Models & Providers

The BPAI MCP Server supports 7 AI providers and 50+ models. Each provider has unique strengths, pricing characteristics, and configuration requirements. This guide covers the full model registry, multi-key parallel generation, and provider-specific settings.


list_models

List all available models for a specific provider (or all providers).

Parameters

Parameter Type Required Default Description
provider enum All openai, claude, deepseek, grok, gemini, perplexity, kimi

Example

What models are available for OpenAI?
Show me all reasoning models across all providers

Provider Registry

OpenAI

Model Context Max Output Category Notes
gpt-5.2 400K 128K Flagship Latest, highest quality
gpt-5-mini 400K 128K Fast Cost-effective, good quality
gpt-5.1 400K 128K Flagship Previous flagship
o4-mini 200K 100K Reasoning Chain-of-thought reasoning
gpt-4.1 1M 32K Flagship Large context, reliable
gpt-4.1-mini 1M 32K Fast Budget option
gpt-4.1-nano 1M 32K Fast Fastest, lowest cost

Base URL: https://api.openai.com/v1

Anthropic (Claude)

Model Context Max Output Category Notes
claude-opus-4.5 200K 64K Flagship Best writing quality
claude-sonnet-4.5 200K 64K Flagship Balanced performance
claude-3.7-sonnet 200K 128K Flagship Extended output available
claude-3.5-sonnet 200K 8K Fast Previous generation
claude-3.5-haiku 200K 8K Fast Fastest Claude

Base URL: https://api.anthropic.com/v1

Google (Gemini)

Model Context Max Output Category Notes
gemini-3-pro-preview 1M+ 64K Flagship Google Search grounding
gemini-2.5-flash 1M 64K Fast Thinking model, high speed
gemini-2.5-pro 1M 64K Reasoning Deep analysis
gemini-2.0-flash 1M 8K Fast Previous generation fast

Base URL: https://generativelanguage.googleapis.com/v1beta

xAI (Grok)

Model Context Max Output Category Notes
grok-4-flagship 256K 32K Flagship Highest quality Grok
grok-4.1-fast 2M 32K Fast Massive 2M context
grok-3 131K 8K Flagship Previous generation
grok-3-mini 131K 8K Fast Budget option

Base URL: https://api.x.ai/v1

DeepSeek

Model Context Max Output Category Notes
deepseek-reasoner (V3.2) 128K 64K Reasoning Best reasoning model
deepseek-chat 128K 64K Flagship General purpose

Base URL: https://api.deepseek.com/v1

Kimi (Moonshot)

Model Context Max Output Category Notes
kimi-k2.5-thinking 256K 32K Reasoning Thinking mode (temp fixed at 1.0)
kimi-k2.5 256K 32K Flagship Latest general model
kimi-k2 131K 32K Flagship Previous generation

Base URL: https://api.moonshot.cn/v1

Perplexity

Model Context Max Output Category Notes
sonar-pro 127K 4K Research Web search + citations
sonar-reasoning-pro 127K 4K Research Web search + reasoning
sonar-deep-research 127K 4K Research Multi-query deep research
sonar 127K 4K Research Standard web search

Base URL: https://api.perplexity.ai


get_settings

View current AI generation settings.

Parameters

None required.

Response

Returns configured providers, default models, temperatures, and token limits from your BPAI account.

Note: Settings are managed through the BPAI web dashboard. The MCP server reads these settings at startup and applies them to all generation requests.


Reasoning Model Handling

Models with internal reasoning phases (Chain-of-Thought) require special handling:

Temperature Omission

These models reject the temperature parameter and manage it internally:

  • O-series (o4-mini, o3, o1)
  • DeepSeek Reasoner
  • Kimi K2.5 Thinking (fixed at 1.0)
  • Gemini 2.5 series (thinking mode)

The MCP server automatically omits temperature for these models. If you set it, the server strips it silently.

Minimum Output Tokens

Reasoning models need extra output budget for their internal thinking tokens:

Model Minimum Output Tokens
o4-mini 8,000
DeepSeek Reasoner 8,000
Gemini 2.5 Pro 4,000
Kimi K2.5 Thinking 4,000

The server enforces these minimums. If you request fewer tokens, the server upgrades to the minimum.

Parameter Self-Healing

If a model returns a 400 error for an unsupported parameter, the server automatically:

  1. Identifies the rejected parameter
  2. Strips it from the request
  3. Retries the request once

This prevents failures when providers change their API without notice.


Multi-Key Configuration

For high-volume generation, register multiple API keys per provider. The server distributes requests across keys using round-robin rotation.

How It Works

{
  "enabled": true,
  "openai": [
    { "key": "sk-key-1...", "label": "Main Account", "enabled": true },
    { "key": "sk-key-2...", "label": "Secondary", "enabled": true },
    { "key": "sk-key-3...", "label": "Team Account", "enabled": true }
  ]
}

With 3 keys, batch generation processes 3 articles simultaneously. A 100-article batch completes roughly 3× faster.

Parallelism Calculation

Parallel threads = number of enabled keys for the selected provider
Batch time ≈ (total articles / parallel threads) × avg generation time
Keys 50 Articles (30s each) 200 Articles 1000 Articles
1 key 25 minutes 100 minutes 500 minutes
3 keys ~8 minutes ~33 minutes ~167 minutes
5 keys ~5 minutes ~20 minutes ~100 minutes
10 keys ~2.5 minutes ~10 minutes ~50 minutes

Configuration via MCP

Multi-key configuration is managed through the web dashboard (Settings → Multi-Key Config). MCP inherits these settings automatically.


Choosing the Right Model

Use Case Recommended Model Why
Highest writing quality claude-opus-4.5 Best at natural, engaging prose
Fastest generation gpt-5-mini Low latency, reliable output
Budget-friendly bulk gpt-4.1-nano Cheapest per token, decent quality
Research-heavy content sonar-pro + gpt-5.2 Perplexity researches, GPT writes
Technical/code content deepseek-reasoner Superior reasoning for technical topics
Massive context needed grok-4.1-fast 2M token context window
SEO-optimized content gpt-5.2 Strong data-driven SEO compliance

Next Steps