Models & Providers
The BPAI MCP Server supports 7 AI providers and 50+ models. Each provider has unique strengths, pricing characteristics, and configuration requirements. This guide covers the full model registry, multi-key parallel generation, and provider-specific settings.
list_models
List all available models for a specific provider (or all providers).
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
provider |
enum | ❌ | All | openai, claude, deepseek, grok, gemini, perplexity, kimi |
Example
What models are available for OpenAI?
Show me all reasoning models across all providers
Provider Registry
OpenAI
| Model | Context | Max Output | Category | Notes |
|---|---|---|---|---|
gpt-5.2 |
400K | 128K | Flagship | Latest, highest quality |
gpt-5-mini |
400K | 128K | Fast | Cost-effective, good quality |
gpt-5.1 |
400K | 128K | Flagship | Previous flagship |
o4-mini |
200K | 100K | Reasoning | Chain-of-thought reasoning |
gpt-4.1 |
1M | 32K | Flagship | Large context, reliable |
gpt-4.1-mini |
1M | 32K | Fast | Budget option |
gpt-4.1-nano |
1M | 32K | Fast | Fastest, lowest cost |
Base URL: https://api.openai.com/v1
Anthropic (Claude)
| Model | Context | Max Output | Category | Notes |
|---|---|---|---|---|
claude-opus-4.5 |
200K | 64K | Flagship | Best writing quality |
claude-sonnet-4.5 |
200K | 64K | Flagship | Balanced performance |
claude-3.7-sonnet |
200K | 128K | Flagship | Extended output available |
claude-3.5-sonnet |
200K | 8K | Fast | Previous generation |
claude-3.5-haiku |
200K | 8K | Fast | Fastest Claude |
Base URL: https://api.anthropic.com/v1
Google (Gemini)
| Model | Context | Max Output | Category | Notes |
|---|---|---|---|---|
gemini-3-pro-preview |
1M+ | 64K | Flagship | Google Search grounding |
gemini-2.5-flash |
1M | 64K | Fast | Thinking model, high speed |
gemini-2.5-pro |
1M | 64K | Reasoning | Deep analysis |
gemini-2.0-flash |
1M | 8K | Fast | Previous generation fast |
Base URL: https://generativelanguage.googleapis.com/v1beta
xAI (Grok)
| Model | Context | Max Output | Category | Notes |
|---|---|---|---|---|
grok-4-flagship |
256K | 32K | Flagship | Highest quality Grok |
grok-4.1-fast |
2M | 32K | Fast | Massive 2M context |
grok-3 |
131K | 8K | Flagship | Previous generation |
grok-3-mini |
131K | 8K | Fast | Budget option |
Base URL: https://api.x.ai/v1
DeepSeek
| Model | Context | Max Output | Category | Notes |
|---|---|---|---|---|
deepseek-reasoner (V3.2) |
128K | 64K | Reasoning | Best reasoning model |
deepseek-chat |
128K | 64K | Flagship | General purpose |
Base URL: https://api.deepseek.com/v1
Kimi (Moonshot)
| Model | Context | Max Output | Category | Notes |
|---|---|---|---|---|
kimi-k2.5-thinking |
256K | 32K | Reasoning | Thinking mode (temp fixed at 1.0) |
kimi-k2.5 |
256K | 32K | Flagship | Latest general model |
kimi-k2 |
131K | 32K | Flagship | Previous generation |
Base URL: https://api.moonshot.cn/v1
Perplexity
| Model | Context | Max Output | Category | Notes |
|---|---|---|---|---|
sonar-pro |
127K | 4K | Research | Web search + citations |
sonar-reasoning-pro |
127K | 4K | Research | Web search + reasoning |
sonar-deep-research |
127K | 4K | Research | Multi-query deep research |
sonar |
127K | 4K | Research | Standard web search |
Base URL: https://api.perplexity.ai
get_settings
View current AI generation settings.
Parameters
None required.
Response
Returns configured providers, default models, temperatures, and token limits from your BPAI account.
Note: Settings are managed through the BPAI web dashboard. The MCP server reads these settings at startup and applies them to all generation requests.
Reasoning Model Handling
Models with internal reasoning phases (Chain-of-Thought) require special handling:
Temperature Omission
These models reject the temperature parameter and manage it internally:
- O-series (o4-mini, o3, o1)
- DeepSeek Reasoner
- Kimi K2.5 Thinking (fixed at 1.0)
- Gemini 2.5 series (thinking mode)
The MCP server automatically omits temperature for these models. If you set it, the server strips it silently.
Minimum Output Tokens
Reasoning models need extra output budget for their internal thinking tokens:
| Model | Minimum Output Tokens |
|---|---|
| o4-mini | 8,000 |
| DeepSeek Reasoner | 8,000 |
| Gemini 2.5 Pro | 4,000 |
| Kimi K2.5 Thinking | 4,000 |
The server enforces these minimums. If you request fewer tokens, the server upgrades to the minimum.
Parameter Self-Healing
If a model returns a 400 error for an unsupported parameter, the server automatically:
- Identifies the rejected parameter
- Strips it from the request
- Retries the request once
This prevents failures when providers change their API without notice.
Multi-Key Configuration
For high-volume generation, register multiple API keys per provider. The server distributes requests across keys using round-robin rotation.
How It Works
{
"enabled": true,
"openai": [
{ "key": "sk-key-1...", "label": "Main Account", "enabled": true },
{ "key": "sk-key-2...", "label": "Secondary", "enabled": true },
{ "key": "sk-key-3...", "label": "Team Account", "enabled": true }
]
}
With 3 keys, batch generation processes 3 articles simultaneously. A 100-article batch completes roughly 3× faster.
Parallelism Calculation
Parallel threads = number of enabled keys for the selected provider
Batch time ≈ (total articles / parallel threads) × avg generation time
| Keys | 50 Articles (30s each) | 200 Articles | 1000 Articles |
|---|---|---|---|
| 1 key | 25 minutes | 100 minutes | 500 minutes |
| 3 keys | ~8 minutes | ~33 minutes | ~167 minutes |
| 5 keys | ~5 minutes | ~20 minutes | ~100 minutes |
| 10 keys | ~2.5 minutes | ~10 minutes | ~50 minutes |
Configuration via MCP
Multi-key configuration is managed through the web dashboard (Settings → Multi-Key Config). MCP inherits these settings automatically.
Choosing the Right Model
| Use Case | Recommended Model | Why |
|---|---|---|
| Highest writing quality | claude-opus-4.5 |
Best at natural, engaging prose |
| Fastest generation | gpt-5-mini |
Low latency, reliable output |
| Budget-friendly bulk | gpt-4.1-nano |
Cheapest per token, decent quality |
| Research-heavy content | sonar-pro + gpt-5.2 |
Perplexity researches, GPT writes |
| Technical/code content | deepseek-reasoner |
Superior reasoning for technical topics |
| Massive context needed | grok-4.1-fast |
2M token context window |
| SEO-optimized content | gpt-5.2 |
Strong data-driven SEO compliance |
Next Steps
- Advanced Usage → — Multi-provider workflows and automation
- Content Generation → — Apply these model settings to generation