Models & Providers

The BPAI MCP Server supports 7 AI providers and 50+ models. Each provider has unique strengths, pricing characteristics, and configuration requirements. This guide covers the full model registry, multi-key parallel generation, and provider-specific settings.

`list_models`

List all available models for a specific provider (or all providers).

Parameters

Parameter	Type	Required	Default	Description
`provider`	enum	❌	All	`openai`, `claude`, `deepseek`, `grok`, `gemini`, `perplexity`, `kimi`

Example

What models are available for OpenAI?

Show me all reasoning models across all providers

Provider Registry

OpenAI

Model	Context	Max Output	Category	Notes
`gpt-5.2`	400K	128K	Flagship	Latest, highest quality
`gpt-5-mini`	400K	128K	Fast	Cost-effective, good quality
`gpt-5.1`	400K	128K	Flagship	Previous flagship
`o4-mini`	200K	100K	Reasoning	Chain-of-thought reasoning
`gpt-4.1`	1M	32K	Flagship	Large context, reliable
`gpt-4.1-mini`	1M	32K	Fast	Budget option
`gpt-4.1-nano`	1M	32K	Fast	Fastest, lowest cost

Base URL: https://api.openai.com/v1

Anthropic (Claude)

Model	Context	Max Output	Category	Notes
`claude-opus-4.5`	200K	64K	Flagship	Best writing quality
`claude-sonnet-4.5`	200K	64K	Flagship	Balanced performance
`claude-3.7-sonnet`	200K	128K	Flagship	Extended output available
`claude-3.5-sonnet`	200K	8K	Fast	Previous generation
`claude-3.5-haiku`	200K	8K	Fast	Fastest Claude

Base URL: https://api.anthropic.com/v1

Google (Gemini)

Model	Context	Max Output	Category	Notes
`gemini-3-pro-preview`	1M+	64K	Flagship	Google Search grounding
`gemini-2.5-flash`	1M	64K	Fast	Thinking model, high speed
`gemini-2.5-pro`	1M	64K	Reasoning	Deep analysis
`gemini-2.0-flash`	1M	8K	Fast	Previous generation fast

Base URL: https://generativelanguage.googleapis.com/v1beta

xAI (Grok)

Model	Context	Max Output	Category	Notes
`grok-4-flagship`	256K	32K	Flagship	Highest quality Grok
`grok-4.1-fast`	2M	32K	Fast	Massive 2M context
`grok-3`	131K	8K	Flagship	Previous generation
`grok-3-mini`	131K	8K	Fast	Budget option

Base URL: https://api.x.ai/v1

DeepSeek

Model	Context	Max Output	Category	Notes
`deepseek-reasoner` (V3.2)	128K	64K	Reasoning	Best reasoning model
`deepseek-chat`	128K	64K	Flagship	General purpose

Base URL: https://api.deepseek.com/v1

Kimi (Moonshot)

Model	Context	Max Output	Category	Notes
`kimi-k2.5-thinking`	256K	32K	Reasoning	Thinking mode (temp fixed at 1.0)
`kimi-k2.5`	256K	32K	Flagship	Latest general model
`kimi-k2`	131K	32K	Flagship	Previous generation

Base URL: https://api.moonshot.cn/v1

Perplexity

Model	Context	Max Output	Category	Notes
`sonar-pro`	127K	4K	Research	Web search + citations
`sonar-reasoning-pro`	127K	4K	Research	Web search + reasoning
`sonar-deep-research`	127K	4K	Research	Multi-query deep research
`sonar`	127K	4K	Research	Standard web search

Base URL: https://api.perplexity.ai

`get_settings`

View current AI generation settings.

Parameters

None required.

Response

Returns configured providers, default models, temperatures, and token limits from your BPAI account.

Note: Settings are managed through the BPAI web dashboard. The MCP server reads these settings at startup and applies them to all generation requests.

Reasoning Model Handling

Models with internal reasoning phases (Chain-of-Thought) require special handling:

Temperature Omission

These models reject the temperature parameter and manage it internally:

O-series (o4-mini, o3, o1)
DeepSeek Reasoner
Kimi K2.5 Thinking (fixed at 1.0)
Gemini 2.5 series (thinking mode)

The MCP server automatically omits temperature for these models. If you set it, the server strips it silently.

Minimum Output Tokens

Reasoning models need extra output budget for their internal thinking tokens:

Model	Minimum Output Tokens
o4-mini	8,000
DeepSeek Reasoner	8,000
Gemini 2.5 Pro	4,000
Kimi K2.5 Thinking	4,000

The server enforces these minimums. If you request fewer tokens, the server upgrades to the minimum.

Parameter Self-Healing

If a model returns a 400 error for an unsupported parameter, the server automatically:

Identifies the rejected parameter
Strips it from the request
Retries the request once

This prevents failures when providers change their API without notice.

Multi-Key Configuration

For high-volume generation, register multiple API keys per provider. The server distributes requests across keys using round-robin rotation.

How It Works

{
  "enabled": true,
  "openai": [
    { "key": "sk-key-1...", "label": "Main Account", "enabled": true },
    { "key": "sk-key-2...", "label": "Secondary", "enabled": true },
    { "key": "sk-key-3...", "label": "Team Account", "enabled": true }
  ]
}

With 3 keys, batch generation processes 3 articles simultaneously. A 100-article batch completes roughly 3× faster.

Parallelism Calculation

Parallel threads = number of enabled keys for the selected provider
Batch time ≈ (total articles / parallel threads) × avg generation time

Keys	50 Articles (30s each)	200 Articles	1000 Articles
1 key	25 minutes	100 minutes	500 minutes
3 keys	~8 minutes	~33 minutes	~167 minutes
5 keys	~5 minutes	~20 minutes	~100 minutes
10 keys	~2.5 minutes	~10 minutes	~50 minutes

Configuration via MCP

Multi-key configuration is managed through the web dashboard (Settings → Multi-Key Config). MCP inherits these settings automatically.

Choosing the Right Model

Use Case	Recommended Model	Why
Highest writing quality	`claude-opus-4.5`	Best at natural, engaging prose
Fastest generation	`gpt-5-mini`	Low latency, reliable output
Budget-friendly bulk	`gpt-4.1-nano`	Cheapest per token, decent quality
Research-heavy content	`sonar-pro` + `gpt-5.2`	Perplexity researches, GPT writes
Technical/code content	`deepseek-reasoner`	Superior reasoning for technical topics
Massive context needed	`grok-4.1-fast`	2M token context window
SEO-optimized content	`gpt-5.2`	Strong data-driven SEO compliance

Next Steps

Advanced Usage → — Multi-provider workflows and automation
Content Generation → — Apply these model settings to generation