Providers
orchex supports 5 LLM providers out of the box. You bring your own API keys (BYOK) — orchex routes requests to the provider you've configured, and you pay the provider directly.
Supported Providers
| Provider | Models | Env Variable | Context Limit |
|---|---|---|---|
| Anthropic | Claude Sonnet 4, Claude Haiku | ANTHROPIC_API_KEY |
200,000 tokens |
| OpenAI | GPT-4o, GPT-4o-mini, o1, o3-mini | OPENAI_API_KEY |
128,000 tokens |
| Gemini 2.5 Pro, Gemini 2.0 Flash | GOOGLE_API_KEY / GEMINI_API_KEY |
1,000,000 tokens | |
| DeepSeek | DeepSeek V3, DeepSeek R1, Coder | DEEPSEEK_API_KEY |
128,000 tokens |
| Ollama | Any local model | No key (local) | Model-dependent |
BYOK (Bring Your Own Key)
orchex never stores, proxies, or logs your API keys. When running locally (MCP mode), your key goes directly from your machine to the provider's API. On orchex cloud, BYOK keys are encrypted with AES-256-GCM and used only for the duration of the request.
Setting Up
Export your provider's API key:
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
# OpenAI
export OPENAI_API_KEY="sk-..."
# Google Gemini
export GOOGLE_API_KEY="AI..."
# DeepSeek
export DEEPSEEK_API_KEY="sk-..."
# Ollama (no key needed — runs locally)
# Just ensure Ollama is running: ollama serveAuto-Detection
orchex auto-detects the provider from whichever API key is set. If multiple keys are set, it uses the first one found (in the order listed above). To force a specific provider:
export ORCHEX_PROVIDER="openai"Per-Stream Provider Selection
Each stream can target a specific provider via the provider field:
orchex.init({
feature: "mixed-providers",
streams: {
"core-logic": {
name: "Core Business Logic",
owns: ["src/core.ts"],
// Uses default provider (Anthropic in this case)
plan: "Implement the payment processing pipeline"
},
"documentation": {
name: "API Documentation",
owns: ["docs/api.md"],
provider: "deepseek", // Use DeepSeek for docs (cheaper)
plan: "Write comprehensive API documentation"
},
"data-processing": {
name: "Data Pipeline",
owns: ["src/pipeline.ts"],
provider: "gemini", // Use Gemini for large context
plan: "Process and transform the dataset schema"
}
}
});Cost Optimization Strategy
Route streams based on complexity and cost:
| Use Case | Recommended Provider | Why |
|---|---|---|
| Complex logic, refactoring | Anthropic (Claude Sonnet) | Best code quality |
| Documentation, comments | DeepSeek V3 | Very low cost ($0.001/1K) |
| Large file context (>100K tokens) | Google Gemini | 1M token context window |
| Quick type definitions | OpenAI GPT-4o-mini | Fast, cheap |
| Privacy-sensitive code | Ollama (local) | Never leaves your machine |
Rate Limit Strategy
When you configure multiple provider API keys, orchex can distribute parallel streams across providers. If one provider rate-limits, streams on other providers continue executing.
Wave 2: [auth-api] → Anthropic (rate limited... waiting)
[billing-api] → OpenAI (executing)
[docs] → DeepSeek (executing)This is particularly useful for large orchestrations with many parallel streams.
Context Limits
Each provider has different context window sizes. orchex tracks context budget per stream and warns when approaching limits:
| Provider | Context Limit | Soft Warning | Hard Limit |
|---|---|---|---|
| Anthropic | 200,000 | 140,000 | 180,000 |
| OpenAI | 128,000 | 89,600 | 115,200 |
| Gemini | 1,000,000 | 700,000 | 900,000 |
| DeepSeek | 128,000 | 89,600 | 115,200 |
| Ollama | 128,000 | 89,600 | 115,200 |
Streams that exceed the soft limit generate a warning. Streams that exceed the hard limit may fail or produce truncated output. See Context Budgets for details.
Tier Limits on Providers
The number of distinct providers you can use in a single orchestration is subject to tier limits:
| Tier | Max Providers |
|---|---|
| Local (Free) | 1 |
| Pro | 2 |
| Team | 3 |
| Enterprise | Unlimited |
This limit counts distinct provider values across all streams. Streams without an explicit provider field use the default (auto-detected) provider and don't count toward additional provider slots.
Ollama (Local Models)
Ollama lets you run open-source models locally — no API key needed, no data leaves your machine.
Setup
- Install Ollama: ollama.com
- Pull a model:
ollama pull codellama - Start the server:
ollama serve - Set the base URL:
export OLLAMA_BASE_URL="http://localhost:11434"Considerations
- Speed — Local models are slower than cloud APIs
- Quality — Code generation quality varies by model
- Context — Most local models have smaller context windows
- Privacy — Your code never leaves your machine
Choosing a Provider
For Getting Started
Use Anthropic (Claude Sonnet) — best overall code quality and orchex is optimized for Claude's output format.
For Cost-Sensitive Workloads
Use DeepSeek for documentation, comments, and simple transformations. Save Claude/GPT-4 for complex logic.
For Large Codebases
Use Google Gemini when streams need to read many large files. The 1M token context window handles what other providers can't.
For Enterprise/Privacy
Use Ollama to keep all code local. Combine with cloud providers for non-sensitive streams.
Related
- Streams — Stream definitions including
providerfield - Context Budgets — Provider-aware token limits
- Configure MCP — Setting up API keys
- Cloud Execution — BYOK on orchex cloud