Context Budgets

Every LLM has a context window limit — the maximum number of tokens it can process in a single request. orchex tracks context usage per stream and enforces budgets to prevent truncation, degraded output, or outright failures.

How Context Is Built

For each stream, orchex assembles a multi-layer context prompt:

┌─────────────────────────────────────┐
│ 1. Project Context                  │  File tree, dependencies, config
│    (~2,000-5,000 tokens)            │
├─────────────────────────────────────┤
│ 2. Stream Context                   │  Owned files (with line numbers)
│    (varies by file count/size)      │  + Read-only files
├─────────────────────────────────────┤
│ 3. Dependency Context               │  Completed artifact summaries
│    (~500-2,000 per dependency)      │  from upstream streams
├─────────────────────────────────────┤
│ 4. Instructions                     │  Artifact format rules,
│    (~1,000 tokens)                  │  ownership constraints, plan
└─────────────────────────────────────┘

The total token count across all four layers is the stream's context budget usage.

Provider-Aware Limits

Each LLM provider has a different context window. orchex sets soft and hard limits as percentages of the provider's capacity:

Provider Context Window Soft Limit (70%) Hard Limit (90%)
Anthropic 200,000 140,000 180,000
OpenAI 128,000 89,600 115,200
Gemini 1,000,000 700,000 900,000
DeepSeek 128,000 89,600 115,200
Ollama 128,000 89,600 115,200

Soft Limit

When a stream's estimated context exceeds the soft limit, orchex logs a warning but proceeds with execution:

{
  "event": "budget_warning",
  "streamId": "large-refactor",
  "violationType": "soft",
  "estimatedTokens": 156000,
  "budgetLimit": 140000,
  "provider": "anthropic"
}

The LLM may still produce correct output, but quality can degrade as context grows.

Hard Limit

When a stream exceeds the hard limit, orchex generates an error. The stream may fail or produce truncated output:

{
  "event": "budget_exceeded",
  "streamId": "monolith-stream",
  "violationType": "hard",
  "estimatedTokens": 195000,
  "budgetLimit": 180000,
  "provider": "anthropic"
}

Action: Split the stream into smaller sub-streams, reduce owned/read files, or switch to a provider with a larger context window (e.g., Gemini).

Configuring Budgets

You can configure per-stream context budgets in the stream definition:

"large-stream": {
  name: "Large Refactor",
  owns: ["src/core.ts"],
  reads: ["src/types.ts", "src/config.ts"],
  contextBudget: {
    softLimitTokens: 100000,
    hardLimitTokens: 150000,
    enforcementLevel: "warn",     // "warn" | "soft" | "hard"
    warningThreshold: 0.8         // Warn at 80% of soft limit
  }
}
Field Default Description
softLimitTokens Provider-dependent Warning threshold
hardLimitTokens Provider-dependent Failure threshold
enforcementLevel "warn" "warn" = log only, "soft" = warn + degrade, "hard" = fail
warningThreshold 0.8 Fraction of soft limit that triggers a warning (0.0-1.0)

Reducing Context Usage

1. Split Large Streams

The most effective strategy. If a stream owns 6+ files, split it:

// Before: One stream, high context
"full-api": {
  owns: ["src/routes/users.ts", "src/routes/posts.ts",
         "src/routes/auth.ts", "src/routes/billing.ts",
         "tests/api.test.ts"],
  reads: ["src/types/api.ts", "src/config.ts"]
}

// After: Focused streams, manageable context
"api-users": {
  owns: ["src/routes/users.ts"],
  reads: ["src/types/api.ts"]
},
"api-posts": {
  owns: ["src/routes/posts.ts"],
  reads: ["src/types/api.ts"]
}

2. Minimize Read Files

Each reads file adds its entire content to the context. Only include files the LLM actually needs:

// Bad: Reading entire config
reads: ["src/config.ts"]       // 500 lines of config

// Better: Read only the relevant section
reads: ["src/config/database.ts"]   // 50 lines

3. Use Dependency Context Instead of Reads

If a file was created by an upstream stream, its artifact summary is automatically included in downstream context. You don't need to add it to reads unless the LLM needs the full file content:

"api-routes": {
  deps: ["api-types"],
  // api-types' artifact summary is automatically included
  // Only add to reads if you need full file content
}

4. Choose the Right Provider

For streams with inherently large context (many files, long code), use a provider with a larger window:

"massive-refactor": {
  provider: "gemini",           // 1M token context
  owns: ["src/legacy/core.ts"],
  reads: ["src/legacy/types.ts", "src/legacy/utils.ts",
          "src/legacy/config.ts", "src/legacy/helpers.ts"]
}

Stream Category Recommendations

orchex categorizes streams by their file patterns and recommends max file counts:

Category Max owns Max reads Notes
code 4 4 Implementation files
docs 6 3 Documentation pages
tutorial 3 4 Tutorial sections
test 4 5 Test files with imports
migration 3 4 Schema migrations

These are guidelines, not hard limits. Exceeding them increases timeout and quality risk.

Adaptive Learning

orchex tracks context budget usage across executions and adapts thresholds based on your project's history:

  • Per-category limits — Code streams vs. documentation vs. tutorials have different optimal budgets
  • Confidence levels — Low (0-49 samples), Medium (50-99), High (100+)
  • Persistent state — Saved in .orchex/learn/thresholds.json

As orchex accumulates execution history, its budget estimates become more accurate for your specific codebase and coding patterns.

Monitoring Budget Usage

During `orchex learn`

The learn pipeline estimates token counts for each generated stream and warns about potential budget issues:

Stream "large-docs" estimated at 156,000 tokens (Anthropic soft limit: 140,000)
→ Consider splitting into focused sub-streams or using Gemini provider

During `orchex execute`

Budget usage is reported in real-time:

Wave 1: [auth-types] context: 12,400 tokens (6% of 200K)
        [api-types]  context: 18,200 tokens (9% of 200K)
Wave 2: [auth-api]   context: 45,600 tokens (23% of 200K)

In Telemetry

Context budget metrics are recorded in orchex's telemetry system:

  • contextTokensEstimated — Pre-execution estimate
  • contextTokensActual — Actual tokens used
  • contextBudgetUtilization — Ratio of actual to limit (0.0-1.0)
  • budgetViolationType"none", "soft", or "hard"