Self-Healing

When a stream fails, orchex doesn't just retry blindly. It categorizes the error, generates a targeted fix stream with specific repair instructions, and retries with augmented context — up to 3 attempts.

How Self-Healing Works

Stream executes → Verify commands run → Failure detected
                                              ↓
                                    Error categorized (1 of 10 types)
                                              ↓
                                    Fix stream generated with:
                                      • Original plan
                                      • Error output
                                      • Category-specific fix instructions
                                              ↓
                                    Fix stream executes
                                              ↓
                                    Pass → Continue  |  Fail → Retry (up to 3x)

Step by Step

  1. Detection — A stream fails during execution or when its verify commands return non-zero exit codes
  2. Categorization — orchex analyzes the error output and classifies it into one of 10 error categories
  3. Fix Generation — A new fix stream is created with the original plan, the error details, and category-specific repair instructions
  4. Inheritance — The fix stream inherits the parent stream's file ownership, dependencies, and verify commands
  5. Execution — The fix stream runs with augmented context
  6. Chain Limit — If the fix fails, steps 2-5 repeat. After 3 total attempts, the stream is marked as permanently failed

Error Categories

orchex recognizes 10 distinct error types, each with a targeted fix strategy:

Category Example Error Fix Strategy
TIMEOUT Stream exceeded time limit Retry with increased timeout, suggest splitting the stream
TEST_FAILURE vitest or jest tests failed Include test output, ask LLM to fix specific failing assertions
LINT_ERROR ESLint or Prettier violations Include lint output, ask LLM to fix specific rules
TYPE_ERROR TypeScript tsc compilation failed Include compiler errors with line numbers, fix type mismatches
BUILD_ERROR Build step (npm run build) failed Include build output, fix configuration or missing dependencies
RUNTIME_ERROR Code threw an exception at runtime Include stack trace, fix logic errors
SYNTAX_ERROR Invalid JavaScript/TypeScript syntax Include parser error, fix syntax
IMPORT_ERROR Module not found / import resolution failure Fix import paths, add missing dependencies
PERMISSION_ERROR File ownership violation Fix file access patterns to respect owns boundaries
UNKNOWN Unrecognized error General retry with full error context

Fix Stream Anatomy

When a test failure is detected, orchex generates a fix stream like this:

{
  id: "auth-middleware_fix1",
  name: "Fix: Auth Middleware (attempt 1)",
  owns: ["src/middleware/auth.ts"],        // Same as parent
  reads: ["src/types/auth.ts"],            // Same as parent
  parentStreamId: "auth-middleware",         // Links to parent
  plan: `
    The previous attempt to implement auth middleware failed.

    ERROR CATEGORY: TEST_FAILURE
    ERROR OUTPUT:
    FAIL tests/auth.test.ts
      ✕ returns 401 for expired tokens (12ms)
        Expected: 401
        Received: 500

    ORIGINAL PLAN:
    Create Express middleware that validates JWT tokens...

    FIX INSTRUCTIONS:
    The test expects a 401 response for expired tokens, but the middleware
    is returning 500. Wrap the jwt.verify() call in a try/catch and return
    res.status(401) when a TokenExpiredError is caught.
  `,
  verify: ["npx vitest run tests/auth.test.ts"]
}

Key properties of fix streams:

  • Same ownership — The fix stream owns the same files as the original
  • Same verify commands — It must pass the same checks
  • Augmented plan — Includes the error output and targeted fix instructions
  • Parent chainparentStreamId links to the original stream for tracking

Chain Limits

Fix attempts are limited to prevent infinite loops:

  • Maximum 3 total attempts (original + 2 fixes)
  • Chain trackingparentStreamId links to the immediate parent, and orchex traverses the full chain to count attempts
  • Escalation — After 3 failures, the stream is marked as failed and requires manual intervention
auth-middleware (attempt 1) → FAIL
    ↓
auth-middleware_fix1 (attempt 2) → FAIL
    ↓
auth-middleware_fix2 (attempt 3) → FAIL
    ↓
Stream marked as FAILED (manual intervention needed)

When to Intervene

Self-healing handles most issues automatically. You should step in when:

The Same Error Repeats

If the same error appears across all 3 attempts, the LLM doesn't understand how to fix it. Common causes:

  • Missing dependency not in reads
  • Incorrect assumption in the plan
  • API or library that doesn't exist

The Error is Architectural

Self-healing fixes code-level issues. It can't restructure your stream definitions:

  • Wrong stream decomposition
  • Missing streams
  • Incorrect dependency ordering

A Dependency is Missing

If a stream needs a file that hasn't been created yet and isn't in its reads or deps, self-healing can't add the dependency. Update the stream definition manually.

Verify Command Isolation

orchex runs verify commands per-stream, not globally. This means:

  • Stream A's verify failure doesn't block Stream B in the same wave
  • Each fix stream re-runs only its own verify commands
  • Cross-stream verify failures (e.g., a shared type check) are attributed to the stream that triggered them

Tier Gating

Self-healing is available on Pro tier and above. On the Free (Local) tier, failed streams are marked as failed without automatic fix attempts.

Tier Self-Healing
Local (Free) Not available
Pro Up to 3 attempts
Team Up to 3 attempts
Enterprise Up to 3 attempts (configurable)

Monitoring Self-Healing

During execution, orchex reports self-healing activity:

Wave 2: [auth-middleware] FAILED — TEST_FAILURE
         → Generating fix stream (attempt 2/3)
         → [auth-middleware_fix1] executing...
         → [auth-middleware_fix1] PASS

After execution, orchex.status() shows the self-healing chain:

{
  "streams": {
    "auth-middleware": {
      "status": "failed",
      "error": "TEST_FAILURE: Expected 401, received 500"
    },
    "auth-middleware_fix1": {
      "status": "complete",
      "parentStreamId": "auth-middleware"
    }
  }
}