Tutorial: Self-Healing in Action

Learn how Orchex automatically recovers from failures by generating fix streams. When a stream fails, Orchex analyzes the error, generates a targeted fix, and retries — all without manual intervention.

Time: 15-20 minutes Prerequisites: Completed First Orchestration Tutorial


Introduction

Why Self-Healing?

AI agents aren't perfect. They might:

  • Write code with syntax errors
  • Miss edge cases that fail tests
  • Produce output that doesn't match expectations

Without self-healing, you'd need to manually:

  1. Read the error
  2. Understand what went wrong
  3. Modify the prompt
  4. Re-run the stream

Self-healing automates this loop.

How It Works

Stream fails → Error analyzed → Fix stream generated → Automatic retry
  1. Error Analysis: Orchex categorizes the error (test failure, lint error, timeout, etc.)
  2. Fix Generation: A new stream is created with the original plan + error context
  3. Retry: The fix stream executes, with knowledge of what went wrong
  4. Limits: Maximum 3 attempts to prevent infinite loops

Step 1: Create a Scenario That Might Fail

Let's create an orchestration with verification that might initially fail:

# self-healing-demo.yaml
# Demonstrates self-healing when tests fail

streams:
  - id: create-calculator
    prompt: |
      Create `src/calculator.ts` with a Calculator class:

      Requirements:
      - add(a, b): returns sum
      - subtract(a, b): returns difference
      - multiply(a, b): returns product
      - divide(a, b): returns quotient, throws on divide by zero

      Export the class as default.
    dependencies: []
    verify:
      - npm run build

  - id: create-tests
    prompt: |
      Create `tests/calculator.test.ts` with comprehensive tests:

      Test cases:
      - add: 2+3=5, -1+1=0, 0+0=0
      - subtract: 5-3=2, 1-1=0
      - multiply: 3*4=12, 0*5=0, -2*3=-6
      - divide: 10/2=5, 9/3=3, division by zero throws

      Use vitest syntax.
    dependencies: [create-calculator]
    verify:
      - npm test

Step 2: Understanding Verification

The verify field runs commands after a stream completes:

verify:
  - npm run build    # Check TypeScript compiles
  - npm test         # Run tests

If any verification command fails, the stream is marked as failed, triggering self-healing.

Step 3: Execute and Watch Self-Healing

Initialize and execute:

Initialize orchestration from self-healing-demo.yaml
Execute the orchestration

Scenario A: Everything Works

If the AI gets it right the first time:

⚡ Wave 1/2 (1 stream)
└─ create-calculator ... ✓ complete

⚡ Wave 2/2 (1 stream)
└─ create-tests ... ✓ complete

✅ Orchestration complete!

Scenario B: Test Fails → Self-Healing Kicks In

More realistically, the first attempt might fail:

⚡ Wave 1/2 (1 stream)
└─ create-calculator ... ✓ complete

⚡ Wave 2/2 (1 stream)
└─ create-tests ... ✗ failed

📊 Error Analysis:
   Category: test_failure
   Retryable: yes
   Suggestion: Review test assertions against implementation

🔧 Generating fix stream: create-tests-fix-2

⚡ Wave 2/2 (1 stream - retry)
└─ create-tests-fix-2 ... ✓ complete

✅ Orchestration complete (with 1 fix)

Understanding Fix Streams

What's in a Fix Stream?

When create-tests fails, Orchex generates create-tests-fix-2:

streams:
  create-tests-fix-2:
    name: "Create Tests (Fix #2)"
    parentStreamId: create-tests
    plan: |
      PREVIOUS ERROR:
      Test failed: Expected divide(10, 0) to throw, but it returned Infinity

      ORIGINAL TASK:
      Create tests/calculator.test.ts with comprehensive tests...

      FIX INSTRUCTIONS:
      1. Review the test assertions for the divide function
      2. Ensure the test correctly expects an exception for divide by zero
      3. Check that the calculator implementation actually throws
      4. Make sure the test syntax for expecting throws is correct

      SUGGESTION:
      Review test assertions against implementation

The fix stream has:

  • Error context: What went wrong
  • Original task: The full original prompt
  • Fix instructions: Guidance based on error analysis
  • Parent tracking: parentStreamId prevents infinite fix chains

Error Categories

Orchex categorizes errors to provide targeted suggestions:

Category Example Suggestion
test_failure Tests don't pass Review test assertions
lint_error ESLint/TSLint fails Fix code style issues
edit_mismatch File edit failed Verify file contents match
timeout Execution took too long Simplify task or increase timeout
runtime_error Code throws at runtime Debug runtime exception
invalid_artifact Output format wrong Follow artifact schema

Non-Retryable Errors

Some errors aren't retried:

Category Why Not Retryable
environment Missing dependencies, wrong Node version
timeout (infrastructure) Network/API timeout, not code issue
rate_limit API rate limited, need to wait

For these, Orchex reports the error and stops.


Step 4: Examining the Self-Healing Process

Check the Manifest

After self-healing, examine .orchex/active/manifest.yaml:

streams:
  create-calculator:
    status: complete
    appliedAt: "2026-02-05T10:00:00Z"

  create-tests:
    status: failed
    error: "Test failed: Expected divide(10, 0) to throw"
    attempts: 1

  create-tests-fix-2:
    name: "Create Tests (Fix #2)"
    parentStreamId: create-tests
    status: complete
    attempts: 1
    appliedAt: "2026-02-05T10:02:30Z"

Why `-fix-2` (not `-fix-1`)?

The attempt counter increments when execution starts. When the original stream fails on attempt 1, the fix stream starts at attempt 2. This ensures attempt numbers are globally unique.


Step 5: Configuring Self-Healing Behavior

Per-Stream Timeout

For streams that need more time:

streams:
  - id: complex-generation
    prompt: Generate comprehensive documentation
    timeoutMs: 900000  # 15 minutes instead of default 10
    dependencies: []

Maximum Attempts

Self-healing has a built-in limit of 3 attempts (original + 2 fixes). After that, the stream remains failed.

Attempt 1: Original stream
Attempt 2: First fix stream (-fix-2)
Attempt 3: Second fix stream (-fix-3)
Attempt 4: ✗ Max attempts reached, stream stays failed

Preventing Fix Chains

Fix streams track their parent to prevent cascading fixes:

original-stream:
  # fails

original-stream-fix-2:
  parentStreamId: original-stream
  # fails again

original-stream-fix-3:
  parentStreamId: original-stream  # Same parent!
  # This is attempt 3

If a fix stream generates another fix, it still counts toward the original's attempt limit.


Real-World Example: API Integration

Here's a more realistic scenario where self-healing shines:

# api-integration.yaml
streams:
  - id: define-types
    prompt: |
      Create src/api/types.ts with TypeScript interfaces for a REST API:
      - User, Post, Comment types
      - Request/Response types
      - Error types
    dependencies: []
    verify:
      - npm run build

  - id: implement-client
    prompt: |
      Create src/api/client.ts implementing the API client:
      - fetchUsers(): Promise<User[]>
      - fetchUser(id): Promise<User>
      - createPost(data): Promise<Post>
      - Error handling with proper types
    dependencies: [define-types]
    verify:
      - npm run build
      - npm test -- --testPathPattern=client

  - id: implement-hooks
    prompt: |
      Create src/hooks/useApi.ts with React hooks:
      - useUsers(): { users, loading, error }
      - useUser(id): { user, loading, error }
      - useCreatePost(): { createPost, loading, error }
      Use the API client from src/api/client.ts
    dependencies: [implement-client]
    verify:
      - npm run build
      - npm test -- --testPathPattern=hooks

Common failure points:

  1. Types don't match between files → Self-healing adds type fixes
  2. Tests expect different behavior → Self-healing adjusts implementation
  3. Build fails on type errors → Self-healing fixes type issues

Debugging Self-Healing

View Fix Stream Details

Check what Orchex learned from the failure:

cat .orchex/active/manifest.yaml | grep -A 20 "fix-2"

Check Error Analysis

The error category determines the fix approach:

create-tests:
  error: |
    FAIL tests/calculator.test.ts
    ✕ divide by zero should throw (5 ms)
      Expected: [Error: Division by zero]
      Received: Infinity

This is categorized as test_failure with suggestion to review assertions.

Examine the Generated Plan

Compare original vs fix stream plans:

Original:

Create tests/calculator.test.ts with comprehensive tests...

Fix:

PREVIOUS ERROR:
Expected divide(10, 0) to throw, but it returned Infinity

ORIGINAL TASK:
Create tests/calculator.test.ts...

FIX INSTRUCTIONS:
1. Review the test assertions for the divide function
2. Ensure the calculator implementation throws on divide by zero
3. If implementation doesn't throw, the test expectation is wrong

Best Practices

1. Write Good Verification Commands

# Good: Specific, informative failures
verify:
  - npm run build 2>&1 | head -50
  - npm test -- --reporter=verbose

# Bad: Silent failures
verify:
  - npm run build > /dev/null
  - npm test --silent

2. Include Multiple Verification Steps

verify:
  - npm run typecheck          # Type errors
  - npm run lint               # Style issues
  - npm run test               # Logic errors
  - npm run build              # Final check

3. Use Specific Test Patterns

verify:
  # Only run tests for this stream's files
  - npm test -- --testPathPattern=calculator

4. Set Appropriate Timeouts

# Quick stream
- id: create-types
  timeoutMs: 120000  # 2 minutes

# Complex stream
- id: generate-docs
  timeoutMs: 600000  # 10 minutes

What You've Learned

✅ How self-healing automatically recovers from failures

✅ How errors are analyzed and categorized

✅ How fix streams are generated with error context

✅ How to configure timeouts and verification

✅ Best practices for reliable orchestrations


Next Steps


Self-healing makes your orchestrations more robust and reduces manual intervention!