Tutorial: Self-Healing in Action
Learn how Orchex automatically recovers from failures by generating fix streams. When a stream fails, Orchex analyzes the error, generates a targeted fix, and retries — all without manual intervention.
Time: 15-20 minutes Prerequisites: Completed First Orchestration Tutorial
Introduction
Why Self-Healing?
AI agents aren't perfect. They might:
- Write code with syntax errors
- Miss edge cases that fail tests
- Produce output that doesn't match expectations
Without self-healing, you'd need to manually:
- Read the error
- Understand what went wrong
- Modify the prompt
- Re-run the stream
Self-healing automates this loop.
How It Works
Stream fails → Error analyzed → Fix stream generated → Automatic retry- Error Analysis: Orchex categorizes the error (test failure, lint error, timeout, etc.)
- Fix Generation: A new stream is created with the original plan + error context
- Retry: The fix stream executes, with knowledge of what went wrong
- Limits: Maximum 3 attempts to prevent infinite loops
Step 1: Create a Scenario That Might Fail
Let's create an orchestration with verification that might initially fail:
# self-healing-demo.yaml
# Demonstrates self-healing when tests fail
streams:
- id: create-calculator
prompt: |
Create `src/calculator.ts` with a Calculator class:
Requirements:
- add(a, b): returns sum
- subtract(a, b): returns difference
- multiply(a, b): returns product
- divide(a, b): returns quotient, throws on divide by zero
Export the class as default.
dependencies: []
verify:
- npm run build
- id: create-tests
prompt: |
Create `tests/calculator.test.ts` with comprehensive tests:
Test cases:
- add: 2+3=5, -1+1=0, 0+0=0
- subtract: 5-3=2, 1-1=0
- multiply: 3*4=12, 0*5=0, -2*3=-6
- divide: 10/2=5, 9/3=3, division by zero throws
Use vitest syntax.
dependencies: [create-calculator]
verify:
- npm testStep 2: Understanding Verification
The verify field runs commands after a stream completes:
verify:
- npm run build # Check TypeScript compiles
- npm test # Run testsIf any verification command fails, the stream is marked as failed, triggering self-healing.
Step 3: Execute and Watch Self-Healing
Initialize and execute:
Initialize orchestration from self-healing-demo.yaml
Execute the orchestrationScenario A: Everything Works
If the AI gets it right the first time:
⚡ Wave 1/2 (1 stream)
└─ create-calculator ... ✓ complete
⚡ Wave 2/2 (1 stream)
└─ create-tests ... ✓ complete
✅ Orchestration complete!Scenario B: Test Fails → Self-Healing Kicks In
More realistically, the first attempt might fail:
⚡ Wave 1/2 (1 stream)
└─ create-calculator ... ✓ complete
⚡ Wave 2/2 (1 stream)
└─ create-tests ... ✗ failed
📊 Error Analysis:
Category: test_failure
Retryable: yes
Suggestion: Review test assertions against implementation
🔧 Generating fix stream: create-tests-fix-2
⚡ Wave 2/2 (1 stream - retry)
└─ create-tests-fix-2 ... ✓ complete
✅ Orchestration complete (with 1 fix)Understanding Fix Streams
What's in a Fix Stream?
When create-tests fails, Orchex generates create-tests-fix-2:
streams:
create-tests-fix-2:
name: "Create Tests (Fix #2)"
parentStreamId: create-tests
plan: |
PREVIOUS ERROR:
Test failed: Expected divide(10, 0) to throw, but it returned Infinity
ORIGINAL TASK:
Create tests/calculator.test.ts with comprehensive tests...
FIX INSTRUCTIONS:
1. Review the test assertions for the divide function
2. Ensure the test correctly expects an exception for divide by zero
3. Check that the calculator implementation actually throws
4. Make sure the test syntax for expecting throws is correct
SUGGESTION:
Review test assertions against implementationThe fix stream has:
- Error context: What went wrong
- Original task: The full original prompt
- Fix instructions: Guidance based on error analysis
- Parent tracking:
parentStreamIdprevents infinite fix chains
Error Categories
Orchex categorizes errors to provide targeted suggestions:
| Category | Example | Suggestion |
|---|---|---|
test_failure |
Tests don't pass | Review test assertions |
lint_error |
ESLint/TSLint fails | Fix code style issues |
edit_mismatch |
File edit failed | Verify file contents match |
timeout |
Execution took too long | Simplify task or increase timeout |
runtime_error |
Code throws at runtime | Debug runtime exception |
invalid_artifact |
Output format wrong | Follow artifact schema |
Non-Retryable Errors
Some errors aren't retried:
| Category | Why Not Retryable |
|---|---|
environment |
Missing dependencies, wrong Node version |
timeout (infrastructure) |
Network/API timeout, not code issue |
rate_limit |
API rate limited, need to wait |
For these, Orchex reports the error and stops.
Step 4: Examining the Self-Healing Process
Check the Manifest
After self-healing, examine .orchex/active/manifest.yaml:
streams:
create-calculator:
status: complete
appliedAt: "2026-02-05T10:00:00Z"
create-tests:
status: failed
error: "Test failed: Expected divide(10, 0) to throw"
attempts: 1
create-tests-fix-2:
name: "Create Tests (Fix #2)"
parentStreamId: create-tests
status: complete
attempts: 1
appliedAt: "2026-02-05T10:02:30Z"Why `-fix-2` (not `-fix-1`)?
The attempt counter increments when execution starts. When the original stream fails on attempt 1, the fix stream starts at attempt 2. This ensures attempt numbers are globally unique.
Step 5: Configuring Self-Healing Behavior
Per-Stream Timeout
For streams that need more time:
streams:
- id: complex-generation
prompt: Generate comprehensive documentation
timeoutMs: 900000 # 15 minutes instead of default 10
dependencies: []Maximum Attempts
Self-healing has a built-in limit of 3 attempts (original + 2 fixes). After that, the stream remains failed.
Attempt 1: Original stream
Attempt 2: First fix stream (-fix-2)
Attempt 3: Second fix stream (-fix-3)
Attempt 4: ✗ Max attempts reached, stream stays failedPreventing Fix Chains
Fix streams track their parent to prevent cascading fixes:
original-stream:
# fails
original-stream-fix-2:
parentStreamId: original-stream
# fails again
original-stream-fix-3:
parentStreamId: original-stream # Same parent!
# This is attempt 3If a fix stream generates another fix, it still counts toward the original's attempt limit.
Real-World Example: API Integration
Here's a more realistic scenario where self-healing shines:
# api-integration.yaml
streams:
- id: define-types
prompt: |
Create src/api/types.ts with TypeScript interfaces for a REST API:
- User, Post, Comment types
- Request/Response types
- Error types
dependencies: []
verify:
- npm run build
- id: implement-client
prompt: |
Create src/api/client.ts implementing the API client:
- fetchUsers(): Promise<User[]>
- fetchUser(id): Promise<User>
- createPost(data): Promise<Post>
- Error handling with proper types
dependencies: [define-types]
verify:
- npm run build
- npm test -- --testPathPattern=client
- id: implement-hooks
prompt: |
Create src/hooks/useApi.ts with React hooks:
- useUsers(): { users, loading, error }
- useUser(id): { user, loading, error }
- useCreatePost(): { createPost, loading, error }
Use the API client from src/api/client.ts
dependencies: [implement-client]
verify:
- npm run build
- npm test -- --testPathPattern=hooksCommon failure points:
- Types don't match between files → Self-healing adds type fixes
- Tests expect different behavior → Self-healing adjusts implementation
- Build fails on type errors → Self-healing fixes type issues
Debugging Self-Healing
View Fix Stream Details
Check what Orchex learned from the failure:
cat .orchex/active/manifest.yaml | grep -A 20 "fix-2"Check Error Analysis
The error category determines the fix approach:
create-tests:
error: |
FAIL tests/calculator.test.ts
✕ divide by zero should throw (5 ms)
Expected: [Error: Division by zero]
Received: InfinityThis is categorized as test_failure with suggestion to review assertions.
Examine the Generated Plan
Compare original vs fix stream plans:
Original:
Create tests/calculator.test.ts with comprehensive tests...Fix:
PREVIOUS ERROR:
Expected divide(10, 0) to throw, but it returned Infinity
ORIGINAL TASK:
Create tests/calculator.test.ts...
FIX INSTRUCTIONS:
1. Review the test assertions for the divide function
2. Ensure the calculator implementation throws on divide by zero
3. If implementation doesn't throw, the test expectation is wrongBest Practices
1. Write Good Verification Commands
# Good: Specific, informative failures
verify:
- npm run build 2>&1 | head -50
- npm test -- --reporter=verbose
# Bad: Silent failures
verify:
- npm run build > /dev/null
- npm test --silent2. Include Multiple Verification Steps
verify:
- npm run typecheck # Type errors
- npm run lint # Style issues
- npm run test # Logic errors
- npm run build # Final check3. Use Specific Test Patterns
verify:
# Only run tests for this stream's files
- npm test -- --testPathPattern=calculator4. Set Appropriate Timeouts
# Quick stream
- id: create-types
timeoutMs: 120000 # 2 minutes
# Complex stream
- id: generate-docs
timeoutMs: 600000 # 10 minutesWhat You've Learned
✅ How self-healing automatically recovers from failures
✅ How errors are analyzed and categorized
✅ How fix streams are generated with error context
✅ How to configure timeouts and verification
✅ Best practices for reliable orchestrations
Next Steps
- Context Optimization Tutorial — Improve quality and reduce costs
- Error Handling Guide — Deep dive into error categories
- Troubleshooting — Common issues and fixes
Self-healing makes your orchestrations more robust and reduces manual intervention!