Error Handling Guide
Orchex provides intelligent error handling with automatic classification, retry logic, and self-healing capabilities. This guide explains how errors are categorized and handled throughout the execution pipeline.
Error Classification
Retryable Errors
Retryable errors are temporary failures that may succeed on subsequent attempts. Orchex automatically retries these with exponential backoff.
API Rate Limits (429)
// Automatic retry with backoff
{
"error": "rate_limit_error",
"message": "Request rate limit exceeded",
"retryable": true,
"retryAfter": 60 // seconds
}Retry Strategy:
- Initial delay: 1 second
- Maximum retries: 5
- Backoff: Exponential (2x multiplier)
- Respects
Retry-Afterheaders
Overloaded Errors (529)
{
"error": "overloaded_error",
"message": "API is temporarily overloaded",
"retryable": true
}Retry Strategy:
- Initial delay: 5 seconds
- Maximum retries: 3
- Backoff: Exponential
- Additional jitter to prevent thundering herd
Network Errors
// Connection failures, timeouts, DNS issues
{
"error": "network_error",
"code": "ECONNRESET" | "ETIMEDOUT" | "ENOTFOUND",
"retryable": true
}Retry Strategy:
- Initial delay: 2 seconds
- Maximum retries: 3
- Backoff: Exponential
Non-Retryable Errors
Non-retryable errors are permanent failures that require user intervention or indicate invalid requests.
Authentication Errors (401)
{
"error": "authentication_error",
"message": "Invalid API key",
"retryable": false
}Resolution: Check your API key configuration in .env or cloud settings.
Permission Errors (403)
{
"error": "permission_error",
"message": "Insufficient permissions for this operation",
"retryable": false
}Resolution: Verify API key has required permissions or upgrade plan.
Validation Errors (400)
{
"error": "invalid_request_error",
"message": "Invalid parameters",
"retryable": false,
"details": {
"field": "commands",
"issue": "Empty command list"
}
}Resolution: Fix the manifest or command parameters.
Not Found Errors (404)
{
"error": "not_found_error",
"message": "Resource not found",
"retryable": false
}Resolution: Verify the resource exists or update references.
Stream Error Patterns
Command Execution Errors
When a command fails during stream execution:
# manifest.yaml
streams:
- id: my-stream
commands:
- read_file
- edit_file # Fails here
- write_file # Not executedBehavior:
- Stream stops at failed command
- Error is analyzed and classified
- If retryable: automatic retry with backoff
- If non-retryable: stream marked as failed
- Self-healing attempts fix if enabled
Retry Configuration
Configure retry behavior per stream:
streams:
- id: critical-stream
maxRetries: 5
retryDelay: 1000 # milliseconds
retryBackoff: 2.0 # multiplier
commands:
- ...Error Recovery Patterns
Pattern 1: Graceful Degradation
streams:
- id: optional-enhancement
continueOnError: true # Continue even if stream fails
commands:
- add_featureUse case: Non-critical enhancements that shouldn't block main work.
Pattern 2: Critical Path
streams:
- id: core-functionality
continueOnError: false # Default: stop on error
commands:
- implement_featureUse case: Essential changes that must succeed.
Pattern 3: Rollback on Failure
streams:
- id: database-migration
rollbackOn: error
commands:
- backup_schema
- alter_table
- verify_migrationNote: Rollback support is planned for future versions.
Command Design for Error Handling
Validation Best Practices
1. Explicit Validation
// ✅ Good: Clear validation with specific errors
export const myCommand: CommandDefinition = {
name: 'my_command',
parameters: z.object({
path: z.string().min(1, 'Path cannot be empty'),
content: z.string(),
encoding: z.enum(['utf8', 'base64']).default('utf8')
}),
async execute(params) {
// Additional runtime validation
if (!existsSync(params.path)) {
throw new CommandError('File not found', 'NOT_FOUND', false);
}
// ...
}
};// ❌ Bad: Vague validation, unclear errors
export const badCommand: CommandDefinition = {
name: 'bad_command',
parameters: z.object({
data: z.any() // Too permissive
}),
async execute(params) {
// Implicit validation that may throw unclear errors
const result = params.data.something.deeply.nested;
// ...
}
};2. Error Context
// ✅ Good: Rich error context
throw new CommandError(
`Failed to write file: ${error.message}`,
'WRITE_ERROR',
true, // retryable
{
path: params.path,
originalError: error.code,
diskSpace: await checkDiskSpace()
}
);// ❌ Bad: Minimal context
throw new Error('Write failed');3. Retryable vs Non-Retryable
// Retryable: Temporary conditions
if (error.code === 'EBUSY') {
throw new CommandError(
'File is locked by another process',
'FILE_LOCKED',
true // Retryable
);
}
// Non-retryable: Permanent conditions
if (error.code === 'EACCES') {
throw new CommandError(
'Permission denied',
'PERMISSION_DENIED',
false // Not retryable
);
}Custom Command Error Handling
import { CommandDefinition, CommandError } from './types';
import { z } from 'zod';
export const safeCommand: CommandDefinition = {
name: 'safe_command',
description: 'Example with comprehensive error handling',
parameters: z.object({
input: z.string()
}),
async execute(params, context) {
try {
// Pre-execution validation
if (!context.projectRoot) {
throw new CommandError(
'Project root not configured',
'INVALID_CONTEXT',
false
);
}
// Main logic
const result = await riskyOperation(params.input);
return {
success: true,
data: result
};
} catch (error) {
// Classify and rethrow with context
if (error instanceof CommandError) {
throw error; // Already classified
}
// Network errors: retryable
if (error.code === 'ECONNRESET' || error.code === 'ETIMEDOUT') {
throw new CommandError(
`Network error: ${error.message}`,
'NETWORK_ERROR',
true,
{ originalCode: error.code }
);
}
// Unknown errors: not retryable by default
throw new CommandError(
`Unexpected error: ${error.message}`,
'UNKNOWN_ERROR',
false,
{ stack: error.stack }
);
}
}
};Self-Healing Capabilities
Orchex includes intelligent self-healing that attempts to fix errors automatically.
Automatic Fixes
1. Missing Dependencies
// Error detected
{
"error": "MODULE_NOT_FOUND",
"message": "Cannot find module 'lodash'"
}
// Self-healing action
// Automatically runs: npm install lodash2. Syntax Errors
// Error detected
{
"error": "SYNTAX_ERROR",
"message": "Unexpected token '}'",
"file": "src/utils.ts",
"line": 42
}
// Self-healing action
// Analyzes context and suggests fix
// May automatically fix common issues (missing commas, brackets)3. Type Errors
// Error detected
{
"error": "TYPE_ERROR",
"message": "Property 'map' does not exist on type 'string'"
}
// Self-healing action
// Analyzes intended operation
// Suggests type fixes or refactoringConfiguration
Enable/disable self-healing in your manifest:
metadata:
selfHealing:
enabled: true
maxAttempts: 3
strategies:
- fix_syntax
- install_dependencies
- update_importsOr via environment:
ORCHEX_SELF_HEALING=true
ORCHEX_MAX_HEALING_ATTEMPTS=3Debugging Failed Executions
Error Logs
Orchex provides detailed error logs:
# Local execution
npx @wundam/orchex execute ./manifest.yaml --verbose
# Cloud execution (check logs)
curl https://api.orchex.dev/v1/jobs/{jobId}/logs \
-H "Authorization: Bearer $ORCHEX_API_KEY"Error Analysis
Use the error analyzer to understand failures:
import { analyzeError } from '@orchex/intelligence/error-analyzer';
const analysis = await analyzeError(error, {
context: executionContext,
previousAttempts: retryHistory
});
console.log(analysis);
// {
// classification: 'RATE_LIMIT',
// retryable: true,
// suggestedDelay: 60000,
// confidence: 0.95,
// recommendations: ['Wait 60s before retry', 'Consider rate limiting']
// }Common Issues
Issue: Commands Always Failing
Symptoms:
- All commands fail immediately
- Error: "Authentication failed"
Solution:
# Check API key
echo $ANTHROPIC_API_KEY
# Verify key has correct permissions
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01"Issue: Intermittent Failures
Symptoms:
- Commands fail randomly
- Error: "Rate limit exceeded"
Solution:
# Adjust concurrency in manifest
execution:
concurrency: 1 # Reduce from default 3
delay: 1000 # Add delay between commandsIssue: Timeout Errors
Symptoms:
- Long-running operations fail
- Error: "Request timeout"
Solution:
# Increase timeout
streams:
- id: long-running
timeout: 300000 # 5 minutes (milliseconds)
commands:
- complex_operationBest Practices
1. Design for Idempotency
Make commands idempotent so retries are safe:
// ✅ Good: Idempotent
export const createFileIfNotExists: CommandDefinition = {
async execute(params) {
if (!existsSync(params.path)) {
await writeFile(params.path, params.content);
}
return { created: !existsSync(params.path) };
}
};
// ❌ Bad: Not idempotent
export const appendToFile: CommandDefinition = {
async execute(params) {
// Retry will duplicate content
await appendFile(params.path, params.content);
}
};2. Provide Clear Error Messages
// ✅ Good: Actionable error message
throw new CommandError(
'Failed to read package.json. Ensure file exists and is valid JSON.',
'INVALID_PACKAGE_JSON',
false,
{ path: './package.json', parseError: error.message }
);
// ❌ Bad: Vague error
throw new Error('Failed');3. Use Appropriate Retry Logic
# Critical operations: More retries
streams:
- id: deploy
maxRetries: 5
retryDelay: 2000
commands:
- build
- deploy
# Experimental features: Fewer retries
streams:
- id: optional-enhancement
maxRetries: 1
continueOnError: true
commands:
- add_feature4. Monitor and Alert
For cloud executions, set up monitoring:
// Webhook for error notifications
const job = await orchex.execute(manifest, {
webhooks: {
onError: 'https://your-app.com/webhooks/orchex-error',
onComplete: 'https://your-app.com/webhooks/orchex-complete'
}
});API Reference
CommandError Class
class CommandError extends Error {
constructor(
message: string,
code: string,
retryable: boolean,
context?: Record<string, unknown>
);
}Error Analyzer
interface ErrorAnalysis {
classification: ErrorClassification;
retryable: boolean;
suggestedDelay?: number;
confidence: number;
recommendations: string[];
context: Record<string, unknown>;
}
function analyzeError(
error: Error,
context: ExecutionContext
): Promise<ErrorAnalysis>;Self-Healer
interface HealingResult {
success: boolean;
strategy: string;
changes: FileChange[];
recommendations: string[];
}
function attemptHeal(
error: Error,
context: ExecutionContext,
options?: HealingOptions
): Promise<HealingResult>;