Error Handling Guide

Orchex provides intelligent error handling with automatic classification, retry logic, and self-healing capabilities. This guide explains how errors are categorized and handled throughout the execution pipeline.

Error Classification

Retryable Errors

Retryable errors are temporary failures that may succeed on subsequent attempts. Orchex automatically retries these with exponential backoff.

API Rate Limits (429)

// Automatic retry with backoff
{
  "error": "rate_limit_error",
  "message": "Request rate limit exceeded",
  "retryable": true,
  "retryAfter": 60 // seconds
}

Retry Strategy:

Initial delay: 1 second
Maximum retries: 5
Backoff: Exponential (2x multiplier)
Respects Retry-After headers

Overloaded Errors (529)

{
  "error": "overloaded_error",
  "message": "API is temporarily overloaded",
  "retryable": true
}

Retry Strategy:

Initial delay: 5 seconds
Maximum retries: 3
Backoff: Exponential
Additional jitter to prevent thundering herd

Network Errors

// Connection failures, timeouts, DNS issues
{
  "error": "network_error",
  "code": "ECONNRESET" | "ETIMEDOUT" | "ENOTFOUND",
  "retryable": true
}

Retry Strategy:

Initial delay: 2 seconds
Maximum retries: 3
Backoff: Exponential

Non-Retryable Errors

Non-retryable errors are permanent failures that require user intervention or indicate invalid requests.

Authentication Errors (401)

{
  "error": "authentication_error",
  "message": "Invalid API key",
  "retryable": false
}

Resolution: Check your API key configuration in .env or cloud settings.

Permission Errors (403)

{
  "error": "permission_error",
  "message": "Insufficient permissions for this operation",
  "retryable": false
}

Resolution: Verify API key has required permissions or upgrade plan.

Validation Errors (400)

{
  "error": "invalid_request_error",
  "message": "Invalid parameters",
  "retryable": false,
  "details": {
    "field": "commands",
    "issue": "Empty command list"
  }
}

Resolution: Fix the manifest or command parameters.

Not Found Errors (404)

{
  "error": "not_found_error",
  "message": "Resource not found",
  "retryable": false
}

Resolution: Verify the resource exists or update references.

Stream Error Patterns

Command Execution Errors

When a command fails during stream execution:

# manifest.yaml
streams:
  - id: my-stream
    commands:
      - read_file
      - edit_file  # Fails here
      - write_file # Not executed

Behavior:

Stream stops at failed command
Error is analyzed and classified
If retryable: automatic retry with backoff
If non-retryable: stream marked as failed
Self-healing attempts fix if enabled

Retry Configuration

Configure retry behavior per stream:

streams:
  - id: critical-stream
    maxRetries: 5
    retryDelay: 1000  # milliseconds
    retryBackoff: 2.0  # multiplier
    commands:
      - ...

Error Recovery Patterns

Pattern 1: Graceful Degradation

streams:
  - id: optional-enhancement
    continueOnError: true  # Continue even if stream fails
    commands:
      - add_feature

Use case: Non-critical enhancements that shouldn't block main work.

Pattern 2: Critical Path

streams:
  - id: core-functionality
    continueOnError: false  # Default: stop on error
    commands:
      - implement_feature

Use case: Essential changes that must succeed.

Pattern 3: Rollback on Failure

streams:
  - id: database-migration
    rollbackOn: error
    commands:
      - backup_schema
      - alter_table
      - verify_migration

Note: Rollback support is planned for future versions.

Command Design for Error Handling

Validation Best Practices

1. Explicit Validation

// ✅ Good: Clear validation with specific errors
export const myCommand: CommandDefinition = {
  name: 'my_command',
  parameters: z.object({
    path: z.string().min(1, 'Path cannot be empty'),
    content: z.string(),
    encoding: z.enum(['utf8', 'base64']).default('utf8')
  }),
  async execute(params) {
    // Additional runtime validation
    if (!existsSync(params.path)) {
      throw new CommandError('File not found', 'NOT_FOUND', false);
    }
    // ...
  }
};

// ❌ Bad: Vague validation, unclear errors
export const badCommand: CommandDefinition = {
  name: 'bad_command',
  parameters: z.object({
    data: z.any()  // Too permissive
  }),
  async execute(params) {
    // Implicit validation that may throw unclear errors
    const result = params.data.something.deeply.nested;
    // ...
  }
};

2. Error Context

// ✅ Good: Rich error context
throw new CommandError(
  `Failed to write file: ${error.message}`,
  'WRITE_ERROR',
  true,  // retryable
  {
    path: params.path,
    originalError: error.code,
    diskSpace: await checkDiskSpace()
  }
);

// ❌ Bad: Minimal context
throw new Error('Write failed');

3. Retryable vs Non-Retryable

// Retryable: Temporary conditions
if (error.code === 'EBUSY') {
  throw new CommandError(
    'File is locked by another process',
    'FILE_LOCKED',
    true  // Retryable
  );
}

// Non-retryable: Permanent conditions
if (error.code === 'EACCES') {
  throw new CommandError(
    'Permission denied',
    'PERMISSION_DENIED',
    false  // Not retryable
  );
}

Custom Command Error Handling

import { CommandDefinition, CommandError } from './types';
import { z } from 'zod';

export const safeCommand: CommandDefinition = {
  name: 'safe_command',
  description: 'Example with comprehensive error handling',
  parameters: z.object({
    input: z.string()
  }),
  
  async execute(params, context) {
    try {
      // Pre-execution validation
      if (!context.projectRoot) {
        throw new CommandError(
          'Project root not configured',
          'INVALID_CONTEXT',
          false
        );
      }

      // Main logic
      const result = await riskyOperation(params.input);
      
      return {
        success: true,
        data: result
      };
      
    } catch (error) {
      // Classify and rethrow with context
      if (error instanceof CommandError) {
        throw error;  // Already classified
      }
      
      // Network errors: retryable
      if (error.code === 'ECONNRESET' || error.code === 'ETIMEDOUT') {
        throw new CommandError(
          `Network error: ${error.message}`,
          'NETWORK_ERROR',
          true,
          { originalCode: error.code }
        );
      }
      
      // Unknown errors: not retryable by default
      throw new CommandError(
        `Unexpected error: ${error.message}`,
        'UNKNOWN_ERROR',
        false,
        { stack: error.stack }
      );
    }
  }
};

Self-Healing Capabilities

Orchex includes intelligent self-healing that attempts to fix errors automatically.

Automatic Fixes

1. Missing Dependencies

// Error detected
{
  "error": "MODULE_NOT_FOUND",
  "message": "Cannot find module 'lodash'"
}

// Self-healing action
// Automatically runs: npm install lodash

2. Syntax Errors

// Error detected
{
  "error": "SYNTAX_ERROR",
  "message": "Unexpected token '}'",
  "file": "src/utils.ts",
  "line": 42
}

// Self-healing action
// Analyzes context and suggests fix
// May automatically fix common issues (missing commas, brackets)

3. Type Errors

// Error detected
{
  "error": "TYPE_ERROR",
  "message": "Property 'map' does not exist on type 'string'"
}

// Self-healing action
// Analyzes intended operation
// Suggests type fixes or refactoring

Configuration

Enable/disable self-healing in your manifest:

metadata:
  selfHealing:
    enabled: true
    maxAttempts: 3
    strategies:
      - fix_syntax
      - install_dependencies
      - update_imports

Or via environment:

ORCHEX_SELF_HEALING=true
ORCHEX_MAX_HEALING_ATTEMPTS=3

Debugging Failed Executions

Error Logs

Orchex provides detailed error logs:

# Local execution
npx @wundam/orchex execute ./manifest.yaml --verbose

# Cloud execution (check logs)
curl https://api.orchex.dev/v1/jobs/{jobId}/logs \
  -H "Authorization: Bearer $ORCHEX_API_KEY"

Error Analysis

Use the error analyzer to understand failures:

import { analyzeError } from '@orchex/intelligence/error-analyzer';

const analysis = await analyzeError(error, {
  context: executionContext,
  previousAttempts: retryHistory
});

console.log(analysis);
// {
//   classification: 'RATE_LIMIT',
//   retryable: true,
//   suggestedDelay: 60000,
//   confidence: 0.95,
//   recommendations: ['Wait 60s before retry', 'Consider rate limiting']
// }

Common Issues

Issue: Commands Always Failing

Symptoms:

All commands fail immediately
Error: "Authentication failed"

Solution:

# Check API key
echo $ANTHROPIC_API_KEY

# Verify key has correct permissions
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01"

Issue: Intermittent Failures

Symptoms:

Commands fail randomly
Error: "Rate limit exceeded"

Solution:

# Adjust concurrency in manifest
execution:
  concurrency: 1  # Reduce from default 3
  delay: 1000      # Add delay between commands

Issue: Timeout Errors

Symptoms:

Long-running operations fail
Error: "Request timeout"

Solution:

# Increase timeout
streams:
  - id: long-running
    timeout: 300000  # 5 minutes (milliseconds)
    commands:
      - complex_operation

Best Practices

1. Design for Idempotency

Make commands idempotent so retries are safe:

// ✅ Good: Idempotent
export const createFileIfNotExists: CommandDefinition = {
  async execute(params) {
    if (!existsSync(params.path)) {
      await writeFile(params.path, params.content);
    }
    return { created: !existsSync(params.path) };
  }
};

// ❌ Bad: Not idempotent
export const appendToFile: CommandDefinition = {
  async execute(params) {
    // Retry will duplicate content
    await appendFile(params.path, params.content);
  }
};

2. Provide Clear Error Messages

// ✅ Good: Actionable error message
throw new CommandError(
  'Failed to read package.json. Ensure file exists and is valid JSON.',
  'INVALID_PACKAGE_JSON',
  false,
  { path: './package.json', parseError: error.message }
);

// ❌ Bad: Vague error
throw new Error('Failed');

3. Use Appropriate Retry Logic

# Critical operations: More retries
streams:
  - id: deploy
    maxRetries: 5
    retryDelay: 2000
    commands:
      - build
      - deploy

# Experimental features: Fewer retries
streams:
  - id: optional-enhancement
    maxRetries: 1
    continueOnError: true
    commands:
      - add_feature

4. Monitor and Alert

For cloud executions, set up monitoring:

// Webhook for error notifications
const job = await orchex.execute(manifest, {
  webhooks: {
    onError: 'https://your-app.com/webhooks/orchex-error',
    onComplete: 'https://your-app.com/webhooks/orchex-complete'
  }
});

API Reference

CommandError Class

class CommandError extends Error {
  constructor(
    message: string,
    code: string,
    retryable: boolean,
    context?: Record<string, unknown>
  );
}

Error Analyzer

interface ErrorAnalysis {
  classification: ErrorClassification;
  retryable: boolean;
  suggestedDelay?: number;
  confidence: number;
  recommendations: string[];
  context: Record<string, unknown>;
}

function analyzeError(
  error: Error,
  context: ExecutionContext
): Promise<ErrorAnalysis>;

Self-Healer

interface HealingResult {
  success: boolean;
  strategy: string;
  changes: FileChange[];
  recommendations: string[];
}

function attemptHeal(
  error: Error,
  context: ExecutionContext,
  options?: HealingOptions
): Promise<HealingResult>;

Error Handling Guide

Error Classification

Retryable Errors

API Rate Limits (429)

Overloaded Errors (529)

Network Errors

Non-Retryable Errors

Authentication Errors (401)

Permission Errors (403)

Validation Errors (400)

Not Found Errors (404)

Stream Error Patterns

Command Execution Errors

Retry Configuration

Error Recovery Patterns

Pattern 1: Graceful Degradation

Pattern 2: Critical Path

Pattern 3: Rollback on Failure

Command Design for Error Handling

Validation Best Practices

1. Explicit Validation

2. Error Context

3. Retryable vs Non-Retryable

Custom Command Error Handling

Self-Healing Capabilities

Automatic Fixes

1. Missing Dependencies

2. Syntax Errors

3. Type Errors

Configuration

Debugging Failed Executions

Error Logs

Error Analysis

Common Issues

Issue: Commands Always Failing

Issue: Intermittent Failures

Issue: Timeout Errors

Best Practices

1. Design for Idempotency

2. Provide Clear Error Messages

3. Use Appropriate Retry Logic

4. Monitor and Alert

API Reference

CommandError Class

Error Analyzer

Self-Healer

See Also