Error Handling Guide

Orchex provides intelligent error handling with automatic classification, retry logic, and self-healing capabilities. This guide explains how errors are categorized and handled throughout the execution pipeline.

Error Classification

Retryable Errors

Retryable errors are temporary failures that may succeed on subsequent attempts. Orchex automatically retries these with exponential backoff.

API Rate Limits (429)

// Automatic retry with backoff
{
  "error": "rate_limit_error",
  "message": "Request rate limit exceeded",
  "retryable": true,
  "retryAfter": 60 // seconds
}

Retry Strategy:

  • Initial delay: 1 second
  • Maximum retries: 5
  • Backoff: Exponential (2x multiplier)
  • Respects Retry-After headers

Overloaded Errors (529)

{
  "error": "overloaded_error",
  "message": "API is temporarily overloaded",
  "retryable": true
}

Retry Strategy:

  • Initial delay: 5 seconds
  • Maximum retries: 3
  • Backoff: Exponential
  • Additional jitter to prevent thundering herd

Network Errors

// Connection failures, timeouts, DNS issues
{
  "error": "network_error",
  "code": "ECONNRESET" | "ETIMEDOUT" | "ENOTFOUND",
  "retryable": true
}

Retry Strategy:

  • Initial delay: 2 seconds
  • Maximum retries: 3
  • Backoff: Exponential

Non-Retryable Errors

Non-retryable errors are permanent failures that require user intervention or indicate invalid requests.

Authentication Errors (401)

{
  "error": "authentication_error",
  "message": "Invalid API key",
  "retryable": false
}

Resolution: Check your API key configuration in .env or cloud settings.

Permission Errors (403)

{
  "error": "permission_error",
  "message": "Insufficient permissions for this operation",
  "retryable": false
}

Resolution: Verify API key has required permissions or upgrade plan.

Validation Errors (400)

{
  "error": "invalid_request_error",
  "message": "Invalid parameters",
  "retryable": false,
  "details": {
    "field": "commands",
    "issue": "Empty command list"
  }
}

Resolution: Fix the manifest or command parameters.

Not Found Errors (404)

{
  "error": "not_found_error",
  "message": "Resource not found",
  "retryable": false
}

Resolution: Verify the resource exists or update references.

Stream Error Patterns

Command Execution Errors

When a command fails during stream execution:

# manifest.yaml
streams:
  - id: my-stream
    commands:
      - read_file
      - edit_file  # Fails here
      - write_file # Not executed

Behavior:

  1. Stream stops at failed command
  2. Error is analyzed and classified
  3. If retryable: automatic retry with backoff
  4. If non-retryable: stream marked as failed
  5. Self-healing attempts fix if enabled

Retry Configuration

Configure retry behavior per stream:

streams:
  - id: critical-stream
    maxRetries: 5
    retryDelay: 1000  # milliseconds
    retryBackoff: 2.0  # multiplier
    commands:
      - ...

Error Recovery Patterns

Pattern 1: Graceful Degradation

streams:
  - id: optional-enhancement
    continueOnError: true  # Continue even if stream fails
    commands:
      - add_feature

Use case: Non-critical enhancements that shouldn't block main work.

Pattern 2: Critical Path

streams:
  - id: core-functionality
    continueOnError: false  # Default: stop on error
    commands:
      - implement_feature

Use case: Essential changes that must succeed.

Pattern 3: Rollback on Failure

streams:
  - id: database-migration
    rollbackOn: error
    commands:
      - backup_schema
      - alter_table
      - verify_migration

Note: Rollback support is planned for future versions.

Command Design for Error Handling

Validation Best Practices

1. Explicit Validation

// ✅ Good: Clear validation with specific errors
export const myCommand: CommandDefinition = {
  name: 'my_command',
  parameters: z.object({
    path: z.string().min(1, 'Path cannot be empty'),
    content: z.string(),
    encoding: z.enum(['utf8', 'base64']).default('utf8')
  }),
  async execute(params) {
    // Additional runtime validation
    if (!existsSync(params.path)) {
      throw new CommandError('File not found', 'NOT_FOUND', false);
    }
    // ...
  }
};
// ❌ Bad: Vague validation, unclear errors
export const badCommand: CommandDefinition = {
  name: 'bad_command',
  parameters: z.object({
    data: z.any()  // Too permissive
  }),
  async execute(params) {
    // Implicit validation that may throw unclear errors
    const result = params.data.something.deeply.nested;
    // ...
  }
};

2. Error Context

// ✅ Good: Rich error context
throw new CommandError(
  `Failed to write file: ${error.message}`,
  'WRITE_ERROR',
  true,  // retryable
  {
    path: params.path,
    originalError: error.code,
    diskSpace: await checkDiskSpace()
  }
);
// ❌ Bad: Minimal context
throw new Error('Write failed');

3. Retryable vs Non-Retryable

// Retryable: Temporary conditions
if (error.code === 'EBUSY') {
  throw new CommandError(
    'File is locked by another process',
    'FILE_LOCKED',
    true  // Retryable
  );
}

// Non-retryable: Permanent conditions
if (error.code === 'EACCES') {
  throw new CommandError(
    'Permission denied',
    'PERMISSION_DENIED',
    false  // Not retryable
  );
}

Custom Command Error Handling

import { CommandDefinition, CommandError } from './types';
import { z } from 'zod';

export const safeCommand: CommandDefinition = {
  name: 'safe_command',
  description: 'Example with comprehensive error handling',
  parameters: z.object({
    input: z.string()
  }),
  
  async execute(params, context) {
    try {
      // Pre-execution validation
      if (!context.projectRoot) {
        throw new CommandError(
          'Project root not configured',
          'INVALID_CONTEXT',
          false
        );
      }

      // Main logic
      const result = await riskyOperation(params.input);
      
      return {
        success: true,
        data: result
      };
      
    } catch (error) {
      // Classify and rethrow with context
      if (error instanceof CommandError) {
        throw error;  // Already classified
      }
      
      // Network errors: retryable
      if (error.code === 'ECONNRESET' || error.code === 'ETIMEDOUT') {
        throw new CommandError(
          `Network error: ${error.message}`,
          'NETWORK_ERROR',
          true,
          { originalCode: error.code }
        );
      }
      
      // Unknown errors: not retryable by default
      throw new CommandError(
        `Unexpected error: ${error.message}`,
        'UNKNOWN_ERROR',
        false,
        { stack: error.stack }
      );
    }
  }
};

Self-Healing Capabilities

Orchex includes intelligent self-healing that attempts to fix errors automatically.

Automatic Fixes

1. Missing Dependencies

// Error detected
{
  "error": "MODULE_NOT_FOUND",
  "message": "Cannot find module 'lodash'"
}

// Self-healing action
// Automatically runs: npm install lodash

2. Syntax Errors

// Error detected
{
  "error": "SYNTAX_ERROR",
  "message": "Unexpected token '}'",
  "file": "src/utils.ts",
  "line": 42
}

// Self-healing action
// Analyzes context and suggests fix
// May automatically fix common issues (missing commas, brackets)

3. Type Errors

// Error detected
{
  "error": "TYPE_ERROR",
  "message": "Property 'map' does not exist on type 'string'"
}

// Self-healing action
// Analyzes intended operation
// Suggests type fixes or refactoring

Configuration

Enable/disable self-healing in your manifest:

metadata:
  selfHealing:
    enabled: true
    maxAttempts: 3
    strategies:
      - fix_syntax
      - install_dependencies
      - update_imports

Or via environment:

ORCHEX_SELF_HEALING=true
ORCHEX_MAX_HEALING_ATTEMPTS=3

Debugging Failed Executions

Error Logs

Orchex provides detailed error logs:

# Local execution
npx @wundam/orchex execute ./manifest.yaml --verbose

# Cloud execution (check logs)
curl https://api.orchex.dev/v1/jobs/{jobId}/logs \
  -H "Authorization: Bearer $ORCHEX_API_KEY"

Error Analysis

Use the error analyzer to understand failures:

import { analyzeError } from '@orchex/intelligence/error-analyzer';

const analysis = await analyzeError(error, {
  context: executionContext,
  previousAttempts: retryHistory
});

console.log(analysis);
// {
//   classification: 'RATE_LIMIT',
//   retryable: true,
//   suggestedDelay: 60000,
//   confidence: 0.95,
//   recommendations: ['Wait 60s before retry', 'Consider rate limiting']
// }

Common Issues

Issue: Commands Always Failing

Symptoms:

  • All commands fail immediately
  • Error: "Authentication failed"

Solution:

# Check API key
echo $ANTHROPIC_API_KEY

# Verify key has correct permissions
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01"

Issue: Intermittent Failures

Symptoms:

  • Commands fail randomly
  • Error: "Rate limit exceeded"

Solution:

# Adjust concurrency in manifest
execution:
  concurrency: 1  # Reduce from default 3
  delay: 1000      # Add delay between commands

Issue: Timeout Errors

Symptoms:

  • Long-running operations fail
  • Error: "Request timeout"

Solution:

# Increase timeout
streams:
  - id: long-running
    timeout: 300000  # 5 minutes (milliseconds)
    commands:
      - complex_operation

Best Practices

1. Design for Idempotency

Make commands idempotent so retries are safe:

// ✅ Good: Idempotent
export const createFileIfNotExists: CommandDefinition = {
  async execute(params) {
    if (!existsSync(params.path)) {
      await writeFile(params.path, params.content);
    }
    return { created: !existsSync(params.path) };
  }
};

// ❌ Bad: Not idempotent
export const appendToFile: CommandDefinition = {
  async execute(params) {
    // Retry will duplicate content
    await appendFile(params.path, params.content);
  }
};

2. Provide Clear Error Messages

// ✅ Good: Actionable error message
throw new CommandError(
  'Failed to read package.json. Ensure file exists and is valid JSON.',
  'INVALID_PACKAGE_JSON',
  false,
  { path: './package.json', parseError: error.message }
);

// ❌ Bad: Vague error
throw new Error('Failed');

3. Use Appropriate Retry Logic

# Critical operations: More retries
streams:
  - id: deploy
    maxRetries: 5
    retryDelay: 2000
    commands:
      - build
      - deploy

# Experimental features: Fewer retries
streams:
  - id: optional-enhancement
    maxRetries: 1
    continueOnError: true
    commands:
      - add_feature

4. Monitor and Alert

For cloud executions, set up monitoring:

// Webhook for error notifications
const job = await orchex.execute(manifest, {
  webhooks: {
    onError: 'https://your-app.com/webhooks/orchex-error',
    onComplete: 'https://your-app.com/webhooks/orchex-complete'
  }
});

API Reference

CommandError Class

class CommandError extends Error {
  constructor(
    message: string,
    code: string,
    retryable: boolean,
    context?: Record<string, unknown>
  );
}

Error Analyzer

interface ErrorAnalysis {
  classification: ErrorClassification;
  retryable: boolean;
  suggestedDelay?: number;
  confidence: number;
  recommendations: string[];
  context: Record<string, unknown>;
}

function analyzeError(
  error: Error,
  context: ExecutionContext
): Promise<ErrorAnalysis>;

Self-Healer

interface HealingResult {
  success: boolean;
  strategy: string;
  changes: FileChange[];
  recommendations: string[];
}

function attemptHeal(
  error: Error,
  context: ExecutionContext,
  options?: HealingOptions
): Promise<HealingResult>;

See Also