Orchestrator

The orchestrator is the execution engine that powers Bootspring workflows. It manages phase transitions, coordinates agents, handles failures, and ensures complex tasks complete successfully.

How the Orchestrator Works#

The orchestrator coordinates the entire workflow lifecycle:

┌─────────────────────────────────────────────────────────────────────────┐ │ Orchestrator Engine │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Workflow │ │ Phase │ │ Agent │ │ │ │ Registry │───>│ Manager │───>│ Coordinator │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ State │ │ Gate │ │ Artifact │ │ │ │ Manager │<──>│ Manager │<──>│ Manager │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ ═══════════════════════════════════════════════════════════════════ │ │ Persistence Layer │ │ Checkpoints │ Logs │ Artifacts │ │ │ └─────────────────────────────────────────────────────────────────────────┘

Development Lifecycle Phases#

The orchestrator understands 9 standard development phases:

┌──────────────────────────────────────────────────────────────────────────┐ │ Development Lifecycle │ ├──────────────────────────────────────────────────────────────────────────┤ │ │ │ 1. Ideation 2. Planning 3. Design 4. Development │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │Concepts │──>│ Scope & │──>│ Schema │──>│ Code │ │ │ │Research │ │Strategy │ │API, UX │ │Building │ │ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ │ │ │ ┌──────────────────────────────────────────────┘ │ │ │ │ │ │ 5. Testing 6. Review 7. Deploy 8. Monitor │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ └─>│ Unit │──>│Security │──>│Release │──>│ Health │ │ │ │E2E, QA │ │ Code QA │ │ CI/CD │ │Analytics│ │ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ │ │ │ ┌──────────────────────────────────────────────────┘ │ │ │ │ │ │ 9. Iterate │ │ │ ┌─────────┐ │ │ └─>│Feedback │──────────────────────────────────┐ │ │ │ Improve │ │ │ │ └─────────┘ ▼ │ │ Back to any phase │ │ │ └──────────────────────────────────────────────────────────────────────────┘

Phase Details#

PhasePurposeDefault Agent
IdeationBrainstorm and researchresearch-expert
PlanningScope and strategyarchitecture-expert
DesignTechnical specificationsdatabase-expert, api-expert
DevelopmentCode implementationbackend-expert, frontend-expert
TestingQuality assurancetesting-expert
ReviewCode and security reviewsecurity-expert, code-review-expert
DeployRelease to productiondevops-expert
MonitorTrack health and metricsmonitoring-expert
IterateImprove based on feedbackproduct-expert

Execution Modes#

Sequential Execution#

Phases run one after another:

Plan ──> Design ──> Build ──> Test ──> Review │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ Done Done Done Done Done

Parallel Execution#

Multiple agents work simultaneously:

┌──> Backend ──┐ Plan ──> Design ──>│ ├──> Test ──> Review └──> Frontend ─┘

Adaptive Execution#

The orchestrator can adjust based on results:

Plan ──> Design ──> Build ──> Test ──┬──> Review (pass) │ └──> Fix ──> Test (fail, retry)

State Management#

The orchestrator maintains comprehensive state:

1{ 2 "workflowId": "wf_abc123", 3 "workflow": "feature-development", 4 "status": "running", 5 "currentPhase": "development", 6 "progress": { 7 "completed": 3, 8 "total": 5, 9 "percentage": 60 10 }, 11 "phases": [ 12 { 13 "name": "planning", 14 "status": "completed", 15 "startedAt": "2024-02-19T10:00:00Z", 16 "completedAt": "2024-02-19T10:03:00Z", 17 "duration": 180000, 18 "agent": "architecture-expert", 19 "artifacts": ["plan.md"] 20 }, 21 { 22 "name": "design", 23 "status": "completed", 24 "duration": 240000, 25 "agents": ["database-expert", "api-expert"], 26 "artifacts": ["design.md", "schema.prisma"] 27 }, 28 { 29 "name": "development", 30 "status": "in_progress", 31 "startedAt": "2024-02-19T10:07:00Z", 32 "agents": ["backend-expert", "frontend-expert"], 33 "parallel": true, 34 "tasks": [ 35 { "agent": "backend-expert", "status": "in_progress" }, 36 { "agent": "frontend-expert", "status": "completed" } 37 ] 38 }, 39 { 40 "name": "testing", 41 "status": "pending" 42 }, 43 { 44 "name": "review", 45 "status": "pending" 46 } 47 ], 48 "checkpoints": [ 49 { "phase": "planning", "path": "checkpoints/planning.json" }, 50 { "phase": "design", "path": "checkpoints/design.json" } 51 ], 52 "context": { 53 "feature": "user notifications", 54 "requirements": ["email", "push", "in-app"] 55 } 56}

Status Values#

StatusDescription
pendingNot yet started
runningCurrently executing
pausedManually or automatically paused
completedSuccessfully finished
failedEncountered an error
cancelledManually stopped

Agent Coordination#

The orchestrator manages multiple agents working together:

Agent Assignment#

Each phase can have:

  • Single agent: One expert handles the phase
  • Multiple agents: Several experts collaborate
  • Parallel agents: Agents work simultaneously
1// Configuration example 2{ 3 phases: [ 4 { 5 name: 'planning', 6 agent: 'architecture-expert' // Single 7 }, 8 { 9 name: 'design', 10 agents: ['database-expert', 'api-expert', 'ui-ux-expert'] // Multiple 11 }, 12 { 13 name: 'development', 14 parallel: true, 15 tasks: [ 16 { agent: 'backend-expert', task: 'Build API endpoints' }, 17 { agent: 'frontend-expert', task: 'Build UI components' } 18 ] 19 } 20 ] 21}

Agent Communication#

Agents share context through:

  1. Workflow context: Initial parameters and requirements
  2. Phase artifacts: Documents created by previous phases
  3. State updates: Real-time progress information
Phase 1 Output ──────────────────────────────────────┐ │ Phase 2 reads Phase 1 artifacts │ │ │ ▼ ▼ Phase 2 Output ───────────> Phase 3 reads all previous artifacts

Quality Gate Integration#

The orchestrator enforces quality gates between phases:

Development ──┬──> pre-commit gate ──> pass ──> Testing │ │ │ └──> fail ──> Fix & Retry │ └──> blocked until gate passes

Gate Types#

GateWhenWhat It Checks
pre-commitAfter developmentLinting, formatting, types
pre-pushAfter testingTests pass, coverage threshold
pre-deployAfter reviewSecurity scan, build success

Gate Failure Handling#

When a gate fails:

  1. Workflow pauses
  2. Failure details recorded
  3. Options presented:
    • Fix and retry
    • Skip gate (if allowed)
    • Cancel workflow

Checkpoint System#

The orchestrator creates checkpoints for recovery:

Automatic Checkpoints#

Created after each phase completes:

.bootspring/workflows/wf_abc123/ ├── checkpoints/ │ ├── planning.json │ ├── design.json │ └── development.json └── state.json

Checkpoint Content#

1{ 2 "phase": "design", 3 "timestamp": "2024-02-19T10:07:00Z", 4 "state": { /* full state snapshot */ }, 5 "artifacts": ["design.md", "schema.prisma"], 6 "context": { /* accumulated context */ } 7}

Recovery#

Restore from any checkpoint:

Restore workflow wf_abc123 to the design checkpoint.

The orchestrator will:

  1. Load checkpoint state
  2. Reset phases after checkpoint
  3. Resume from that point

Failure Handling#

Automatic Retries#

Transient failures are retried automatically:

1module.exports = { 2 orchestrator: { 3 retry: { 4 maxAttempts: 3, 5 backoff: 'exponential', 6 initialDelay: 1000, 7 }, 8 }, 9};

Pause on Failure#

Significant failures pause the workflow:

1{ 2 "status": "paused", 3 "error": { 4 "phase": "testing", 5 "type": "QUALITY_GATE_FAILED", 6 "message": "Test coverage (68%) below threshold (80%)", 7 "details": { 8 "metric": "coverage", 9 "actual": 68, 10 "required": 80 11 } 12 }, 13 "recovery": { 14 "options": ["retry", "skip", "cancel"], 15 "recommended": "retry" 16 } 17}

Manual Intervention#

Some failures require human decision:

The workflow has paused because tests are failing. Options: 1. Fix the tests and retry 2. Skip the testing phase (not recommended) 3. Cancel the workflow

Configuration#

Basic Configuration#

1// bootspring.config.js 2module.exports = { 3 orchestrator: { 4 // Auto-advance to next phase 5 autoAdvance: true, 6 7 // Pause on any failure 8 pauseOnFailure: true, 9 10 // Save checkpoints 11 saveCheckpoints: true, 12 13 // Notify on completion 14 notifyOnComplete: false, 15 }, 16};

Phase Configuration#

1module.exports = { 2 orchestrator: { 3 phases: { 4 planning: { 5 timeout: 300000, // 5 minute timeout 6 required: true, // Cannot skip 7 }, 8 testing: { 9 timeout: 600000, // 10 minute timeout 10 required: true, 11 qualityGate: 'pre-push', 12 }, 13 review: { 14 timeout: 300000, 15 required: false, // Can skip 16 }, 17 }, 18 }, 19};

Checkpoint Configuration#

1module.exports = { 2 orchestrator: { 3 checkpoints: { 4 frequency: 'phase', // 'phase', 'step', or 'manual' 5 retention: 7, // Days to keep 6 autoRestore: true, // Auto-restore on resume 7 compress: true, // Compress checkpoint files 8 }, 9 }, 10};

Monitoring and Logs#

Workflow Logs#

All orchestrator activity is logged:

.bootspring/workflows/wf_abc123/ └── logs/ ├── orchestrator.log # Orchestrator decisions ├── phase-planning.log # Planning phase log ├── phase-design.log # Design phase log └── phase-dev.log # Development phase log

Log Format#

[2024-02-19T10:00:00Z] [INFO] Workflow wf_abc123 started [2024-02-19T10:00:00Z] [INFO] Phase: planning - Starting [2024-02-19T10:00:00Z] [INFO] Agent: architecture-expert - Invoked [2024-02-19T10:03:00Z] [INFO] Phase: planning - Completed (180s) [2024-02-19T10:03:00Z] [INFO] Checkpoint saved: planning [2024-02-19T10:03:00Z] [INFO] Phase: design - Starting

Metrics#

The orchestrator tracks:

  • Total workflow duration
  • Phase durations
  • Retry counts
  • Gate pass/fail rates
  • Agent utilization

Best Practices#

1. Let the Orchestrator Drive#

Don't manually skip phases without good reason. The workflow structure exists for quality.

2. Review Checkpoints#

Before resuming a paused workflow, review the last checkpoint to understand the state.

3. Use Quality Gates#

Enable gates for production-critical workflows:

1module.exports = { 2 orchestrator: { 3 enforceGates: true, 4 requiredGates: ['pre-commit', 'pre-push'], 5 }, 6};

4. Monitor Long Workflows#

For workflows over an hour, consider:

  • Breaking into smaller workflows
  • Adding more checkpoints
  • Enabling notifications

5. Handle Failures Properly#

  • Always investigate failures before skipping
  • Use retry for transient issues
  • Cancel and restart for fundamental problems