Orchestrator
The orchestrator is the execution engine that powers Bootspring workflows. It manages phase transitions, coordinates agents, handles failures, and ensures complex tasks complete successfully.
How the Orchestrator Works#
The orchestrator coordinates the entire workflow lifecycle:
┌─────────────────────────────────────────────────────────────────────────┐
│ Orchestrator Engine │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Workflow │ │ Phase │ │ Agent │ │
│ │ Registry │───>│ Manager │───>│ Coordinator │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ State │ │ Gate │ │ Artifact │ │
│ │ Manager │<──>│ Manager │<──>│ Manager │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ═══════════════════════════════════════════════════════════════════ │
│ Persistence Layer │
│ Checkpoints │ Logs │ Artifacts │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Development Lifecycle Phases#
The orchestrator understands 9 standard development phases:
┌──────────────────────────────────────────────────────────────────────────┐
│ Development Lifecycle │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Ideation 2. Planning 3. Design 4. Development │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Concepts │──>│ Scope & │──>│ Schema │──>│ Code │ │
│ │Research │ │Strategy │ │API, UX │ │Building │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────┘ │
│ │ │
│ │ 5. Testing 6. Review 7. Deploy 8. Monitor │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ └─>│ Unit │──>│Security │──>│Release │──>│ Health │ │
│ │E2E, QA │ │ Code QA │ │ CI/CD │ │Analytics│ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────┘ │
│ │ │
│ │ 9. Iterate │
│ │ ┌─────────┐ │
│ └─>│Feedback │──────────────────────────────────┐ │
│ │ Improve │ │ │
│ └─────────┘ ▼ │
│ Back to any phase │
│ │
└──────────────────────────────────────────────────────────────────────────┘
Phase Details#
| Phase | Purpose | Default Agent |
|---|---|---|
| Ideation | Brainstorm and research | research-expert |
| Planning | Scope and strategy | architecture-expert |
| Design | Technical specifications | database-expert, api-expert |
| Development | Code implementation | backend-expert, frontend-expert |
| Testing | Quality assurance | testing-expert |
| Review | Code and security review | security-expert, code-review-expert |
| Deploy | Release to production | devops-expert |
| Monitor | Track health and metrics | monitoring-expert |
| Iterate | Improve based on feedback | product-expert |
Execution Modes#
Sequential Execution#
Phases run one after another:
Plan ──> Design ──> Build ──> Test ──> Review
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
Done Done Done Done Done
Parallel Execution#
Multiple agents work simultaneously:
┌──> Backend ──┐
Plan ──> Design ──>│ ├──> Test ──> Review
└──> Frontend ─┘
Adaptive Execution#
The orchestrator can adjust based on results:
Plan ──> Design ──> Build ──> Test ──┬──> Review (pass)
│
└──> Fix ──> Test (fail, retry)
State Management#
The orchestrator maintains comprehensive state:
1{
2 "workflowId": "wf_abc123",
3 "workflow": "feature-development",
4 "status": "running",
5 "currentPhase": "development",
6 "progress": {
7 "completed": 3,
8 "total": 5,
9 "percentage": 60
10 },
11 "phases": [
12 {
13 "name": "planning",
14 "status": "completed",
15 "startedAt": "2024-02-19T10:00:00Z",
16 "completedAt": "2024-02-19T10:03:00Z",
17 "duration": 180000,
18 "agent": "architecture-expert",
19 "artifacts": ["plan.md"]
20 },
21 {
22 "name": "design",
23 "status": "completed",
24 "duration": 240000,
25 "agents": ["database-expert", "api-expert"],
26 "artifacts": ["design.md", "schema.prisma"]
27 },
28 {
29 "name": "development",
30 "status": "in_progress",
31 "startedAt": "2024-02-19T10:07:00Z",
32 "agents": ["backend-expert", "frontend-expert"],
33 "parallel": true,
34 "tasks": [
35 { "agent": "backend-expert", "status": "in_progress" },
36 { "agent": "frontend-expert", "status": "completed" }
37 ]
38 },
39 {
40 "name": "testing",
41 "status": "pending"
42 },
43 {
44 "name": "review",
45 "status": "pending"
46 }
47 ],
48 "checkpoints": [
49 { "phase": "planning", "path": "checkpoints/planning.json" },
50 { "phase": "design", "path": "checkpoints/design.json" }
51 ],
52 "context": {
53 "feature": "user notifications",
54 "requirements": ["email", "push", "in-app"]
55 }
56}Status Values#
| Status | Description |
|---|---|
pending | Not yet started |
running | Currently executing |
paused | Manually or automatically paused |
completed | Successfully finished |
failed | Encountered an error |
cancelled | Manually stopped |
Agent Coordination#
The orchestrator manages multiple agents working together:
Agent Assignment#
Each phase can have:
- Single agent: One expert handles the phase
- Multiple agents: Several experts collaborate
- Parallel agents: Agents work simultaneously
1// Configuration example
2{
3 phases: [
4 {
5 name: 'planning',
6 agent: 'architecture-expert' // Single
7 },
8 {
9 name: 'design',
10 agents: ['database-expert', 'api-expert', 'ui-ux-expert'] // Multiple
11 },
12 {
13 name: 'development',
14 parallel: true,
15 tasks: [
16 { agent: 'backend-expert', task: 'Build API endpoints' },
17 { agent: 'frontend-expert', task: 'Build UI components' }
18 ]
19 }
20 ]
21}Agent Communication#
Agents share context through:
- Workflow context: Initial parameters and requirements
- Phase artifacts: Documents created by previous phases
- State updates: Real-time progress information
Phase 1 Output ──────────────────────────────────────┐
│
Phase 2 reads Phase 1 artifacts │
│ │
▼ ▼
Phase 2 Output ───────────> Phase 3 reads all previous artifacts
Quality Gate Integration#
The orchestrator enforces quality gates between phases:
Development ──┬──> pre-commit gate ──> pass ──> Testing
│ │
│ └──> fail ──> Fix & Retry
│
└──> blocked until gate passes
Gate Types#
| Gate | When | What It Checks |
|---|---|---|
| pre-commit | After development | Linting, formatting, types |
| pre-push | After testing | Tests pass, coverage threshold |
| pre-deploy | After review | Security scan, build success |
Gate Failure Handling#
When a gate fails:
- Workflow pauses
- Failure details recorded
- Options presented:
- Fix and retry
- Skip gate (if allowed)
- Cancel workflow
Checkpoint System#
The orchestrator creates checkpoints for recovery:
Automatic Checkpoints#
Created after each phase completes:
.bootspring/workflows/wf_abc123/
├── checkpoints/
│ ├── planning.json
│ ├── design.json
│ └── development.json
└── state.json
Checkpoint Content#
1{
2 "phase": "design",
3 "timestamp": "2024-02-19T10:07:00Z",
4 "state": { /* full state snapshot */ },
5 "artifacts": ["design.md", "schema.prisma"],
6 "context": { /* accumulated context */ }
7}Recovery#
Restore from any checkpoint:
Restore workflow wf_abc123 to the design checkpoint.
The orchestrator will:
- Load checkpoint state
- Reset phases after checkpoint
- Resume from that point
Failure Handling#
Automatic Retries#
Transient failures are retried automatically:
1module.exports = {
2 orchestrator: {
3 retry: {
4 maxAttempts: 3,
5 backoff: 'exponential',
6 initialDelay: 1000,
7 },
8 },
9};Pause on Failure#
Significant failures pause the workflow:
1{
2 "status": "paused",
3 "error": {
4 "phase": "testing",
5 "type": "QUALITY_GATE_FAILED",
6 "message": "Test coverage (68%) below threshold (80%)",
7 "details": {
8 "metric": "coverage",
9 "actual": 68,
10 "required": 80
11 }
12 },
13 "recovery": {
14 "options": ["retry", "skip", "cancel"],
15 "recommended": "retry"
16 }
17}Manual Intervention#
Some failures require human decision:
The workflow has paused because tests are failing.
Options:
1. Fix the tests and retry
2. Skip the testing phase (not recommended)
3. Cancel the workflow
Configuration#
Basic Configuration#
1// bootspring.config.js
2module.exports = {
3 orchestrator: {
4 // Auto-advance to next phase
5 autoAdvance: true,
6
7 // Pause on any failure
8 pauseOnFailure: true,
9
10 // Save checkpoints
11 saveCheckpoints: true,
12
13 // Notify on completion
14 notifyOnComplete: false,
15 },
16};Phase Configuration#
1module.exports = {
2 orchestrator: {
3 phases: {
4 planning: {
5 timeout: 300000, // 5 minute timeout
6 required: true, // Cannot skip
7 },
8 testing: {
9 timeout: 600000, // 10 minute timeout
10 required: true,
11 qualityGate: 'pre-push',
12 },
13 review: {
14 timeout: 300000,
15 required: false, // Can skip
16 },
17 },
18 },
19};Checkpoint Configuration#
1module.exports = {
2 orchestrator: {
3 checkpoints: {
4 frequency: 'phase', // 'phase', 'step', or 'manual'
5 retention: 7, // Days to keep
6 autoRestore: true, // Auto-restore on resume
7 compress: true, // Compress checkpoint files
8 },
9 },
10};Monitoring and Logs#
Workflow Logs#
All orchestrator activity is logged:
.bootspring/workflows/wf_abc123/
└── logs/
├── orchestrator.log # Orchestrator decisions
├── phase-planning.log # Planning phase log
├── phase-design.log # Design phase log
└── phase-dev.log # Development phase log
Log Format#
[2024-02-19T10:00:00Z] [INFO] Workflow wf_abc123 started
[2024-02-19T10:00:00Z] [INFO] Phase: planning - Starting
[2024-02-19T10:00:00Z] [INFO] Agent: architecture-expert - Invoked
[2024-02-19T10:03:00Z] [INFO] Phase: planning - Completed (180s)
[2024-02-19T10:03:00Z] [INFO] Checkpoint saved: planning
[2024-02-19T10:03:00Z] [INFO] Phase: design - Starting
Metrics#
The orchestrator tracks:
- Total workflow duration
- Phase durations
- Retry counts
- Gate pass/fail rates
- Agent utilization
Best Practices#
1. Let the Orchestrator Drive#
Don't manually skip phases without good reason. The workflow structure exists for quality.
2. Review Checkpoints#
Before resuming a paused workflow, review the last checkpoint to understand the state.
3. Use Quality Gates#
Enable gates for production-critical workflows:
1module.exports = {
2 orchestrator: {
3 enforceGates: true,
4 requiredGates: ['pre-commit', 'pre-push'],
5 },
6};4. Monitor Long Workflows#
For workflows over an hour, consider:
- Breaking into smaller workflows
- Adding more checkpoints
- Enabling notifications
5. Handle Failures Properly#
- Always investigate failures before skipping
- Use retry for transient issues
- Cancel and restart for fundamental problems
Related#
- Workflows - Workflow concepts
- bootspring_orchestrator - Tool reference
- bootspring_loop - Autonomous execution
- Quality Gates - Gate configuration