γ Server Lifecycle — Algorithm Extraction
⚠ HERITAGE EXTRACTION — donor AMS γ Server Lifecycle (Wave 8 quarantine)
This file extracts the donor AMS γ lifecycle from
src/server.js(deleted R53). The 2-phase startup with file watcher boot in Phase 2 is donor surface. Phase 0 Colibri targetssrc/server.ts(P0.2.1) with a stdio one-shot startup and no file watcher (S17 §2). The “list/all/none”AMS_WATCH_MODEmodes have been dropped from Phase 0 — Phase 0 ships onlyFULL,READONLY,TEST,MINIMAL(noWATCH). The canonical Phase 0 γ surface is in../../concepts/γ-server-lifecycle.md.Read this file as donor genealogy only.
Algorithmic content extracted from AMS
src/server.jsfor Colibri implementation reference.
Phase 1: Transport (Immediate)
Goal: satisfy the MCP client’s handshake timeout before any blocking I/O.
// Phase 1 — connect transport first, gate tool calls
const server = new Server({ name: "colibri", version: "..." }, { capabilities: { tools: {} } });
// Register handlers BEFORE connecting transport
server.setRequestHandler(ListToolsRequestSchema, async () => {
return { tools: visibleTools }; // static import, no DB needed
});
server.setRequestHandler(ListResourcesRequestSchema, async () => {
return { resources: [] };
});
server.setRequestHandler(CallToolRequestSchema, async (request) => {
await initReady; // ← GATE: block until Phase 2 completes
return routeToolCall(request);
});
// Connect transport — client handshake succeeds here
const transport = new StdioServerTransport();
await server.connect(transport);
// Create Phase 2 completion gate
let resolveInit;
const initReady = new Promise(resolve => { resolveInit = resolve; });
// Start Phase 2 asynchronously (do not await here)
initializeServer().then(resolveInit);
Phase 2: Init (Sequential, After Transport)
Internal sequence within initializeServer():
Step 1: DB connect
getDb() — opens SQLite, returns connection handle
If fails: log error, enter SAFE_MODE, partial init
Step 2: Run migrations
migrate(db) — applies pending schema changes from schema files
If fails: log error, abort init, server stays in BOOT mode
Step 3: Load domains
import all domain modules from src/domains/
Each domain: registerTools(toolRegistry), initializeDb(db)
Order matters: gsd_projects before tasks (FK dependency)
Step 4: Start watchers
startWatchers(db) — chokidar watches on data/roadmaps/, planning/, config files
Debounce: 500ms default
Safety: skip symlinks, limit directory entries to 200
Step 5: Open HTTP
startDashboardServer(config) — binds HTTP port for observability
startWebSocketServer(config) — if CLAUDE_WEBSOCKET_ENABLED=true
Step 6: Health check loop
setInterval(runHealthChecks, AMS_HEALTH_INTERVAL || 30000)
Step 7: Resolve gate
resolveInit() — queued CallTool requests begin executing
Server enters OPERATIONAL mode
Promise Gate Pattern
The gate ensures zero-race between transport readiness and heavy initialization:
Timeline:
t=0ms Phase 1: transport connected, client handshake succeeds
t=0ms initReady Promise created (unresolved)
t=Xms Client sends ListTools → returns immediately (no gate)
t=Xms Client sends CallTool → awaits initReady → queued
t=Yms Phase 2 completes → resolveInit() called
t=Yms Queued CallTool requests execute in arrival order
t=Yms All subsequent CallTool requests execute directly
Multiple tool calls arriving during Phase 2 queue behind the same Promise — they all resolve when initReady resolves, then execute through the serialization lock (tool-lock middleware) one at a time.
5 Runtime Modes
| Mode | What is accepted | Entry condition | Exit condition |
|---|---|---|---|
BOOT |
Nothing — all CallTool requests queue | Server start, Phases 1–2 executing | Phase 2 completes → OPERATIONAL |
OPERATIONAL |
All tool calls, all transports | Phase 2 resolves normally | Degraded health check, manual trigger, shutdown signal |
MAINTENANCE |
Admin tools only; client calls return HTTP 503 | Manual trigger (set_maintenance_mode) or scheduled window |
Manual clear |
SAFE_MODE |
Read-only tools only; no DB writes | DB corruption detected, disk pressure critical, health check severity HIGH | Manual clear after issue resolved |
DIAGNOSE |
Health/debug tools only; external traffic blocked | Manual trigger (enable_diagnose_mode) for investigation |
Manual clear |
Mode Transition Triggers
OPERATIONAL → MAINTENANCE manual trigger or scheduled maintenance window
OPERATIONAL → SAFE_MODE health check: SQLite integrity fail OR disk critical
OPERATIONAL → DIAGNOSE manual trigger
SAFE_MODE → OPERATIONAL manual clear after root cause resolved
MAINTENANCE → OPERATIONAL maintenance window end or manual clear
DIAGNOSE → OPERATIONAL manual clear
Any → BOOT server restart
All mode transitions are logged to audit_log with: old_mode, new_mode, reason, triggered_by, timestamp.
Allowed Operations Per Mode
| Operation | OPERATIONAL | MAINTENANCE | SAFE_MODE | DIAGNOSE | BOOT |
|---|---|---|---|---|---|
| Read tools | Yes | Admin only | Yes | Health only | Queue |
| Write tools | Yes | Admin only | No | No | Queue |
| DB reads | Yes | Yes | Yes | Yes | No |
| DB writes | Yes | Admin only | No | No | No |
| Watcher events | Yes | Yes | Muted | Yes | No |
| Health checks | Yes | Yes | Yes | Yes | No |
Graceful Shutdown Algorithm
Triggered by SIGTERM or SIGINT:
1. Set runtime mode to BOOT (stops accepting new connections)
Log: "shutdown initiated"
2. Drain in-flight tool calls
Timeout: 30 seconds
Poll every 100ms for active call count to reach zero
If timeout: log warning, force-close remaining calls
3. Flush pending audit log writes
Call audit_flush() — writes buffered audit entries to SQLite
4. Close SQLite connection
db.close()
Wait for all pending write transactions to commit
5. Clean up PID file
fs.unlinkSync(pidFilePath)
6. Exit
process.exit(0)
Unhandled cases caught by global handlers:
process.on('unhandledRejection', (reason) => { log(reason); gracefulShutdown(); })
process.on('uncaughtException', (err) => { log(err); gracefulShutdown(); })
These log the error and trigger graceful shutdown instead of silent crash.
Health Check Types (6 types, 30s loop)
Check 1: SQL integrity
PRAGMA integrity_check (sampled: runs every Nth cycle, not every 30s)
Pass: returns "ok"
Fail: triggers SAFE_MODE
Check 2: Memory pressure
process.memoryUsage().rss vs AMS_MEMORY_THRESHOLD_MB
Warning threshold: 80% of limit → log warning
Critical threshold: 95% of limit → SAFE_MODE
Check 3: Event loop latency
Measurement: setImmediate timing delta
Warning: >100ms
Critical: >500ms → DIAGNOSE mode
Check 4: Watcher status
Verify each registered watcher is alive (not in error/stalled state)
Stalled watcher: attempt restart
Cannot restart: log error, emit health event
Check 5: Queue depth
Count pending tool calls in serialization lock queue
Warning: >50 pending calls
No mode transition (informational)
Check 6: DB connection
Lightweight ping query: SELECT 1
Fail → attempt reconnect
Cannot reconnect → SAFE_MODE
See Also
-
[[concepts/γ-server-lifecycle γ Server Lifecycle]] — concept overview -
[[extractions/alpha-system-core-extraction α Core Extraction]] — middleware chain and tool routing -
[[architecture/server-lifecycle Server Lifecycle Architecture]] — detailed phase descriptions -
[[reference/config Config Reference]] — AMS_HEALTH_INTERVAL and related env vars