γ Server Lifecycle — Algorithm Extraction

⚠ HERITAGE EXTRACTION — donor AMS γ Server Lifecycle (Wave 8 quarantine)

This file extracts the donor AMS γ lifecycle from src/server.js (deleted R53). The 2-phase startup with file watcher boot in Phase 2 is donor surface. Phase 0 Colibri targets src/server.ts (P0.2.1) with a stdio one-shot startup and no file watcher (S17 §2). The “list/all/none” AMS_WATCH_MODE modes have been dropped from Phase 0 — Phase 0 ships only FULL, READONLY, TEST, MINIMAL (no WATCH). The canonical Phase 0 γ surface is in ../../concepts/γ-server-lifecycle.md.

Read this file as donor genealogy only.

Algorithmic content extracted from AMS src/server.js for Colibri implementation reference.

Phase 1: Transport (Immediate)

Goal: satisfy the MCP client’s handshake timeout before any blocking I/O.

// Phase 1 — connect transport first, gate tool calls

const server = new Server({ name: "colibri", version: "..." }, { capabilities: { tools: {} } });

// Register handlers BEFORE connecting transport
server.setRequestHandler(ListToolsRequestSchema, async () => {
  return { tools: visibleTools };  // static import, no DB needed
});

server.setRequestHandler(ListResourcesRequestSchema, async () => {
  return { resources: [] };
});

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  await initReady;  // ← GATE: block until Phase 2 completes
  return routeToolCall(request);
});

// Connect transport — client handshake succeeds here
const transport = new StdioServerTransport();
await server.connect(transport);

// Create Phase 2 completion gate
let resolveInit;
const initReady = new Promise(resolve => { resolveInit = resolve; });

// Start Phase 2 asynchronously (do not await here)
initializeServer().then(resolveInit);

Phase 2: Init (Sequential, After Transport)

Internal sequence within initializeServer():

Step 1: DB connect
  getDb()  — opens SQLite, returns connection handle
  If fails: log error, enter SAFE_MODE, partial init

Step 2: Run migrations
  migrate(db)  — applies pending schema changes from schema files
  If fails: log error, abort init, server stays in BOOT mode

Step 3: Load domains
  import all domain modules from src/domains/
  Each domain: registerTools(toolRegistry), initializeDb(db)
  Order matters: gsd_projects before tasks (FK dependency)

Step 4: Start watchers
  startWatchers(db)  — chokidar watches on data/roadmaps/, planning/, config files
  Debounce: 500ms default
  Safety: skip symlinks, limit directory entries to 200

Step 5: Open HTTP
  startDashboardServer(config)  — binds HTTP port for observability
  startWebSocketServer(config)  — if CLAUDE_WEBSOCKET_ENABLED=true

Step 6: Health check loop
  setInterval(runHealthChecks, AMS_HEALTH_INTERVAL || 30000)

Step 7: Resolve gate
  resolveInit()  — queued CallTool requests begin executing
  Server enters OPERATIONAL mode

Promise Gate Pattern

The gate ensures zero-race between transport readiness and heavy initialization:

Timeline:
  t=0ms   Phase 1: transport connected, client handshake succeeds
  t=0ms   initReady Promise created (unresolved)
  t=Xms   Client sends ListTools → returns immediately (no gate)
  t=Xms   Client sends CallTool → awaits initReady → queued
  t=Yms   Phase 2 completes → resolveInit() called
  t=Yms   Queued CallTool requests execute in arrival order
  t=Yms   All subsequent CallTool requests execute directly

Multiple tool calls arriving during Phase 2 queue behind the same Promise — they all resolve when initReady resolves, then execute through the serialization lock (tool-lock middleware) one at a time.

5 Runtime Modes

Mode What is accepted Entry condition Exit condition
BOOT Nothing — all CallTool requests queue Server start, Phases 1–2 executing Phase 2 completes → OPERATIONAL
OPERATIONAL All tool calls, all transports Phase 2 resolves normally Degraded health check, manual trigger, shutdown signal
MAINTENANCE Admin tools only; client calls return HTTP 503 Manual trigger (set_maintenance_mode) or scheduled window Manual clear
SAFE_MODE Read-only tools only; no DB writes DB corruption detected, disk pressure critical, health check severity HIGH Manual clear after issue resolved
DIAGNOSE Health/debug tools only; external traffic blocked Manual trigger (enable_diagnose_mode) for investigation Manual clear

Mode Transition Triggers

OPERATIONAL → MAINTENANCE   manual trigger or scheduled maintenance window
OPERATIONAL → SAFE_MODE     health check: SQLite integrity fail OR disk critical
OPERATIONAL → DIAGNOSE      manual trigger
SAFE_MODE   → OPERATIONAL   manual clear after root cause resolved
MAINTENANCE → OPERATIONAL   maintenance window end or manual clear
DIAGNOSE    → OPERATIONAL   manual clear
Any         → BOOT          server restart

All mode transitions are logged to audit_log with: old_mode, new_mode, reason, triggered_by, timestamp.

Allowed Operations Per Mode

Operation OPERATIONAL MAINTENANCE SAFE_MODE DIAGNOSE BOOT
Read tools Yes Admin only Yes Health only Queue
Write tools Yes Admin only No No Queue
DB reads Yes Yes Yes Yes No
DB writes Yes Admin only No No No
Watcher events Yes Yes Muted Yes No
Health checks Yes Yes Yes Yes No

Graceful Shutdown Algorithm

Triggered by SIGTERM or SIGINT:

1. Set runtime mode to BOOT (stops accepting new connections)
   Log: "shutdown initiated"

2. Drain in-flight tool calls
   Timeout: 30 seconds
   Poll every 100ms for active call count to reach zero
   If timeout: log warning, force-close remaining calls

3. Flush pending audit log writes
   Call audit_flush() — writes buffered audit entries to SQLite

4. Close SQLite connection
   db.close()
   Wait for all pending write transactions to commit

5. Clean up PID file
   fs.unlinkSync(pidFilePath)

6. Exit
   process.exit(0)

Unhandled cases caught by global handlers:
  process.on('unhandledRejection', (reason) => { log(reason); gracefulShutdown(); })
  process.on('uncaughtException', (err) => { log(err); gracefulShutdown(); })
  These log the error and trigger graceful shutdown instead of silent crash.

Health Check Types (6 types, 30s loop)

Check 1: SQL integrity
  PRAGMA integrity_check  (sampled: runs every Nth cycle, not every 30s)
  Pass: returns "ok"
  Fail: triggers SAFE_MODE

Check 2: Memory pressure
  process.memoryUsage().rss vs AMS_MEMORY_THRESHOLD_MB
  Warning threshold: 80% of limit → log warning
  Critical threshold: 95% of limit → SAFE_MODE

Check 3: Event loop latency
  Measurement: setImmediate timing delta
  Warning: >100ms
  Critical: >500ms → DIAGNOSE mode

Check 4: Watcher status
  Verify each registered watcher is alive (not in error/stalled state)
  Stalled watcher: attempt restart
  Cannot restart: log error, emit health event

Check 5: Queue depth
  Count pending tool calls in serialization lock queue
  Warning: >50 pending calls
  No mode transition (informational)

Check 6: DB connection
  Lightweight ping query: SELECT 1
  Fail → attempt reconnect
  Cannot reconnect → SAFE_MODE

See Also

  • [[concepts/γ-server-lifecycle γ Server Lifecycle]] — concept overview
  • [[extractions/alpha-system-core-extraction α Core Extraction]] — middleware chain and tool routing
  • [[architecture/server-lifecycle Server Lifecycle Architecture]] — detailed phase descriptions
  • [[reference/config Config Reference]] — AMS_HEALTH_INTERVAL and related env vars

Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.