γ Server Lifecycle — Algorithm Extraction

⚠ HERITAGE EXTRACTION — donor AMS γ Server Lifecycle (Wave 8 quarantine)

This file extracts the donor AMS γ lifecycle from src/server.js (deleted R53). The 2-phase startup with file watcher boot in Phase 2 is donor surface. Phase 0 Colibri targets src/server.ts (P0.2.1) with a stdio one-shot startup and no file watcher (S17 §2). The “list/all/none” AMS_WATCH_MODE modes have been dropped from Phase 0 — Phase 0 ships only FULL, READONLY, TEST, MINIMAL (no WATCH). The canonical Phase 0 γ surface is in ../../concepts/γ-server-lifecycle.md.

Read this file as donor genealogy only.

Algorithmic content extracted from AMS src/server.js for Colibri implementation reference.

Phase 1: Transport (Immediate)

Goal: satisfy the MCP client’s handshake timeout before any blocking I/O.

// Phase 1 — connect transport first, gate tool calls

const server = new Server({ name: "colibri", version: "..." }, { capabilities: { tools: {} } });

// Register handlers BEFORE connecting transport
server.setRequestHandler(ListToolsRequestSchema, async () => {
  return { tools: visibleTools };  // static import, no DB needed
});

server.setRequestHandler(ListResourcesRequestSchema, async () => {
  return { resources: [] };
});

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  await initReady;  // ← GATE: block until Phase 2 completes
  return routeToolCall(request);
});

// Connect transport — client handshake succeeds here
const transport = new StdioServerTransport();
await server.connect(transport);

// Create Phase 2 completion gate
let resolveInit;
const initReady = new Promise(resolve => { resolveInit = resolve; });

// Start Phase 2 asynchronously (do not await here)
initializeServer().then(resolveInit);

Phase 2: Init (Sequential, After Transport)

Internal sequence within initializeServer():

Step 1: DB connect
  getDb()  — opens SQLite, returns connection handle
  If fails: log error, enter SAFE_MODE, partial init

Step 2: Run migrations
  migrate(db)  — applies pending schema changes from schema files
  If fails: log error, abort init, server stays in BOOT mode

Step 3: Load domains
  import all domain modules from src/domains/
  Each domain: registerTools(toolRegistry), initializeDb(db)
  Order matters: gsd_projects before tasks (FK dependency)

Step 4: Start watchers
  startWatchers(db)  — chokidar watches on data/roadmaps/, planning/, config files
  Debounce: 500ms default
  Safety: skip symlinks, limit directory entries to 200

Step 5: Open HTTP
  startDashboardServer(config)  — binds HTTP port for observability
  startWebSocketServer(config)  — if CLAUDE_WEBSOCKET_ENABLED=true

Step 6: Health check loop
  setInterval(runHealthChecks, AMS_HEALTH_INTERVAL || 30000)

Step 7: Resolve gate
  resolveInit()  — queued CallTool requests begin executing
  Server enters OPERATIONAL mode

Promise Gate Pattern

The gate ensures zero-race between transport readiness and heavy initialization:

Timeline:
  t=0ms   Phase 1: transport connected, client handshake succeeds
  t=0ms   initReady Promise created (unresolved)
  t=Xms   Client sends ListTools → returns immediately (no gate)
  t=Xms   Client sends CallTool → awaits initReady → queued
  t=Yms   Phase 2 completes → resolveInit() called
  t=Yms   Queued CallTool requests execute in arrival order
  t=Yms   All subsequent CallTool requests execute directly

Multiple tool calls arriving during Phase 2 queue behind the same Promise — they all resolve when initReady resolves, then execute through the serialization lock (tool-lock middleware) one at a time.

5 Runtime Modes

Mode	What is accepted	Entry condition	Exit condition
`BOOT`	Nothing — all CallTool requests queue	Server start, Phases 1–2 executing	Phase 2 completes → OPERATIONAL
`OPERATIONAL`	All tool calls, all transports	Phase 2 resolves normally	Degraded health check, manual trigger, shutdown signal
`MAINTENANCE`	Admin tools only; client calls return HTTP 503	Manual trigger (`set_maintenance_mode`) or scheduled window	Manual clear
`SAFE_MODE`	Read-only tools only; no DB writes	DB corruption detected, disk pressure critical, health check severity HIGH	Manual clear after issue resolved
`DIAGNOSE`	Health/debug tools only; external traffic blocked	Manual trigger (`enable_diagnose_mode`) for investigation	Manual clear

Mode Transition Triggers

OPERATIONAL → MAINTENANCE   manual trigger or scheduled maintenance window
OPERATIONAL → SAFE_MODE     health check: SQLite integrity fail OR disk critical
OPERATIONAL → DIAGNOSE      manual trigger
SAFE_MODE   → OPERATIONAL   manual clear after root cause resolved
MAINTENANCE → OPERATIONAL   maintenance window end or manual clear
DIAGNOSE    → OPERATIONAL   manual clear
Any         → BOOT          server restart

All mode transitions are logged to audit_log with: old_mode, new_mode, reason, triggered_by, timestamp.

Allowed Operations Per Mode

Operation	OPERATIONAL	MAINTENANCE	SAFE_MODE	DIAGNOSE	BOOT
Read tools	Yes	Admin only	Yes	Health only	Queue
Write tools	Yes	Admin only	No	No	Queue
DB reads	Yes	Yes	Yes	Yes	No
DB writes	Yes	Admin only	No	No	No
Watcher events	Yes	Yes	Muted	Yes	No
Health checks	Yes	Yes	Yes	Yes	No

Graceful Shutdown Algorithm

Triggered by SIGTERM or SIGINT:

1. Set runtime mode to BOOT (stops accepting new connections)
   Log: "shutdown initiated"

2. Drain in-flight tool calls
   Timeout: 30 seconds
   Poll every 100ms for active call count to reach zero
   If timeout: log warning, force-close remaining calls

3. Flush pending audit log writes
   Call audit_flush() — writes buffered audit entries to SQLite

4. Close SQLite connection
   db.close()
   Wait for all pending write transactions to commit

5. Clean up PID file
   fs.unlinkSync(pidFilePath)

6. Exit
   process.exit(0)

Unhandled cases caught by global handlers:
  process.on('unhandledRejection', (reason) => { log(reason); gracefulShutdown(); })
  process.on('uncaughtException', (err) => { log(err); gracefulShutdown(); })
  These log the error and trigger graceful shutdown instead of silent crash.

Health Check Types (6 types, 30s loop)

Check 1: SQL integrity
  PRAGMA integrity_check  (sampled: runs every Nth cycle, not every 30s)
  Pass: returns "ok"
  Fail: triggers SAFE_MODE

Check 2: Memory pressure
  process.memoryUsage().rss vs AMS_MEMORY_THRESHOLD_MB
  Warning threshold: 80% of limit → log warning
  Critical threshold: 95% of limit → SAFE_MODE

Check 3: Event loop latency
  Measurement: setImmediate timing delta
  Warning: >100ms
  Critical: >500ms → DIAGNOSE mode

Check 4: Watcher status
  Verify each registered watcher is alive (not in error/stalled state)
  Stalled watcher: attempt restart
  Cannot restart: log error, emit health event

Check 5: Queue depth
  Count pending tool calls in serialization lock queue
  Warning: >50 pending calls
  No mode transition (informational)

Check 6: DB connection
  Lightweight ping query: SELECT 1
  Fail → attempt reconnect
  Cannot reconnect → SAFE_MODE