P0.2.3 — Two-Phase Startup — Packet

1. Commit plan

Five commits on feature/p0-2-3-two-phase-startup:

# Commit Files Rationale
1 audit(p0-2-3-two-phase-startup): inventory surface docs/audits/p0-2-3-two-phase-startup-audit.md Step 1 landed.
2 contract(p0-2-3-two-phase-startup): behavioral contract docs/contracts/p0-2-3-two-phase-startup-contract.md Step 2 landed.
3 packet(p0-2-3-two-phase-startup): execution plan docs/packets/p0-2-3-two-phase-startup-packet.md This file.
4 feat(p0-2-3-two-phase-startup): two-phase startup + graceful shutdown src/startup.ts, src/server.ts, src/__tests__/startup.test.ts One feat commit for the module + wiring + tests — logically indivisible.
5 verify(p0-2-3-two-phase-startup): test evidence docs/verification/p0-2-3-two-phase-startup-verification.md Step 5 lands after npm test / npm run lint / npm run build all green.

2. src/startup.ts — skeleton

/**
 * Colibri — Phase 0 two-phase startup orchestrator (α System Core).
 *
 * Phase 1 (fast) — transport ready, server_ping registered. No DB. This is
 * the P0.2.1 `bootstrap()` invocation, untouched.
 *
 * Phase 2 (heavy) — open the SQLite file, run integrity_check + migrations,
 * publish the `db` singleton via `src/db/index.ts`. Future domain code
 * (β/ε/ζ/η/ν) registers its tools + migrations here.
 *
 * Graceful shutdown — on Phase 2 throw, on SIGINT, and on SIGTERM, close
 * the transport, close the DB, remove the signal listeners, and exit 0
 * (clean) or 1 (error).
 *
 * Canonical references:
 *   - docs/audits/p0-2-3-two-phase-startup-audit.md
 *   - docs/contracts/p0-2-3-two-phase-startup-contract.md
 *   - docs/packets/p0-2-3-two-phase-startup-packet.md
 *   - docs/2-plugin/boot.md §"Two-Phase Startup"
 *   - docs/spec/s17-mcp-surface.md §"Transport lifecycle"
 */

import { performance } from 'node:perf_hooks';
import type Database from 'better-sqlite3';

import { config } from './config.js';
import { initDb as initDbImpl, closeDb as closeDbImpl } from './db/index.js';
import {
  bootstrap as bootstrapImpl,
  stop as stopImpl,
  type BootstrapOptions,
  type ColibriServerContext,
  type CreateServerOptions,
} from './server.js';

export interface StartupOptions {  }          // §2.1 of contract
export interface StartupResult {  }           // §2.1 of contract

// Module state — reset via __resetForTests.
let startupInvoked = false;
let activeCtx: ColibriServerContext | null = null;
let activeDb: Database.Database | null = null;
let activeOptions: Required<Pick<StartupOptions, 'stopFn' | 'closeDbFn' | 'logger' | 'exit' | 'cleanupTimeoutMs' | 'nowMs'>> | null = null;
let shutdownPromise: Promise<void> | null = null;
let sigintHandler: NodeJS.SignalsListener | null = null;
let sigtermHandler: NodeJS.SignalsListener | null = null;

export async function startup(options: StartupOptions = {}): Promise<StartupResult> {
  if (startupInvoked) throw new Error('startup() already invoked');
  startupInvoked = true;

  const logger = options.logger ?? console.error;
  const nowMs  = options.nowMs  ?? (() => performance.now());
  const exit   = options.exit   ?? process.exit.bind(process);
  const cleanupTimeoutMs = options.cleanupTimeoutMs ?? 5000;
  const bootstrapFn = options.bootstrapFn ?? bootstrapImpl;
  const stopFn      = options.stopFn      ?? stopImpl;
  const initDbFn    = options.initDbFn    ?? initDbImpl;
  const closeDbFn   = options.closeDbFn   ?? closeDbImpl;
  const dbPath      = options.dbPath      ?? config.COLIBRI_DB_PATH;
  const registerSignalHandlers = options.registerSignalHandlers ?? true;

  const phase1StartMs = nowMs();

  // Phase 1 — transport
  logger('[Startup] Phase 1: transport...');
  const bootOpts: BootstrapOptions = {
    ...(options.createOptions !== undefined ? { createOptions: options.createOptions } : {}),
    exit,
  };
  const ctx = await bootstrapFn(bootOpts);
  activeCtx = ctx;
  logger('[Startup] Phase 1 ready');

  // Stash cleanup references for shutdown()
  activeOptions = { stopFn, closeDbFn, logger, exit, cleanupTimeoutMs, nowMs };

  // Register signal handlers only after Phase 1 succeeds
  if (registerSignalHandlers) {
    sigintHandler  = () => { void gracefulSignalExit('SIGINT'); };
    sigtermHandler = () => { void gracefulSignalExit('SIGTERM'); };
    process.on('SIGINT',  sigintHandler);
    process.on('SIGTERM', sigtermHandler);
  }

  // Phase 2 — heavy init
  logger('[Startup] Phase 2: heavy-init...');
  try {
    const db = initDbFn(dbPath);         // sync — throws synchronously on error
    activeDb = db;
    const elapsedMs = Math.floor(nowMs() - phase1StartMs);
    logger(`[Startup] Complete in ${elapsedMs}ms`);
    return { ctx, db, elapsedMs };
  } catch (err) {
    const msg = err instanceof Error ? err.message : String(err);
    logger('[Startup] Phase 2 failed:', msg);
    const abortedMs = Math.floor(nowMs() - phase1StartMs);
    logger(`[Startup] Aborted after ${abortedMs}ms`);
    await shutdown('phase-2-failed');
    exit(1);
    // If exit doesn't terminate (tests), rethrow so caller sees it.
    throw err instanceof Error ? err : new Error(String(err));
  }
}

export function shutdown(reason: string): Promise<void> {
  if (shutdownPromise !== null) return shutdownPromise;
  const opts = activeOptions;
  const ctx  = activeCtx;
  const { logger, stopFn, closeDbFn, cleanupTimeoutMs } =
    opts ?? { logger: console.error, stopFn: stopImpl, closeDbFn: closeDbImpl, cleanupTimeoutMs: 5000 };
  logger(`[Shutdown] ${reason}`);

  shutdownPromise = (async () => {
    let forced = false;

    // Transport first
    if (ctx !== null) {
      try {
        let timer: ReturnType<typeof setTimeout> | undefined;
        await Promise.race([
          stopFn(ctx),
          new Promise<void>((resolve) => {
            timer = setTimeout(() => { forced = true; resolve(); }, cleanupTimeoutMs);
            timer.unref();
          }),
        ]);
        if (timer !== undefined) clearTimeout(timer);
      } catch (err) {
        logger('[Shutdown] stop failed:', err);
      }
    }

    // DB second
    try {
      closeDbFn();
    } catch (err) {
      logger('[Shutdown] close failed:', err);
    }

    // Signals last
    if (sigintHandler !== null) { process.off('SIGINT',  sigintHandler);  sigintHandler  = null; }
    if (sigtermHandler !== null){ process.off('SIGTERM', sigtermHandler); sigtermHandler = null; }

    if (forced) logger('[Shutdown] Forced after 5000ms timeout');
    logger('[Shutdown] Clean');
  })();

  return shutdownPromise;
}

async function gracefulSignalExit(signalName: string): Promise<void> {
  const opts = activeOptions;
  const { logger, exit } = opts ?? { logger: console.error, exit: process.exit.bind(process) };
  try {
    await shutdown(`signal-${signalName}`);
    exit(0);
  } catch (err) {
    logger('[Shutdown] signal handler failed:', err);
    exit(1);
  }
}

export function __resetForTests(): void {
  startupInvoked = false;
  shutdownPromise = null;
  activeCtx = null;
  activeDb = null;
  activeOptions = null;
  if (sigintHandler !== null)  { process.off('SIGINT',  sigintHandler);  sigintHandler  = null; }
  if (sigtermHandler !== null) { process.off('SIGTERM', sigtermHandler); sigtermHandler = null; }
}

(The real implementation fleshes this out with full JSDoc, strict types, and imports.)

3. src/server.ts — exact diff sketch

Replace only these 3 lines at the bottom of the file:

 if (isInvokedAsScript()) {
-  await bootstrap();
+  const { startup } = await import('./startup.js');
+  await startup();
 }

The isInvokedAsScript() guard function is untouched. All 13 existing exports from src/server.ts are untouched. No import changes — startup is loaded dynamically so src/server.ts does not statically import src/startup.ts (which imports src/server.ts). Dynamic import avoids the cycle.

4. src/__tests__/startup.test.ts — test case list

24 tests organized into 7 describe blocks. All tests:

  • Call __resetForTests() in afterEach.
  • Pass registerSignalHandlers: false by default; the sub-describe that tests signal installation explicitly enables it and cleans up.
  • Pass a fake exit that pushes into a number[] so Jest does not die.
  • Pass createOptions: { transport: <InMemoryTransport half>, installGlobalHandlers: false, startupTimeoutMs: 5000 } so bootstrapFn defaults work without real stdio.

describe 1 — happy path

  1. startup resolves with { ctx, db, elapsedMs } — inject initDbFn that returns a fake handle, assert the returned object shape and the phase-2 log.
  2. Phase 1 log precedes Phase 2 log — capture logs into an array, assert [Startup] Phase 1:... index < [Startup] Phase 2:... index.
  3. bootstrapFn runs before initDbFn — track call order via shared counter; assert bootstrap timestamp < initDb timestamp.
  4. uses config.COLIBRI_DB_PATH when dbPath is not supplied — inject an initDbFn that captures its argument; assert the argument equals config.COLIBRI_DB_PATH.
  5. honours a custom dbPath — pass dbPath: '/tmp/x.db'; assert capture.
  6. emits [Startup] Complete in <N>ms on success — regex the log.
  7. elapsedMs is computed from the injected nowMs — inject a monotonic counter nowMs; assert elapsedMs equals the expected delta.

describe 2 — Phase 2 failure

  1. rethrows the underlying error when exit does not terminate — inject initDbFn throwing Error('boom'), fake exit; assert startup() rejects with the original boom.
  2. calls exit(1) on Phase 2 failure — inject failing initDbFn, fake exit; assert exits.includes(1).
  3. runs shutdown(phase-2-failed) before rejecting — inject failing initDbFn + spy stopFn + spy closeDbFn; assert both spies called exactly once before the reject.
  4. emits [Startup] Phase 2 failed: <msg> — regex the log.
  5. emits [Startup] Aborted after <N>ms — regex the log.
  6. closeDb is called even when initDb throws — asserted in (10) but explicitly documented as its own expect so regression is loud.

describe 3 — shutdown contract

  1. shutdown is idempotent — in-flight call returns same promise — call shutdown('a') and shutdown('b') before either awaits, assert stopFn invocation count is exactly 1.
  2. shutdown never throws — inject stopFn that throws, assert await expect(shutdown('x')).resolves.toBeUndefined().
  3. transport closes before DB — track order in arrays, assert stopFn.order < closeDbFn.order.
  4. emits [Shutdown] Forced after 5000ms timeout when stopFn hangs — pass cleanupTimeoutMs: 50 + stopFn that never resolves; assert the timeout log and the subsequent [Shutdown] Clean.
  5. emits [Shutdown] stop failed when stopFn throws — assert log, assert closeDbFn still ran.
  6. emits [Shutdown] close failed when closeDbFn throws — assert log.

describe 4 — signals

  1. registers SIGINT and SIGTERM when enabled — count listeners before/after startup; assert +1 each.
  2. does NOT register signals when option is false — count listeners; assert unchanged.
  3. SIGINT triggers shutdown + exit(0) — fire process.emit('SIGINT', 'SIGINT'), await a microtask, assert shutdown ran + exits contains 0.
  4. signal handler that throws calls exit(1) — inject a stopFn that throws AND a closeDbFn that throws (both logged, shutdown doesn’t throw). Since shutdown swallows errors, the “throws” path is reached by wrapping shutdown via spy that rejects — or by asserting the exit-0 path only and keeping the exit-1 branch tested via the _-prefixed internal (we document that the exit-1 path is defense-in-depth coverage). See R-9.

describe 5 — re-entry guard

  1. second startup() call throws — call startup() successfully, then call it again in the same test, assert .toThrow('startup() already invoked').

describe 6 — signal-handler leakage regression

Counted inside describe 4; a final assertion confirms process.listenerCount('SIGINT') is zero after each test thanks to __resetForTests.

describe 7 — subprocess smoke

  1. main() IIFE smoke — script invocation boots, logs [Startup] Phase 1spawnSync tsx src/server.ts with NODE_ENV=test, short timeout. Assert [colibri] starting (from bootstrap() → start()) AND [Startup] Phase 1: transport... both appear on stderr. Matches the server.test.ts pattern.

Target coverage on src/startup.ts:

  • Stmt ≥ 95%
  • Branch ≥ 90% (contract I-10)
  • Func 100%
  • Line ≥ 95%

5. Files changed — final list

Path Action Lines (approx)
src/startup.ts Create ~230
src/server.ts Edit (3-line swap) +2 / -1
src/__tests__/startup.test.ts Create ~650
docs/audits/p0-2-3-two-phase-startup-audit.md Create (step 1) 279
docs/contracts/p0-2-3-two-phase-startup-contract.md Create (step 2) ~340
docs/packets/p0-2-3-two-phase-startup-packet.md Create (step 3 — this) ~370
docs/verification/p0-2-3-two-phase-startup-verification.md Create (step 5) ~200

Zero edits to package.json, tsconfig.json, jest.config.ts, ESLint config, src/config.ts, src/modes.ts, src/db/*, src/domains/*, or any sibling test file — the batch-lock list from §9 of the audit.

6. Risk mitigations

  • R-1 (server.test.ts IIFE smoke). The subprocess test asserts [colibri] starting — emitted by start() which is called by bootstrap() which is now called by startup(). Preserved.
  • R-2 (argv1 undefined). The if (isInvokedAsScript()) guard is preserved byte-for-byte. The import-only path stays silent (no startup call, no bootstrap call).
  • R-3 (module-scope signal leak). Signal handlers are installed inside startup() only. __resetForTests() + shutdown() remove them.
  • R-4 (process.exit kills Jest). All tests inject a fake exit.
  • R-5 (sync initDb in async startup). Handled — try/catch around a sync throw works transparently inside an async function.
  • R-6 (parallel batch collision). Wave C siblings own src/domains/* only; none touch src/server.ts, src/startup.ts, or src/__tests__/.
  • R-7 (dynamic import cycle). src/server.ts uses await import('./startup.js') inside the IIFE so the import is triggered only when isInvokedAsScript() is true — Jest loading src/server.ts never pulls src/startup.ts via this path.
  • R-8 (shutdown reentrancy during signal burst). shutdownPromise is cached and returned; a second SIGINT sees the in-flight promise.
  • R-9 (signal-handler exit-1 branch). The exit-1 branch is reached only if shutdown() rejects. Since shutdown never throws by contract (§4.1), the exit-1 branch is defense-in-depth. We cover it by injecting a fake shutdown mock OR by direct invocation of the internal handler using a thrown Error from the exit fn itself. If branch coverage slips <90%, we add a direct call to the handler with a temporarily-broken shutdown spy.
  • R-10 (log-order flakiness). logger is injected; each test uses its own array. No shared console.error reliance.

7. Verification script for Step 5

cd .worktrees/claude/p0-2-3-two-phase-startup
npm ci
npm run lint
npm test          # full suite, including new startup.test.ts
npm run build

All four commands must exit 0. Coverage for src/startup.ts is extracted from coverage/lcov-report/startup.ts.html (or the JSON summary) and copied into the verification doc along with the full test output.

8. Exit criteria for step 3 (this packet)

  • Commit plan with 5 entries (§1).
  • src/startup.ts skeleton with exports (§2).
  • src/server.ts exact diff sketch (§3).
  • Test case list with ≥ 24 tests + subprocess smoke (§4).
  • Final files list within batch-lock limits (§5).
  • Risk mitigations R-1 through R-10 (§6).
  • Verification commands (§7).

Packet approved (Sigma pre-approved via dispatch prompt). Proceed to step 4 (implement).


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.