P0.2.3 — Two-Phase Startup — Audit

1. Scope of this audit

P0.2.3 layers two-phase startup orchestration on top of the P0.2.1 MCP server bootstrap and the P0.2.2 SQLite initializer:

  • Phase 1 (fast): MCP stdio transport is live; the server_ping tool can answer a handshake. No DB. No domain code.
  • Phase 2 (heavy): initDb(config.COLIBRI_DB_PATH) runs; DB singleton is published. Domain tool registration is deferred to β/ε/ζ/η sub-tasks.
  • Graceful shutdown: on Phase 2 throw, on SIGINT, and on SIGTERM, we close the transport and the DB in reverse registration order, then exit with a defined code (0 on clean, 1 on error).

This audit answers: what is already in the repo that Phase 2 startup must reuse, and what seams do we need to carve into src/server.ts without breaking its existing contracts?

2. Existing surface — src/server.ts (from P0.2.1)

Canonical path: src/server.ts (559 LOC, exported module — package.json "main" points here).

2.1. Public exports relevant to startup

Export Shape Relevance to P0.2.3
createServer(options?) CreateServerOptions → ColibriServerContext Reused as-is. Constructs the MCP server + transport + audit sink + mode detection. Does NOT connect the transport. Does NOT install process.on(...) handlers.
registerColibriTool(ctx, name, config, handler) 4-arg register Reused by Phase 2 when domain tasks land. P0.2.3 does not register new tools itself beyond what bootstrap() already does.
start(ctx) (ctx) → Promise<void> This is the Phase 1 primitive. It installs global handlers (gated by installGlobalHandlers), calls ctx.server.connect(ctx.transport) inside a Promise.race against ctx.startupTimeoutMs, and logs [colibri] starting + [colibri] ready.
stop(ctx) (ctx) → Promise<void> await ctx.server.close(). P0.2.3 wraps this in shutdown() and additionally calls closeDb().
bootstrap(options?) BootstrapOptions → Promise<ColibriServerContext> The Phase 1-only entry path. Constructs the ctx, registers server_ping, calls start(), and on error logs [colibri] fatal: + invokes the injected exit. P0.2.3’s startup() composes Phase 2 on top of this.
createNoOpAuditSink() () → AuditSink Not touched.
ColibriServerContext (type) See server.ts §”Opaque context” Holds server, transport, version, mode, nowMs, bootStartMs, logger, startupTimeoutMs, _toolLocks, _registeredToolNames, _globalHandlersInstalled. No DB field — Phase 2 must return the DB alongside the ctx, not mutate the ctx.
CreateServerOptions, BootstrapOptions, ColibriToolConfig, AuditSink, ToolEnterEvent, ToolExitEvent (types) Various Not touched.

2.2. Script-invocation guard (bottom of server.ts)

function isInvokedAsScript(): boolean {  }

if (isInvokedAsScript()) {
  await bootstrap();
}

This is the current production entry path. Problem for P0.2.3: bootstrap() as-shipped does Phase 1 only. Once P0.2.3 lands, the script-invocation path must run the two-phase startup, not just Phase 1.

Decision (baked into the contract): the module-level guard will switch from bootstrap() to startup() from ./startup.js. bootstrap() stays exported for tests + for any future callers that deliberately want a Phase-1-only path; it is not removed. Verification: the P0.2.1 subprocess test (main() IIFE smoke) asserts that stderr shows [colibri] starting — that message is still emitted by start() which startup() calls, so the assertion stays green.

2.3. Logging conventions already established

  • All server-side logs go through ctx.logger (defaults to console.error).
  • Stdout is reserved for MCP JSON-RPC traffic. S17. src/server.ts does not monkey-patch process.stdout.write (donor bug #3).
  • Current start log: [colibri] starting in mode= <mode> version= <v> on entry to start(); [colibri] ready on successful connect.
  • Current fatal log: [colibri] fatal: <err> from bootstrap().

P0.2.3 new log lines (must match this prefix convention):

Phase Message When
Phase 1 start [Startup] Phase 1: transport… Just before calling start()
Phase 1 done [Startup] Phase 1 ready After start() resolves
Phase 2 start [Startup] Phase 2: heavy-init… Before initDb
Phase 2 done [Startup] Complete in <N>ms After initDb resolves
Phase 2 fail [Startup] Phase 2 failed: <err> In the catch block
Phase 2 abort [Startup] Aborted after <N>ms In the catch block
Shutdown start [Shutdown] <reason> Entry to shutdown()
Shutdown force [Shutdown] Forced after 5000ms timeout On the 5s race
Shutdown done [Shutdown] Clean After closeDb() resolves

All via console.error through an injectable logger in tests.

3. Existing surface — src/db/index.ts (from P0.2.2)

Canonical path: src/db/index.ts.

3.1. Public exports relevant to startup

Export Shape Relevance to P0.2.3
initDb(path) (string) → Database.Database (sync) Called once from Phase 2. Sync call, but we wrap in an async context. Deliberately does NOT default to config.COLIBRI_DB_PATH — the caller passes the path explicitly so tests can bypass env. P0.2.3 passes config.COLIBRI_DB_PATH.
getDb() () → Database.Database Throws until initDb runs. P0.2.3 does not call this; it’s for domain code (future P0.3+).
closeDb() () → void (sync, idempotent) Called from shutdown(). Safe to call even if initDb never ran (early Phase 1 failure).

3.2. Error semantics

  • initDb throws Error("Database integrity check failed: …") on corruption.
  • initDb throws Error("Migration <file> failed: …") on migration SQL error.
  • initDb throws Error("Migration prefix collision at N: <a> vs <b>") on file-ordering collision.
  • In all of the above, the underlying handle is db.close()‘d before re-throw so closeDb() is a safe no-op after failure.

Implication for P0.2.3: if initDb throws inside Phase 2, calling closeDb() during graceful shutdown is safe — the singleton was never published.

4. Existing surface — src/config.ts (from P0.1.4)

config.COLIBRI_DB_PATH is a frozen string, default data/colibri.db. config.COLIBRI_STARTUP_TIMEOUT_MS is a frozen positive integer, default

  1. P0.2.3 reads both values; it does not mutate or override them.

Note the spec deviation (Sigma-approved, bake into packet): the P0.2.3 prompt snippet in p0.2-alpha-system-core.md uses config.DATABASE_PATH || …. This is stale. The canonical env key is COLIBRI_DB_PATH.

5. Where bootstrap happens today (pre-P0.2.3)

Two places:

  1. Script invocation (src/server.ts bottom): if (isInvokedAsScript()) { await bootstrap(); }.
  2. Test harness (src/__tests__/server.test.ts): tests call bootstrap directly with a fake InMemoryTransport + injected exit.

Neither place currently opens the DB. P0.2.3 adds the DB open to the script invocation only — tests continue to use bootstrap() for Phase-1-only coverage; new startup tests use the same InMemoryTransport pattern.

6. Is there already a module orchestrating startup?

No. There is no src/startup.ts, no src/index.ts, and no other file that sequences Phase 1 → Phase 2. The current entry point is the bare if (isInvokedAsScript()) await bootstrap(); tail of src/server.ts. That tail is the one line P0.2.3 must modify.

The spec file (p0.2-alpha-system-core.md) mentions creating src/index.ts as an entry point that calls startup(). We deviate from that: we keep package.json "main" on src/server.ts (unchanged from P0.2.1) and rewire the script-invocation tail of server.ts to call startup() instead of bootstrap(). Rationale: package.json "main" should not move under a parallel-batch lock constraint (package.json is in our MUST-NOT-TOUCH list), and keeping the entry module stable avoids a churn commit on dist/server.js / bin.colibri.

7. console.error audit — who writes to stderr today?

  • src/server.ts: [colibri] starting, [colibri] ready, [colibri] fatal:, [colibri] unhandledRejection, [colibri] uncaughtException, [colibri] audit-exit sink failed:.
  • src/db/index.ts: nothing — DB module is silent, throws only.
  • src/config.ts: nothing — throws on env errors.
  • src/modes.ts: nothing.

P0.2.3 decision: new lines use a distinct [Startup] / [Shutdown] prefix so they are grep-separable from server-transport logs. start() and the global-handler log lines keep their existing [colibri] prefix — no retrofit.

8. Test conventions established

  • Jest 29 with ESM (node --experimental-vm-modules).
  • roots: ['<rootDir>/src']; tests live in src/__tests__/.
  • jest.isolateModulesAsync is broken under ts-jest ESM (zod cache bug documented in feedback_phase0_first_code_hazards.md) — use pure factory functions + spawnSync + tsx for per-test env isolation.
  • InMemoryTransport.createLinkedPair() from @modelcontextprotocol/sdk for in-process E2E.
  • spawnSync(process.execPath, ['--import', 'tsx', …]) for subprocess tests.
  • Spy on process.exit via direct reassign + restore (see server.test.ts describe 4 “installed handlers log via ctx.logger and call process.exit(1)”).

P0.2.3 startup tests reuse these patterns end-to-end.

9. Known constraints (non-negotiable)

  • No new deps. (Batch-lock rule.)
  • No edits to src/config.ts, src/modes.ts, src/db/index.ts, src/db/schema.sql, src/db/migrations/*. (Batch-lock rule.)
  • No edits to package.json, jest.config.ts, .eslintrc.json, .prettierrc, tsconfig.json, .github/workflows/*. (Batch-lock rule.)
  • No edits to any src/domains/*. (Parallel executors own those.)
  • No edits to src/__tests__/server.test.ts etc. (Parallel lock.)
  • src/server.ts is ALLOWED to modify — but only the script-invocation tail. The contract documents the exact change.

10. Open questions answered by the packet

  1. Does bootstrap() stay exported? Yes. It becomes a Phase-1-only primitive that startup() composes on top of. Tests keep using it for transport-only coverage.
  2. Where does the DB handle live? Returned from startup(), not stashed on the ctx. Callers who need it use getDb() from src/db/index.js — P0.2.3 does not reinvent that path.
  3. Who owns SIGINT/SIGTERM? startup() registers them idempotently (guarded by a module-level signalsRegistered flag, reset on shutdown() return). They call shutdown('SIGINT') / shutdown('SIGTERM') and then process.exit(0) on clean, process.exit(1) on error.
  4. What’s the re-entry guard? A module-level startupInvoked boolean. Set at the top of startup() before any work. Second call throws Error('startup() already invoked').
  5. What timeout on in-flight cleanup? 5000 ms per the packet spec. Uses Promise.race([cleanup, sleep(5000)]); if the sleep wins, log [Shutdown] Forced after 5000ms timeout and continue.

11. Files this task will create or modify

Create (owned by P0.2.3, batch-lock safe)

Path Purpose
src/startup.ts Two-phase orchestrator + shutdown + signal handlers.
src/__tests__/startup.test.ts Jest coverage for all 8 acceptance criteria.
docs/audits/p0-2-3-two-phase-startup-audit.md This file (step 1).
docs/contracts/p0-2-3-two-phase-startup-contract.md Step 2.
docs/packets/p0-2-3-two-phase-startup-packet.md Step 3.
docs/verification/p0-2-3-two-phase-startup-verification.md Step 5.

Modify (batch-lock explicitly allows)

Path Change Rationale
src/server.ts Replace the 3-line if (isInvokedAsScript()) { await bootstrap(); } tail with a call to startup() from ./startup.js. No other changes. Must swap the production entry path to the two-phase orchestrator. Limited to the tail; bootstrap() is still exported and used by tests.

12. Risks surfaced for the packet

  • R-1. The main() IIFE smoke test in server.test.ts spawns src/server.ts and asserts [colibri] starting on stderr. If we change server.ts’s tail to call startup(), startup() itself must still cause [colibri] starting to be emitted (because startup() calls bootstrap()start() → the existing log line). The contract pins this: startup() MUST reuse bootstrap() or start(), not bypass them.
  • R-2. The isInvokedAsScript() returns false when process.argv[1] is undefined test asserts that loading src/server.ts without a direct script path does NOT print [colibri] starting. The new tail must keep the if (isInvokedAsScript()) guard. Verified in packet.
  • R-3. Signal handlers installed at module scope would leak into Jest workers on every test file import of src/startup.ts. Solution: install the signal handlers inside startup() (not at module scope) so they are only registered when a real startup runs.
  • R-4. process.exit in tests kills Jest. Tests must spy on process.exit (direct reassign + restore pattern from server.test.ts describe 4). The module’s production path keeps the real process.exit.
  • R-5. initDb is synchronous. An async startup() that wraps a sync initDb inside a try/catch is fine — JS try/catch handles sync throws inside async functions transparently. No change.
  • R-6. The batch is running P0.3.1, P0.6.1, P0.7.1 in parallel. None of them touch src/startup.ts or the server-ts script tail — confirmed by the batch-lock rules in the Sigma prompt. No merge conflict expected.

13. Exit criteria for step 1 (this audit)

  • Current src/server.ts exports enumerated (§2.1).
  • Script-invocation tail identified as the one line to modify (§2.2, §11).
  • src/db/index.ts surface mapped with error semantics (§3).
  • config.COLIBRI_DB_PATH as the Phase-2 DB path argument confirmed (§4).
  • Log-prefix convention chosen (§7).
  • Test-harness conventions recorded (§8).
  • Batch-lock constraints enumerated (§9).
  • Open design questions resolved (§10).
  • Risks for the packet (§12).

Ready for step 2 (contract).


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.