P0.2.3 — Two-Phase Startup — Audit
1. Scope of this audit
P0.2.3 layers two-phase startup orchestration on top of the P0.2.1 MCP server bootstrap and the P0.2.2 SQLite initializer:
- Phase 1 (fast): MCP stdio transport is live; the
server_pingtool can answer a handshake. No DB. No domain code. - Phase 2 (heavy):
initDb(config.COLIBRI_DB_PATH)runs; DB singleton is published. Domain tool registration is deferred to β/ε/ζ/η sub-tasks. - Graceful shutdown: on Phase 2 throw, on SIGINT, and on SIGTERM, we close the transport and the DB in reverse registration order, then exit with a defined code (0 on clean, 1 on error).
This audit answers: what is already in the repo that Phase 2 startup must
reuse, and what seams do we need to carve into src/server.ts without
breaking its existing contracts?
2. Existing surface — src/server.ts (from P0.2.1)
Canonical path: src/server.ts (559 LOC, exported module — package.json
"main" points here).
2.1. Public exports relevant to startup
| Export | Shape | Relevance to P0.2.3 |
|---|---|---|
createServer(options?) |
CreateServerOptions → ColibriServerContext |
Reused as-is. Constructs the MCP server + transport + audit sink + mode detection. Does NOT connect the transport. Does NOT install process.on(...) handlers. |
registerColibriTool(ctx, name, config, handler) |
4-arg register | Reused by Phase 2 when domain tasks land. P0.2.3 does not register new tools itself beyond what bootstrap() already does. |
start(ctx) |
(ctx) → Promise<void> |
This is the Phase 1 primitive. It installs global handlers (gated by installGlobalHandlers), calls ctx.server.connect(ctx.transport) inside a Promise.race against ctx.startupTimeoutMs, and logs [colibri] starting + [colibri] ready. |
stop(ctx) |
(ctx) → Promise<void> |
await ctx.server.close(). P0.2.3 wraps this in shutdown() and additionally calls closeDb(). |
bootstrap(options?) |
BootstrapOptions → Promise<ColibriServerContext> |
The Phase 1-only entry path. Constructs the ctx, registers server_ping, calls start(), and on error logs [colibri] fatal: + invokes the injected exit. P0.2.3’s startup() composes Phase 2 on top of this. |
createNoOpAuditSink() |
() → AuditSink |
Not touched. |
ColibriServerContext (type) |
See server.ts §”Opaque context” | Holds server, transport, version, mode, nowMs, bootStartMs, logger, startupTimeoutMs, _toolLocks, _registeredToolNames, _globalHandlersInstalled. No DB field — Phase 2 must return the DB alongside the ctx, not mutate the ctx. |
CreateServerOptions, BootstrapOptions, ColibriToolConfig, AuditSink, ToolEnterEvent, ToolExitEvent (types) |
Various | Not touched. |
2.2. Script-invocation guard (bottom of server.ts)
function isInvokedAsScript(): boolean { … }
if (isInvokedAsScript()) {
await bootstrap();
}
This is the current production entry path. Problem for P0.2.3: bootstrap()
as-shipped does Phase 1 only. Once P0.2.3 lands, the script-invocation path
must run the two-phase startup, not just Phase 1.
Decision (baked into the contract): the module-level guard will switch from
bootstrap() to startup() from ./startup.js. bootstrap() stays exported
for tests + for any future callers that deliberately want a Phase-1-only path;
it is not removed. Verification: the P0.2.1 subprocess test
(main() IIFE smoke) asserts that stderr shows [colibri] starting — that
message is still emitted by start() which startup() calls, so the
assertion stays green.
2.3. Logging conventions already established
- All server-side logs go through
ctx.logger(defaults toconsole.error). - Stdout is reserved for MCP JSON-RPC traffic. S17.
src/server.tsdoes not monkey-patchprocess.stdout.write(donor bug #3). - Current start log:
[colibri] starting in mode= <mode> version= <v>on entry tostart();[colibri] readyon successful connect. - Current fatal log:
[colibri] fatal: <err>frombootstrap().
P0.2.3 new log lines (must match this prefix convention):
| Phase | Message | When |
|---|---|---|
| Phase 1 start | [Startup] Phase 1: transport… |
Just before calling start() |
| Phase 1 done | [Startup] Phase 1 ready |
After start() resolves |
| Phase 2 start | [Startup] Phase 2: heavy-init… |
Before initDb |
| Phase 2 done | [Startup] Complete in <N>ms |
After initDb resolves |
| Phase 2 fail | [Startup] Phase 2 failed: <err> |
In the catch block |
| Phase 2 abort | [Startup] Aborted after <N>ms |
In the catch block |
| Shutdown start | [Shutdown] <reason> |
Entry to shutdown() |
| Shutdown force | [Shutdown] Forced after 5000ms timeout |
On the 5s race |
| Shutdown done | [Shutdown] Clean |
After closeDb() resolves |
All via console.error through an injectable logger in tests.
3. Existing surface — src/db/index.ts (from P0.2.2)
Canonical path: src/db/index.ts.
3.1. Public exports relevant to startup
| Export | Shape | Relevance to P0.2.3 |
|---|---|---|
initDb(path) |
(string) → Database.Database (sync) |
Called once from Phase 2. Sync call, but we wrap in an async context. Deliberately does NOT default to config.COLIBRI_DB_PATH — the caller passes the path explicitly so tests can bypass env. P0.2.3 passes config.COLIBRI_DB_PATH. |
getDb() |
() → Database.Database |
Throws until initDb runs. P0.2.3 does not call this; it’s for domain code (future P0.3+). |
closeDb() |
() → void (sync, idempotent) |
Called from shutdown(). Safe to call even if initDb never ran (early Phase 1 failure). |
3.2. Error semantics
initDbthrowsError("Database integrity check failed: …")on corruption.initDbthrowsError("Migration <file> failed: …")on migration SQL error.initDbthrowsError("Migration prefix collision at N: <a> vs <b>")on file-ordering collision.- In all of the above, the underlying handle is
db.close()‘d before re-throw socloseDb()is a safe no-op after failure.
Implication for P0.2.3: if initDb throws inside Phase 2, calling
closeDb() during graceful shutdown is safe — the singleton was never
published.
4. Existing surface — src/config.ts (from P0.1.4)
config.COLIBRI_DB_PATH is a frozen string, default data/colibri.db.
config.COLIBRI_STARTUP_TIMEOUT_MS is a frozen positive integer, default
- P0.2.3 reads both values; it does not mutate or override them.
Note the spec deviation (Sigma-approved, bake into packet): the P0.2.3 prompt
snippet in p0.2-alpha-system-core.md uses config.DATABASE_PATH || …. This
is stale. The canonical env key is COLIBRI_DB_PATH.
5. Where bootstrap happens today (pre-P0.2.3)
Two places:
- Script invocation (
src/server.tsbottom):if (isInvokedAsScript()) { await bootstrap(); }. - Test harness (
src/__tests__/server.test.ts): tests callbootstrapdirectly with a fakeInMemoryTransport+ injectedexit.
Neither place currently opens the DB. P0.2.3 adds the DB open to the script
invocation only — tests continue to use bootstrap() for Phase-1-only
coverage; new startup tests use the same InMemoryTransport pattern.
6. Is there already a module orchestrating startup?
No. There is no src/startup.ts, no src/index.ts, and no other file
that sequences Phase 1 → Phase 2. The current entry point is the bare
if (isInvokedAsScript()) await bootstrap(); tail of src/server.ts. That
tail is the one line P0.2.3 must modify.
The spec file (p0.2-alpha-system-core.md) mentions creating src/index.ts
as an entry point that calls startup(). We deviate from that: we keep
package.json "main" on src/server.ts (unchanged from P0.2.1) and rewire
the script-invocation tail of server.ts to call startup() instead of
bootstrap(). Rationale: package.json "main" should not move under a
parallel-batch lock constraint (package.json is in our MUST-NOT-TOUCH
list), and keeping the entry module stable avoids a churn commit on
dist/server.js / bin.colibri.
7. console.error audit — who writes to stderr today?
src/server.ts:[colibri] starting,[colibri] ready,[colibri] fatal:,[colibri] unhandledRejection,[colibri] uncaughtException,[colibri] audit-exit sink failed:.src/db/index.ts: nothing — DB module is silent, throws only.src/config.ts: nothing — throws on env errors.src/modes.ts: nothing.
P0.2.3 decision: new lines use a distinct [Startup] / [Shutdown]
prefix so they are grep-separable from server-transport logs. start() and
the global-handler log lines keep their existing [colibri] prefix — no
retrofit.
8. Test conventions established
- Jest 29 with ESM (
node --experimental-vm-modules). roots: ['<rootDir>/src']; tests live insrc/__tests__/.jest.isolateModulesAsyncis broken under ts-jest ESM (zod cache bug documented infeedback_phase0_first_code_hazards.md) — use pure factory functions +spawnSync+tsxfor per-test env isolation.InMemoryTransport.createLinkedPair()from@modelcontextprotocol/sdkfor in-process E2E.spawnSync(process.execPath, ['--import', 'tsx', …])for subprocess tests.- Spy on
process.exitvia direct reassign + restore (see server.test.ts describe 4 “installed handlers log via ctx.logger and call process.exit(1)”).
P0.2.3 startup tests reuse these patterns end-to-end.
9. Known constraints (non-negotiable)
- No new deps. (Batch-lock rule.)
- No edits to
src/config.ts,src/modes.ts,src/db/index.ts,src/db/schema.sql,src/db/migrations/*. (Batch-lock rule.) - No edits to
package.json,jest.config.ts,.eslintrc.json,.prettierrc,tsconfig.json,.github/workflows/*. (Batch-lock rule.) - No edits to any
src/domains/*. (Parallel executors own those.) - No edits to
src/__tests__/server.test.tsetc. (Parallel lock.) src/server.tsis ALLOWED to modify — but only the script-invocation tail. The contract documents the exact change.
10. Open questions answered by the packet
- Does
bootstrap()stay exported? Yes. It becomes a Phase-1-only primitive thatstartup()composes on top of. Tests keep using it for transport-only coverage. - Where does the DB handle live? Returned from
startup(), not stashed on the ctx. Callers who need it usegetDb()fromsrc/db/index.js— P0.2.3 does not reinvent that path. - Who owns SIGINT/SIGTERM?
startup()registers them idempotently (guarded by a module-levelsignalsRegisteredflag, reset onshutdown()return). They callshutdown('SIGINT')/shutdown('SIGTERM')and thenprocess.exit(0)on clean,process.exit(1)on error. - What’s the re-entry guard? A module-level
startupInvokedboolean. Set at the top ofstartup()before any work. Second call throwsError('startup() already invoked'). - What timeout on in-flight cleanup? 5000 ms per the packet spec. Uses
Promise.race([cleanup, sleep(5000)]); if the sleep wins, log[Shutdown] Forced after 5000ms timeoutand continue.
11. Files this task will create or modify
Create (owned by P0.2.3, batch-lock safe)
| Path | Purpose |
|---|---|
src/startup.ts |
Two-phase orchestrator + shutdown + signal handlers. |
src/__tests__/startup.test.ts |
Jest coverage for all 8 acceptance criteria. |
docs/audits/p0-2-3-two-phase-startup-audit.md |
This file (step 1). |
docs/contracts/p0-2-3-two-phase-startup-contract.md |
Step 2. |
docs/packets/p0-2-3-two-phase-startup-packet.md |
Step 3. |
docs/verification/p0-2-3-two-phase-startup-verification.md |
Step 5. |
Modify (batch-lock explicitly allows)
| Path | Change | Rationale |
|---|---|---|
src/server.ts |
Replace the 3-line if (isInvokedAsScript()) { await bootstrap(); } tail with a call to startup() from ./startup.js. No other changes. |
Must swap the production entry path to the two-phase orchestrator. Limited to the tail; bootstrap() is still exported and used by tests. |
12. Risks surfaced for the packet
- R-1. The
main() IIFE smoketest inserver.test.tsspawnssrc/server.tsand asserts[colibri] startingon stderr. If we changeserver.ts’s tail to callstartup(),startup()itself must still cause[colibri] startingto be emitted (becausestartup()callsbootstrap()→start()→ the existing log line). The contract pins this:startup()MUST reusebootstrap()orstart(), not bypass them. - R-2. The
isInvokedAsScript() returns false when process.argv[1] is undefinedtest asserts that loadingsrc/server.tswithout a direct script path does NOT print[colibri] starting. The new tail must keep theif (isInvokedAsScript())guard. Verified in packet. - R-3. Signal handlers installed at module scope would leak into Jest
workers on every test file import of
src/startup.ts. Solution: install the signal handlers insidestartup()(not at module scope) so they are only registered when a real startup runs. - R-4.
process.exitin tests kills Jest. Tests must spy onprocess.exit(direct reassign + restore pattern from server.test.ts describe 4). The module’s production path keeps the realprocess.exit. - R-5.
initDbis synchronous. An asyncstartup()that wraps a syncinitDbinside a try/catch is fine — JS try/catch handles sync throws inside async functions transparently. No change. - R-6. The batch is running P0.3.1, P0.6.1, P0.7.1 in parallel. None of
them touch
src/startup.tsor the server-ts script tail — confirmed by the batch-lock rules in the Sigma prompt. No merge conflict expected.
13. Exit criteria for step 1 (this audit)
- Current
src/server.tsexports enumerated (§2.1). - Script-invocation tail identified as the one line to modify (§2.2, §11).
src/db/index.tssurface mapped with error semantics (§3).config.COLIBRI_DB_PATHas the Phase-2 DB path argument confirmed (§4).- Log-prefix convention chosen (§7).
- Test-harness conventions recorded (§8).
- Batch-lock constraints enumerated (§9).
- Open design questions resolved (§10).
- Risks for the packet (§12).
Ready for step 2 (contract).