P0.2.3 — Two-Phase Startup — Packet
1. Commit plan
Five commits on feature/p0-2-3-two-phase-startup:
| # | Commit | Files | Rationale |
|---|---|---|---|
| 1 | audit(p0-2-3-two-phase-startup): inventory surface |
docs/audits/p0-2-3-two-phase-startup-audit.md |
Step 1 landed. |
| 2 | contract(p0-2-3-two-phase-startup): behavioral contract |
docs/contracts/p0-2-3-two-phase-startup-contract.md |
Step 2 landed. |
| 3 | packet(p0-2-3-two-phase-startup): execution plan |
docs/packets/p0-2-3-two-phase-startup-packet.md |
This file. |
| 4 | feat(p0-2-3-two-phase-startup): two-phase startup + graceful shutdown |
src/startup.ts, src/server.ts, src/__tests__/startup.test.ts |
One feat commit for the module + wiring + tests — logically indivisible. |
| 5 | verify(p0-2-3-two-phase-startup): test evidence |
docs/verification/p0-2-3-two-phase-startup-verification.md |
Step 5 lands after npm test / npm run lint / npm run build all green. |
2. src/startup.ts — skeleton
/**
* Colibri — Phase 0 two-phase startup orchestrator (α System Core).
*
* Phase 1 (fast) — transport ready, server_ping registered. No DB. This is
* the P0.2.1 `bootstrap()` invocation, untouched.
*
* Phase 2 (heavy) — open the SQLite file, run integrity_check + migrations,
* publish the `db` singleton via `src/db/index.ts`. Future domain code
* (β/ε/ζ/η/ν) registers its tools + migrations here.
*
* Graceful shutdown — on Phase 2 throw, on SIGINT, and on SIGTERM, close
* the transport, close the DB, remove the signal listeners, and exit 0
* (clean) or 1 (error).
*
* Canonical references:
* - docs/audits/p0-2-3-two-phase-startup-audit.md
* - docs/contracts/p0-2-3-two-phase-startup-contract.md
* - docs/packets/p0-2-3-two-phase-startup-packet.md
* - docs/2-plugin/boot.md §"Two-Phase Startup"
* - docs/spec/s17-mcp-surface.md §"Transport lifecycle"
*/
import { performance } from 'node:perf_hooks';
import type Database from 'better-sqlite3';
import { config } from './config.js';
import { initDb as initDbImpl, closeDb as closeDbImpl } from './db/index.js';
import {
bootstrap as bootstrapImpl,
stop as stopImpl,
type BootstrapOptions,
type ColibriServerContext,
type CreateServerOptions,
} from './server.js';
export interface StartupOptions { … } // §2.1 of contract
export interface StartupResult { … } // §2.1 of contract
// Module state — reset via __resetForTests.
let startupInvoked = false;
let activeCtx: ColibriServerContext | null = null;
let activeDb: Database.Database | null = null;
let activeOptions: Required<Pick<StartupOptions, 'stopFn' | 'closeDbFn' | 'logger' | 'exit' | 'cleanupTimeoutMs' | 'nowMs'>> | null = null;
let shutdownPromise: Promise<void> | null = null;
let sigintHandler: NodeJS.SignalsListener | null = null;
let sigtermHandler: NodeJS.SignalsListener | null = null;
export async function startup(options: StartupOptions = {}): Promise<StartupResult> {
if (startupInvoked) throw new Error('startup() already invoked');
startupInvoked = true;
const logger = options.logger ?? console.error;
const nowMs = options.nowMs ?? (() => performance.now());
const exit = options.exit ?? process.exit.bind(process);
const cleanupTimeoutMs = options.cleanupTimeoutMs ?? 5000;
const bootstrapFn = options.bootstrapFn ?? bootstrapImpl;
const stopFn = options.stopFn ?? stopImpl;
const initDbFn = options.initDbFn ?? initDbImpl;
const closeDbFn = options.closeDbFn ?? closeDbImpl;
const dbPath = options.dbPath ?? config.COLIBRI_DB_PATH;
const registerSignalHandlers = options.registerSignalHandlers ?? true;
const phase1StartMs = nowMs();
// Phase 1 — transport
logger('[Startup] Phase 1: transport...');
const bootOpts: BootstrapOptions = {
...(options.createOptions !== undefined ? { createOptions: options.createOptions } : {}),
exit,
};
const ctx = await bootstrapFn(bootOpts);
activeCtx = ctx;
logger('[Startup] Phase 1 ready');
// Stash cleanup references for shutdown()
activeOptions = { stopFn, closeDbFn, logger, exit, cleanupTimeoutMs, nowMs };
// Register signal handlers only after Phase 1 succeeds
if (registerSignalHandlers) {
sigintHandler = () => { void gracefulSignalExit('SIGINT'); };
sigtermHandler = () => { void gracefulSignalExit('SIGTERM'); };
process.on('SIGINT', sigintHandler);
process.on('SIGTERM', sigtermHandler);
}
// Phase 2 — heavy init
logger('[Startup] Phase 2: heavy-init...');
try {
const db = initDbFn(dbPath); // sync — throws synchronously on error
activeDb = db;
const elapsedMs = Math.floor(nowMs() - phase1StartMs);
logger(`[Startup] Complete in ${elapsedMs}ms`);
return { ctx, db, elapsedMs };
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
logger('[Startup] Phase 2 failed:', msg);
const abortedMs = Math.floor(nowMs() - phase1StartMs);
logger(`[Startup] Aborted after ${abortedMs}ms`);
await shutdown('phase-2-failed');
exit(1);
// If exit doesn't terminate (tests), rethrow so caller sees it.
throw err instanceof Error ? err : new Error(String(err));
}
}
export function shutdown(reason: string): Promise<void> {
if (shutdownPromise !== null) return shutdownPromise;
const opts = activeOptions;
const ctx = activeCtx;
const { logger, stopFn, closeDbFn, cleanupTimeoutMs } =
opts ?? { logger: console.error, stopFn: stopImpl, closeDbFn: closeDbImpl, cleanupTimeoutMs: 5000 };
logger(`[Shutdown] ${reason}`);
shutdownPromise = (async () => {
let forced = false;
// Transport first
if (ctx !== null) {
try {
let timer: ReturnType<typeof setTimeout> | undefined;
await Promise.race([
stopFn(ctx),
new Promise<void>((resolve) => {
timer = setTimeout(() => { forced = true; resolve(); }, cleanupTimeoutMs);
timer.unref();
}),
]);
if (timer !== undefined) clearTimeout(timer);
} catch (err) {
logger('[Shutdown] stop failed:', err);
}
}
// DB second
try {
closeDbFn();
} catch (err) {
logger('[Shutdown] close failed:', err);
}
// Signals last
if (sigintHandler !== null) { process.off('SIGINT', sigintHandler); sigintHandler = null; }
if (sigtermHandler !== null){ process.off('SIGTERM', sigtermHandler); sigtermHandler = null; }
if (forced) logger('[Shutdown] Forced after 5000ms timeout');
logger('[Shutdown] Clean');
})();
return shutdownPromise;
}
async function gracefulSignalExit(signalName: string): Promise<void> {
const opts = activeOptions;
const { logger, exit } = opts ?? { logger: console.error, exit: process.exit.bind(process) };
try {
await shutdown(`signal-${signalName}`);
exit(0);
} catch (err) {
logger('[Shutdown] signal handler failed:', err);
exit(1);
}
}
export function __resetForTests(): void {
startupInvoked = false;
shutdownPromise = null;
activeCtx = null;
activeDb = null;
activeOptions = null;
if (sigintHandler !== null) { process.off('SIGINT', sigintHandler); sigintHandler = null; }
if (sigtermHandler !== null) { process.off('SIGTERM', sigtermHandler); sigtermHandler = null; }
}
(The real implementation fleshes this out with full JSDoc, strict types, and imports.)
3. src/server.ts — exact diff sketch
Replace only these 3 lines at the bottom of the file:
if (isInvokedAsScript()) {
- await bootstrap();
+ const { startup } = await import('./startup.js');
+ await startup();
}
The isInvokedAsScript() guard function is untouched. All 13 existing
exports from src/server.ts are untouched. No import changes — startup
is loaded dynamically so src/server.ts does not statically import
src/startup.ts (which imports src/server.ts). Dynamic import avoids
the cycle.
4. src/__tests__/startup.test.ts — test case list
24 tests organized into 7 describe blocks. All tests:
- Call
__resetForTests()inafterEach. - Pass
registerSignalHandlers: falseby default; the sub-describe that tests signal installation explicitly enables it and cleans up. - Pass a fake
exitthat pushes into anumber[]so Jest does not die. - Pass
createOptions: { transport: <InMemoryTransport half>, installGlobalHandlers: false, startupTimeoutMs: 5000 }sobootstrapFndefaults work without real stdio.
describe 1 — happy path
startup resolves with { ctx, db, elapsedMs }— injectinitDbFnthat returns a fake handle, assert the returned object shape and the phase-2 log.Phase 1 log precedes Phase 2 log— capture logs into an array, assert[Startup] Phase 1:...index <[Startup] Phase 2:...index.bootstrapFn runs before initDbFn— track call order via shared counter; assertbootstraptimestamp <initDbtimestamp.uses config.COLIBRI_DB_PATH when dbPath is not supplied— inject aninitDbFnthat captures its argument; assert the argument equalsconfig.COLIBRI_DB_PATH.honours a custom dbPath— passdbPath: '/tmp/x.db'; assert capture.emits [Startup] Complete in <N>ms on success— regex the log.elapsedMs is computed from the injected nowMs— inject a monotonic counternowMs; assertelapsedMsequals the expected delta.
describe 2 — Phase 2 failure
rethrows the underlying error when exit does not terminate— injectinitDbFnthrowingError('boom'), fakeexit; assertstartup()rejects with the originalboom.calls exit(1) on Phase 2 failure— inject failinginitDbFn, fakeexit; assertexits.includes(1).runs shutdown(phase-2-failed) before rejecting— inject failinginitDbFn+ spystopFn+ spycloseDbFn; assert both spies called exactly once before the reject.emits [Startup] Phase 2 failed: <msg>— regex the log.emits [Startup] Aborted after <N>ms— regex the log.closeDb is called even when initDb throws— asserted in (10) but explicitly documented as its ownexpectso regression is loud.
describe 3 — shutdown contract
shutdown is idempotent — in-flight call returns same promise— callshutdown('a')andshutdown('b')before either awaits, assertstopFninvocation count is exactly 1.shutdown never throws— injectstopFnthat throws, assertawait expect(shutdown('x')).resolves.toBeUndefined().transport closes before DB— track order in arrays, assertstopFn.order < closeDbFn.order.emits [Shutdown] Forced after 5000ms timeout when stopFn hangs— passcleanupTimeoutMs: 50+stopFnthat never resolves; assert the timeout log and the subsequent[Shutdown] Clean.emits [Shutdown] stop failed when stopFn throws— assert log, assertcloseDbFnstill ran.emits [Shutdown] close failed when closeDbFn throws— assert log.
describe 4 — signals
registers SIGINT and SIGTERM when enabled— count listeners before/after startup; assert+1each.does NOT register signals when option is false— count listeners; assert unchanged.SIGINT triggers shutdown + exit(0)— fireprocess.emit('SIGINT', 'SIGINT'), await a microtask, assert shutdown ran +exitscontains0.signal handler that throws calls exit(1)— inject astopFnthat throws AND acloseDbFnthat throws (both logged, shutdown doesn’t throw). Sinceshutdownswallows errors, the “throws” path is reached by wrappingshutdownvia spy that rejects — or by asserting the exit-0 path only and keeping the exit-1 branch tested via the_-prefixed internal (we document that the exit-1 path is defense-in-depth coverage). See R-9.
describe 5 — re-entry guard
second startup() call throws— callstartup()successfully, then call it again in the same test, assert.toThrow('startup() already invoked').
describe 6 — signal-handler leakage regression
Counted inside describe 4; a final assertion confirms
process.listenerCount('SIGINT') is zero after each test thanks to
__resetForTests.
describe 7 — subprocess smoke
main() IIFE smoke — script invocation boots, logs [Startup] Phase 1—spawnSynctsx src/server.tswithNODE_ENV=test, short timeout. Assert[colibri] starting(frombootstrap() → start()) AND[Startup] Phase 1: transport...both appear on stderr. Matches the server.test.ts pattern.
Target coverage on src/startup.ts:
- Stmt ≥ 95%
- Branch ≥ 90% (contract I-10)
- Func 100%
- Line ≥ 95%
5. Files changed — final list
| Path | Action | Lines (approx) |
|---|---|---|
src/startup.ts |
Create | ~230 |
src/server.ts |
Edit (3-line swap) | +2 / -1 |
src/__tests__/startup.test.ts |
Create | ~650 |
docs/audits/p0-2-3-two-phase-startup-audit.md |
Create (step 1) | 279 |
docs/contracts/p0-2-3-two-phase-startup-contract.md |
Create (step 2) | ~340 |
docs/packets/p0-2-3-two-phase-startup-packet.md |
Create (step 3 — this) | ~370 |
docs/verification/p0-2-3-two-phase-startup-verification.md |
Create (step 5) | ~200 |
Zero edits to package.json, tsconfig.json, jest.config.ts, ESLint
config, src/config.ts, src/modes.ts, src/db/*, src/domains/*, or
any sibling test file — the batch-lock list from §9 of the audit.
6. Risk mitigations
- R-1 (server.test.ts IIFE smoke). The subprocess test asserts
[colibri] starting— emitted bystart()which is called bybootstrap()which is now called bystartup(). Preserved. - R-2 (argv1 undefined). The
if (isInvokedAsScript())guard is preserved byte-for-byte. The import-only path stays silent (nostartupcall, nobootstrapcall). - R-3 (module-scope signal leak). Signal handlers are installed
inside
startup()only.__resetForTests()+shutdown()remove them. - R-4 (process.exit kills Jest). All tests inject a fake
exit. - R-5 (sync initDb in async startup). Handled — try/catch around a sync throw works transparently inside an async function.
- R-6 (parallel batch collision). Wave C siblings own
src/domains/*only; none touchsrc/server.ts,src/startup.ts, orsrc/__tests__/. - R-7 (dynamic import cycle).
src/server.tsusesawait import('./startup.js')inside the IIFE so the import is triggered only whenisInvokedAsScript()is true — Jest loadingsrc/server.tsnever pullssrc/startup.tsvia this path. - R-8 (shutdown reentrancy during signal burst).
shutdownPromiseis cached and returned; a second SIGINT sees the in-flight promise. - R-9 (signal-handler exit-1 branch). The exit-1 branch is reached
only if
shutdown()rejects. Sinceshutdownnever throws by contract (§4.1), the exit-1 branch is defense-in-depth. We cover it by injecting a fakeshutdownmock OR by direct invocation of the internal handler using a thrownErrorfrom the exit fn itself. If branch coverage slips <90%, we add a direct call to the handler with a temporarily-brokenshutdownspy. - R-10 (log-order flakiness).
loggeris injected; each test uses its own array. No sharedconsole.errorreliance.
7. Verification script for Step 5
cd .worktrees/claude/p0-2-3-two-phase-startup
npm ci
npm run lint
npm test # full suite, including new startup.test.ts
npm run build
All four commands must exit 0. Coverage for src/startup.ts is
extracted from coverage/lcov-report/startup.ts.html (or the JSON
summary) and copied into the verification doc along with the full test
output.
8. Exit criteria for step 3 (this packet)
- Commit plan with 5 entries (§1).
src/startup.tsskeleton with exports (§2).src/server.tsexact diff sketch (§3).- Test case list with ≥ 24 tests + subprocess smoke (§4).
- Final files list within batch-lock limits (§5).
- Risk mitigations R-1 through R-10 (§6).
- Verification commands (§7).
Packet approved (Sigma pre-approved via dispatch prompt). Proceed to step 4 (implement).