P0.2.3 — Two-Phase Startup — Contract

1. Intent

Introduce a single async orchestrator, startup(), that sequences the P0.2.1 MCP transport and the P0.2.2 SQLite initializer into two visible phases with graceful shutdown. Phase 1 runs fast and makes the MCP handshake answerable; Phase 2 opens the database. Heavy work is deferred until after the transport is live so the MCP client cannot time out, which is the donor-bug-#4 mitigation P0.2.1 established and this task upholds.

This contract fixes the public surface, the guarantees each phase owes its caller, and the error / shutdown flows. It gates Step 3 (packet).

2. Public surface

2.1. New module src/startup.ts

export interface StartupOptions {
  readonly createOptions?: CreateServerOptions;
  readonly dbPath?: string;                    // defaults to config.COLIBRI_DB_PATH
  readonly initDbFn?: (path: string) => Database.Database;
  readonly closeDbFn?: () => void;
  readonly bootstrapFn?: (opts: BootstrapOptions) => Promise<ColibriServerContext>;
  readonly stopFn?: (ctx: ColibriServerContext) => Promise<void>;
  readonly exit?: (code: number) => void;
  readonly registerSignalHandlers?: boolean;   // default true
  readonly cleanupTimeoutMs?: number;          // default 5000
  readonly nowMs?: () => number;
  readonly logger?: (...args: unknown[]) => void;
}

export interface StartupResult {
  readonly ctx: ColibriServerContext;
  readonly db: Database.Database;
  readonly elapsedMs: number;
}

export function startup(options?: StartupOptions): Promise<StartupResult>;
export function shutdown(reason: string): Promise<void>;

/** @internal — test-only reset so each test gets a fresh module state. */
export function __resetForTests(): void;

Nothing else is exported. The contract pins this list.

2.2. src/server.ts delta (owned by this task)

Replace the bottom-of-file IIFE:

// BEFORE (P0.2.1)
if (isInvokedAsScript()) {
  await bootstrap();
}

// AFTER (P0.2.3) — identical guard, swaps bootstrap() for startup()
if (isInvokedAsScript()) {
  const { startup } = await import('./startup.js');
  await startup();
}

No other lines of src/server.ts change. bootstrap() stays exported, stays Phase-1-only, and is still used by the in-process bootstrap tests.

3. Phase invariants

3.1. Phase 1 — Transport (fast)

  • Composes bootstrapFn() (default bootstrap from ./server.js).
  • On return: ctx.server.isConnected() === true, server_ping is registered, and the [colibri] ready log has been emitted by start().
  • Database handle is not opened. getDb() still throws.
  • Elapsed time at the Phase-1 boundary is recorded via ctx.nowMs() (the same clock P0.2.1 uses so durations are comparable).
  • Any exception from bootstrapFn propagates out of startup(); the caller receives it. The injected exit is not called at this point — bootstrap’s own internal catch already did that.

3.2. Phase 2 — Heavy init

  • Runs only if Phase 1 resolved without throwing.
  • Emits [Startup] Phase 2: heavy-init... on stderr before work.
  • Calls initDbFn(dbPath) exactly once. Default initDbFn = initDb from ./db/index.js. Default dbPath = config.COLIBRI_DB_PATH.
  • On success:
    • db is the return value — the live handle.
    • startup() resolves with { ctx, db, elapsedMs } where elapsedMs is Math.floor(nowMs() - phase1StartMs).
    • Emits [Startup] Complete in <N>ms.
  • On failure:
    • Emits [Startup] Phase 2 failed: <message>.
    • Invokes the internal shutdown('phase-2-failed') which closes the transport and the DB (closeDbFn; safe no-op because initDb cleans up its own handle on throw).
    • Calls the injected exit(1). If exit does not terminate (tests), the promise rejects with the original error so the caller observes it.

3.3. Re-entrancy guard

startup() sets a module-level startupInvoked boolean at the very top. A second invocation throws:

Error('startup() already invoked')

Reset is ONLY possible via __resetForTests(). Production callers are intentionally locked out — a second startup would leak signal handlers and duplicate-register server_ping.

4. Shutdown contract

4.1. shutdown(reason: string): Promise<void>

  • Idempotent. A second call while the first is in flight returns the same in-flight promise; a call after completion is a no-op.
  • Emits [Shutdown] <reason> on entry.
  • Closes the transport by awaiting stopFn(ctx) (default stop from ./server.js). The 5000 ms timeout races transport cleanup; on timeout, emits [Shutdown] Forced after 5000ms timeout and continues.
  • Closes the DB via closeDbFn() (default closeDb from ./db/index.js). Always safe — sync no-op when not initialized.
  • Removes the SIGINT and SIGTERM listeners this task’s startup() installed.
  • Emits [Shutdown] Clean on success.

shutdown() never throws. Internal errors from stopFn are logged via logger with prefix [Shutdown] stop failed:, swallowed, and cleanup continues. Rationale: during a shutdown we want both halves (transport + DB) to close regardless of which half threw.

4.2. Signal handling

Inside startup(), when registerSignalHandlers !== false:

  • Register one-shot listeners on SIGINT and SIGTERM pointing at:
    async (): Promise<void> => {
      try {
        await shutdown(`signal-${signalName}`);
        exit(0);
      } catch (err) {
        logger('[Shutdown] signal handler failed:', err);
        exit(1);
      }
    };
    
  • Listeners are stored at module scope so shutdown() can remove them via process.off(...). This is R-3 from the audit: signal handlers are installed inside startup(), never at module import time, so they do not leak into Jest workers.

4.3. Shutdown ordering

emit "[Shutdown] <reason>"
await Promise.race([stopFn(ctx), sleep(cleanupTimeoutMs)])
closeDbFn()
process.off('SIGINT', ...)
process.off('SIGTERM', ...)
emit "[Shutdown] Clean" | "[Shutdown] Forced after 5000ms timeout"

Transport first, DB second, signals last. Rationale: an inbound MCP call in flight should receive its response (or at least a transport-level error) before the DB handle it was reading from goes away.

5. Error modes

Where Condition startup() behaviour Injected exit
Phase 1 bootstrapFn throws BEFORE its internal catch Rethrows NOT called by startup() (bootstrap already called it)
Phase 1 bootstrapFn returns (its catch fired, it called exit internally) Resolves as if Phase 1 succeeded but ctx._registeredToolNames may be empty; Phase 2 still proceeds because Phase 1 returned. Called by bootstrap internally
Phase 2 initDbFn throws Rejects with the original error after running shutdown('phase-2-failed') Called with code 1
Shutdown stopFn throws Logs, continues to closeDbFn Not called by shutdown itself
Shutdown closeDbFn throws Logs, continues Not called by shutdown itself
Signal SIGINT / SIGTERM received Runs shutdown(signal) then exit(0) on success, exit(1) on error Called by the signal handler
Re-entry startup() called twice Throws Error('startup() already invoked') Not called

Note on the Phase-1-bootstrap-already-exited case: bootstrap() has its own try/catch that invokes exit(1) on Phase-1 failure. In production, exit is process.exit.bind(process) so the process is gone — Phase 2 never runs. In tests with a fake exit, bootstrap returns a ctx whose server.isConnected() === false; we still proceed to Phase 2 because the contract treats “bootstrap returned without throwing” as Phase-1 success. Tests that want the full Phase-1-failed path inject a bootstrapFn that rejects.

6. Log format

All via logger (default console.error):

Message When
[Startup] Phase 1: transport... Before bootstrapFn()
[Startup] Phase 1 ready After bootstrapFn() resolves
[Startup] Phase 2: heavy-init... Before initDbFn(dbPath)
[Startup] Complete in <N>ms After Phase 2 success
[Startup] Phase 2 failed: <message> In the catch block
[Startup] Aborted after <N>ms Appended after the “failed” line
[Shutdown] <reason> Entry to shutdown()
[Shutdown] stop failed: <err> When stopFn throws
[Shutdown] close failed: <err> When closeDbFn throws
[Shutdown] signal handler failed: <err> When a signal handler’s path throws
[Shutdown] Forced after 5000ms timeout On the cleanup race timeout
[Shutdown] Clean After clean cleanup

Stderr only. Stdout is owned by the MCP stdio transport (donor bug #3 + S17).

7. Test seams

The StartupOptions interface is the sole dependency-injection seam. Every default reads a binding exported by another module; every default is overridable via the options bag. Test patterns required:

Test path Technique
Happy path Inject fake bootstrapFn + initDbFn that resolve; assert StartupResult fields.
Phase 2 fails Inject initDbFn that throws; assert shutdown ran, exit(1) was called, promise rejects.
Graceful SIGINT Register signals, fire process.emit('SIGINT', 'SIGINT'), assert shutdown ran + exit(0).
Graceful SIGTERM Same, with SIGTERM.
Re-entry guard Call startup() twice, expect throw. Reset via __resetForTests.
Signal-handler install gated by option Pass registerSignalHandlers: false, assert process.listenerCount unchanged.
Cleanup timeout Inject a stopFn that never resolves with cleanupTimeoutMs: 50, assert [Shutdown] Forced log.
stopFn throws Inject a throwing stopFn, assert [Shutdown] stop failed: log and DB still closed.
closeDbFn throws Inject a throwing closeDbFn, assert log and shutdown completes.
idempotent shutdown Call shutdown() twice in flight, assert stopFn called exactly once.

__resetForTests() clears module state between tests:

export function __resetForTests(): void {
  startupInvoked = false;
  shutdownPromise = null;
  activeCtx = null;
  // Remove any still-installed signal listeners (defensive).
  if (sigintHandler !== null) {
    process.off('SIGINT', sigintHandler);
    sigintHandler = null;
  }
  if (sigtermHandler !== null) {
    process.off('SIGTERM', sigtermHandler);
    sigtermHandler = null;
  }
}

afterEach in the test file MUST call __resetForTests() so individual tests do not leak state.

8. Invariants — enumerated

  1. I-1. startup() MUST NOT call initDbFn before bootstrapFn resolves. Verified by order-of-calls test.
  2. I-2. startup() MUST emit Phase 1 logs BEFORE any Phase 2 log. Verified by log-order assertion in happy-path test.
  3. I-3. On Phase 2 failure, closeDbFn MUST be called EVEN IF the DB was never successfully opened. This is safe because closeDb is a no-op when initDb cleans up its own handle on throw.
  4. I-4. shutdown() MUST close the transport before the DB.
  5. I-5. shutdown() MUST NOT throw. Internal errors are logged and swallowed.
  6. I-6. startup() MUST reject with the original Phase 2 error when the injected exit does not terminate (tests). In production with process.exit, the reject is unreachable.
  7. I-7. startup() MUST register AT MOST ONE SIGINT listener and AT MOST ONE SIGTERM listener, regardless of how many times signals fire. (Node dispatches the listener once per signal; the listener itself guards against reentrancy via the shutdown() idempotency.)
  8. I-8. registerSignalHandlers: false MUST result in zero signal listeners added.
  9. I-9. bootstrap() and start() from src/server.ts remain exported unchanged. No test for this task modifies server.test.ts.
  10. I-10. Coverage on src/startup.ts ≥ 90% branch. All exported functions exercised. __resetForTests exercised implicitly via afterEach.

9. Backward compatibility

  • bootstrap() as called by src/__tests__/server.test.ts is unchanged.
  • The main() IIFE smoke test (server.test.ts describe 5) spawns tsx src/server.ts with NODE_ENV=test and asserts [colibri] starting on stderr. With the new tail calling startup(), that log line is still emitted by start() via bootstrap() via startup(). The assertion passes without modification.
  • The isInvokedAsScript() returns false test in server.test.ts describe 8 asserts the import-only path is silent. Under the new tail, the if (isInvokedAsScript()) guard is preserved byte-for-byte; the only change inside the guard is await bootstrap()await (await import('./startup.js')).startup(). Silence is preserved.
  • package.json "main" remains src/server.ts. bin.colibri remains dist/server.js. No packaging changes.

10. What this task explicitly does NOT do

  • Does not register any new MCP tools.
  • Does not modify the 5-stage middleware chain (P0.2.4 owns the 11-stage split).
  • Does not add a database-health MCP tool (that is the P0.2.5 follow-up, already flagged in the Phase 0 task list).
  • Does not wire domain registration. The StartupResult exposes ctx and db; later P0.3+ tasks register their tools against the returned ctx.
  • Does not change data/colibri.db or any DB schema.
  • Does not modify src/config.ts — the DATABASE_PATH reference in the spec file is stale (the canonical key is COLIBRI_DB_PATH).

11. Exit criteria for step 2

  • StartupOptions, StartupResult, startup(), shutdown(), __resetForTests() defined (§2.1).
  • Exact edit to src/server.ts pinned (§2.2).
  • Phase 1 + Phase 2 invariants (§3).
  • Shutdown contract with ordering + signal handling (§4).
  • All error modes tabulated (§5).
  • Log format enumerated (§6).
  • Test seams enumerated (§7).
  • Invariants I-1 through I-10 (§8).
  • Backward compatibility confirmed against existing server tests (§9).
  • Non-goals listed (§10).

Ready for step 3 (packet).


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.