P0.2.4 — Health Check Tool — Contract

1. Intent

Register a single MCP tool — server_health — that answers a zero-argument probe with a snapshot of the server’s runtime state:

{
  "status": "ok",
  "version": "0.0.1",
  "uptime_ms": 1234,
  "db_tables": 0,
  "phase": "phase2",
  "mode": "FULL"
}

The tool exists so external clients can distinguish:

Is the server alive? — any response is evidence; status: "ok" makes the intent explicit.
Which build? — version (from package.json).
How long has it been up? — uptime_ms (monotonic, bootStart-relative).
Is the DB wired? — db_tables > 0 confirms migrations ran.
Which startup phase? — phase1 means “transport up, DB not yet” (the Phase 2 race window); phase2 means “DB open, all migrations applied”.
Which runtime mode? — mode enables clients to branch on READONLY etc.

This contract fixes the public surface, invariants, error modes, and observability requirements. It gates Step 3 (packet).

2. Public surface

2.1. New module `src/tools/health.ts`

import type { ColibriServerContext } from '../server.js';

/**
 * Register `server_health` with the given server context. Must be called
 * before `start(ctx)` (MCP SDK forbids registration after connect).
 */
export function registerHealthTool(ctx: ColibriServerContext): void;

Exports a single function registerHealthTool. No other exports. No default export. No top-level side effects.

2.2. Extension to `src/server.ts` — `ColibriServerContext`

Two new fields appended to the context type (NON-readonly, unlike the existing readonly fields):

export interface ColibriServerContext {
  // ... existing fields (all readonly) ...

  /** P0.2.4: DB handle, set by startup.ts Phase 2 after `initDb()`. */
  db?: Database.Database;

  /** P0.2.4: current startup phase, set by bootstrap() and startup.ts. */
  phase?: 'phase1' | 'phase2';
}

Rationale for optional mutable fields:

readonly is inappropriate because startup.ts must assign them post-construction.
undefined is the correct default — Phase 1 may run briefly without any phase value (between createServer and the ctx.phase = 'phase1' write in bootstrap), and db is never set until Phase 2.
Moving the fields behind a getter adds complexity without benefit — tests can assign directly.

2.3. Extension to `src/server.ts` — `bootstrap()`

Two additions, in order:

export async function bootstrap(
  options: BootstrapOptions = {},
): Promise<ColibriServerContext> {
  const exit = options.exit ?? process.exit.bind(process);
  const ctx = createServer(options.createOptions ?? {});
  try {
    ctx.phase = 'phase1';                        // ← NEW, before registration
    registerColibriTool(ctx, 'server_ping', ...);
    registerHealthTool(ctx);                     // ← NEW, after server_ping
    await start(ctx);
    return ctx;
  } catch (err) {
    ctx.logger('[colibri] fatal:', err);
    exit(1);
    return ctx;
  }
}

bootstrap() imports registerHealthTool at the top of server.ts.

2.4. Extension to `src/startup.ts` — Phase 2

Two lines added to the Phase 2 try block, after initDbFn:

try {
  const db = initDbFn(dbPath);
  ctx.db = db;                                   // ← NEW
  ctx.phase = 'phase2';                          // ← NEW
  const elapsedMs = Math.floor(nowMs() - phase1StartMs);
  logger(`[Startup] Complete in ${elapsedMs}ms`);
  return { ctx, db, elapsedMs };
}

If initDbFn throws, neither mutation runs — ctx.phase remains 'phase1' (set by bootstrap), ctx.db remains undefined. server_health can still answer with { phase: 'phase1', db_tables: 0 }.

3. Tool input / output schema

3.1. Input schema

z.object({})

No arguments. Extraneous fields are silently discarded (Zod default). Passing a non-object (e.g. "foo") is rejected by stage 2 with the standard INVALID_PARAMS envelope.

3.2. Output schema

z.object({
  status: z.literal('ok'),
  version: z.string().min(1),
  uptime_ms: z.number().int().nonnegative(),
  db_tables: z.number().int().nonnegative(),
  phase: z.enum(['phase1', 'phase2']),
  mode: z.enum(['FULL', 'READONLY', 'TEST', 'MINIMAL']),
})

All six fields are required. No optional fields. The enum for mode mirrors RUNTIME_MODES from src/modes.ts.

4. Handler behaviour

4.1. Pseudocode

function handler(): HealthPayload {
  const uptime_ms = Math.floor(ctx.nowMs() - ctx.bootStartMs);
  const db_tables = countTables(ctx.db);
  const phase = ctx.phase ?? 'phase1';
  return {
    status: 'ok',
    version: ctx.version,
    uptime_ms,
    db_tables,
    phase,
    mode: ctx.mode,
  };
}

function countTables(db: Database.Database | undefined): number {
  if (db === undefined) return 0;
  try {
    const row = db
      .prepare(
        "SELECT COUNT(*) AS c FROM sqlite_master " +
          "WHERE type = 'table' AND name NOT LIKE 'sqlite_%'",
      )
      .get() as { c: number } | undefined;
    return row?.c ?? 0;
  } catch {
    return 0;
  }
}

4.2. Synchronous

The handler is synchronous (() => HealthPayload, not () => Promise<HealthPayload>). registerColibriTool accepts both shapes via Promise<unknown> | unknown. Synchronous avoids an unnecessary microtask hop and reliably keeps response time under 100 ms.

4.3. `ctx.phase ?? 'phase1'` default

If phase is undefined (should not happen in production because bootstrap sets it before start, but defensive), return 'phase1'. This is the narrowing branch that makes phase always a string in the output schema.

4.4. Never throws

The handler catches any SqliteError from the db_tables query. All other operations (arithmetic, property reads) cannot throw on a well-formed ctx. A non-Error thrown from the query path is swallowed the same way — the outer try { ... } catch { return 0 } is untyped.

5. Invariants

5.1. Response time < 100 ms

The chain overhead (stages 1-5 + SDK envelope serialisation) is typically 1-5 ms on Node 20 under load. The SELECT COUNT(*) FROM sqlite_master query is O(1) against an in-memory WAL-mode SQLite file; expected < 1 ms. Tests pin a < 100 ms ceiling.

5.2. `db_tables` accurate post-migration

Tests create an in-memory DB, insert a known count of CREATE TABLE statements, wire the DB into ctx.db, and assert the returned count matches exactly. The count excludes sqlite_* internal tables.

5.3. `phase` matches runtime state

Runtime state	`ctx.phase`	Expected in health
Before `bootstrap`	`undefined`	`'phase1'` (default)
After `ctx.phase = 'phase1'` in bootstrap, before `start`	`'phase1'`	`'phase1'`
After `start`, before `initDb` in startup	`'phase1'`	`'phase1'`
After `initDb` succeeds in startup	`'phase2'`	`'phase2'`
After `initDb` throws in startup	`'phase1'` (unchanged)	`'phase1'`
After shutdown closes DB (rare — transport usually closes first)	`'phase2'` (stale), `ctx.db` closed	`'phase2'` + `db_tables: 0` (defensive catch)

5.4. `version` matches package.json

ctx.version is read from package.json in createServer(). Tests assert the returned version equals the value read from package.json via readFileSync.

5.5. `mode` is one of the four

RuntimeMode is a closed union. The output schema’s z.enum enforces this. Tests instantiate the ctx with each mode and round-trip.

6. Error modes

6.1. Stage 2 schema rejection on unexpected args

Calling server_health with non-empty arguments passes through — Zod’s z.object({}) does not strip extras by default, but also does not reject them. However, passing a non-object arguments (e.g. a string) is rejected by stage 2 with:

{
  "ok": false,
  "error": {
    "code": "INVALID_PARAMS",
    "message": "schema validation failed",
    "details": { "issues": [ ... ] }
  }
}

Test: call with arguments: "foo" as any → response.isError === true, response.structuredContent.error.code === 'INVALID_PARAMS'.

6.2. Handler never throws

By construction (try/catch around the DB query). Even if ctx.db is a closed handle, ctx.db.prepare(...) throws SqliteError: database is closed, which is caught and db_tables: 0 is returned.

6.3. Stage 5 sink failure does not leak

Standard middleware invariant from P0.2.1 — a stage-5 sink throw is caught by registerColibriTool’s finally block and logged via ctx.logger('[colibri] audit-exit sink failed:', ...). The original response envelope is returned unchanged.

7. Observability

7.1. Stage 3 (audit-enter) event

Emitted once per call:

{
  tool: 'server_health',
  args: {},                 // empty object after Zod parse
  timestamp: performance.now() reading,
  correlationId: UUID v4,
}

7.2. Stage 5 (audit-exit) event

Emitted once per call (from the finally block):

{
  tool: 'server_health',
  correlationId: <matching>,
  durationMs: <floor(nowMs() - enterTs)>,
  result: { ok: true, data: HealthPayload },  // on success
  error: <Error>,                             // on failure (never happens in practice)
}

Tests use makeRecordingSink to assert both events fire exactly once and share the same correlationId.

7.3. No log lines

The handler does NOT call ctx.logger. Neither a success nor a failure log is emitted for the probe itself. Operators probing the endpoint would otherwise flood stderr.

8. Non-goals

Not a liveness probe beyond status: 'ok' — we do not check DB connectivity by running a real query on a user table (there are no user tables in Phase 0 P0.2.4). db_tables > 0 is later evidence; Phase 0 P0.2.4 ships with db_tables: 0.
Not a versioned probe — a future server_health/v2 is a separate task; this is the v1 surface.
Not a readiness probe vs. liveness probe split — Phase 0 collapses both into the single phase field.
Not a multi-DB probe — Phase 0 has one DB (config.COLIBRI_DB_PATH).

9. Acceptance criteria mapping

Task spec (task-breakdown.md L140-151):

✅ Tool name: server_health (deviation #2 — underscore not slash)
✅ Returns: { status, version, uptime_ms, db_tables, phase, mode }
✅ status: "ok"
✅ version from package.json — already in ctx
✅ uptime_ms: Math.floor(ctx.nowMs() - ctx.bootStartMs)
✅ db_tables = count of SQLite tables excluding sqlite_*
✅ phase: "phase1" | "phase2" — reflects runtime state
✅ mode = current RuntimeMode
✅ Response time < 100 ms — asserted in tests
✅ 100% branch coverage — 6 branches mapped in §4.1

10. References

Task spec: docs/guides/implementation/task-breakdown.md L140-151
Audit: docs/audits/p0-2-4-health-audit.md
P0.2.1 contract: docs/contracts/p0-2-1-mcp-server-contract.md
P0.2.2 contract: docs/contracts/p0-2-2-sqlite-init-contract.md
P0.2.3 contract: docs/contracts/p0-2-3-two-phase-startup-contract.md
P0.4.1 contract: docs/contracts/p0-4-1-modes-contract.md