P0.2.4 — Health Check Tool — Contract

1. Intent

Register a single MCP tool — server_health — that answers a zero-argument probe with a snapshot of the server’s runtime state:

{
  "status": "ok",
  "version": "0.0.1",
  "uptime_ms": 1234,
  "db_tables": 0,
  "phase": "phase2",
  "mode": "FULL"
}

The tool exists so external clients can distinguish:

  1. Is the server alive? — any response is evidence; status: "ok" makes the intent explicit.
  2. Which build?version (from package.json).
  3. How long has it been up?uptime_ms (monotonic, bootStart-relative).
  4. Is the DB wired?db_tables > 0 confirms migrations ran.
  5. Which startup phase?phase1 means “transport up, DB not yet” (the Phase 2 race window); phase2 means “DB open, all migrations applied”.
  6. Which runtime mode?mode enables clients to branch on READONLY etc.

This contract fixes the public surface, invariants, error modes, and observability requirements. It gates Step 3 (packet).

2. Public surface

2.1. New module src/tools/health.ts

import type { ColibriServerContext } from '../server.js';

/**
 * Register `server_health` with the given server context. Must be called
 * before `start(ctx)` (MCP SDK forbids registration after connect).
 */
export function registerHealthTool(ctx: ColibriServerContext): void;

Exports a single function registerHealthTool. No other exports. No default export. No top-level side effects.

2.2. Extension to src/server.tsColibriServerContext

Two new fields appended to the context type (NON-readonly, unlike the existing readonly fields):

export interface ColibriServerContext {
  // ... existing fields (all readonly) ...

  /** P0.2.4: DB handle, set by startup.ts Phase 2 after `initDb()`. */
  db?: Database.Database;

  /** P0.2.4: current startup phase, set by bootstrap() and startup.ts. */
  phase?: 'phase1' | 'phase2';
}

Rationale for optional mutable fields:

  • readonly is inappropriate because startup.ts must assign them post-construction.
  • undefined is the correct default — Phase 1 may run briefly without any phase value (between createServer and the ctx.phase = 'phase1' write in bootstrap), and db is never set until Phase 2.
  • Moving the fields behind a getter adds complexity without benefit — tests can assign directly.

2.3. Extension to src/server.tsbootstrap()

Two additions, in order:

export async function bootstrap(
  options: BootstrapOptions = {},
): Promise<ColibriServerContext> {
  const exit = options.exit ?? process.exit.bind(process);
  const ctx = createServer(options.createOptions ?? {});
  try {
    ctx.phase = 'phase1';                        // ← NEW, before registration
    registerColibriTool(ctx, 'server_ping', ...);
    registerHealthTool(ctx);                     // ← NEW, after server_ping
    await start(ctx);
    return ctx;
  } catch (err) {
    ctx.logger('[colibri] fatal:', err);
    exit(1);
    return ctx;
  }
}

bootstrap() imports registerHealthTool at the top of server.ts.

2.4. Extension to src/startup.ts — Phase 2

Two lines added to the Phase 2 try block, after initDbFn:

try {
  const db = initDbFn(dbPath);
  ctx.db = db;                                   // ← NEW
  ctx.phase = 'phase2';                          // ← NEW
  const elapsedMs = Math.floor(nowMs() - phase1StartMs);
  logger(`[Startup] Complete in ${elapsedMs}ms`);
  return { ctx, db, elapsedMs };
}

If initDbFn throws, neither mutation runs — ctx.phase remains 'phase1' (set by bootstrap), ctx.db remains undefined. server_health can still answer with { phase: 'phase1', db_tables: 0 }.

3. Tool input / output schema

3.1. Input schema

z.object({})

No arguments. Extraneous fields are silently discarded (Zod default). Passing a non-object (e.g. "foo") is rejected by stage 2 with the standard INVALID_PARAMS envelope.

3.2. Output schema

z.object({
  status: z.literal('ok'),
  version: z.string().min(1),
  uptime_ms: z.number().int().nonnegative(),
  db_tables: z.number().int().nonnegative(),
  phase: z.enum(['phase1', 'phase2']),
  mode: z.enum(['FULL', 'READONLY', 'TEST', 'MINIMAL']),
})

All six fields are required. No optional fields. The enum for mode mirrors RUNTIME_MODES from src/modes.ts.

4. Handler behaviour

4.1. Pseudocode

function handler(): HealthPayload {
  const uptime_ms = Math.floor(ctx.nowMs() - ctx.bootStartMs);
  const db_tables = countTables(ctx.db);
  const phase = ctx.phase ?? 'phase1';
  return {
    status: 'ok',
    version: ctx.version,
    uptime_ms,
    db_tables,
    phase,
    mode: ctx.mode,
  };
}

function countTables(db: Database.Database | undefined): number {
  if (db === undefined) return 0;
  try {
    const row = db
      .prepare(
        "SELECT COUNT(*) AS c FROM sqlite_master " +
          "WHERE type = 'table' AND name NOT LIKE 'sqlite_%'",
      )
      .get() as { c: number } | undefined;
    return row?.c ?? 0;
  } catch {
    return 0;
  }
}

4.2. Synchronous

The handler is synchronous (() => HealthPayload, not () => Promise<HealthPayload>). registerColibriTool accepts both shapes via Promise<unknown> | unknown. Synchronous avoids an unnecessary microtask hop and reliably keeps response time under 100 ms.

4.3. ctx.phase ?? 'phase1' default

If phase is undefined (should not happen in production because bootstrap sets it before start, but defensive), return 'phase1'. This is the narrowing branch that makes phase always a string in the output schema.

4.4. Never throws

The handler catches any SqliteError from the db_tables query. All other operations (arithmetic, property reads) cannot throw on a well-formed ctx. A non-Error thrown from the query path is swallowed the same way — the outer try { ... } catch { return 0 } is untyped.

5. Invariants

5.1. Response time < 100 ms

The chain overhead (stages 1-5 + SDK envelope serialisation) is typically 1-5 ms on Node 20 under load. The SELECT COUNT(*) FROM sqlite_master query is O(1) against an in-memory WAL-mode SQLite file; expected < 1 ms. Tests pin a < 100 ms ceiling.

5.2. db_tables accurate post-migration

Tests create an in-memory DB, insert a known count of CREATE TABLE statements, wire the DB into ctx.db, and assert the returned count matches exactly. The count excludes sqlite_* internal tables.

5.3. phase matches runtime state

Runtime state ctx.phase Expected in health
Before bootstrap undefined 'phase1' (default)
After ctx.phase = 'phase1' in bootstrap, before start 'phase1' 'phase1'
After start, before initDb in startup 'phase1' 'phase1'
After initDb succeeds in startup 'phase2' 'phase2'
After initDb throws in startup 'phase1' (unchanged) 'phase1'
After shutdown closes DB (rare — transport usually closes first) 'phase2' (stale), ctx.db closed 'phase2' + db_tables: 0 (defensive catch)

5.4. version matches package.json

ctx.version is read from package.json in createServer(). Tests assert the returned version equals the value read from package.json via readFileSync.

5.5. mode is one of the four

RuntimeMode is a closed union. The output schema’s z.enum enforces this. Tests instantiate the ctx with each mode and round-trip.

6. Error modes

6.1. Stage 2 schema rejection on unexpected args

Calling server_health with non-empty arguments passes through — Zod’s z.object({}) does not strip extras by default, but also does not reject them. However, passing a non-object arguments (e.g. a string) is rejected by stage 2 with:

{
  "ok": false,
  "error": {
    "code": "INVALID_PARAMS",
    "message": "schema validation failed",
    "details": { "issues": [ ... ] }
  }
}

Test: call with arguments: "foo" as anyresponse.isError === true, response.structuredContent.error.code === 'INVALID_PARAMS'.

6.2. Handler never throws

By construction (try/catch around the DB query). Even if ctx.db is a closed handle, ctx.db.prepare(...) throws SqliteError: database is closed, which is caught and db_tables: 0 is returned.

6.3. Stage 5 sink failure does not leak

Standard middleware invariant from P0.2.1 — a stage-5 sink throw is caught by registerColibriTool’s finally block and logged via ctx.logger('[colibri] audit-exit sink failed:', ...). The original response envelope is returned unchanged.

7. Observability

7.1. Stage 3 (audit-enter) event

Emitted once per call:

{
  tool: 'server_health',
  args: {},                 // empty object after Zod parse
  timestamp: performance.now() reading,
  correlationId: UUID v4,
}

7.2. Stage 5 (audit-exit) event

Emitted once per call (from the finally block):

{
  tool: 'server_health',
  correlationId: <matching>,
  durationMs: <floor(nowMs() - enterTs)>,
  result: { ok: true, data: HealthPayload },  // on success
  error: <Error>,                             // on failure (never happens in practice)
}

Tests use makeRecordingSink to assert both events fire exactly once and share the same correlationId.

7.3. No log lines

The handler does NOT call ctx.logger. Neither a success nor a failure log is emitted for the probe itself. Operators probing the endpoint would otherwise flood stderr.

8. Non-goals

  • Not a liveness probe beyond status: 'ok' — we do not check DB connectivity by running a real query on a user table (there are no user tables in Phase 0 P0.2.4). db_tables > 0 is later evidence; Phase 0 P0.2.4 ships with db_tables: 0.
  • Not a versioned probe — a future server_health/v2 is a separate task; this is the v1 surface.
  • Not a readiness probe vs. liveness probe split — Phase 0 collapses both into the single phase field.
  • Not a multi-DB probe — Phase 0 has one DB (config.COLIBRI_DB_PATH).

9. Acceptance criteria mapping

Task spec (task-breakdown.md L140-151):

  • ✅ Tool name: server_health (deviation #2 — underscore not slash)
  • ✅ Returns: { status, version, uptime_ms, db_tables, phase, mode }
  • status: "ok"
  • version from package.json — already in ctx
  • uptime_ms: Math.floor(ctx.nowMs() - ctx.bootStartMs)
  • db_tables = count of SQLite tables excluding sqlite_*
  • phase: "phase1" | "phase2" — reflects runtime state
  • mode = current RuntimeMode
  • ✅ Response time < 100 ms — asserted in tests
  • ✅ 100% branch coverage — 6 branches mapped in §4.1

10. References

  • Task spec: docs/guides/implementation/task-breakdown.md L140-151
  • Audit: docs/audits/p0-2-4-health-audit.md
  • P0.2.1 contract: docs/contracts/p0-2-1-mcp-server-contract.md
  • P0.2.2 contract: docs/contracts/p0-2-2-sqlite-init-contract.md
  • P0.2.3 contract: docs/contracts/p0-2-3-two-phase-startup-contract.md
  • P0.4.1 contract: docs/contracts/p0-4-1-modes-contract.md

Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.