P0.2.4 — Health Check Tool — Contract
1. Intent
Register a single MCP tool — server_health — that answers a zero-argument
probe with a snapshot of the server’s runtime state:
{
"status": "ok",
"version": "0.0.1",
"uptime_ms": 1234,
"db_tables": 0,
"phase": "phase2",
"mode": "FULL"
}
The tool exists so external clients can distinguish:
- Is the server alive? — any response is evidence;
status: "ok"makes the intent explicit. - Which build? —
version(frompackage.json). - How long has it been up? —
uptime_ms(monotonic, bootStart-relative). - Is the DB wired? —
db_tables> 0 confirms migrations ran. - Which startup phase? —
phase1means “transport up, DB not yet” (the Phase 2 race window);phase2means “DB open, all migrations applied”. - Which runtime mode? —
modeenables clients to branch onREADONLYetc.
This contract fixes the public surface, invariants, error modes, and observability requirements. It gates Step 3 (packet).
2. Public surface
2.1. New module src/tools/health.ts
import type { ColibriServerContext } from '../server.js';
/**
* Register `server_health` with the given server context. Must be called
* before `start(ctx)` (MCP SDK forbids registration after connect).
*/
export function registerHealthTool(ctx: ColibriServerContext): void;
Exports a single function registerHealthTool. No other exports. No
default export. No top-level side effects.
2.2. Extension to src/server.ts — ColibriServerContext
Two new fields appended to the context type (NON-readonly, unlike the existing readonly fields):
export interface ColibriServerContext {
// ... existing fields (all readonly) ...
/** P0.2.4: DB handle, set by startup.ts Phase 2 after `initDb()`. */
db?: Database.Database;
/** P0.2.4: current startup phase, set by bootstrap() and startup.ts. */
phase?: 'phase1' | 'phase2';
}
Rationale for optional mutable fields:
readonlyis inappropriate becausestartup.tsmust assign them post-construction.undefinedis the correct default — Phase 1 may run briefly without any phase value (betweencreateServerand thectx.phase = 'phase1'write inbootstrap), anddbis never set until Phase 2.- Moving the fields behind a getter adds complexity without benefit — tests can assign directly.
2.3. Extension to src/server.ts — bootstrap()
Two additions, in order:
export async function bootstrap(
options: BootstrapOptions = {},
): Promise<ColibriServerContext> {
const exit = options.exit ?? process.exit.bind(process);
const ctx = createServer(options.createOptions ?? {});
try {
ctx.phase = 'phase1'; // ← NEW, before registration
registerColibriTool(ctx, 'server_ping', ...);
registerHealthTool(ctx); // ← NEW, after server_ping
await start(ctx);
return ctx;
} catch (err) {
ctx.logger('[colibri] fatal:', err);
exit(1);
return ctx;
}
}
bootstrap() imports registerHealthTool at the top of server.ts.
2.4. Extension to src/startup.ts — Phase 2
Two lines added to the Phase 2 try block, after initDbFn:
try {
const db = initDbFn(dbPath);
ctx.db = db; // ← NEW
ctx.phase = 'phase2'; // ← NEW
const elapsedMs = Math.floor(nowMs() - phase1StartMs);
logger(`[Startup] Complete in ${elapsedMs}ms`);
return { ctx, db, elapsedMs };
}
If initDbFn throws, neither mutation runs — ctx.phase remains
'phase1' (set by bootstrap), ctx.db remains undefined.
server_health can still answer with { phase: 'phase1', db_tables: 0 }.
3. Tool input / output schema
3.1. Input schema
z.object({})
No arguments. Extraneous fields are silently discarded (Zod default).
Passing a non-object (e.g. "foo") is rejected by stage 2 with the
standard INVALID_PARAMS envelope.
3.2. Output schema
z.object({
status: z.literal('ok'),
version: z.string().min(1),
uptime_ms: z.number().int().nonnegative(),
db_tables: z.number().int().nonnegative(),
phase: z.enum(['phase1', 'phase2']),
mode: z.enum(['FULL', 'READONLY', 'TEST', 'MINIMAL']),
})
All six fields are required. No optional fields. The enum for mode
mirrors RUNTIME_MODES from src/modes.ts.
4. Handler behaviour
4.1. Pseudocode
function handler(): HealthPayload {
const uptime_ms = Math.floor(ctx.nowMs() - ctx.bootStartMs);
const db_tables = countTables(ctx.db);
const phase = ctx.phase ?? 'phase1';
return {
status: 'ok',
version: ctx.version,
uptime_ms,
db_tables,
phase,
mode: ctx.mode,
};
}
function countTables(db: Database.Database | undefined): number {
if (db === undefined) return 0;
try {
const row = db
.prepare(
"SELECT COUNT(*) AS c FROM sqlite_master " +
"WHERE type = 'table' AND name NOT LIKE 'sqlite_%'",
)
.get() as { c: number } | undefined;
return row?.c ?? 0;
} catch {
return 0;
}
}
4.2. Synchronous
The handler is synchronous (() => HealthPayload, not
() => Promise<HealthPayload>). registerColibriTool accepts both shapes
via Promise<unknown> | unknown. Synchronous avoids an unnecessary
microtask hop and reliably keeps response time under 100 ms.
4.3. ctx.phase ?? 'phase1' default
If phase is undefined (should not happen in production because
bootstrap sets it before start, but defensive), return 'phase1'.
This is the narrowing branch that makes phase always a string in the
output schema.
4.4. Never throws
The handler catches any SqliteError from the db_tables query. All
other operations (arithmetic, property reads) cannot throw on a
well-formed ctx. A non-Error thrown from the query path is swallowed the
same way — the outer try { ... } catch { return 0 } is untyped.
5. Invariants
5.1. Response time < 100 ms
The chain overhead (stages 1-5 + SDK envelope serialisation) is typically
1-5 ms on Node 20 under load. The SELECT COUNT(*) FROM sqlite_master
query is O(1) against an in-memory WAL-mode SQLite file; expected < 1 ms.
Tests pin a < 100 ms ceiling.
5.2. db_tables accurate post-migration
Tests create an in-memory DB, insert a known count of CREATE TABLE
statements, wire the DB into ctx.db, and assert the returned count
matches exactly. The count excludes sqlite_* internal tables.
5.3. phase matches runtime state
| Runtime state | ctx.phase |
Expected in health |
|---|---|---|
Before bootstrap |
undefined |
'phase1' (default) |
After ctx.phase = 'phase1' in bootstrap, before start |
'phase1' |
'phase1' |
After start, before initDb in startup |
'phase1' |
'phase1' |
After initDb succeeds in startup |
'phase2' |
'phase2' |
After initDb throws in startup |
'phase1' (unchanged) |
'phase1' |
| After shutdown closes DB (rare — transport usually closes first) | 'phase2' (stale), ctx.db closed |
'phase2' + db_tables: 0 (defensive catch) |
5.4. version matches package.json
ctx.version is read from package.json in createServer(). Tests
assert the returned version equals the value read from
package.json via readFileSync.
5.5. mode is one of the four
RuntimeMode is a closed union. The output schema’s z.enum enforces
this. Tests instantiate the ctx with each mode and round-trip.
6. Error modes
6.1. Stage 2 schema rejection on unexpected args
Calling server_health with non-empty arguments passes through —
Zod’s z.object({}) does not strip extras by default, but also does not
reject them. However, passing a non-object arguments (e.g. a string)
is rejected by stage 2 with:
{
"ok": false,
"error": {
"code": "INVALID_PARAMS",
"message": "schema validation failed",
"details": { "issues": [ ... ] }
}
}
Test: call with arguments: "foo" as any → response.isError === true,
response.structuredContent.error.code === 'INVALID_PARAMS'.
6.2. Handler never throws
By construction (try/catch around the DB query). Even if ctx.db is a
closed handle, ctx.db.prepare(...) throws SqliteError: database is
closed, which is caught and db_tables: 0 is returned.
6.3. Stage 5 sink failure does not leak
Standard middleware invariant from P0.2.1 — a stage-5 sink throw is
caught by registerColibriTool’s finally block and logged via
ctx.logger('[colibri] audit-exit sink failed:', ...). The original
response envelope is returned unchanged.
7. Observability
7.1. Stage 3 (audit-enter) event
Emitted once per call:
{
tool: 'server_health',
args: {}, // empty object after Zod parse
timestamp: performance.now() reading,
correlationId: UUID v4,
}
7.2. Stage 5 (audit-exit) event
Emitted once per call (from the finally block):
{
tool: 'server_health',
correlationId: <matching>,
durationMs: <floor(nowMs() - enterTs)>,
result: { ok: true, data: HealthPayload }, // on success
error: <Error>, // on failure (never happens in practice)
}
Tests use makeRecordingSink to assert both events fire exactly once
and share the same correlationId.
7.3. No log lines
The handler does NOT call ctx.logger. Neither a success nor a failure
log is emitted for the probe itself. Operators probing the endpoint would
otherwise flood stderr.
8. Non-goals
- Not a liveness probe beyond
status: 'ok'— we do not check DB connectivity by running a real query on a user table (there are no user tables in Phase 0 P0.2.4).db_tables > 0is later evidence; Phase 0 P0.2.4 ships withdb_tables: 0. - Not a versioned probe — a future
server_health/v2is a separate task; this is the v1 surface. - Not a readiness probe vs. liveness probe split — Phase 0 collapses
both into the single
phasefield. - Not a multi-DB probe — Phase 0 has one DB (
config.COLIBRI_DB_PATH).
9. Acceptance criteria mapping
Task spec (task-breakdown.md L140-151):
- ✅ Tool name:
server_health(deviation #2 — underscore not slash) - ✅ Returns:
{ status, version, uptime_ms, db_tables, phase, mode } - ✅
status: "ok" - ✅
versionfrom package.json — already in ctx - ✅
uptime_ms: Math.floor(ctx.nowMs() - ctx.bootStartMs) - ✅
db_tables= count of SQLite tables excludingsqlite_* - ✅
phase: "phase1" | "phase2"— reflects runtime state - ✅
mode= currentRuntimeMode - ✅ Response time < 100 ms — asserted in tests
- ✅ 100% branch coverage — 6 branches mapped in §4.1
10. References
- Task spec:
docs/guides/implementation/task-breakdown.mdL140-151 - Audit:
docs/audits/p0-2-4-health-audit.md - P0.2.1 contract:
docs/contracts/p0-2-1-mcp-server-contract.md - P0.2.2 contract:
docs/contracts/p0-2-2-sqlite-init-contract.md - P0.2.3 contract:
docs/contracts/p0-2-3-two-phase-startup-contract.md - P0.4.1 contract:
docs/contracts/p0-4-1-modes-contract.md