P0.2.1 — Step 2 Contract

Behavioral contract for src/server.ts (α System Core — MCP server bootstrap). Every invariant is verifiable. Step 3 (packet) may not expand scope without amending this document first. Step 5 (verify) MUST cite each section as pass/fail.

Upstream authority:

  • docs/guides/implementation/task-breakdown.md §P0.2.1 (acceptance checklist)
  • docs/spec/s17-mcp-surface.md §2, §4, §6, §8 (stdio-only, 5-stage chain, response envelope)
  • docs/2-plugin/boot.md §1-6 (boot sequence, Promise gate, exit codes)
  • docs/2-plugin/middleware.md Stage 1-5 (per-stage semantics)
  • docs/2-plugin/modes.md §”Tool surface per mode” (admitted tools per mode)
  • docs/architecture/decisions/ADR-004-tool-surface.md (19-tool inventory, server_ping name)
  • Sigma dispatch brief (uniform envelope, AuditSink seam, donor-bug mitigation)
  • Audit a7305e43 (surface inventory + spec-contradiction catalogue)

Where the dispatch brief tightens the spec it is because the expanded detail (uptime_ms, correlationId, AuditSink) feeds P0.2.3 + P0.7 directly.


§1. Module surface (exports of src/server.ts)

src/server.ts MUST export exactly these symbols, no more, no fewer:

1a. Types

  1. AuditSink — interface. Pluggable seam for ζ Decision Trail (P0.7). See §5 for the exact shape.
  2. ToolEnterEvent — the shape AuditSink.enter() receives. Frozen at runtime via Object.freeze at construction time.
  3. ToolExitEvent — the shape AuditSink.exit() receives. Frozen.
  4. ColibriToolConfig — the config object passed to registerColibriTool(): { title?: string; description?: string; inputSchema: ZodObject<any>; outputSchema: ZodObject<any> }. inputSchema is required so every tool has a Zod schema per s17 §5 (“A tool without a Zod schema cannot be registered”).
  5. ColibriServerContext — the value returned by createServer(...). Opaque to callers except for registerColibriTool, start, stop, getVersion, getMode, isConnected.

1b. Functions

  1. createServer(options?: CreateServerOptions): ColibriServerContext — factory. See §3 for inputs.
  2. registerColibriTool<I extends ZodRawShape, O extends ZodRawShape>(ctx: ColibriServerContext, name: string, config: ColibriToolConfig, handler: (args: z.infer<ZodObject<I>>) => Promise<unknown> | unknown): void — register a tool with the 5-stage chain. See §4.
  3. start(ctx: ColibriServerContext): Promise<void> — runs boot steps 1-4 (server creation happens in createServer, but transport connect + handshake wait happen in start). See §6.
  4. stop(ctx: ColibriServerContext): Promise<void> — graceful shutdown. Closes transport; no DB to flush in P0.2.1.

1c. Defaults (named exports)

  1. createNoOpAuditSink(): AuditSink — production default for P0.2.1 (before ζ lands).

1d. Entry-point behavior

  1. main() — IIFE-style default entry that reads config/detectMode and calls createServer + start. Guarded by import.meta.url === pathToFileURL(process.argv[1]).href so importing src/server.ts in tests does NOT run main(). When invoked as a module (npm startnode dist/server.js), main() runs.

No default export. Every symbol is a named export (matches src/config.ts + src/modes.ts convention).

No additional symbols. The 5 middleware stages are INTERNAL to src/server.ts and not re-exported; P0.2.4 re-homes them into src/middleware/*.ts and the public API of src/server.ts does not change.


§2. OPEN QUESTION for T0 — tool-lock stage semantics

The audit (§7a) surfaced a spec contradiction:

  • s17 §4 + middleware.md Stage 1: pure per-tool mutex. Concurrency control only. Never rejects a call.
  • modes.md §”What ‘admitted’ means”: consults the active mode and rejects with ToolNotAdmittedError at the lock stage, before validation.

Contract’s proposed reading (pending T0 sign-off):

Stage 1 tool-lock in P0.2.1 is pure per-tool mutex, matching s17 + middleware.md. Capability-gating (which tools are admitted per mode) is a SEPARATE concern handled at TOOL REGISTRATION time: registerColibriTool inspects the current mode via capabilitiesFor(mode) and the per-tool requires set (added in a later task — P0.4.2 or P0.2.3), and refuses to register tools the current mode does not admit. In P0.2.1 there is only server_ping, which every mode admits per modes.md line 57, so no gating logic is exercised.

This reading aligns three of four specs (s17, middleware.md, boot.md) and defers the modes.md scope collision to a later task that can fix modes.md explicitly. T0 must sign off on this reading before the packet’s Step-4 implementation begins. If T0 directs the modes.md reading instead, the contract and packet must be amended to add capability-gating to stage 1.

The rest of this contract assumes the pure-mutex reading.


§3. createServer semantics

3a. Signature

export function createServer(options?: CreateServerOptions): ColibriServerContext;

export interface CreateServerOptions {
  readonly auditSink?: AuditSink;
  readonly transport?: Transport;      // Transport from @modelcontextprotocol/sdk
  readonly version?: string;           // Override; default = read from package.json
  readonly mode?: RuntimeMode;         // Override; default = detectMode(process.env)
  readonly nowMs?: () => number;       // Override; default = () => performance.now()
  readonly bootStartMs?: number;       // Override; default = nowMs() at construction
  readonly logger?: (...args: unknown[]) => void;  // Override; default = console.error
}

All options have defaults; passing {} or omitting the argument is valid. Options are the dependency-injection seams tests use to exercise the server without real stdio, real filesystem reads, or real time.

3b. Default wiring (invoked when options are absent or partial)

  1. versionreadPackageJson() — reads ../package.json relative to import.meta.url (resolved via path.resolve(path.dirname(fileURLToPath(import.meta.url)), '..', 'package.json')), parses, returns .version. Throws Error with message "Failed to read package.json version: <reason>" on failure. Called once at construction; result is cached on the context.
  2. modedetectMode(process.env) from src/modes.ts. Throws on AMS_MODE or any invalid COLIBRI_MODE value.
  3. auditSinkcreateNoOpAuditSink().
  4. transportnew StdioServerTransport() (uses default process.stdin / process.stdout).
  5. nowMs() => performance.now() from node:perf_hooks.
  6. bootStartMsnowMs() at the moment createServer is called. Ping uptime_ms is calculated as floor(nowMs() - bootStartMs).
  7. loggerconsole.error. Chosen because console.log writes to stdout which is owned by the SDK’s stdio transport (donor bug #3); stderr is safe for our messages.

3c. What createServer DOES NOT do

  • Does NOT call transport.connect(). That happens in start(). This split lets tests observe the constructed-but-not-connected state.
  • Does NOT call process.on(...). Global handlers are installed by start() (or by the main() entry). Tests that call createServer in isolation do NOT pollute Jest’s handlers.
  • Does NOT read process.env directly. All env reads go through config (from src/config.ts) or detectMode (from src/modes.ts).
  • Does NOT throw if auditSink is missing — the no-op default is used.

§4. registerColibriTool — the 5-stage middleware chain

4a. Signature

export function registerColibriTool<
  I extends z.ZodRawShape,
  O extends z.ZodRawShape,
>(
  ctx: ColibriServerContext,
  name: string,
  config: {
    readonly title?: string;
    readonly description?: string;
    readonly inputSchema: z.ZodObject<I>;
    readonly outputSchema?: z.ZodObject<O>;
  },
  handler: (
    args: z.infer<z.ZodObject<I>>,
  ) => Promise<unknown> | unknown,
): void;

Calls ctx.server.registerTool(name, sdkConfig, wrappedHandler) under the hood, where wrappedHandler is the composed 5-stage chain.

4b. The 5 stages (canonical order, matches s17 §4 + middleware.md)

Each stage is a thin async function. They compose as follows (pseudocode, not the implementation):

wrappedHandler = async (args) => {
  return await toolLock(name, async () => {
    const validated = schemaValidate(config.inputSchema, args);
    const correlationId = uuidV4();
    const enterTs = nowMs();
    let result, error;
    try {
      await auditEnter({ tool: name, args: validated, timestamp: enterTs, correlationId });
      result = await dispatch(handler, validated);
      return { ok: true, data: result };
    } catch (e) {
      error = e;
      throw e;
    } finally {
      await auditExit({
        tool: name,
        correlationId,
        durationMs: floor(nowMs() - enterTs),
        result,
        error,
      });
    }
  });
};

4c. Stage-by-stage contract

Stage 1 — tool-lock.

  • Input: tool name (string).
  • Behavior: per-tool mutex. The lock map lives on ColibriServerContext as a private Map<string, Promise<void>>. Acquiring the lock: await the current Promise for name (if any), then replace with a new Promise that resolves when the current call’s finally runs.
  • Output: runs the inner chain exactly once per tool-name, serialized.
  • Failure handling: the lock itself cannot fail. If the inner chain throws, the mutex still releases (via the outer finally).
  • In P0.2.1: does NOT consult mode. See §2 OPEN QUESTION.

Stage 2 — schema-validate.

  • Input: the Zod schema (from config.inputSchema), raw args (unknown).
  • Behavior: config.inputSchema.safeParse(args). On .success === false, throw SchemaValidationError (an internal class) with the Zod error tree attached. The outer envelope maps this to { ok: false, error: { code: 'INVALID_PARAMS', message, details: { issues } } } at response-serialization time. .success === true yields the typed .data object, which is passed to stage 3 and stage 4.
  • Failure handling: throw. The chain stops here; stage 3 (audit-enter) does NOT run. This matches middleware.md Stage 2: “a call that never validated never enters the decision trail.”

Stage 3 — audit-enter.

  • Input: ToolEnterEvent = { tool, args, timestamp, correlationId }.
  • Behavior: await ctx.auditSink.enter(event). In P0.2.1, the no-op sink does nothing; in P0.7, the ζ sink writes an audit_events row.
  • Failure handling: per middleware.md Stage 3 (“Audit insert failure is a hard stop”), if auditSink.enter() throws, the chain stops and the error propagates. Stages 4 and 5 do NOT run for this call. However, stage 5 (audit-exit) DOES run from the outer finally — see §4d for the ordering subtlety.

Stage 4 — dispatch.

  • Input: the typed args (from stage 2).
  • Behavior: await handler(args). The returned value is wrapped in { ok: true, data: <value> }.
  • Failure handling: exceptions propagate to the outer finally. Stage 5 still runs.

Stage 5 — audit-exit.

  • Input: ToolExitEvent = { tool, correlationId, durationMs, result, error }.
  • Behavior: await ctx.auditSink.exit(event). Runs unconditionally in the chain-level finally block, regardless of whether stage 3 or stage 4 threw.
  • Failure handling: if auditSink.exit() itself throws, the error is logged via ctx.logger but is NOT re-thrown (the ORIGINAL error from stage 3 or 4, if any, is preserved). This avoids double-fault patterns where a flaky sink hides the real handler error. Deviation from middleware.md: middleware.md says “exit-row insert failure is a hard stop”, but ALSO says “the handler’s own result is dropped in favour of the insert error”. In P0.2.1 with a no-op sink the question is moot; when P0.7 lands a real sink, the contract here may need re-negotiation. Flagged to T0 as a secondary open question (§2.2).

4d. Error propagation table

Stage that throws Outcome What the client sees
1 tool-lock Impossible (acquisition-only)
2 schema-validate Chain stops; stages 3-4 skipped; stage 5 runs (records the validation error) { ok: false, error: { code: 'INVALID_PARAMS', message, details: { issues } } }
3 audit-enter Chain stops; stage 4 skipped; stage 5 runs from outer finally (records the audit error) { ok: false, error: { code: 'AUDIT_ENTER_FAILED', message } }
4 dispatch (handler) Stage 5 runs from outer finally (records the handler error) { ok: false, error: { code: 'HANDLER_ERROR', message, details?: { stack in non-prod } } }
5 audit-exit Logged via ctx.logger; if stages 2-4 already produced an error, that error is what the client sees; otherwise the client sees a success envelope with a warning log on the server Original stage-2-4 error if any; otherwise success

All errors are mapped to JSON-RPC error codes per s17 §6 + docs/spec/s05-errors.md before returning. In P0.2.1 the mapping is simple (only HANDLER_ERROR and INVALID_PARAMS are reachable via server_ping); P0.3+ exercises the full table.

4e. Registration-time invariants

  • name MUST be a non-empty string matching /^[a-z_][a-z0-9_]*$/ (snake_case). The helper asserts this and throws Error('invalid tool name: <name>') otherwise.
  • config.inputSchema MUST be a z.ZodObject<any>. The helper’s type signature enforces this at compile time; at runtime a defensive instanceof z.ZodObject check throws Error('inputSchema must be a Zod object') for JavaScript callers (belt-and-braces).
  • Calling registerColibriTool with the same name twice throws Error('tool already registered: <name>'). This matches SDK behavior (the underlying McpServer.registerTool also throws on duplicate).

§5. AuditSink interface (the P0.7 seam)

export interface ToolEnterEvent {
  readonly tool: string;
  readonly args: unknown;        // post-validation, typed at the stage 2 boundary
  readonly timestamp: number;    // nowMs() at chain-enter; ms since process start or since epoch — see note
  readonly correlationId: string; // uuid v4
}

export interface ToolExitEvent {
  readonly tool: string;
  readonly correlationId: string;
  readonly durationMs: number;   // floor(nowMs() - enterTs)
  readonly result?: unknown;     // present on success
  readonly error?: Error;        // present on failure
}

export interface AuditSink {
  enter(event: ToolEnterEvent): Promise<void> | void;
  exit(event: ToolExitEvent): Promise<void> | void;
}

5a. Design rationale

  • enter and exit are called at stage 3 and stage 5 respectively.
  • Both methods return Promise<void> | void so a sync sink can omit the Promise; the chain always awaits, so sync sinks incur one microtask hop.
  • correlationId is generated by stage 3 (NOT stage 2 or stage 1) because the identifier should only exist for calls that validated. This matches middleware.md Stage 3 (“a call that never validated never enters the decision trail”). The correlationId is the join key that audit_exit uses to close the entry row.
  • timestamp semantics: performance.now() returns milliseconds since process start. This is fine for Phase 0 where the sink is process-local; P0.7 may need to switch to Date.now() for cross-session audit anchoring. The contract documents both options; the packet locks in performance.now() for P0.2.1 because it is monotonic and immune to system-clock skew. P0.7 will wrap both values into the audit row if needed.
  • args is UNKNOWN-typed, not a generic — the sink is tool-agnostic. The sink implementor (ζ) is responsible for stable-serializing and hashing.
  • error is typed Error (not unknown) so sinks can call .stack without a narrowing. JavaScript throw "string" hits stage 5 wrapped as new Error(String(value)) by the chain.

5b. createNoOpAuditSink()

export function createNoOpAuditSink(): AuditSink {
  return Object.freeze({
    enter(): void { /* no-op */ },
    exit(): void { /* no-op */ },
  });
}

Returns a module-scoped singleton frozen object. Multiple calls MAY return the same reference (implementation detail; tests MUST NOT assume referential identity across calls).

5c. T0 sign-off asked on

  • Field names (tool, args, timestamp, correlationId, durationMs, result, error) — any rename must propagate to P0.7.
  • Method names (enter, exit). Alternatives considered: onEnter/onExit (reject — MORE verbose); open/close (reject — overloaded with DB semantics).
  • Return type Promise<void> | void (alternative: always Promise<void>). Current choice lets sync sinks skip the async keyword; asymmetric async cost is negligible.

§6. start and main — boot sequence

6a. start(ctx) signature

export function start(ctx: ColibriServerContext): Promise<void>;

Returns when the transport is connected and the handshake has completed. Never resolves to a value.

6b. start() behavior

  1. Install global handlers (if not already installed):
    • process.on('unhandledRejection', (reason) => { ctx.logger('[colibri] unhandledRejection:', reason); process.exit(1); })
    • process.on('uncaughtException', (err) => { ctx.logger('[colibri] uncaughtException:', err); process.exit(1); })
    • Installation is idempotent (checks process.listenerCount('unhandledRejection') === 0 before installing) so tests that invoke start() multiple times do not pile up handlers.
  2. Log one line: ctx.logger('[colibri] starting in mode=', ctx.mode, 'version=', ctx.version).
  3. await Promise.race([ ctx.server.connect(ctx.transport), new Promise((_, reject) => setTimeout(() => reject(new Error('startup timeout exceeded')), config.COLIBRI_STARTUP_TIMEOUT_MS)) ]). On timeout, exit with code 75 (resource) per boot.md §”Startup Timeout”.
  4. Log one line: ctx.logger('[colibri] ready').

6c. stop(ctx) signature and behavior

export function stop(ctx: ColibriServerContext): Promise<void>;

Calls ctx.server.close(). No global-handler cleanup (Node’s own process shutdown drains them). Returns when the transport is closed. Tests use this for teardown.

6d. main() — the entry-point IIFE

Runs only when the module is executed directly (via node dist/server.js), not when imported:

if (import.meta.url === pathToFileURL(process.argv[1] ?? '').href) {
  await main();
}

async function main(): Promise<void> {
  const ctx = createServer();
  try {
    await start(ctx);
    // In P0.2.1 there are no domain handlers to load; start() is the whole lifecycle.
    // Process stays alive because stdio is open.
  } catch (err) {
    ctx.logger('[colibri] fatal:', err);
    process.exit(1);
  }
}

Not strictly tested at the unit level (entry-point IIFEs are notoriously hard to unit-test without spawnSync). §8d specifies the coverage boundary.


§7. server_ping tool (the one registered tool)

7a. Name

server_ping (snake_case, no slash). Matches ADR-004 + S17. The task-breakdown + task-prompt text server/ping is treated as heritage draft and overridden (audit §7b).

7b. Input schema

const pingInput = z.object({});

No parameters. Stage 2 parses {} on every call.

7c. Output schema

Two levels:

  • Handler return (the value handler() produces): { version: string; mode: RuntimeMode; uptime_ms: number }. Zod schema z.object({ version: z.string(), mode: z.enum(['FULL','READONLY','TEST','MINIMAL']), uptime_ms: z.number().int().nonnegative() }).
  • Wire envelope (what the client sees): { ok: true, data: <handler return> } per s17 §6.

Failure envelope (if the handler somehow fails — should be unreachable for server_ping in P0.2.1): { ok: false, error: { code: 'HANDLER_ERROR', message } }.

7d. Handler implementation

async (_args) => ({
  version: ctx.version,
  mode: ctx.mode,
  uptime_ms: Math.floor(ctx.nowMs() - ctx.bootStartMs),
});

No I/O, no async work, deterministic given ctx. nowMs is injected for test determinism.

7e. Response-time invariant

Per docs/reference/mcp-tools-phase-0.md (the server_ping row implicit in ADR-004 Phase-0 inventory): handler completes in under 100 ms end-to-end in FULL mode on a typical dev machine. Not asserted in tests (timing-sensitive), but the handler does no I/O so this is guaranteed by construction.


§8. Test contract (WHAT must be covered, not HOW)

The test file is src/__tests__/server.test.ts (per audit §2e). It MUST cover:

8a. createServer unit tests

  • T-1 createServer() with all defaults returns a context with version matching package.json#version.
  • T-2 createServer({ version: '9.9.9-test' }) overrides the version.
  • T-3 createServer({ mode: 'READONLY' }) overrides the mode (bypasses detectMode).
  • T-4 createServer() does NOT call transport.connect() (use an injected fake transport and assert no start() was called).
  • T-5 createServer() does NOT install unhandledRejection/uncaughtException handlers (count listeners before/after).
  • T-6 createServer() reads config without throwing given a valid env (spawns tsx subprocess OR uses the already-loaded config).
  • T-7 createServer({ auditSink: customSink }) wires the custom sink (observable by registering a tool and triggering enter/exit).

8b. registerColibriTool unit tests

  • T-8 registering server_ping succeeds; the tool is retrievable via ctx.server.isConnected() === false, ctx.server._registeredTools['server_ping'] (or equivalent SDK introspection).
  • T-9 registering the same name twice throws.
  • T-10 registering with an invalid name ('server-ping' with a dash) throws.
  • T-11 registering with a non-Zod-object inputSchema throws.

8c. 5-stage middleware tests (end-to-end, via the SDK’s in-memory transport)

The SDK ships an InMemoryTransport (in node_modules/@modelcontextprotocol/sdk/dist/esm/inMemory.js) — NOT StdioServerTransport. Tests pass a pair of linked InMemoryTransport instances via createServer({ transport: serverHalf }) and client.connect(clientHalf), then use the client to call tools. This exercises the full chain without real stdio.

  • T-12 calling server_ping returns { ok: true, data: { version, mode, uptime_ms } } with all three fields present and correctly typed.
  • T-13 uptime_ms is non-negative.
  • T-14 a custom AuditSink observes exactly one enter and one exit per successful call, with matching correlationId + tool='server_ping' + a positive durationMs.
  • T-15 a custom AuditSink sees exit.result is defined and exit.error is undefined on success.
  • T-16 calling a handler that throws causes exit.error to be the thrown Error and exit.result to be undefined.
  • T-17 calling server_ping twice concurrently (via Promise.all) serializes — the two enter events are observed in order (not interleaved with exit events in an incorrect pattern).
  • T-18 a handler with a deliberately-failing Zod schema (register a test-only tool with z.object({ required: z.string() }) and call with {}) yields { ok: false, error: { code: 'INVALID_PARAMS' } } and the enter/exit sink sees exactly one pair (exit records the error).

8d. Boot-sequence tests

  • T-19 start(ctx) with an injected stub transport resolves once transport.start() has been invoked.
  • T-20 start(ctx) rejects with a timeout error if transport.connect() never resolves (override COLIBRI_STARTUP_TIMEOUT_MS via a test-local createServer({ ... }) override OR by using a very short fake timeout).
  • T-21 start(ctx) installs unhandledRejection + uncaughtException listeners (assert count incremented by at least 1 for each).
  • T-22 start(ctx) is idempotent w.r.t. listener installation — calling start twice does not double-register.
  • T-23 stop(ctx) calls transport.close().

8e. Negative / regression tests

  • T-24 createServer({}) does NOT override process.stdout.write. Assertion: process.stdout.write === <captured-before>. (Donor bug #3.)
  • T-25 src/server.ts imports no HTTP/WebSocket SDK module. Assertion: grep the module source (or inspect import graph).
  • T-26 createServer() propagates a detectMode throw (e.g. AMS_MODE set) when called without an override — uses a tsx subprocess because src/config.ts eagerly reads env at module load.

8f. Coverage invariant

  • src/server.ts reaches 100% statement / function / line and ≥90% branch coverage.
  • src/config.ts + src/modes.ts coverage unchanged.

Total new tests target: 26 tests added. Combined with 15 (config) + 24 (modes) + 1 (smoke) = 66 tests total. Packet §2 finalizes the exact test names.


§9. Integration points — stability

After P0.2.1 lands, the following are the stable API:

  • createServer(options?) — signature frozen for Phase 0; adding optional fields to CreateServerOptions is non-breaking.
  • registerColibriTool(ctx, name, config, handler) — signature frozen for Phase 0; P0.3+ domain tools call this.
  • AuditSink — interface frozen for Phase 0; P0.7 implements it.

The following are internal and may change without notice:

  • The tool-lock Map shape (currently Map<string, Promise<void>>).
  • The 5-stage composition implementation (may factor out into src/middleware/*.ts per P0.2.4).
  • The exact wire format of JSON-RPC error codes (locked by s17 §6 eventually).

§10. Files touched

Exhaustive list (new files + modifications):

New files:

  • src/server.ts — the server bootstrap module. ~350-450 lines.
  • src/__tests__/server.test.ts — test file. ~500-650 lines.

Modified files:

  • src/index.ts — REMAIN as export {}; (no change), OR deleted if the contract decides to remove it. Contract decision: remain as placeholder, because package.json#main points at dist/server.js and src/index.ts serves no purpose after P0.2.1 — but deleting it risks confusing future contributors who expect an index. Leave for a dedicated cleanup task.

NOT modified in P0.2.1:

  • package.json — no new deps (SDK already in).
  • tsconfig.json — existing config suffices.
  • jest.config.ts — existing config suffices.
  • .eslintrc.json — existing override for .test.ts suffices.
  • .env.example — no new env vars in P0.2.1.
  • Any of src/config.ts, src/modes.ts — read only.

§11. Out-of-scope — EXPLICIT LIST

P0.2.1 does NOT ship any of these (deferred to cited downstream tasks):

  • SQLite DB open or migration (P0.2.2).
  • Two-phase startup splitting start() into transport + heavy-init (P0.2.3).
  • Health tool server_health — different task (P0.2.4 per task-breakdown, or P0.4.2 per modes.md).
  • server_info, server_shutdown tools (P0.2.4+ / γ lifecycle). R75 Wave H update: neither tool shipped. server_info was struck as a phantom; server_shutdown was deferred. Capability reporting is folded into server_health; process teardown is SIGTERM/SIGINT-level.
  • Middleware layer extraction into src/middleware/*.ts (P0.2.4).
  • Any β task tool (task_create, etc.) (P0.3).
  • Any ε skill tool (skill_list) (P0.6).
  • Any ζ thought tool (thought_record, etc.) (P0.7).
  • Any η merkle tool (P0.8).
  • Any ν integration (P0.9).
  • Real ζ AuditSink implementation (P0.7).
  • HTTP transport (NEVER in Phase 0 per s17 §2).
  • WebSocket transport (NEVER in Phase 0 per s17 §2).
  • Multi-model routing / δ (Phase 1.5 per ADR-005).
  • Capability-gating at tool-lock (pending T0 decision in §2 — either this task adds it or a later task does).
  • Rate limiting, ACL, circuit breaker, retry (per middleware.md appendix — earned by later concepts).

§12. Contract acceptance criteria (the checklist Step 5 verify cites)

  • §1 — exports exactly the 11 symbols listed, no more.
  • §2 — tool-lock semantics reading is pure-mutex (pending T0 sign-off).
  • §3 — createServer({}) succeeds with defaults; every option is injectable.
  • §4 — 5 stages run in canonical order; error propagation table holds.
  • §5 — AuditSink interface matches the exact shape in §5.
  • §6 — start() installs global handlers idempotently; stop() closes the transport.
  • §7 — server_ping returns the { ok: true, data: { version, mode, uptime_ms } } envelope.
  • §8 — all 26 test categories pass.
  • §8f — coverage invariant holds.
  • §10 — only src/server.ts + src/__tests__/server.test.ts are new; src/index.ts unchanged.
  • §11 — no out-of-scope code snuck in.
  • CI docs-check + build-test-lint on Node 20 is green.

§13. Risks (surfaced to the packet)

  • R-1 package.json resolution under ESM + ts-jest. path.resolve(fileURLToPath(import.meta.url), '..', '..') behaves differently at test time (ts-jest transforms .ts sources) vs run time (compiled dist/server.js). The packet §1 locks a resolution idiom that works in both.
  • R-2 SDK registerTool type ergonomics. The SDK’s registerTool<OutputArgs, InputArgs> uses ZodRawShapeCompat not ZodObject. The helper signature may need to unwrap ZodObject.shape at the boundary. Packet §1d specifies the exact adapter code.
  • R-3 In-memory transport availability. If @modelcontextprotocol/sdk/inMemory.js does not expose what we need, tests may fall back to calling the wrapped handlers directly (bypassing the SDK’s JSON-RPC layer). Packet §2 specifies the fallback.
  • R-4 Coverage threshold. Some error branches (e.g. Failed to read package.json) are hard to trigger without mocking fs. Packet §3 specifies the fs-injection strategy (either via CreateServerOptions.readPackageJson?: () => string or via jest.mock).
  • R-5 Global handlers & Jest. Jest installs its own handlers. If start() installs ours on top, a test-level crash is now handled by OUR listener which calls process.exit(1) — which kills the test runner. Packet §2d specifies that test-time start() invocations override via CreateServerOptions.installGlobalHandlers?: boolean (defaulted to true).

§14. Open questions deferred to the packet

  • Q-1 (§2): T0’s reading of tool-lock stage — pure mutex vs capability-gate?
  • Q-2 (§4c): if auditSink.exit() throws, do we re-throw or just log? Contract proposes log-and-continue; middleware.md says hard-stop. P0.2.1 ships log-and-continue (the no-op sink cannot throw anyway); P0.7 may revisit.
  • Q-3 (§6d): does main() get unit-test coverage? Contract defers to packet; likely answer is “covered via spawnSync tsx subprocess integration test” or “excluded via coverage pragma”.
  • Q-4 (§10): keep or delete src/index.ts? Contract says keep.
  • Q-5 (§8c): use the SDK’s InMemoryTransport or fake streams via PassThrough? Contract leaves this to the packet to decide based on what exactly the SDK exports.

P0.2.1 Step 2 Contract — 2026-04-16 — branch feature/p0-2-1-mcp-server. Audit: a7305e43. Gates Step 3 Packet.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.