Audit — debug-startup-smoke-flake

Step 1 of the 5-step executor chain. Root-causes the flaky test startup — subprocess smoke › tsx src/server.ts boots and logs [Startup] Phase 1 in src/__tests__/startup.test.ts.

Scope

A single Jest test case — src/__tests__/startup.test.ts:767 (describe 'startup — subprocess smoke') — that fails intermittently in full-suite runs (Jest default parallel --maxWorkers) but passes deterministically in isolation (npm test -- --testPathPattern=startup).

History

Source Quote
MEMORY.md (R75 Wave H carry-over) “Pre-existing startup — subprocess smoke flakiness under full-suite load — predates Wave H.”
MEMORY.md (R77) “all 4 R77 executors hit it once, always green on rerun.”
Round R83 (this round) “Every wave-1, wave-2, wave-3 executor in this round (9) hit it the same way.”

The flake fires intermittently — typically once per cold-cache full-suite run, never on rerun. Every prior round has worked around it by re-running the suite rather than fixing it.

The test

src/__tests__/startup.test.ts:766–806

describe('startup — subprocess smoke', () => {
  it('tsx src/server.ts boots and logs [Startup] Phase 1', () => {
    /* … env setup … */
    const result = spawnSync(
      process.execPath,
      ['--import', 'tsx', SERVER_MODULE_FS_PATH],
      {
        env: { /* … */ },
        encoding: 'utf8',
        timeout: 3000,                   // ← spawnSync hard timeout
        input: '',
      },
    );
    const stderr = result.stderr ?? '';
    expect(stderr).toMatch(/\[colibri\] starting/);
    expect(stderr).toMatch(/\[Startup\] Phase 1/);
  }, 15000);                              // ← Jest test timeout
});

Two timeouts coexist:

  1. Outer (Jest): 15000 ms.
  2. Inner (spawnSync): 3000 ms.

The inner is the binding constraint — when it fires, spawnSync SIGTERMs the child and returns whatever stderr was buffered up to that moment. If [Startup] Phase 1 had not yet been emitted, the second expect(...).toMatch fails.

What the subprocess does

process.execPath --import tsx <abs path to src/server.ts> triggers:

  1. Node spawn (cold process): ~50–150 ms on Windows.
  2. tsx loader registers itself as an ES-module loader hook: ~200–500 ms cold; ~50 ms warm in the same shell.
  3. TypeScript compilation of src/server.ts and its transitive imports — src/config.ts, src/startup.ts, src/db/*, src/domains/*, plus the MCP SDK + better-sqlite3 + zod etc. tsx uses esbuild but still parses every imported .ts. Cold: ~800–2500 ms.
  4. module evaluation: configures the MCP server, registers tools.
  5. isInvokedAsScript() IIFE at src/server.ts:596 dynamically imports ./startup.js.
  6. startup({ bootstrapFn, stopFn }) runs Phase 1 — emits [Startup] Phase 1: transport... (the regex target).

Steps 1–4 are pure cold-start cost. The test cannot observe [Startup] Phase 1 until step 6, which means the entire 1–6 chain must complete inside the 3000 ms spawnSync budget.

Cold-start measurement

Standalone bench (no Jest, no parallel workers, just node --import tsx src/server.ts < /dev/null with NODE_ENV=test + a unique COLIBRI_DB_PATH), measured from process spawn to the appearance of [Startup] Phase 1: transport... in stderr. 5 sequential runs on the host machine (Windows 10, Node 22.20.0, Opus 4.7 development sandbox):

Run Elapsed to [Startup] Phase 1
1 (cold) 4205 ms
2 3055 ms
3 1439 ms
4 1799 ms
5 1864 ms

Run 1 alone exceeds the 3000 ms test timeout by 40 %. Run 2 also exceeds it. Even a “warm” run hits 1.4–3 s, so any extra system load (other Jest workers compiling TS, spawning subprocesses, opening SQLite WAL files in parallel) reliably tips the test over.

10-run full-suite reproduction

After rm -rf node_modules/.cache (defensive cold-cache reset) and 10 sequential CI=true FORCE_COLOR=0 npm test runs:

Run Result
1 1357 passed
2 1 failed, 1356 passed
3 1357 passed
4 1357 passed
5 1357 passed
6 1357 passed
7 1357 passed
8 1357 passed
9 1357 passed
10 1357 passed

1/10 = 10 % failure rate observed in this audit pass. (An earlier 3-run cold pass also reproduced 1/3 — the failure rate fluctuates with first-run conditions.)

The single failed test is always the same one: startup — subprocess smoke › tsx src/server.ts boots and logs [Startup] Phase 1. The failure mode is always the same: expect(stderr).toMatch(/\[Startup\] Phase 1/) fails because the subprocess was SIGTERMed by spawnSync’s 3000 ms timeout before reaching that log line.

Failure-mode signature

Captured during the failing run (Jest output stripped of color escapes, lines collapsed):

FAIL src/__tests__/startup.test.ts (24.26 s)
  ● startup — subprocess smoke › tsx src/server.ts boots and logs [Startup] Phase 1
     803 |     // And now Phase 1 log from P0.2.3.
     804 |     expect(stderr).toMatch(/\[Startup\] Phase 1/);
Tests:       1 failed, 1356 passed, 1357 total
Time:        38.425 s

Total wall time 38.425 s — significantly higher than typical warm runs (~21–26 s). The first-run penalty across 7 ts-jest workers + the tsx cold-start in the smoke subprocess + Windows Defender real-time scanning the spawned node.exe adds up. The first expect(stderr).toMatch(/\[colibri\] starting/) apparently also fails when the subprocess gets SIGTERMed before any logs are emitted — but the second expect(...) is the more typical failure when the child was making progress but had not yet reached [Startup] Phase 1.

Why isolation does not reproduce

npm test -- --testPathPattern=startup: 1 worker, 1 test file, no contention. npm test: 7 workers (default Jest maxWorkers=cpu/2 - 1 ≈ 7 on the dev box), 26 test suites, all racing the same disk + CPU.

Cold-start cost of tsx + ts-node-style transformers + better-sqlite3 native bindings + MCP SDK lands well below 3 s in isolation. Under contention, it doesn’t.

Hypothesis (root cause)

The timeout: 3000 literal at src/__tests__/startup.test.ts:796 is too tight for the cold-start cost of a tsx-loaded TypeScript subprocess on Windows when the parent Jest run is paralleling 7 workers. The flake is purely a timing race; nothing in src/startup.ts or src/server.ts is broken. Cold-start measurement (4205 ms) on its own confirms the budget is undersized even without parallel load.

What is NOT broken

Per the task constraints, the audit re-checked these and found no production bug:

  • src/server.tsisInvokedAsScript() works correctly; the IIFE at :596–607 invokes startup() as designed. The first log line emitted is [Startup] Phase 1: transport... from startup.ts:225. No race exists inside the boot path; the race is between boot completion and the test’s externally-imposed deadline.
  • src/startup.ts — Phase 1 / Phase 2 ordering, signal handlers, __resetForTests are all hermetic. None of the 39 other tests in the same file flake.
  • src/db/index.ts — initDb runs in Phase 2, well after the log line the test asserts on. DB I/O is not on the critical path.

Conclusion: this is a test-side flake, not a production bug. Fix is in the test, not the production code.

Candidate fixes (evaluated)

The task prompt offered five candidates. Per-candidate evaluation:

  1. Increase the spawnSync timeoutdirect fix, minimum risk. A typed constant (e.g. STARTUP_SUBPROCESS_TIMEOUT_MS = 30_000) gives ample headroom for cold tsx + Windows AV scanning + 7-worker contention. Survives a 10× cold-start regression. Selected.
  2. Buffer stdout/stderr capture differently — irrelevant. spawnSync collects both streams in full when the child exits or is killed; the issue is the child being killed before writing the log, not the parent failing to read it.
  3. --runInBand for this suite via Jest config or annotation — heavy hammer. Forces the entire startup.test.ts file (40 tests) to run serially even in CI, slowing the pipeline for the sake of one test. Rejected unless the timeout fix proves insufficient.
  4. Eliminate the subprocess test — rejected by task constraints. “DO NOT remove subprocess testing from the test surface entirely.”
  5. Re-architect to await a stable signal[Startup] Phase 1: transport... is the stable signal; the issue is reaching it within budget, not detecting it. Rejected.

Selected: Candidate 1 (increase the timeout, cite the rationale in a comment, name it as a typed constant). Optional belt-and-suspenders: also widen the Jest test timeout from 15000 → 35000 ms so the inner timeout stays the binding constraint with margin.

Implementation hint forwarded to the packet

  • Replace timeout: 3000 with a typed constant STARTUP_SUBPROCESS_TIMEOUT_MS = 30_000.
  • Bump the Jest test timeout from 15000 to STARTUP_SUBPROCESS_TIMEOUT_MS + 5_000 = 35_000.
  • Add a comment naming the cold-start budget so future readers understand why 30 s.
  • Improve the diagnostic on failure: include the full stderr and the result.signal/result.error in the assertion failure so future flakes (e.g. if tsx cold-start one day exceeds 30 s) are easy to debug.

No production code changes. No other test changes.

References

  • src/__tests__/startup.test.ts:766–806 — the test under audit.
  • src/__tests__/startup.test.ts:48SERVER_MODULE_FS_PATH resolution.
  • src/server.ts:454[colibri] starting log line emission.
  • src/server.ts:596–607 — script-invocation IIFE.
  • src/startup.ts:225[Startup] Phase 1: transport... log line emission.
  • MEMORY.md (R75 Wave H, R77, R83) — historical sightings.

Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.