Audit — debug-startup-smoke-flake
Step 1 of the 5-step executor chain. Root-causes the flaky test
startup — subprocess smoke › tsx src/server.ts boots and logs [Startup] Phase 1insrc/__tests__/startup.test.ts.
Scope
A single Jest test case — src/__tests__/startup.test.ts:767 (describe 'startup — subprocess smoke') — that fails intermittently in full-suite runs (Jest default parallel --maxWorkers) but passes deterministically in isolation (npm test -- --testPathPattern=startup).
History
| Source | Quote |
|---|---|
MEMORY.md (R75 Wave H carry-over) |
“Pre-existing startup — subprocess smoke flakiness under full-suite load — predates Wave H.” |
MEMORY.md (R77) |
“all 4 R77 executors hit it once, always green on rerun.” |
| Round R83 (this round) | “Every wave-1, wave-2, wave-3 executor in this round (9) hit it the same way.” |
The flake fires intermittently — typically once per cold-cache full-suite run, never on rerun. Every prior round has worked around it by re-running the suite rather than fixing it.
The test
src/__tests__/startup.test.ts:766–806
describe('startup — subprocess smoke', () => {
it('tsx src/server.ts boots and logs [Startup] Phase 1', () => {
/* … env setup … */
const result = spawnSync(
process.execPath,
['--import', 'tsx', SERVER_MODULE_FS_PATH],
{
env: { /* … */ },
encoding: 'utf8',
timeout: 3000, // ← spawnSync hard timeout
input: '',
},
);
const stderr = result.stderr ?? '';
expect(stderr).toMatch(/\[colibri\] starting/);
expect(stderr).toMatch(/\[Startup\] Phase 1/);
}, 15000); // ← Jest test timeout
});
Two timeouts coexist:
- Outer (Jest):
15000ms. - Inner (spawnSync):
3000ms.
The inner is the binding constraint — when it fires, spawnSync SIGTERMs the child and returns whatever stderr was buffered up to that moment. If [Startup] Phase 1 had not yet been emitted, the second expect(...).toMatch fails.
What the subprocess does
process.execPath --import tsx <abs path to src/server.ts> triggers:
- Node spawn (cold process): ~50–150 ms on Windows.
- tsx loader registers itself as an ES-module loader hook: ~200–500 ms cold; ~50 ms warm in the same shell.
- TypeScript compilation of
src/server.tsand its transitive imports —src/config.ts,src/startup.ts,src/db/*,src/domains/*, plus the MCP SDK +better-sqlite3+zodetc. tsx uses esbuild but still parses every imported.ts. Cold: ~800–2500 ms. - module evaluation: configures the MCP server, registers tools.
isInvokedAsScript()IIFE atsrc/server.ts:596dynamically imports./startup.js.startup({ bootstrapFn, stopFn })runs Phase 1 — emits[Startup] Phase 1: transport...(the regex target).
Steps 1–4 are pure cold-start cost. The test cannot observe [Startup] Phase 1 until step 6, which means the entire 1–6 chain must complete inside the 3000 ms spawnSync budget.
Cold-start measurement
Standalone bench (no Jest, no parallel workers, just node --import tsx src/server.ts < /dev/null with NODE_ENV=test + a unique COLIBRI_DB_PATH), measured from process spawn to the appearance of [Startup] Phase 1: transport... in stderr. 5 sequential runs on the host machine (Windows 10, Node 22.20.0, Opus 4.7 development sandbox):
| Run | Elapsed to [Startup] Phase 1 |
|---|---|
| 1 (cold) | 4205 ms |
| 2 | 3055 ms |
| 3 | 1439 ms |
| 4 | 1799 ms |
| 5 | 1864 ms |
Run 1 alone exceeds the 3000 ms test timeout by 40 %. Run 2 also exceeds it. Even a “warm” run hits 1.4–3 s, so any extra system load (other Jest workers compiling TS, spawning subprocesses, opening SQLite WAL files in parallel) reliably tips the test over.
10-run full-suite reproduction
After rm -rf node_modules/.cache (defensive cold-cache reset) and 10 sequential CI=true FORCE_COLOR=0 npm test runs:
| Run | Result |
|---|---|
| 1 | 1357 passed |
| 2 | 1 failed, 1356 passed |
| 3 | 1357 passed |
| 4 | 1357 passed |
| 5 | 1357 passed |
| 6 | 1357 passed |
| 7 | 1357 passed |
| 8 | 1357 passed |
| 9 | 1357 passed |
| 10 | 1357 passed |
1/10 = 10 % failure rate observed in this audit pass. (An earlier 3-run cold pass also reproduced 1/3 — the failure rate fluctuates with first-run conditions.)
The single failed test is always the same one: startup — subprocess smoke › tsx src/server.ts boots and logs [Startup] Phase 1. The failure mode is always the same: expect(stderr).toMatch(/\[Startup\] Phase 1/) fails because the subprocess was SIGTERMed by spawnSync’s 3000 ms timeout before reaching that log line.
Failure-mode signature
Captured during the failing run (Jest output stripped of color escapes, lines collapsed):
FAIL src/__tests__/startup.test.ts (24.26 s)
● startup — subprocess smoke › tsx src/server.ts boots and logs [Startup] Phase 1
803 | // And now Phase 1 log from P0.2.3.
804 | expect(stderr).toMatch(/\[Startup\] Phase 1/);
Tests: 1 failed, 1356 passed, 1357 total
Time: 38.425 s
Total wall time 38.425 s — significantly higher than typical warm runs (~21–26 s). The first-run penalty across 7 ts-jest workers + the tsx cold-start in the smoke subprocess + Windows Defender real-time scanning the spawned node.exe adds up. The first expect(stderr).toMatch(/\[colibri\] starting/) apparently also fails when the subprocess gets SIGTERMed before any logs are emitted — but the second expect(...) is the more typical failure when the child was making progress but had not yet reached [Startup] Phase 1.
Why isolation does not reproduce
npm test -- --testPathPattern=startup: 1 worker, 1 test file, no contention.
npm test: 7 workers (default Jest maxWorkers=cpu/2 - 1 ≈ 7 on the dev box), 26 test suites, all racing the same disk + CPU.
Cold-start cost of tsx + ts-node-style transformers + better-sqlite3 native bindings + MCP SDK lands well below 3 s in isolation. Under contention, it doesn’t.
Hypothesis (root cause)
The timeout: 3000 literal at src/__tests__/startup.test.ts:796 is too tight for the cold-start cost of a tsx-loaded TypeScript subprocess on Windows when the parent Jest run is paralleling 7 workers. The flake is purely a timing race; nothing in src/startup.ts or src/server.ts is broken. Cold-start measurement (4205 ms) on its own confirms the budget is undersized even without parallel load.
What is NOT broken
Per the task constraints, the audit re-checked these and found no production bug:
src/server.ts—isInvokedAsScript()works correctly; the IIFE at:596–607invokesstartup()as designed. The first log line emitted is[Startup] Phase 1: transport...fromstartup.ts:225. No race exists inside the boot path; the race is between boot completion and the test’s externally-imposed deadline.src/startup.ts— Phase 1 / Phase 2 ordering, signal handlers,__resetForTestsare all hermetic. None of the 39 other tests in the same file flake.src/db/index.ts— initDb runs in Phase 2, well after the log line the test asserts on. DB I/O is not on the critical path.
Conclusion: this is a test-side flake, not a production bug. Fix is in the test, not the production code.
Candidate fixes (evaluated)
The task prompt offered five candidates. Per-candidate evaluation:
- Increase the spawnSync timeout — direct fix, minimum risk. A typed constant (e.g.
STARTUP_SUBPROCESS_TIMEOUT_MS = 30_000) gives ample headroom for cold tsx + Windows AV scanning + 7-worker contention. Survives a 10× cold-start regression. Selected. - Buffer stdout/stderr capture differently — irrelevant.
spawnSynccollects both streams in full when the child exits or is killed; the issue is the child being killed before writing the log, not the parent failing to read it. --runInBandfor this suite via Jest config or annotation — heavy hammer. Forces the entirestartup.test.tsfile (40 tests) to run serially even in CI, slowing the pipeline for the sake of one test. Rejected unless the timeout fix proves insufficient.- Eliminate the subprocess test — rejected by task constraints. “DO NOT remove subprocess testing from the test surface entirely.”
- Re-architect to await a stable signal —
[Startup] Phase 1: transport...is the stable signal; the issue is reaching it within budget, not detecting it. Rejected.
Selected: Candidate 1 (increase the timeout, cite the rationale in a comment, name it as a typed constant). Optional belt-and-suspenders: also widen the Jest test timeout from 15000 → 35000 ms so the inner timeout stays the binding constraint with margin.
Implementation hint forwarded to the packet
- Replace
timeout: 3000with a typed constantSTARTUP_SUBPROCESS_TIMEOUT_MS = 30_000. - Bump the Jest test timeout from
15000toSTARTUP_SUBPROCESS_TIMEOUT_MS + 5_000 = 35_000. - Add a comment naming the cold-start budget so future readers understand why 30 s.
- Improve the diagnostic on failure: include the full
stderrand theresult.signal/result.errorin the assertion failure so future flakes (e.g. iftsxcold-start one day exceeds 30 s) are easy to debug.
No production code changes. No other test changes.
References
src/__tests__/startup.test.ts:766–806— the test under audit.src/__tests__/startup.test.ts:48—SERVER_MODULE_FS_PATHresolution.src/server.ts:454—[colibri] startinglog line emission.src/server.ts:596–607— script-invocation IIFE.src/startup.ts:225—[Startup] Phase 1: transport...log line emission.MEMORY.md(R75 Wave H, R77, R83) — historical sightings.