P0.6.2 Skills-Loader ESM Deadlock — Contract
Task: 109b31c3-6419-4019-a14a-2fb1fb26f355
Audit: docs/audits/p0-6-2-skills-deadlock-audit.md
ζ thought (analysis): captured after audit; chained off 4b95bb8b…cec3117 (plan).
1. Scope
This contract governs the patch that closes the deadlock surfaced by the audit. In scope: changes to src/server.ts (script-invocation site) and one new regression test. Out of scope: restructuring the server.ts ↔ repository.ts static cycle; deletion of startup.ts’s dynamic-import fallback; refactoring of any other domain’s tool-registration shape; doc-loop or skill-pack changes.
2. Invariants the fix MUST hold
I-1. Production-test parity for the skill-load code path
The production startup path (script invocation of dist/server.js) MUST execute the same loadSkillsFromDisk function the test suite already exercises. No production-only branch may stand between the script-invocation site and the loader function.
Why: the deadlock survived two months because production took a different code path (dynamic import) than tests (option injection). Re-introducing any production-only branch reopens that gap.
I-2. No ESM dynamic imports during script-invocation top-level await
startup() is called from a top-level await at server.ts:606. While that await is suspended, import.meta’s module evaluation is still in progress. No await import(...) inside the call graph reachable from that await is permitted to target a module that participates in the server.ts ↔ <domain>/repository.ts static cycle.
The fix MAY leave dynamic imports of side-cycle modules (e.g. modules with no static back-edge to server.ts) — those resolve from cache without re-entering the suspended loader.
I-3. The options.loadSkillsFn seam stays first-class
startup.ts:111 (options.loadSkillsFn) MUST remain the canonical injection point — for tests AND for the production script-invocation site. Tests already use it; the fix repurposes it for production. Removing or renaming the seam breaks both.
I-4. The dynamic-import fallback at startup.ts:298 MAY remain as defense-in-depth
The fallback (?? (await import(...)).loadSkillsFromDisk) becomes unreachable from the script-invocation site once the fix lands. It MAY be left in place to support hypothetical future callers of startup() that don’t pre-resolve the loader — at the cost of one dead branch in coverage. The fix MUST NOT depend on the fallback for production correctness.
I-5. No new public API surface
The fix adds zero new exported names from src/server.ts, src/startup.ts, src/domains/skills/repository.ts. It does not add new env vars, new MCP tools, new config keys, new error codes. The options.loadSkillsFn shape (signature, error handling) is unchanged.
I-6. Idempotent on repeat boots
Running the fixed dist/server.js twice against the same data/colibri.db MUST produce the same skill_count both times (no duplicate inserts; loadSkillsFromDisk already handles upsert + prune semantics — the fix MUST NOT alter that).
3. Acceptance criteria
A correct patch satisfies ALL of:
- A-1.
tscclean against the patched source. - A-2. ESLint clean (
npm run lint). - A-3. Full Jest suite passes —
npm test. No new failures, no regressions. -
A-4. Standalone smoke test from repo root:
COLIBRI_DB_PATH=... COLIBRI_MODE=FULL NODE_ENV=production \ timeout 5 node dist/server.js < /dev/null 2>&1 | grep -E '\[Startup\] Skills:'prints
[Startup] Skills: loaded=23, skipped=0, pruned=0(or whatever count matches the on-disk corpus at the time) before timeout. Currently this line never appears. - A-5. Live MCP
mcp__colibri__server_healthreturnsskill_count > 0after a Claude Desktop relaunch with the patcheddist/. - A-6. Live MCP
mcp__colibri__skill_listreturnstotal_count > 0and a non-emptyskills[]array. - A-7. New regression test
src/__tests__/scripts/script-invocation-skills-load.test.ts(name negotiable) FAILS on the pre-fix code and PASSES on the fixed code. The test MUST exercise the production injection path — i.e. it MUST verify that the script-invocation site forwards a real (not undefined)loadSkillsFnto startup. A test that injectsloadSkillsFnitself via the seam does NOT satisfy this — that’s the same masking that let the bug ship.
4. Non-acceptance — these do NOT count
- N-1. A test that simply asserts
loadSkillsFromDiskworks in isolation. Such a test would have passed before the fix; it doesn’t characterize the bug. - N-2. A patch that removes
startup.ts:298’s dynamic import without forwarding via options. That would break the test-seam contract (I-3) — tests inject mocks viaoptions.loadSkillsFnand expect the dynamic-import branch to never fire when the option is set, which is already the case; the issue is the production path, not the option-set path. - N-3. A patch that adds a new env var like
COLIBRI_PRELOAD_SKILLS=1. Adds public API surface without need (violates I-5) and re-introduces a production-only branch (violates I-1). - N-4. A patch that deletes or rewrites
src/domains/skills/repository.ts:36’s import ofregisterColibriTool. That breaks the static cycle by a different vector — out of scope and would require parallel changes to all 6 domain modules.
5. Risk register for the implementation
| Risk | Severity | Mitigation |
|---|---|---|
Patch regresses Jest test that asserts await import fires when options.loadSkillsFn === undefined |
Medium | Inspect existing startup-suite tests; if such an assertion exists, update its mocking story to align with the fix. |
| Regression test asserts on log strings, which are flaky / timing-sensitive | Low | Test asserts on the script-invocation site’s static reference to loadSkillsFromDisk, not on logger output. |
| Future contributor reintroduces dynamic import without realizing the deadlock | Medium | Add an inline comment at server.ts:606 next to the loadSkillsFn: argument explaining why static resolution at this site is load-bearing. Cite this contract by path. |
Live deploy: replacing dist/ while Claude Desktop holds it open on Windows |
Low | dist/server.js is loaded by node at process start, then the file handle is released. Tray-quit + relaunch ensures clean swap. Contract verification step covers this. |
6. Goes to Packet step
The packet defines: which exact lines change, in which order, plus the regression-test shape. Implementation begins after the packet is approved.