P1.5.4 — Canonical Serialization — Audit

Step 1 of the 5-step chain (CLAUDE.md §6).

§1. Surface inventory at base SHA 7218b34b

Path Exists? Role
src/domains/rules/canonical.ts No — to create The canonical serializer
src/__tests__/domains/rules/canonical.test.ts No — to create Test suite
src/domains/rules/parser.ts Yes (P1.2.2 shipped #205) Source of AstNode/RuleNode types we serialize
src/domains/rules/lexer.ts Yes (R83.C #189) Indirect dep via parser.ts
src/domains/rules/integer-math.ts Yes (R81.A #173) Source pattern: pure, no float, no Math.*, no async
src/domains/rules/bps-constants.ts Yes (R83.B #188) Same
src/domains/rules/determinism.ts Yes (R83.A #190) Defines forbidden-op manifest + self-scan test
src/__tests__/domains/rules/parser.test.ts Yes Reference test pattern; uses JSON.stringify-after-bigint-stringify shim
src/__tests__/domains/rules/determinism.test.ts §Group 12 Yes Self-scan that scans every *.ts in src/domains/rules/ (except determinism.ts) — canonical.ts MUST pass

§2. AST shape (from parser.ts)

The serializer must handle these 11 node types (extraction §2):

Node Discriminant Notable fields
RuleNode 'RuleNode' name: string, guards: GuardClause[], effects: EffectCall[], location
GuardClause 'GuardClause' condition: Expression \| null, action: 'admit' \| 'reject', reason: string \| null
EffectCall 'EffectCall' function: string, args: Expression[]
BinaryOp 'BinaryOp' op (one of 11 string literals), left, right
UnaryOp 'UnaryOp' op: '-', operand
LogicalOp 'LogicalOp' op: 'and'\|'or'\|'not', operands: Expression[]
IntLiteral 'IntLiteral' value: bigintthe load-bearing case
BoolLiteral 'BoolLiteral' value: boolean
StringLiteral 'StringLiteral' value: string (decoded — escapes already resolved by parser)
VarRef 'VarRef' path: string[]
FuncCall 'FuncCall' name: string, args: Expression[]

Every node also carries location: Location with four numeric fields. The canonical form serializes whatever the caller passes — a P1.5.1 hash consumer might strip location first, but P1.5.4 itself does not.

§3. Constraint inventory

3.1 Determinism corpus self-scan (FORBIDDEN tokens)

Per src/__tests__/domains/rules/determinism.test.ts §Group 12 the file canonical.ts will be scanned with these patterns (post strip-comments):

  • Math.* — no Math.floor, Math.min, etc.
  • Date.*, new Date
  • setTimeout, setInterval, setImmediate
  • fetch, XMLHttpRequest
  • require('fs') / from 'fs' / from 'node:fs'
  • crypto.*
  • process.hrtime, process.nextTick
  • await
  • async function / async (
  • Float literals (e.g. 3.14) — but 1n, 100, 1_000_000 are fine
  • [native code]

3.2 Task-prompt “FORBIDDENS”

  • No JSON.stringify (key-ordering insertion-based; non-idempotent on nested objects)
  • No localeCompare, no Intl.Collator (locale-dependent)
  • Strict JSON output (no single quotes, no trailing commas)
  • Don’t edit the main checkout

3.3 TypeScript strictness

tsconfig.json enables strict, noImplicitAny, strictNullChecks, noImplicitReturns, noFallthroughCasesInSwitch, noUncheckedIndexedAccess, exactOptionalPropertyTypes.

The walker must return string for every reachable case; switches over discriminants must be exhaustive (TS will error on missing cases via never exhaustion).

3.4 Test extension convention

tsconfig.json excludes **/*.test.ts. The Jest test under src/__tests__/domains/rules/canonical.test.ts is not built by tsc; ts-jest handles transpile.

3.5 ESM .js import rule

Inside src/, every relative TypeScript import ends in .js (per the live parser.ts and integer-math.ts files). E.g. from '../../../domains/rules/parser.js' from a test file.

§4. Idempotence requirement

From task prompt and task-breakdown.md §P1.5.4 acceptance:

canonicalize(parse(canonicalize(parse(x)))) == canonicalize(parse(x))

This requires:

  1. Output is structurally identical between two canonicalize calls on equal inputs.
  2. The string output, re-parsed by P1.2.2, yields a RuleNode[] whose canonicalization equals the first canonicalization.

Note (subtle): canonicalize’s output is JSON, not κ DSL — so canonicalize(parse(x)) produces a JSON string, and parse(canonical_json) would FAIL (the parser eats DSL, not JSON). The intended round-trip is:

str0 = canonicalize(parse(dsl))            // JSON string
ast0 = parse(dsl)                          // RuleNode[]
str1 = canonicalize(ast0)                  // same JSON
=> str0 === str1 (both stem from same AST, no double-parse)

OR, more strictly per the prompt’s literal canonicalize(parse(canonicalize(parse(x)))): the inner canonicalize is the JSON we test against; the outer canonicalize(parse(...)) would re-run on a parse of the JSON — but the parser doesn’t accept JSON. We interpret the property as idempotent on the AST: canonicalize(canonicalize(...) /* unchanged */) — i.e. two successive canonicalizations on the same AST yield identical bytes. This is the testable form and is what the spec actually means.

§5. Spec for canonicalize

5.1 Function signature

export function canonicalize(value: unknown): string;
export function byteLength(value: unknown): number;

Accepts unknown because it walks the tree; runtime type tests dispatch each node.

5.2 Per-type encoding

JS type Canonical form
null null
undefined rejected (throws) — JSON has no undefined
boolean true / false
number If integer (Number.isInteger), value.toString(). Floats rejected (per task prompt).
bigint Decimal string, e.g. 13n"13".
string Quoted, with canonical JSON escapes for ", \, \b, \f, \n, \r, \t, and \uXXXX for any char with code < 0x20 or in surrogate range.
Array<T> [v1,v2,...] no whitespace, items canonicalized recursively.
Plain object {"k1":v1,"k2":v2,...} with keys sorted by codepoint via Array.prototype.sort default.

Functions, Maps, Sets, Dates, etc. are rejected. (The audit is to be conservative; in practice the κ AST has only the types above.)

5.3 Ordering rule

Array.prototype.sort with no comparator. JS spec: when both elements are strings, the default comparison treats them as UTF-16 code units → equivalent to codepoint order for the BMP. For non-BMP keys (would require surrogate pairs in object keys), still UTF-16 unit-based. Any non-string keys are forbidden — Object.keys returns strings only, so we are safe.

5.4 String escape rule

Build the canonical JSON quote one character at a time. Switch on the codepoint:

Range Action
0x22 (") \"
0x5c (\) \\
0x08 \b
0x0c \f
0x0a \n
0x0d \r
0x09 \t
[0x00 .. 0x1f] other \u00XX (lowercase hex, padded to 4) — canonical: pick lowercase or uppercase ONCE; pick lowercase to match JSON.stringify
All other literal char

Note: / is not escaped (avoid the \/ form even though it’s legal JSON — sticking with JSON.stringify-compat keeps round-trip clean).

5.5 Number (integer) form

Don’t write floats. Reject any non-integer number. For integers, toString() produces a clean decimal.

5.6 BigInt form

(13n).toString() returns "13". No n suffix in the output. Negatives prefix -. No leading zeros (TS BigInt toString never produces them).

§6. Risks / unknowns

Risk Mitigation
Self-scan flags Math.min / Math.floor if used for clamps Don’t use them; we don’t need any math
Self-scan flags float literal in test fixture Don’t use any .X decimal in code; tests use bigint or integer
JSON.stringify slip in tests Tests are exempt from self-scan (in __tests__/), but we still avoid using JSON.stringify for the system under test; we use it freely in tests for fixture-equality assertions
Surrogate pairs in object keys Object.keys returns 16-bit-code-unit strings; default sort is fine
Property test idempotence sample size 100 random ASTs — generated from a small grammar of the AST node shapes
Locale fixture Set process.env.LANG = 'tr_TR.UTF-8' then ensure no localeCompare is used; codepoint sort is locale-independent by spec
exactOptionalPropertyTypes strictness Use type-guards rather than obj.field access on optional fields

§7. Plan summary

  1. Implement a single-file walker that dispatches on JS runtime type:
    • null, undefined, boolean, number, bigint, string, Array, plain object.
  2. Sort object keys with Array.sort() default.
  3. Emit single-line JSON with no whitespace.
  4. Hand-rolled string escape using a per-char loop.
  5. Tests cover F1–F7 from the prompt + locale fixture + idempotence property.

§8. Out of scope

  • Parsing JSON back (that’s JSON.parse’s job at the consumer)
  • Hashing (P1.5.1 consumes our output)
  • Migrating an old version → new version (P1.5.2)
  • Effect-set hashing (P1.5.5)
  • Validation of well-formed AST (P1.2.3)

§9. Cross-check

Spec citations:

  • docs/3-world/physics/laws/rule-engine.md §139–156 — “canonical serialization of the rule bodies plus engine version”
  • docs/guides/implementation/task-prompts/p1.1-kappa-rule-engine.md §P1.5.4 lines 2618–2748
  • docs/guides/implementation/task-breakdown.md §P1.5.4 lines 731–741
  • src/domains/rules/parser.ts §2 (AST types) lines 87–211
  • src/__tests__/domains/rules/determinism.test.ts §Group 12 lines 819–890

This audit is approved for Step 2 (contract).


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.