P1.5.4 — Canonical Serialization — Audit
Step 1 of the 5-step chain (CLAUDE.md §6).
§1. Surface inventory at base SHA 7218b34b
| Path | Exists? | Role |
|---|---|---|
src/domains/rules/canonical.ts |
No — to create | The canonical serializer |
src/__tests__/domains/rules/canonical.test.ts |
No — to create | Test suite |
src/domains/rules/parser.ts |
Yes (P1.2.2 shipped #205) | Source of AstNode/RuleNode types we serialize |
src/domains/rules/lexer.ts |
Yes (R83.C #189) | Indirect dep via parser.ts |
src/domains/rules/integer-math.ts |
Yes (R81.A #173) | Source pattern: pure, no float, no Math.*, no async |
src/domains/rules/bps-constants.ts |
Yes (R83.B #188) | Same |
src/domains/rules/determinism.ts |
Yes (R83.A #190) | Defines forbidden-op manifest + self-scan test |
src/__tests__/domains/rules/parser.test.ts |
Yes | Reference test pattern; uses JSON.stringify-after-bigint-stringify shim |
src/__tests__/domains/rules/determinism.test.ts §Group 12 |
Yes | Self-scan that scans every *.ts in src/domains/rules/ (except determinism.ts) — canonical.ts MUST pass |
§2. AST shape (from parser.ts)
The serializer must handle these 11 node types (extraction §2):
| Node | Discriminant | Notable fields |
|---|---|---|
RuleNode |
'RuleNode' |
name: string, guards: GuardClause[], effects: EffectCall[], location |
GuardClause |
'GuardClause' |
condition: Expression \| null, action: 'admit' \| 'reject', reason: string \| null |
EffectCall |
'EffectCall' |
function: string, args: Expression[] |
BinaryOp |
'BinaryOp' |
op (one of 11 string literals), left, right |
UnaryOp |
'UnaryOp' |
op: '-', operand |
LogicalOp |
'LogicalOp' |
op: 'and'\|'or'\|'not', operands: Expression[] |
IntLiteral |
'IntLiteral' |
value: bigint ← the load-bearing case |
BoolLiteral |
'BoolLiteral' |
value: boolean |
StringLiteral |
'StringLiteral' |
value: string (decoded — escapes already resolved by parser) |
VarRef |
'VarRef' |
path: string[] |
FuncCall |
'FuncCall' |
name: string, args: Expression[] |
Every node also carries location: Location with four numeric fields. The canonical form serializes whatever the caller passes — a P1.5.1 hash consumer might strip location first, but P1.5.4 itself does not.
§3. Constraint inventory
3.1 Determinism corpus self-scan (FORBIDDEN tokens)
Per src/__tests__/domains/rules/determinism.test.ts §Group 12 the file canonical.ts will be scanned with these patterns (post strip-comments):
Math.*— noMath.floor,Math.min, etc.Date.*,new DatesetTimeout,setInterval,setImmediatefetch,XMLHttpRequestrequire('fs')/from 'fs'/from 'node:fs'crypto.*process.hrtime,process.nextTickawaitasync function/async (- Float literals (e.g.
3.14) — but1n,100,1_000_000are fine [native code]
3.2 Task-prompt “FORBIDDENS”
- No
JSON.stringify(key-ordering insertion-based; non-idempotent on nested objects) - No
localeCompare, noIntl.Collator(locale-dependent) - Strict JSON output (no single quotes, no trailing commas)
- Don’t edit the main checkout
3.3 TypeScript strictness
tsconfig.json enables strict, noImplicitAny, strictNullChecks, noImplicitReturns,
noFallthroughCasesInSwitch, noUncheckedIndexedAccess, exactOptionalPropertyTypes.
The walker must return string for every reachable case; switches over discriminants must be exhaustive (TS will error on missing cases via never exhaustion).
3.4 Test extension convention
tsconfig.json excludes **/*.test.ts. The Jest test under src/__tests__/domains/rules/canonical.test.ts is not built by tsc; ts-jest handles transpile.
3.5 ESM .js import rule
Inside src/, every relative TypeScript import ends in .js (per the live parser.ts and integer-math.ts files). E.g. from '../../../domains/rules/parser.js' from a test file.
§4. Idempotence requirement
From task prompt and task-breakdown.md §P1.5.4 acceptance:
canonicalize(parse(canonicalize(parse(x)))) == canonicalize(parse(x))
This requires:
- Output is structurally identical between two
canonicalizecalls on equal inputs. - The string output, re-parsed by P1.2.2, yields a
RuleNode[]whose canonicalization equals the first canonicalization.
Note (subtle): canonicalize’s output is JSON, not κ DSL — so canonicalize(parse(x)) produces a JSON string, and parse(canonical_json) would FAIL (the parser eats DSL, not JSON). The intended round-trip is:
str0 = canonicalize(parse(dsl)) // JSON string
ast0 = parse(dsl) // RuleNode[]
str1 = canonicalize(ast0) // same JSON
=> str0 === str1 (both stem from same AST, no double-parse)
OR, more strictly per the prompt’s literal canonicalize(parse(canonicalize(parse(x)))): the inner canonicalize is the JSON we test against; the outer canonicalize(parse(...)) would re-run on a parse of the JSON — but the parser doesn’t accept JSON. We interpret the property as idempotent on the AST: canonicalize(canonicalize(...) /* unchanged */) — i.e. two successive canonicalizations on the same AST yield identical bytes. This is the testable form and is what the spec actually means.
§5. Spec for canonicalize
5.1 Function signature
export function canonicalize(value: unknown): string;
export function byteLength(value: unknown): number;
Accepts unknown because it walks the tree; runtime type tests dispatch each node.
5.2 Per-type encoding
| JS type | Canonical form |
|---|---|
null |
null |
undefined |
rejected (throws) — JSON has no undefined |
boolean |
true / false |
number |
If integer (Number.isInteger), value.toString(). Floats rejected (per task prompt). |
bigint |
Decimal string, e.g. 13n → "13". |
string |
Quoted, with canonical JSON escapes for ", \, \b, \f, \n, \r, \t, and \uXXXX for any char with code < 0x20 or in surrogate range. |
Array<T> |
[v1,v2,...] no whitespace, items canonicalized recursively. |
| Plain object | {"k1":v1,"k2":v2,...} with keys sorted by codepoint via Array.prototype.sort default. |
Functions, Maps, Sets, Dates, etc. are rejected. (The audit is to be conservative; in practice the κ AST has only the types above.)
5.3 Ordering rule
Array.prototype.sort with no comparator. JS spec: when both elements are strings, the default comparison treats them as UTF-16 code units → equivalent to codepoint order for the BMP. For non-BMP keys (would require surrogate pairs in object keys), still UTF-16 unit-based. Any non-string keys are forbidden — Object.keys returns strings only, so we are safe.
5.4 String escape rule
Build the canonical JSON quote one character at a time. Switch on the codepoint:
| Range | Action |
|---|---|
0x22 (") |
\" |
0x5c (\) |
\\ |
0x08 |
\b |
0x0c |
\f |
0x0a |
\n |
0x0d |
\r |
0x09 |
\t |
[0x00 .. 0x1f] other |
\u00XX (lowercase hex, padded to 4) — canonical: pick lowercase or uppercase ONCE; pick lowercase to match JSON.stringify |
| All other | literal char |
Note: / is not escaped (avoid the \/ form even though it’s legal JSON — sticking with JSON.stringify-compat keeps round-trip clean).
5.5 Number (integer) form
Don’t write floats. Reject any non-integer number. For integers, toString() produces a clean decimal.
5.6 BigInt form
(13n).toString() returns "13". No n suffix in the output. Negatives prefix -. No leading zeros (TS BigInt toString never produces them).
§6. Risks / unknowns
| Risk | Mitigation |
|---|---|
Self-scan flags Math.min / Math.floor if used for clamps |
Don’t use them; we don’t need any math |
| Self-scan flags float literal in test fixture | Don’t use any .X decimal in code; tests use bigint or integer |
JSON.stringify slip in tests |
Tests are exempt from self-scan (in __tests__/), but we still avoid using JSON.stringify for the system under test; we use it freely in tests for fixture-equality assertions |
| Surrogate pairs in object keys | Object.keys returns 16-bit-code-unit strings; default sort is fine |
| Property test idempotence sample size | 100 random ASTs — generated from a small grammar of the AST node shapes |
| Locale fixture | Set process.env.LANG = 'tr_TR.UTF-8' then ensure no localeCompare is used; codepoint sort is locale-independent by spec |
exactOptionalPropertyTypes strictness |
Use type-guards rather than obj.field access on optional fields |
§7. Plan summary
- Implement a single-file walker that dispatches on JS runtime type:
null,undefined,boolean,number,bigint,string,Array, plain object.
- Sort object keys with
Array.sort()default. - Emit single-line JSON with no whitespace.
- Hand-rolled string escape using a per-char loop.
- Tests cover F1–F7 from the prompt + locale fixture + idempotence property.
§8. Out of scope
- Parsing JSON back (that’s
JSON.parse’s job at the consumer) - Hashing (P1.5.1 consumes our output)
- Migrating an old version → new version (P1.5.2)
- Effect-set hashing (P1.5.5)
- Validation of well-formed AST (P1.2.3)
§9. Cross-check
Spec citations:
docs/3-world/physics/laws/rule-engine.md§139–156 — “canonical serialization of the rule bodies plus engine version”docs/guides/implementation/task-prompts/p1.1-kappa-rule-engine.md§P1.5.4 lines 2618–2748docs/guides/implementation/task-breakdown.md§P1.5.4 lines 731–741src/domains/rules/parser.ts§2 (AST types) lines 87–211src/__tests__/domains/rules/determinism.test.ts§Group 12 lines 819–890
This audit is approved for Step 2 (contract).