P1.5.4 — Canonical Serialization — Packet
Step 3 of the 5-step chain. This packet gates implementation. Approval here unlocks Step 4 (feat).
§1. Goal
Ship src/domains/rules/canonical.ts plus src/__tests__/domains/rules/canonical.test.ts that satisfy every invariant in the contract (docs/contracts/p1-5-4-canonical-contract.md) under the constraints from the audit (docs/audits/p1-5-4-canonical-audit.md).
§2. Implementation outline (src/domains/rules/canonical.ts)
§2.1 File structure
1. Header docblock — references audit, contract, spec, ADR-006
2. CanonicalSerializationError class
3. ESCAPE_TABLE — readonly map of low-codepoint escapes
4. encodeString(s: string): string — hand-rolled escaper
5. encodeAtom(value): string | null — handles primitives; returns null if not an atom
6. encodeValue(value, seen): string — main recursive walker
7. canonicalize(value): string — entry point, instantiates `seen` Set
8. byteLength(value): number — Buffer.byteLength wrapper
§2.2 Pseudocode for encodeValue
function encodeValue(value: unknown, seen: Set<object>, path: string): string {
// Atoms
if (value === null) return 'null';
if (value === undefined) throw new CanonicalSerializationError(
`undefined is not representable in canonical JSON at ${path}`);
const t = typeof value;
if (t === 'boolean') return value ? 'true' : 'false';
if (t === 'bigint') return (value as bigint).toString();
if (t === 'number') {
const n = value as number;
if (!Number.isFinite(n) || !Number.isInteger(n)) {
throw new CanonicalSerializationError(
`non-integer number ${n} at ${path}`);
}
return n.toString();
}
if (t === 'string') return encodeString(value as string);
if (t === 'symbol' || t === 'function') {
throw new CanonicalSerializationError(`${t} not representable at ${path}`);
}
// Composites
if (typeof value === 'object') {
if (seen.has(value)) {
throw new CanonicalSerializationError(`cycle detected at ${path}`);
}
seen.add(value);
try {
if (Array.isArray(value)) {
const parts: string[] = [];
for (let i = 0; i < value.length; i++) {
parts.push(encodeValue(value[i], seen, path + '/' + i));
}
return '[' + parts.join(',') + ']';
}
// Plain-object guard — reject Map, Set, Date, RegExp, Promise, class instances
const proto = Object.getPrototypeOf(value);
if (proto !== null && proto !== Object.prototype) {
throw new CanonicalSerializationError(
`non-plain object at ${path}: ${value.constructor?.name ?? 'unknown'}`);
}
const keys = Object.keys(value as object).sort(); // codepoint order
const parts: string[] = [];
for (const k of keys) {
const child = encodeValue(
(value as Record<string, unknown>)[k],
seen,
path + '/' + k,
);
parts.push(encodeString(k) + ':' + child);
}
return '{' + parts.join(',') + '}';
} finally {
seen.delete(value);
}
}
throw new CanonicalSerializationError(`unknown type ${t} at ${path}`);
}
§2.3 Pseudocode for encodeString
function encodeString(s: string): string {
let out = '"';
for (let i = 0; i < s.length; i++) {
const code = s.charCodeAt(i);
if (code === 0x22) out += '\\"';
else if (code === 0x5c) out += '\\\\';
else if (code === 0x08) out += '\\b';
else if (code === 0x09) out += '\\t';
else if (code === 0x0a) out += '\\n';
else if (code === 0x0c) out += '\\f';
else if (code === 0x0d) out += '\\r';
else if (code < 0x20) {
// \u00XX with lowercase hex, padded to 4 chars
const hex = code.toString(16);
out += '\\u' + '0'.repeat(4 - hex.length) + hex;
} else {
out += s[i];
}
}
out += '"';
return out;
}
Self-scan watch: No Math.*, no Date.*, no await, no async, no float literal. Hex padding uses string-repeat, not arithmetic.
§2.4 Per-call seen set
Created fresh inside canonicalize. The try/finally removes each object on the way back up so siblings don’t false-positive cycles.
§2.5 Object.getPrototypeOf plain-object check
Allowed prototypes: Object.prototype, null (for Object.create(null)). Anything else is rejected. (κ AST nodes are plain objects per parser.ts.)
§3. Test plan (src/__tests__/domains/rules/canonical.test.ts)
§3.1 Imports
import { canonicalize, byteLength, CanonicalSerializationError } from '../../../domains/rules/canonical.js';
import { parse } from '../../../domains/rules/parser.js';
§3.2 Test groups
Group 1 — Atoms (8 cases)
null→'null'true→'true',false→'false'0n→'0',13n→'13',-7n→'-7',9999999999999999999n→'9999999999999999999'0→'0',42→'42',-5→'-5'''→'""','abc'→'"abc"'
Group 2 — String escape (1 fixture, table-driven)
'hello\nworld'→'"hello\\nworld"''a\\b'→'"a\\\\b"''q\"q'→'"q\\"q"''\b\f\r\t'→'"\\b\\f\\r\\t"''\x00'→'"\\u0000"','\x1f'→'"\\u001f"'' '(space, 0x20) →'" "'(literal — not escaped)'/'→'"/"'(forward slash is not escaped)
Group 3 — Object key sort (5 cases)
{b:1, a:2}→'{"a":2,"b":1}'{c:3, a:1, b:2}→'{"a":1,"b":2,"c":3}'{}→'{}'{'10':1, '2':2}→'{"10":1,"2":2}'(string sort, not numeric){'B':1, 'a':1}→'{"B":1,"a":1}'(uppercase sorts before lowercase by codepoint)
Group 4 — Arrays (3 cases)
[]→'[]'[1, 2, 3]→'[1,2,3]'[{c:1}, {a:2}]→'[{"c":1},{"a":2}]'
Group 5 — Nested (2 cases)
{a:{b:[1, 2, {x:3, y:4}]}, c:[]}→'{"a":{"b":[1,2,{"x":3,"y":4}]},"c":[]}'- Real κ rule from extraction §1:
rule X { guards { $a > 0n -> admit } effects {} }parsed and canonicalized — assert known canonical bytes.
Group 6 — Idempotence (round-trip)
- For DSL fixture: parse twice, canonicalize each, assert equal bytes.
- Property: 100 generated random ASTs; canonicalize twice each; assert string-equal.
Group 7 — Locale independence
- Save current
process.env.LANG. Set to'tr_TR.UTF-8'. Run a fixture with keys['I', 'i'](Turkish dotless-i collation famously differs). Assert sort is codepoint-based regardless. Restore env.
Group 8 — Error model
undefined→ throwsCanonicalSerializationError() => {}→ throwsSymbol('x')→ throwsNaN→ throwsInfinity→ throws1.5→ throwsnew Map()→ throwsnew Date()→ throwsnew Set([1])→ throws- A self-referencing object → throws “cycle detected”
Group 9 — byteLength
byteLength('hello')===Buffer.byteLength('"hello"')=== 7- For an ASCII-only canonical form: byte length === string length.
§3.3 Property generator
A small recursive shape-generator using a deterministic LCG-like seeded RNG (no Math.random). Generates one of: int literal, bigint, bool, string (random ASCII), null, array of N children, object of M children. Depth-limited to 6, branching factor 1–4. Run 100 trials.
(Math.random would trip the determinism rule in test files? — tests are NOT scanned; they live under __tests__/. But a deterministic seed is still cleaner and reproducible.)
§3.4 Locale fixture
const original = process.env.LANG;
process.env.LANG = 'tr_TR.UTF-8';
try {
// Test that {İ:1, i:2, I:3} sorts in codepoint order regardless
// codepoint('I') = 0x49, codepoint('i') = 0x69, codepoint('İ') = 0x130
const obj = { 'İ': 1, 'i': 2, 'I': 3 };
expect(canonicalize(obj)).toBe('{"I":3,"i":2,"İ":1}');
} finally {
if (original === undefined) {
delete process.env.LANG;
} else {
process.env.LANG = original;
}
}
§4. Step ordering (commits)
| # | Action | Commit message |
|---|---|---|
| C1 | Already done | audit(p1-5-4-canonical): inventory surface |
| C2 | Already done | contract(p1-5-4-canonical): behavioral contract |
| C3 | This file | packet(p1-5-4-canonical): execution plan |
| C4 | canonical.ts + tests |
feat(p1-5-4-canonical): byte-identical json serialization |
| C5 | Verification doc | verify(p1-5-4-canonical): test evidence |
After C4 lands and tests pass, push the branch (early-push policy per round prompt; quota safety).
§5. Risk register
| Risk | Probability | Mitigation |
|---|---|---|
| ESLint disagrees with hand-rolled string escape (control-flow complexity) | Med | Use guard clauses + early returns; if complexity rule fires, refactor into a switch. |
noUncheckedIndexedAccess complains about s[i] returning string \| undefined |
Med | Use s.charAt(i) (always returns string) or assert non-null after length-bound check. |
Object.create(null) plain-object check accepts both Object.prototype and null proto |
Low | Explicit two-arm check. |
Self-scan flags Math.something in a comment |
Low | Comments are stripped before scan (stripComments in determinism.test.ts). Still, keep code free of forbidden tokens even in prose. |
ts-jest ESM .js imports |
Low | Match the existing parser.test.ts pattern with .js extensions. |
| Property test flake (no determinism) | Low | Use seeded LCG, not Math.random. |
1467 baseline test count assertion (none yet) |
None | No test file in repo asserts the count. Just confirm npm test is green and >= 1467. |
§6. Rollback plan
If implementation reveals a contract gap, return to Step 2 (amend contract) and re-issue this packet. Do not skip back-fill.
§7. Acceptance gates (recap)
- All three:
npm run build && npm run lint && npm testgreen - Determinism corpus self-scan stays green
canonical.tscoverage ≥ 95% lines, 90% branches- No regression on existing 1467-test baseline
Ready to begin Step 4 implementation.