P1.5.4 — Canonical Serialization — Packet

Step 3 of the 5-step chain. This packet gates implementation. Approval here unlocks Step 4 (feat).

§1. Goal

Ship src/domains/rules/canonical.ts plus src/__tests__/domains/rules/canonical.test.ts that satisfy every invariant in the contract (docs/contracts/p1-5-4-canonical-contract.md) under the constraints from the audit (docs/audits/p1-5-4-canonical-audit.md).

§2. Implementation outline (src/domains/rules/canonical.ts)

§2.1 File structure

1. Header docblock — references audit, contract, spec, ADR-006
2. CanonicalSerializationError class
3. ESCAPE_TABLE — readonly map of low-codepoint escapes
4. encodeString(s: string): string — hand-rolled escaper
5. encodeAtom(value): string | null — handles primitives; returns null if not an atom
6. encodeValue(value, seen): string — main recursive walker
7. canonicalize(value): string — entry point, instantiates `seen` Set
8. byteLength(value): number — Buffer.byteLength wrapper

§2.2 Pseudocode for encodeValue

function encodeValue(value: unknown, seen: Set<object>, path: string): string {
  // Atoms
  if (value === null) return 'null';
  if (value === undefined) throw new CanonicalSerializationError(
    `undefined is not representable in canonical JSON at ${path}`);

  const t = typeof value;
  if (t === 'boolean') return value ? 'true' : 'false';
  if (t === 'bigint') return (value as bigint).toString();
  if (t === 'number') {
    const n = value as number;
    if (!Number.isFinite(n) || !Number.isInteger(n)) {
      throw new CanonicalSerializationError(
        `non-integer number ${n} at ${path}`);
    }
    return n.toString();
  }
  if (t === 'string') return encodeString(value as string);
  if (t === 'symbol' || t === 'function') {
    throw new CanonicalSerializationError(`${t} not representable at ${path}`);
  }

  // Composites
  if (typeof value === 'object') {
    if (seen.has(value)) {
      throw new CanonicalSerializationError(`cycle detected at ${path}`);
    }
    seen.add(value);
    try {
      if (Array.isArray(value)) {
        const parts: string[] = [];
        for (let i = 0; i < value.length; i++) {
          parts.push(encodeValue(value[i], seen, path + '/' + i));
        }
        return '[' + parts.join(',') + ']';
      }

      // Plain-object guard — reject Map, Set, Date, RegExp, Promise, class instances
      const proto = Object.getPrototypeOf(value);
      if (proto !== null && proto !== Object.prototype) {
        throw new CanonicalSerializationError(
          `non-plain object at ${path}: ${value.constructor?.name ?? 'unknown'}`);
      }

      const keys = Object.keys(value as object).sort(); // codepoint order
      const parts: string[] = [];
      for (const k of keys) {
        const child = encodeValue(
          (value as Record<string, unknown>)[k],
          seen,
          path + '/' + k,
        );
        parts.push(encodeString(k) + ':' + child);
      }
      return '{' + parts.join(',') + '}';
    } finally {
      seen.delete(value);
    }
  }

  throw new CanonicalSerializationError(`unknown type ${t} at ${path}`);
}

§2.3 Pseudocode for encodeString

function encodeString(s: string): string {
  let out = '"';
  for (let i = 0; i < s.length; i++) {
    const code = s.charCodeAt(i);
    if (code === 0x22) out += '\\"';
    else if (code === 0x5c) out += '\\\\';
    else if (code === 0x08) out += '\\b';
    else if (code === 0x09) out += '\\t';
    else if (code === 0x0a) out += '\\n';
    else if (code === 0x0c) out += '\\f';
    else if (code === 0x0d) out += '\\r';
    else if (code < 0x20) {
      // \u00XX with lowercase hex, padded to 4 chars
      const hex = code.toString(16);
      out += '\\u' + '0'.repeat(4 - hex.length) + hex;
    } else {
      out += s[i];
    }
  }
  out += '"';
  return out;
}

Self-scan watch: No Math.*, no Date.*, no await, no async, no float literal. Hex padding uses string-repeat, not arithmetic.

§2.4 Per-call seen set

Created fresh inside canonicalize. The try/finally removes each object on the way back up so siblings don’t false-positive cycles.

§2.5 Object.getPrototypeOf plain-object check

Allowed prototypes: Object.prototype, null (for Object.create(null)). Anything else is rejected. (κ AST nodes are plain objects per parser.ts.)

§3. Test plan (src/__tests__/domains/rules/canonical.test.ts)

§3.1 Imports

import { canonicalize, byteLength, CanonicalSerializationError } from '../../../domains/rules/canonical.js';
import { parse } from '../../../domains/rules/parser.js';

§3.2 Test groups

Group 1 — Atoms (8 cases)

  • null'null'
  • true'true', false'false'
  • 0n'0', 13n'13', -7n'-7', 9999999999999999999n'9999999999999999999'
  • 0'0', 42'42', -5'-5'
  • '''""', 'abc''"abc"'

Group 2 — String escape (1 fixture, table-driven)

  • 'hello\nworld''"hello\\nworld"'
  • 'a\\b''"a\\\\b"'
  • 'q\"q''"q\\"q"'
  • '\b\f\r\t''"\\b\\f\\r\\t"'
  • '\x00''"\\u0000"', '\x1f''"\\u001f"'
  • ' ' (space, 0x20) → '" "' (literal — not escaped)
  • '/''"/"' (forward slash is not escaped)

Group 3 — Object key sort (5 cases)

  • {b:1, a:2}'{"a":2,"b":1}'
  • {c:3, a:1, b:2}'{"a":1,"b":2,"c":3}'
  • {}'{}'
  • {'10':1, '2':2}'{"10":1,"2":2}' (string sort, not numeric)
  • {'B':1, 'a':1}'{"B":1,"a":1}' (uppercase sorts before lowercase by codepoint)

Group 4 — Arrays (3 cases)

  • []'[]'
  • [1, 2, 3]'[1,2,3]'
  • [{c:1}, {a:2}]'[{"c":1},{"a":2}]'

Group 5 — Nested (2 cases)

  • {a:{b:[1, 2, {x:3, y:4}]}, c:[]}'{"a":{"b":[1,2,{"x":3,"y":4}]},"c":[]}'
  • Real κ rule from extraction §1: rule X { guards { $a > 0n -> admit } effects {} } parsed and canonicalized — assert known canonical bytes.

Group 6 — Idempotence (round-trip)

  • For DSL fixture: parse twice, canonicalize each, assert equal bytes.
  • Property: 100 generated random ASTs; canonicalize twice each; assert string-equal.

Group 7 — Locale independence

  • Save current process.env.LANG. Set to 'tr_TR.UTF-8'. Run a fixture with keys ['I', 'i'] (Turkish dotless-i collation famously differs). Assert sort is codepoint-based regardless. Restore env.

Group 8 — Error model

  • undefined → throws CanonicalSerializationError
  • () => {} → throws
  • Symbol('x') → throws
  • NaN → throws
  • Infinity → throws
  • 1.5 → throws
  • new Map() → throws
  • new Date() → throws
  • new Set([1]) → throws
  • A self-referencing object → throws “cycle detected”

Group 9 — byteLength

  • byteLength('hello') === Buffer.byteLength('"hello"') === 7
  • For an ASCII-only canonical form: byte length === string length.

§3.3 Property generator

A small recursive shape-generator using a deterministic LCG-like seeded RNG (no Math.random). Generates one of: int literal, bigint, bool, string (random ASCII), null, array of N children, object of M children. Depth-limited to 6, branching factor 1–4. Run 100 trials.

(Math.random would trip the determinism rule in test files? — tests are NOT scanned; they live under __tests__/. But a deterministic seed is still cleaner and reproducible.)

§3.4 Locale fixture

const original = process.env.LANG;
process.env.LANG = 'tr_TR.UTF-8';
try {
  // Test that {İ:1, i:2, I:3} sorts in codepoint order regardless
  // codepoint('I') = 0x49, codepoint('i') = 0x69, codepoint('İ') = 0x130
  const obj = { 'İ': 1, 'i': 2, 'I': 3 };
  expect(canonicalize(obj)).toBe('{"I":3,"i":2,"İ":1}');
} finally {
  if (original === undefined) {
    delete process.env.LANG;
  } else {
    process.env.LANG = original;
  }
}

§4. Step ordering (commits)

# Action Commit message
C1 Already done audit(p1-5-4-canonical): inventory surface
C2 Already done contract(p1-5-4-canonical): behavioral contract
C3 This file packet(p1-5-4-canonical): execution plan
C4 canonical.ts + tests feat(p1-5-4-canonical): byte-identical json serialization
C5 Verification doc verify(p1-5-4-canonical): test evidence

After C4 lands and tests pass, push the branch (early-push policy per round prompt; quota safety).

§5. Risk register

Risk Probability Mitigation
ESLint disagrees with hand-rolled string escape (control-flow complexity) Med Use guard clauses + early returns; if complexity rule fires, refactor into a switch.
noUncheckedIndexedAccess complains about s[i] returning string \| undefined Med Use s.charAt(i) (always returns string) or assert non-null after length-bound check.
Object.create(null) plain-object check accepts both Object.prototype and null proto Low Explicit two-arm check.
Self-scan flags Math.something in a comment Low Comments are stripped before scan (stripComments in determinism.test.ts). Still, keep code free of forbidden tokens even in prose.
ts-jest ESM .js imports Low Match the existing parser.test.ts pattern with .js extensions.
Property test flake (no determinism) Low Use seeded LCG, not Math.random.
1467 baseline test count assertion (none yet) None No test file in repo asserts the count. Just confirm npm test is green and >= 1467.

§6. Rollback plan

If implementation reveals a contract gap, return to Step 2 (amend contract) and re-issue this packet. Do not skip back-fill.

§7. Acceptance gates (recap)

  • All three: npm run build && npm run lint && npm test green
  • Determinism corpus self-scan stays green
  • canonical.ts coverage ≥ 95% lines, 90% branches
  • No regression on existing 1467-test baseline

Ready to begin Step 4 implementation.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.