P1.2.2 — κ DSL Parser — Execution Packet

Step 3 of the 5-step executor chain. Builds on audit + contract. Packet gates implementation: once it lands, Step 4 is a mechanical translation of this plan.

§P1. Plan at a glance

Deliver, in a single implementation commit (Step 4):

  1. src/domains/rules/parser.ts — one TypeScript file, ~450–500 lines including JSDoc, AST interfaces, Chevrotain parser class, AST-cap walker, and the parse() entry point.
  2. src/__tests__/domains/rules/parser.test.ts — one Jest file, ~480–520 lines. Target 35–50 cases across 7 fixture groups.

No new dependencies (chevrotain@11.0.3 already pinned by P1.2.1). No edits to any file outside the two listed above. No barrel index.ts for src/domains/rules/.

§P2. File layout (final)

src/domains/rules/
  parser.ts                     # NEW — only file in this task
src/__tests__/domains/rules/
  parser.test.ts                # NEW — colocated per repo convention

Existing files in src/domains/rules/ are untouched: bps-constants.ts, determinism.ts, integer-math.ts, lexer.ts.

§P3. parser.ts — structural sketch

The source file is organised into the following sections, in order:

§P3.1. Header JSDoc (~40 lines)

  • File role + Greek letter κ + Phase 1 Wave 3.
  • Links to audit, contract, packet, the concept doc, the extraction §1+§2, and the s11/s12 specs.
  • Note on the still-missing ADR-006-dsl-grammar (one-line pointer to audit §3).
  • “Pure module” declaration — no I/O, no side effects, never throws.
  • Cite R83.C lexer custom-pattern caveat (no parser regression).

§P3.2. Imports (~3 lines)

import { EmbeddedActionsParser, EOF } from 'chevrotain';
import type { IToken, ILexingResult, IRecognitionException } from 'chevrotain';
import { tokenize, Keywords, Operators, Delimiters, Literals } from './lexer.js';

§P3.3. Constants (~6 lines)

export const MAX_AST_NODES_PER_RULE = 10_000;
export const MAX_PARSE_ERRORS = 5;

§P3.4. Location helper (~15 lines)

A pure helper to build a Location from start + end tokens. Hardens against tokens with missing endLine/endColumn (Chevrotain guarantees them with positionTracking: 'full', but the lexer’s tokenize wraps the result, so we still null-coalesce defensively for the noUncheckedIndexedAccess strict-mode discipline).

function locOf(start: IToken, end: IToken = start): Location {
  return {
    startLine:   start.startLine   ?? 1,
    startColumn: start.startColumn ?? 1,
    endLine:     end.endLine       ?? end.startLine   ?? 1,
    endColumn:   end.endColumn     ?? end.startColumn ?? 1,
  };
}

§P3.5. AST node interfaces (~110 lines)

11 interfaces + 2 union types + 2 result types, exactly as specified in docs/contracts/p1-2-2-parser-contract.md §2.1–§2.2. No methods. Plain data.

§P3.6. String escape decoder (~25 lines)

The lexer’s StringLiteral retains escapes verbatim in image (e.g. "a\\nb" has image === '"a\\nb"', length 7). The parser decodes:

function decodeString(image: string): string {
  // image includes outer quotes; strip them.
  const inner = image.slice(1, -1);
  let out = '';
  let i = 0;
  while (i < inner.length) {
    const ch = inner[i]!;
    if (ch === '\\' && i + 1 < inner.length) {
      const next = inner[i + 1]!;
      switch (next) {
        case 'n': out += '\n'; break;
        case 't': out += '\t'; break;
        case 'r': out += '\r'; break;
        case '\\': out += '\\'; break;
        case '"': out += '"'; break;
        default:  out += next;       // fallback — preserve as-is
      }
      i += 2;
    } else {
      out += ch;
      i += 1;
    }
  }
  return out;
}

§P3.7. The KappaParser class (~250 lines)

A single Chevrotain EmbeddedActionsParser subclass. Each public/private method maps 1:1 to an EBNF production. this.RULE(...) returns the AST node directly (or undefined during error recovery — handled at parent level).

Class skeleton (RULE bodies elided here; see §P3.7.1 for the precedence spine):

class KappaParser extends EmbeddedActionsParser {
  constructor() {
    super(allTokensRefs, {
      recoveryEnabled: true,
      maxLookahead: 3,                     // sufficient for the grammar
    });
    this.performSelfAnalysis();
  }

  // top-level
  public ruleset = this.RULE('ruleset', (): RuleNode[] => { ... });

  // structural
  private rule         = this.RULE('rule',         (): RuleNode | undefined  => { ... });
  private guardBlock   = this.RULE('guardBlock',   (): GuardClause[]         => { ... });
  private guardClause  = this.RULE('guardClause',  (): GuardClause | undefined => { ... });
  private action       = this.RULE('action',       (): { kind: 'admit' | 'reject'; reason: string | null } => { ... });
  private effectBlock  = this.RULE('effectBlock',  (): EffectCall[]          => { ... });
  private effectCall   = this.RULE('effectCall',   (): EffectCall | undefined => { ... });

  // expression spine (precedence stratified)
  private expression     = this.RULE('expression',     (): Expression | undefined => this.SUBRULE(this.orExpr));
  private orExpr         = this.RULE('orExpr',         (): Expression | undefined => { ... });
  private andExpr        = this.RULE('andExpr',        (): Expression | undefined => { ... });
  private notExpr        = this.RULE('notExpr',        (): Expression | undefined => { ... });
  private comparison     = this.RULE('comparison',     (): Expression | undefined => { ... });
  private additive       = this.RULE('additive',       (): Expression | undefined => { ... });
  private multiplicative = this.RULE('multiplicative', (): Expression | undefined => { ... });
  private unary          = this.RULE('unary',          (): Expression | undefined => { ... });
  private primary        = this.RULE('primary',        (): Expression | undefined => { ... });

  // leaves
  private variable    = this.RULE('variable',    (): VarRef    | undefined => { ... });
  private funcCall    = this.RULE('funcCall',    (): FuncCall  | undefined => { ... });
  private intLiteral  = this.RULE('intLiteral',  (): IntLiteral             => { ... });
  private boolLiteral = this.RULE('boolLiteral', (): BoolLiteral            => { ... });
  private stringArg   = this.RULE('stringArg',   (): StringLiteral          => { ... });
}

Rationale for the public/private split: only ruleset is invoked from outside the class (by the parse entry point). Every other method is an internal sub-rule. Marking them private keeps the surface tight.

§P3.7.1. Precedence spine — the load-bearing detail

Each level builds a left-associative chain by iteration (MANY):

private orExpr = this.RULE('orExpr', () => {
  let left = this.SUBRULE(this.andExpr) as Expression;
  this.MANY(() => {
    const op = this.CONSUME(Keywords.Or);
    const right = this.SUBRULE2(this.andExpr) as Expression;
    left = {
      type: 'LogicalOp',
      op: 'or',
      operands: [left, right],
      location: spanLoc(left.location, right.location),
    } as LogicalOp;
  });
  return left;
});

private andExpr = this.RULE('andExpr', () => { /* same shape with Keywords.And */ });

private notExpr = this.RULE('notExpr', () => {
  let notTok: IToken | undefined;
  this.OPTION(() => { notTok = this.CONSUME(Keywords.Not); });
  const inner = this.SUBRULE(this.comparison) as Expression;
  if (notTok) {
    return {
      type: 'LogicalOp', op: 'not', operands: [inner],
      location: spanLoc(locOf(notTok), inner.location),
    } as LogicalOp;
  }
  return inner;
});

private comparison = this.RULE('comparison', () => {
  const left = this.SUBRULE(this.additive) as Expression;
  let op: BinaryOp['op'] | null = null;
  let opTok: IToken | null = null;
  this.OPTION(() => {
    opTok = this.OR([
      { ALT: () => { op = '=='; return this.CONSUME(Operators.Eq); } },
      { ALT: () => { op = '!='; return this.CONSUME(Operators.NotEq); } },
      { ALT: () => { op = '<='; return this.CONSUME(Operators.Lte); } },
      { ALT: () => { op = '>='; return this.CONSUME(Operators.Gte); } },
      { ALT: () => { op = '<';  return this.CONSUME(Operators.Lt); } },
      { ALT: () => { op = '>';  return this.CONSUME(Operators.Gt); } },
    ]);
  });
  if (op === null) return left;
  const right = this.SUBRULE(this.additive) as Expression;
  return {
    type: 'BinaryOp', op, left, right,
    location: spanLoc(left.location, right.location),
  } as BinaryOp;
});

// additive, multiplicative — same iterative shape, op set restricted accordingly.

private unary = this.RULE('unary', () => {
  let minusTok: IToken | undefined;
  this.OPTION(() => { minusTok = this.CONSUME(Operators.Minus); });
  const inner = this.SUBRULE(this.primary) as Expression;
  if (minusTok) {
    return {
      type: 'UnaryOp', op: '-', operand: inner,
      location: spanLoc(locOf(minusTok), inner.location),
    } as UnaryOp;
  }
  return inner;
});

§P3.7.2. Primary

Five alternatives. Variable / FuncCall disambiguated by LA(1) (Variable starts with Variable token; FuncCall starts with Identifier and the parser commits when the next token is LParen).

private primary = this.RULE('primary', () => {
  return this.OR([
    { ALT: () => this.SUBRULE(this.intLiteral)   as Expression },
    { ALT: () => this.SUBRULE(this.boolLiteral)  as Expression },
    { ALT: () => this.SUBRULE(this.variable)     as Expression },
    {
      // funcCall — gated on LA(2) === LParen so a bare identifier (which is invalid at this position) parses as funcCall and surfaces a clean error
      ALT: () => this.SUBRULE(this.funcCall)     as Expression,
    },
    {
      // parenthesised expression
      ALT: () => {
        this.CONSUME(Delimiters.LParen);
        const inner = this.SUBRULE(this.expression) as Expression;
        this.CONSUME(Delimiters.RParen);
        return inner;
      },
    },
  ]);
});

If multiple ALTs start with the same token type, Chevrotain raises a self-analysis error at construction time. Variable / FuncCall do not collide (different start tokens). IntLiteral / BoolLiteral / Variable / FuncCall / ( all have distinct start tokens (IntegerLiteral, True/False, Variable, Identifier, LParen). No GATE needed.

§P3.7.3. funcCall

private funcCall = this.RULE('funcCall', (): FuncCall | undefined => {
  const nameTok = this.CONSUME(Literals.Identifier);
  this.CONSUME(Delimiters.LParen);
  const args: Expression[] = [];
  this.OPTION(() => {
    const first = this.SUBRULE(this.expression);
    if (first) args.push(first);
    this.MANY(() => {
      this.CONSUME(Delimiters.Comma);
      const next = this.SUBRULE2(this.expression);
      if (next) args.push(next);
    });
  });
  const rparen = this.CONSUME(Delimiters.RParen);
  return {
    type: 'FuncCall',
    name: nameTok.image,
    args,
    location: locOf(nameTok, rparen),
  };
});

Note the difference from effectCall: effectCall accepts STRING as a top-level argument (per extraction §1 Arg = Expression | STRING); funcCall inside an expression does not. This separation keeps the AST clean — string literals only appear via EffectCall.args and GuardClause.reason.

§P3.7.4. ruleset top-level

public ruleset = this.RULE('ruleset', (): RuleNode[] => {
  const rules: RuleNode[] = [];
  this.MANY(() => {
    const r = this.SUBRULE(this.rule);
    if (r) rules.push(r);
  });
  return rules;
});

§P3.8. AST node counter (~25 lines)

Recursive walker. Pure function over the AST union.

function countNodes(node: AnyNode): number {
  let n = 1;
  switch (node.type) {
    case 'RuleNode':
      for (const g of node.guards)  n += countNodes(g);
      for (const e of node.effects) n += countNodes(e);
      break;
    case 'GuardClause':
      if (node.condition) n += countNodes(node.condition);
      break;
    case 'EffectCall':
    case 'FuncCall':
      for (const a of node.args) n += countNodes(a);
      break;
    case 'BinaryOp':
      n += countNodes(node.left) + countNodes(node.right);
      break;
    case 'UnaryOp':
      n += countNodes(node.operand);
      break;
    case 'LogicalOp':
      for (const o of node.operands) n += countNodes(o);
      break;
    case 'IntLiteral':
    case 'BoolLiteral':
    case 'StringLiteral':
    case 'VarRef':
      // leaves
      break;
  }
  return n;
}

§P3.9. The parse entry point (~50 lines)

const parserInstance = new KappaParser();

export function parse(input: string): ParseResult {
  const errors: ParseError[] = [];

  // 1. Tokenize
  const lex: ILexingResult = tokenize(input);
  for (const e of lex.errors) {
    errors.push({
      kind: 'lex',
      message: e.message,
      location: {
        startLine: e.line ?? 1,
        startColumn: e.column ?? 1,
        endLine: e.line ?? 1,
        endColumn: (e.column ?? 1) + Math.max(0, e.length - 1),
      },
    });
  }

  // 2. Parse
  parserInstance.input = lex.tokens;
  const cstOrAst = parserInstance.ruleset();
  const rules: RuleNode[] = (cstOrAst ?? []) as RuleNode[];

  // 3. Convert parser errors (truncate to first MAX_PARSE_ERRORS)
  let parseErrCount = 0;
  for (const e of parserInstance.errors) {
    if (parseErrCount >= MAX_PARSE_ERRORS) break;
    parseErrCount += 1;
    errors.push({
      kind: 'parse',
      message: e.message,
      location: e.token && e.token.tokenType !== EOF
        ? locOf(e.token)
        : null,
    });
  }

  // 4. AST-cap pass
  const accepted: RuleNode[] = [];
  for (const rule of rules) {
    const count = countNodes(rule);
    if (count > MAX_AST_NODES_PER_RULE) {
      errors.push({
        kind: 'ast-cap',
        message: `Rule '${rule.name}' exceeds maximum AST node count (${count} > ${MAX_AST_NODES_PER_RULE})`,
        location: rule.location,
      });
    } else {
      accepted.push(rule);
    }
  }

  return { ast: accepted, errors };
}

The parser instance is module-level (new KappaParser() runs once on first import — the lexer follows the same pattern). Reuse is safe because parserInstance.input = ... resets all internal Chevrotain state.

§P4. Test matrix (parser.test.ts)

Seven fixture groups (F1–F7) plus boundary cases. Each group’s tests are numbered for traceability.

F1 — AcceptCommitment golden fixture (~5 cases)

Translate the AcceptCommitment rule from docs/3-world/physics/laws/rule-engine.md §Worked rule into the extraction §1 block syntax. The rule:

rule AcceptCommitment {
  guards {
    event.type == "COMMITMENT_REQUEST"
      and event.status == "PENDING"
      and stake_available($event.actor) >= $event.amount
      and reputation_score($event.actor, "commissioning") >= 100
      -> admit
    else -> reject "no_match"
  }
  effects {
    state_transition($event.id, "PENDING", "ACCEPTED")
    stake_freeze($event.actor, $event.amount)
    obligation_assign($event.actor, $event.id, $event.deadline)
  }
}

Note: the concept-doc rule uses event.type, stake.available(), stake.freeze() — dot-prefixed identifiers. The κ DSL grammar (extraction §1) uses $variable.path for property access and bare identifiers for function calls. The fixture above translates to that style. Documenting this translation in the test docstring.

  • F1.1: parse the rule, assert errors === [] and ast.length === 1.
  • F1.2: assert ast[0].name === 'AcceptCommitment'.
  • F1.3: assert ast[0].guards.length === 2 (one main guard + one else).
  • F1.4: assert ast[0].guards[1].condition === null (the else clause).
  • F1.5: assert ast[0].effects.length === 3.

F2 — Operator precedence (~10 cases)

Per task spec: $a + $b * $c == 10 and not $dAnd(Eq(Add($a, Mul($b, $c)), 10), Not(Var(d))).

  • F2.1: precedence fixture from spec — assert top-level is LogicalOp{op:'and'} with two operands.
  • F2.2: assert left operand of and is BinaryOp{op:'==', left: BinaryOp{op:'+', left: VarRef[a], right: BinaryOp{op:'*', left: VarRef[b], right: VarRef[c]}}, right: IntLiteral 10n}.
  • F2.3: assert right operand of and is LogicalOp{op:'not', operands: [VarRef[d]]}.
  • F2.4: 1 + 2 + 3 is left-associative → Add(Add(1,2),3).
  • F2.5: 1 + 2 * 3Add(1, Mul(2,3)).
  • F2.6: (1 + 2) * 3Mul(Add(1,2), 3).
  • F2.7: not $a or not $bLogicalOp{op:'or', operands: [Not($a), Not($b)]}.
  • F2.8: $a < $b == true is a parse error (chained comparison).
  • F2.9: unary -5UnaryOp{op:'-', operand: IntLiteral 5n}.
  • F2.10: not not $aLogicalOp{op:'not', operands: [LogicalOp{op:'not', operands: [VarRef[a]]}]}.

F3 — Malformed input recovery (~7 cases)

  • F3.1: rule X { guards { -> admit } effects { } } — guard clause missing expression. Expect ≥ 1 parse error, no throw.
  • F3.2: rule { guards {} effects {} } — missing rule name. ≥ 1 parse error.
  • F3.3: rule X { guards {} effects {} extra } — trailing token. ≥ 1 parse error.
  • F3.4: ((((((( — unbalanced parens (not a rule, but exercises lexer + parser errors).
  • F3.5: many-error input — synthesize >5 parse errors, assert errors.filter(e => e.kind === 'parse').length === MAX_PARSE_ERRORS (= 5).
  • F3.6: lex error (3.14) inside otherwise-valid rule — assert one lex error in errors and the rule still parses minus the bad token.
  • F3.7: empty input → { ast: [], errors: [] }.

F4 — AST cap (~3 cases)

  • F4.1: a rule with 9999 guard nodes parses cleanly (just under the cap).
  • F4.2: a rule with > 10000 nodes (synthesized via deeply-nested ((...)) expression or many guard clauses) is omitted from ast and surfaces an ast-cap error.
  • F4.3: MAX_AST_NODES_PER_RULE === 10000 constant exposed correctly.

Synthesizer: generate a parens-nested expression like ((( ... 1 + 1 ... ))) to control node count exactly. Each + inside parens contributes 1 BinaryOp + 2 IntLiteral = 3 nodes; nesting wraps without adding nodes. Synthesizer documented in test as synthDeepRule(n).

F5 — Round-trip stability (~3 cases)

Per audit §10 / contract §I17 — P1.5.4 canonicalize is unwritten. Use parse(s) twice and assert deep equality.

  • F5.1: parse the AcceptCommitment fixture twice; assert JSON.stringify of ast is identical (after BigInt serialization helper).
  • F5.2: parse a precedence fixture twice; same.
  • F5.3: explicit // TODO(P1.5.4): replace with parse(canonicalize(parse(s))).ast === parse(s).ast comment in the test — left in code for the next agent.

F6 — Lexer-error pass-through (~3 cases)

  • F6.1: input 3.14 → one lex error (float rejected), no parse errors, empty ast.
  • F6.2: input 1_000 → one lex error (underscore int), no parse errors, empty ast.
  • F6.3: input @ → one lex error (unrecognised char), no parse errors, empty ast.

F7 — Boundary / edge cases (~7 cases)

  • F7.1: IntLiteral.value is biginttypeof === 'bigint'.
  • F7.2: variable with deep dot path: $a.b.c.d.eVarRef{path: ['a','b','c','d','e']}.
  • F7.3: else -> admit (no reason) → GuardClause{condition: null, action: 'admit', reason: null}.
  • F7.4: else -> reject "blocked"GuardClause{condition: null, action: 'reject', reason: 'blocked'}.
  • F7.5: multiple rules in one input → ast.length === 2 and names match.
  • F7.6: rule with empty guards {} and empty effects {} blocks parses cleanly (validator’s job to flag, not parser’s).
  • F7.7: rule named admissionRule (keyword prefix) — RuleNode.name === 'admissionRule'.

F8 — Pure-function / referential transparency (~2 cases)

  • F8.1: parse(s) called twice with equal s → deep-equal results.
  • F8.2: parse('valid') does not mutate any module-level state observable from outside (lightweight check — second call returns equivalent ast even after a separate parse(invalid) call interleaved).

Total: ~40 cases.

§P5. Implementation order (Step 4)

  1. Header JSDoc + imports + constants — ~5 minutes.
  2. Location helper + escape decoder — ~10 minutes.
  3. AST node interfaces + union types + ParseResult — ~15 minutes.
  4. KappaParser skeleton (constructor, performSelfAnalysis) — ~5 minutes.
  5. Precedence spine — expression down through primary — ~30 minutes.
  6. Leaves — intLiteral, boolLiteral, variable, funcCall, stringArg — ~20 minutes.
  7. Structural — rule, guardBlock, guardClause, action, effectBlock, effectCall, ruleset — ~30 minutes.
  8. AST node counter — ~10 minutes.
  9. parse() entry point — ~15 minutes.
  10. parser.test.ts — fixtures F1–F8 — ~90 minutes.
  11. Run npm run build && npm run lint && npm test; iterate until green — ~30 minutes.

Estimated total wall time: 4 hours.

§P6. Step-4 commit message (template)

feat(p1-2-2-parser): chevrotain parser + 11-node AST

Step 4 of 5 — implements P1.2.2 per the contract. Adds:

- src/domains/rules/parser.ts   — Chevrotain EmbeddedActionsParser, 11
  AST node types (RuleNode, GuardClause, EffectCall, BinaryOp, UnaryOp,
  LogicalOp, IntLiteral, BoolLiteral, StringLiteral, VarRef, FuncCall),
  parse() entry point, AST-cap walker.
- src/__tests__/domains/rules/parser.test.ts — ~40 Jest cases across 8
  fixture groups (AcceptCommitment golden, precedence, malformed input
  recovery, AST cap, round-trip stability, lexer error pass-through,
  boundary cases, referential transparency).

Grammar matches kappa-rule-engine-extraction.md §1 EBNF (the
authoritative superset). Operator precedence stratified across
orExpr/andExpr/notExpr/comparison/additive/multiplicative/unary/primary
productions. recoveryEnabled: true; first 5 parse errors reported.
AST cap = 10000 nodes per rule, enforced at parse time via post-parse
recursive walker. parse() never throws; all errors flow through
ParseResult.errors with kind: 'lex' | 'parse' | 'ast-cap'.

Unblocks P1.2.3 (validator), P1.2.4 (registry), P1.3.1 (evaluator),
P1.5.4 (canonical serializer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

§P7. Test gate (CLAUDE.md §5 — ALL THREE green)

cd .worktrees/claude/p1-2-2-parser
npm run build && npm run lint && npm test

Pre-existing test count on main: ~1409. Expected post-merge: ~1449 (+40 parser cases). Zero existing-test regressions.

Known-flake from memory (startup subprocess smoke) — rerun once if it fails; document if it fails twice.

§P8. Step-5 (verify) plan

docs/verification/p1-2-2-parser-verification.md will record:

  1. The exact build/lint/test command output (test counts, suite names).
  2. The 11 AST types verified at runtime via the type discriminant matrix.
  3. The precedence fixture (F2.1) AST in pretty-print form for visual inspection.
  4. The AST-cap test result with the synthesized rule node count.
  5. The drift items handled (ADR-006-dsl-grammar still missing — re-noted).
  6. PR / branch / commit metadata for the writeback record.

§P9. Risks (from audit §11) — mitigations baked into this packet

  • EmbeddedActions self-analysis warnings: the Chevrotain parser’s performSelfAnalysis() is called explicitly in the constructor; any ambiguity surfaces at module load time, not at parse time. Test F8.1 (referential transparency) doubles as a smoke check.
  • noUncheckedIndexedAccess strictness: every tokens[i] access in this packet uses [i]! after an explicit length check or OPTION guard. The AST-walker uses for…of over arrays, never indexed access.
  • BigInt overflow at parse time: BigInt(text) accepts arbitrary digit sequences and produces a bigint; overflow vs. MAX_INT64 is a downstream (P1.2.3 / P1.3.1) concern. Documented in test F7.1 narrative.
  • Variable token decoding: the lexer’s Variable matches \$[A-Za-z_][A-Za-z0-9_]*(?:\.[A-Za-z_][A-Za-z0-9_]*)*. The parser splits on . and drops the leading $. Test F7.2 covers a 5-segment path.

§P10. Sign-off

Audit + contract reviewed; this packet is internally consistent. Step 4 (implement) may begin once this packet is committed.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.