P1.2.2 — κ DSL Parser — Audit

Step 1 of the 5-step executor chain (audit → contract → packet → implement → verify). Builds on the P1.2.1 lexer (src/domains/rules/lexer.ts, R83.C 686ede6b). Greenfield parser plus AST surface.

§1. Surface inventory

§1.1. Target files (greenfield for this task)

Path Exists at base? Purpose
src/domains/rules/parser.ts No Chevrotain EmbeddedActionsParser + AST node types
src/__tests__/domains/rules/parser.test.ts No Jest parser tests (see §1.3 layout reconciliation)

§1.2. Touched but not owned

Path Delta Purpose
src/domains/rules/ already exists with bps-constants.ts, determinism.ts, integer-math.ts, lexer.ts (4 files) Adding parser.ts as a peer — no edits to existing files.
package.json none chevrotain@11.0.3 already pinned in dependencies from P1.2.1 (E:\AMS\package.json) — no version change.
package-lock.json none No new dependencies.

§1.3. Test-file layout reconciliation

The task prompt places the test at src/domains/rules/__tests__/parser.test.ts. The shipped Phase 0 + Phase 1 convention is tests live under src/__tests__/domains/<name>/, confirmed by inspection at base SHA 6345ba7a:

  • src/__tests__/domains/rules/{bps-constants,determinism,integer-math,lexer}.test.ts (P1.1.1, P1.1.2, P1.1.3, P1.2.1)
  • src/__tests__/domains/{router,skills,tasks,proof,trail}/... (Phase 0 axes)

Jest testMatch in jest.config.ts picks both layouts up. To stay consistent with the in-repo κ tests already shipped (the lexer test was placed under src/__tests__/domains/rules/lexer.test.ts for the same reason — see docs/audits/r81-b-p1-2-1-lexer-audit.md §1.3), the parser test will live at:

src/__tests__/domains/rules/parser.test.ts

This is a convention reconciliation, not a spec deviation. The verification doc will re-cite.

§2. Authoritative grammar sources

The task prompt lists six pre-flight reads. One of them (docs/architecture/decisions/ADR-006-dsl-grammar.md) does not exist — see §3 drift finding. For authoritative grammar this parser relies on:

Source Path Weight
Heritage extraction, full EBNF docs/reference/extractions/kappa-rule-engine-extraction.md §1 Authoritative superset (per prompt)
Heritage extraction, AST shape docs/reference/extractions/kappa-rule-engine-extraction.md §2 Authoritative AST node list (11 types)
Concept doc, EBNF fragment docs/3-world/physics/laws/rule-engine.md §DSL grammar Narrower phrasing — concept uses guard: / effects: prefix syntax; extraction uses guards { } / effects { } block syntax. Extraction wins (per prompt).
Concept doc, worked rule docs/3-world/physics/laws/rule-engine.md §Worked rule (AcceptCommitment) Realistic fixture for parser tests. The body uses guard: style; the test will translate to guards {} block style to match the extraction grammar.
DSL spec docs/spec/s12-dsl.md Load-bearing, high-level.
Rule engine spec docs/spec/s11-rule-engine.md Load-bearing, semantic level.
Lexer source src/domains/rules/lexer.ts The token surface this parser binds to.

§3. Drift finding — ADR-006-dsl-grammar still missing

The task prompt asks the agent to read docs/architecture/decisions/ADR-006-dsl-grammar.md for Chevrotain ratification (it is also referenced from the concept doc at docs/3-world/physics/laws/rule-engine.md line 206). This ADR is not in the repo at base 6345ba7a.

  • Actual ADR-006 in repo: docs/architecture/decisions/ADR-006-executable-meaning.md — different subject.
  • Other ADRs present at base: ADR-001..009 (no dsl-grammar slot).
  • The R81.B audit (docs/audits/r81-b-p1-2-1-lexer-audit.md §3) raised this drift; the lexer was implemented using extraction §1 + s11/s12 as the authoritative grammar triad. Same approach taken here.

Scope of this task: note the drift again, do not write the ADR. The follow-up to ratify Chevrotain/grammar in an ADR remains a docs round candidate.

§4. Lexer / parser interface — what the parser binds to

Inspecting src/domains/rules/lexer.ts at base:

  • Module exports:
    • tokenize(input: string): ILexingResult — never throws.
    • allTokens: TokenType[] — the priority-ordered registry.
    • Bundles: Keywords, Operators, Delimiters, Literals, RejectedLiterals.
    • Re-exports: IToken, TokenType, ILexingResult, ILexingError.
  • Tokens the parser will reference (29 of 39 — non-error, non-whitespace):
    • Keywords (12 used by parser; 6 of the 18 are reserved for future κ but not in extraction §1 grammar):
      • Used: Rule, Guards, Effects, Else, And, Or, Not, True, False, Admit, Reject.
      • Reserved/unused at this task: When, Then, If, Admission, Transition, Consequence, Promotion. (See §6 — these will be used by the rule classifier in P1.2.4 / P1.3.1.)
    • Operators (12): Eq, NotEq, Lte, Gte, Lt, Gt, Plus, Minus, Mul, Div, Mod, Arrow.
    • Delimiters (5 of 7): LBrace, RBrace, LParen, RParen, Comma. (Colon, Dot are not used at the rule-level grammar but Dot is internal to Variable regex.)
    • Literals (4): Identifier, Variable, IntegerLiteral, StringLiteral.
  • Lexer caveats relevant to parser correctness:
    1. The R83.C identifier custom-pattern-function escape hatch (Chevrotain 11.0.3 regexp-to-ast does NOT support the Unicode u flag). The parser must not bypass this — it consumes IToken[] already produced; no regex re-engagement needed.
    2. The lexer rejects float literals and underscore-separated integers via positioned errors (FLOAT_REJECTED_MESSAGE, UNDERSCORE_INT_REJECTED_MESSAGE). The parser sees only well-typed tokens; it does NOT need to re-detect these.
    3. The lexer handles whitespace (Lexer.SKIPPED); the parser sees no whitespace tokens.
    4. The lexer’s Variable token’s image is the full $dot.path string — the parser splits on . to populate VarRef.path: string[].
    5. IntegerLiteral is unsigned; sign is parser-level via Unary rule.
    6. Each IToken carries startLine, startColumn, endLine, endColumn, startOffset, endOffset (lexer constructs with positionTracking: 'full'). The parser uses these to set location on AST nodes.

§5. AST node taxonomy (per extraction §2)

11 node types. Every node carries {type: string discriminant, location: {startLine, startColumn, endLine, endColumn}} plus type-specific fields. Plain data — no classes with behavior (forbidden per task §FORBIDDENS).

# Node type Fields (beyond type + location) Notes
1 RuleNode name: string, guards: GuardClause[], effects: EffectCall[] Top-level rule declaration.
2 GuardClause condition: Expression \| null (null = else), action: 'admit' \| 'reject', reason: string \| null (only set when action === 'reject') First-match-wins evaluation.
3 EffectCall function: string, args: Expression[] Side-effect invocation; semantics live downstream (P1.3.x).
4 BinaryOp op: '+' \| '-' \| '*' \| '/' \| '%' \| '==' \| '!=' \| '<' \| '>' \| '<=' \| '>=', left: Expression, right: Expression Arithmetic + comparison.
5 UnaryOp op: '-', operand: Expression Numeric negation only — not is LogicalOp.
6 LogicalOp op: 'and' \| 'or' \| 'not', operands: Expression[] (length 2 for and/or, 1 for not) Boolean logic.
7 IntLiteral value: bigint Integer constant. bigint to match P1.1.1 integer-math.ts and P1.1.3 bps-constants.ts invariants — the engine is bigint throughout for κ determinism. (Extraction §2 says int64; bigint is the JS-side carrier with the engine enforcing the int64 envelope at evaluation time.)
8 BoolLiteral value: boolean true or false.
9 StringLiteral value: string Decoded string (escapes resolved); image retained on the parent token only.
10 VarRef path: string[] E.g. $actor.reputationpath: ['actor', 'reputation'].
11 FuncCall name: string, args: Expression[] Built-in function invocation; semantics live in P1.3.1 evaluator.

Expression is a union of: BinaryOp | UnaryOp | LogicalOp | IntLiteral | BoolLiteral | StringLiteral | VarRef | FuncCall. (StringLiteral is in the union because the extraction’s Arg = Expression | STRING permits string args; it is not valid as a top-level expression in arithmetic / boolean position. The AST permits the type but the validator (P1.2.3) and evaluator (P1.3.1) reject misplaced strings.)

§6. Rule classification — Admission / StateTransition / Consequence / Promotion

The task prompt asks the parser to “parse 4 rule types: Admission, StateTransition, Consequence, Promotion”. The extraction §1 grammar’s Rule = "rule" IDENTIFIER "{" GuardBlock EffectBlock "}" does not carry type information at the syntax level; classification is a downstream concern. From rule-engine.md §Rule Execution Order, the four kinds are categories used by the registry / executor, not grammatical productions.

Decision for this task: the parser produces RuleNode instances; it does not classify them at parse time. Classification by name convention (e.g. prefix Admit*, State*, etc.) or by an explicit attribute (e.g. a kind keyword) is a P1.2.4 (registry) / P1.3.1 (engine) concern. The PR will document this explicitly so reviewers do not flag a missing classifier.

This is consistent with the lexer reserving Admission, Transition, Consequence, Promotion as keywords — they exist in the token stream for future use but the extraction §1 grammar does not consume them yet. They tokenize today; they bind to grammar productions later.

§7. AST cap (10,000 nodes per rule)

The task prompt requires rejection of any single rule with > 10,000 AST nodes at parse time. Two implementation choices:

Choice A — count during parsing, threading state through Chevrotain’s parser DSL. The task prompt §Common Gotchas explicitly cautions against this (“threading state through Chevrotain’s parser DSL mid-parse is brittle”).

Choice B — count after parsing with a recursive walker. The task prompt §Common Gotchas explicitly recommends this (“Walk the final tree with a simple recursive counter”).

Decision: Choice B — post-parse recursive walker countNodes(node: AnyNode): number, called by the public entry point after parseRuleset returns. Rules exceeding the cap produce a synthetic parse-error entry rather than throwing. The cap is exposed as an exported constant MAX_AST_NODES_PER_RULE = 10000 for tests + future ADR.

§8. Error recovery (5-error cap)

Chevrotain’s recoveryEnabled: true is the documented switch for non-fatal parse errors. The task prompt requires first 5 errors reported, doesn’t crash on malformed input. Chevrotain’s errors array on parse() already accumulates all encountered errors; the parser truncates to first 5.

Decision: the parse() entry point returns { ast: RuleNode[], errors: ParseError[] } where errors is the union of:

  • Lexer errors (passed through from tokenize).
  • Chevrotain parse errors (truncated to first 5).
  • AST-cap errors (one per offending rule).

If errors is non-empty, ast may be partial (rules that parsed cleanly still appear; rules that failed contribute nothing). This matches the spec’s “doesn’t crash on malformed input” requirement.

§9. Public API surface (committed by §contract)

The parser module exports — provisional, locked in docs/contracts/p1-2-2-parser-contract.md:

// AST union types — discriminated by `type`
export type Expression =
  | BinaryOp | UnaryOp | LogicalOp
  | IntLiteral | BoolLiteral | StringLiteral
  | VarRef | FuncCall;

export interface Location { startLine: number; startColumn: number; endLine: number; endColumn: number; }
export interface RuleNode { type: 'RuleNode'; location: Location; name: string; guards: GuardClause[]; effects: EffectCall[]; }
// ... 10 more interfaces (one per AST node)

export interface ParseError {
  kind: 'lex' | 'parse' | 'ast-cap';
  message: string;
  location: Location | null;     // null only for non-positioned errors
}

export interface ParseResult {
  ast: RuleNode[];
  errors: ParseError[];
}

export const MAX_AST_NODES_PER_RULE: number;
export const MAX_PARSE_ERRORS: number;            // 5

export function parse(input: string): ParseResult;

No classes. Interfaces only. Plain data. Pure function.

§10. Non-goals

This task explicitly excludes:

  • AST validator — semantic checks (forbidden ops in expressions, type coherence, function arity). That is P1.2.3.
  • Rule registry / loader — keying rules by name, looking up by registry id. That is P1.2.4.
  • Evaluator / interpreter — executing the AST against a context. That is P1.3.1.
  • Canonical serialization — pretty-printing AST back to DSL text. That is P1.5.4. The round-trip test (Fixture F5) leaves a TODO(P1.5.4) comment where the canonical-serialize call would go; the test asserts parse(s) is structurally stable when re-parsed (i.e. parse twice and assert equal — a weaker but locally testable property).
  • Rule classification by kind (Admission / StateTransition / Consequence / Promotion) — see §6.
  • A new ADR — see §3.
  • Mutating any existing file outside src/domains/rules/parser.ts, src/__tests__/domains/rules/parser.test.ts, and the three docs (audit, contract, packet, verification).
  • Performance SLOs — none gated; informational only.

§11. Risk register

Risk Mitigation
Chevrotain LL(k) left-recursion gotcha EBNF in extraction §1 already iterative ({ ... } repetition for binary chains); maps to MANY rules in Chevrotain. No left recursion in design.
EmbeddedActionsParser vs CstParser choice Pick EmbeddedActionsParser per task spec. Chevrotain warns about EmbeddedActions in self-analysis; mitigate via recoveryEnabled: true plus careful RULE definitions returning AST nodes directly.
Operator precedence collapse Stratified grammar productions (OrExpr → AndExpr → NotExpr → Comparison → Additive → Multiplicative → Unary → Primary) per extraction §1; no precedence-table hack.
AST cap counting brittleness Post-parse walker (Choice B in §7).
BigInt overflow in IntLiteral parsing Use BigInt(text) directly; if overflow at parser time is a concern (it is, for MAX_INT64-exceeding literals), defer to P1.2.3 validator. The parser stores the bigint as-is.
Round-trip property without P1.5.4 canonicalize Fixture F5 uses parse(s) twice and asserts structural equality (a weaker invariant). Comment marks the upgrade target as P1.5.4.
Cross-worktree leak (memory mentions persistent issue at Wave C) Strict scope discipline — only the 5 files this task owns are edited. git status checked at every commit.
Lexer keywords Admission/Transition/Consequence/Promotion unused Documented as reserved per §6. Tests exercise some of them via prefix-of-identifier (admissionRule is a valid Identifier — Chevrotain’s longer_alt).
noUncheckedIndexedAccess in tsconfig Care needed when accessing tokens[i]! — every parser-internal access is [i]! or guarded; AST-walker must check children for undefined before recursing.

§12. Estimated implementation

Step Lines (rough)
parser.ts JSDoc + types ~150
parser.ts Chevrotain parser class ~200
parser.ts AST cap walker + helpers ~50
parser.ts parse() entry point ~50
parser.ts total ~450
parser.test.ts AST assertion helpers ~80
parser.test.ts 5 fixture groups (F1–F5) + boundary cases ~400
parser.test.ts total ~480

Test count target: 35–50 cases (slightly larger than lexer’s 22 because the AST surface is wider).

§13. Pre-flight verification

  • ✅ Worktree created at .worktrees/claude/p1-2-2-parser off origin/main 6345ba7aec8d2507337fa5161928c13d4a3b4d3e.
  • ✅ Branch feature/p1-2-2-parser set up to track origin/main.
  • chevrotain@11.0.3 already in dependencies (P1.2.1 lockfile inherited).
  • ✅ Lexer module readable; surface mapped (§4).
  • ✅ EBNF read and codified (§5–6).
  • ✅ ADR-006-dsl-grammar drift re-noted (§3).

Next step: contract (Step 2 of 5).


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.