P1.2.2 — κ DSL Parser — Audit
Step 1 of the 5-step executor chain (audit → contract → packet → implement → verify). Builds on the P1.2.1 lexer (
src/domains/rules/lexer.ts, R83.C686ede6b). Greenfield parser plus AST surface.
§1. Surface inventory
§1.1. Target files (greenfield for this task)
| Path | Exists at base? | Purpose |
|---|---|---|
src/domains/rules/parser.ts |
No | Chevrotain EmbeddedActionsParser + AST node types |
src/__tests__/domains/rules/parser.test.ts |
No | Jest parser tests (see §1.3 layout reconciliation) |
§1.2. Touched but not owned
| Path | Delta | Purpose |
|---|---|---|
src/domains/rules/ |
already exists with bps-constants.ts, determinism.ts, integer-math.ts, lexer.ts (4 files) |
Adding parser.ts as a peer — no edits to existing files. |
package.json |
none | chevrotain@11.0.3 already pinned in dependencies from P1.2.1 (E:\AMS\package.json) — no version change. |
package-lock.json |
none | No new dependencies. |
§1.3. Test-file layout reconciliation
The task prompt places the test at src/domains/rules/__tests__/parser.test.ts. The shipped Phase 0 + Phase 1 convention is tests live under src/__tests__/domains/<name>/, confirmed by inspection at base SHA 6345ba7a:
src/__tests__/domains/rules/{bps-constants,determinism,integer-math,lexer}.test.ts(P1.1.1, P1.1.2, P1.1.3, P1.2.1)src/__tests__/domains/{router,skills,tasks,proof,trail}/...(Phase 0 axes)
Jest testMatch in jest.config.ts picks both layouts up. To stay consistent with the in-repo κ tests already shipped (the lexer test was placed under src/__tests__/domains/rules/lexer.test.ts for the same reason — see docs/audits/r81-b-p1-2-1-lexer-audit.md §1.3), the parser test will live at:
src/__tests__/domains/rules/parser.test.ts
This is a convention reconciliation, not a spec deviation. The verification doc will re-cite.
§2. Authoritative grammar sources
The task prompt lists six pre-flight reads. One of them (docs/architecture/decisions/ADR-006-dsl-grammar.md) does not exist — see §3 drift finding. For authoritative grammar this parser relies on:
| Source | Path | Weight |
|---|---|---|
| Heritage extraction, full EBNF | docs/reference/extractions/kappa-rule-engine-extraction.md §1 |
Authoritative superset (per prompt) |
| Heritage extraction, AST shape | docs/reference/extractions/kappa-rule-engine-extraction.md §2 |
Authoritative AST node list (11 types) |
| Concept doc, EBNF fragment | docs/3-world/physics/laws/rule-engine.md §DSL grammar |
Narrower phrasing — concept uses guard: / effects: prefix syntax; extraction uses guards { } / effects { } block syntax. Extraction wins (per prompt). |
| Concept doc, worked rule | docs/3-world/physics/laws/rule-engine.md §Worked rule (AcceptCommitment) |
Realistic fixture for parser tests. The body uses guard: style; the test will translate to guards {} block style to match the extraction grammar. |
| DSL spec | docs/spec/s12-dsl.md |
Load-bearing, high-level. |
| Rule engine spec | docs/spec/s11-rule-engine.md |
Load-bearing, semantic level. |
| Lexer source | src/domains/rules/lexer.ts |
The token surface this parser binds to. |
§3. Drift finding — ADR-006-dsl-grammar still missing
The task prompt asks the agent to read docs/architecture/decisions/ADR-006-dsl-grammar.md for Chevrotain ratification (it is also referenced from the concept doc at docs/3-world/physics/laws/rule-engine.md line 206). This ADR is not in the repo at base 6345ba7a.
- Actual ADR-006 in repo:
docs/architecture/decisions/ADR-006-executable-meaning.md— different subject. - Other ADRs present at base: ADR-001..009 (no
dsl-grammarslot). - The R81.B audit (
docs/audits/r81-b-p1-2-1-lexer-audit.md§3) raised this drift; the lexer was implemented using extraction §1 + s11/s12 as the authoritative grammar triad. Same approach taken here.
Scope of this task: note the drift again, do not write the ADR. The follow-up to ratify Chevrotain/grammar in an ADR remains a docs round candidate.
§4. Lexer / parser interface — what the parser binds to
Inspecting src/domains/rules/lexer.ts at base:
- Module exports:
tokenize(input: string): ILexingResult— never throws.allTokens: TokenType[]— the priority-ordered registry.- Bundles:
Keywords,Operators,Delimiters,Literals,RejectedLiterals. - Re-exports:
IToken,TokenType,ILexingResult,ILexingError.
- Tokens the parser will reference (29 of 39 — non-error, non-whitespace):
- Keywords (12 used by parser; 6 of the 18 are reserved for future κ but not in extraction §1 grammar):
- Used:
Rule,Guards,Effects,Else,And,Or,Not,True,False,Admit,Reject. - Reserved/unused at this task:
When,Then,If,Admission,Transition,Consequence,Promotion. (See §6 — these will be used by the rule classifier in P1.2.4 / P1.3.1.)
- Used:
- Operators (12):
Eq,NotEq,Lte,Gte,Lt,Gt,Plus,Minus,Mul,Div,Mod,Arrow. - Delimiters (5 of 7):
LBrace,RBrace,LParen,RParen,Comma. (Colon,Dotare not used at the rule-level grammar butDotis internal toVariableregex.) - Literals (4):
Identifier,Variable,IntegerLiteral,StringLiteral.
- Keywords (12 used by parser; 6 of the 18 are reserved for future κ but not in extraction §1 grammar):
- Lexer caveats relevant to parser correctness:
- The R83.C identifier custom-pattern-function escape hatch (Chevrotain 11.0.3
regexp-to-astdoes NOT support the Unicodeuflag). The parser must not bypass this — it consumesIToken[]already produced; no regex re-engagement needed. - The lexer rejects float literals and underscore-separated integers via positioned errors (
FLOAT_REJECTED_MESSAGE,UNDERSCORE_INT_REJECTED_MESSAGE). The parser sees only well-typed tokens; it does NOT need to re-detect these. - The lexer handles whitespace (
Lexer.SKIPPED); the parser sees no whitespace tokens. - The lexer’s
Variabletoken’simageis the full$dot.pathstring — the parser splits on.to populateVarRef.path: string[]. IntegerLiteralis unsigned; sign is parser-level viaUnaryrule.- Each
ITokencarriesstartLine,startColumn,endLine,endColumn,startOffset,endOffset(lexer constructs withpositionTracking: 'full'). The parser uses these to setlocationon AST nodes.
- The R83.C identifier custom-pattern-function escape hatch (Chevrotain 11.0.3
§5. AST node taxonomy (per extraction §2)
11 node types. Every node carries {type: string discriminant, location: {startLine, startColumn, endLine, endColumn}} plus type-specific fields. Plain data — no classes with behavior (forbidden per task §FORBIDDENS).
| # | Node type | Fields (beyond type + location) |
Notes |
|---|---|---|---|
| 1 | RuleNode |
name: string, guards: GuardClause[], effects: EffectCall[] |
Top-level rule declaration. |
| 2 | GuardClause |
condition: Expression \| null (null = else), action: 'admit' \| 'reject', reason: string \| null (only set when action === 'reject') |
First-match-wins evaluation. |
| 3 | EffectCall |
function: string, args: Expression[] |
Side-effect invocation; semantics live downstream (P1.3.x). |
| 4 | BinaryOp |
op: '+' \| '-' \| '*' \| '/' \| '%' \| '==' \| '!=' \| '<' \| '>' \| '<=' \| '>=', left: Expression, right: Expression |
Arithmetic + comparison. |
| 5 | UnaryOp |
op: '-', operand: Expression |
Numeric negation only — not is LogicalOp. |
| 6 | LogicalOp |
op: 'and' \| 'or' \| 'not', operands: Expression[] (length 2 for and/or, 1 for not) |
Boolean logic. |
| 7 | IntLiteral |
value: bigint |
Integer constant. bigint to match P1.1.1 integer-math.ts and P1.1.3 bps-constants.ts invariants — the engine is bigint throughout for κ determinism. (Extraction §2 says int64; bigint is the JS-side carrier with the engine enforcing the int64 envelope at evaluation time.) |
| 8 | BoolLiteral |
value: boolean |
true or false. |
| 9 | StringLiteral |
value: string |
Decoded string (escapes resolved); image retained on the parent token only. |
| 10 | VarRef |
path: string[] |
E.g. $actor.reputation → path: ['actor', 'reputation']. |
| 11 | FuncCall |
name: string, args: Expression[] |
Built-in function invocation; semantics live in P1.3.1 evaluator. |
Expression is a union of: BinaryOp | UnaryOp | LogicalOp | IntLiteral | BoolLiteral | StringLiteral | VarRef | FuncCall. (StringLiteral is in the union because the extraction’s Arg = Expression | STRING permits string args; it is not valid as a top-level expression in arithmetic / boolean position. The AST permits the type but the validator (P1.2.3) and evaluator (P1.3.1) reject misplaced strings.)
§6. Rule classification — Admission / StateTransition / Consequence / Promotion
The task prompt asks the parser to “parse 4 rule types: Admission, StateTransition, Consequence, Promotion”. The extraction §1 grammar’s Rule = "rule" IDENTIFIER "{" GuardBlock EffectBlock "}" does not carry type information at the syntax level; classification is a downstream concern. From rule-engine.md §Rule Execution Order, the four kinds are categories used by the registry / executor, not grammatical productions.
Decision for this task: the parser produces RuleNode instances; it does not classify them at parse time. Classification by name convention (e.g. prefix Admit*, State*, etc.) or by an explicit attribute (e.g. a kind keyword) is a P1.2.4 (registry) / P1.3.1 (engine) concern. The PR will document this explicitly so reviewers do not flag a missing classifier.
This is consistent with the lexer reserving Admission, Transition, Consequence, Promotion as keywords — they exist in the token stream for future use but the extraction §1 grammar does not consume them yet. They tokenize today; they bind to grammar productions later.
§7. AST cap (10,000 nodes per rule)
The task prompt requires rejection of any single rule with > 10,000 AST nodes at parse time. Two implementation choices:
Choice A — count during parsing, threading state through Chevrotain’s parser DSL. The task prompt §Common Gotchas explicitly cautions against this (“threading state through Chevrotain’s parser DSL mid-parse is brittle”).
Choice B — count after parsing with a recursive walker. The task prompt §Common Gotchas explicitly recommends this (“Walk the final tree with a simple recursive counter”).
Decision: Choice B — post-parse recursive walker countNodes(node: AnyNode): number, called by the public entry point after parseRuleset returns. Rules exceeding the cap produce a synthetic parse-error entry rather than throwing. The cap is exposed as an exported constant MAX_AST_NODES_PER_RULE = 10000 for tests + future ADR.
§8. Error recovery (5-error cap)
Chevrotain’s recoveryEnabled: true is the documented switch for non-fatal parse errors. The task prompt requires first 5 errors reported, doesn’t crash on malformed input. Chevrotain’s errors array on parse() already accumulates all encountered errors; the parser truncates to first 5.
Decision: the parse() entry point returns { ast: RuleNode[], errors: ParseError[] } where errors is the union of:
- Lexer errors (passed through from
tokenize). - Chevrotain parse errors (truncated to first 5).
- AST-cap errors (one per offending rule).
If errors is non-empty, ast may be partial (rules that parsed cleanly still appear; rules that failed contribute nothing). This matches the spec’s “doesn’t crash on malformed input” requirement.
§9. Public API surface (committed by §contract)
The parser module exports — provisional, locked in docs/contracts/p1-2-2-parser-contract.md:
// AST union types — discriminated by `type`
export type Expression =
| BinaryOp | UnaryOp | LogicalOp
| IntLiteral | BoolLiteral | StringLiteral
| VarRef | FuncCall;
export interface Location { startLine: number; startColumn: number; endLine: number; endColumn: number; }
export interface RuleNode { type: 'RuleNode'; location: Location; name: string; guards: GuardClause[]; effects: EffectCall[]; }
// ... 10 more interfaces (one per AST node)
export interface ParseError {
kind: 'lex' | 'parse' | 'ast-cap';
message: string;
location: Location | null; // null only for non-positioned errors
}
export interface ParseResult {
ast: RuleNode[];
errors: ParseError[];
}
export const MAX_AST_NODES_PER_RULE: number;
export const MAX_PARSE_ERRORS: number; // 5
export function parse(input: string): ParseResult;
No classes. Interfaces only. Plain data. Pure function.
§10. Non-goals
This task explicitly excludes:
- AST validator — semantic checks (forbidden ops in expressions, type coherence, function arity). That is P1.2.3.
- Rule registry / loader — keying rules by name, looking up by registry id. That is P1.2.4.
- Evaluator / interpreter — executing the AST against a context. That is P1.3.1.
- Canonical serialization — pretty-printing AST back to DSL text. That is P1.5.4. The round-trip test (Fixture F5) leaves a
TODO(P1.5.4)comment where the canonical-serialize call would go; the test assertsparse(s)is structurally stable when re-parsed (i.e. parse twice and assert equal — a weaker but locally testable property). - Rule classification by kind (Admission / StateTransition / Consequence / Promotion) — see §6.
- A new ADR — see §3.
- Mutating any existing file outside
src/domains/rules/parser.ts,src/__tests__/domains/rules/parser.test.ts, and the three docs (audit,contract,packet,verification). - Performance SLOs — none gated; informational only.
§11. Risk register
| Risk | Mitigation |
|---|---|
| Chevrotain LL(k) left-recursion gotcha | EBNF in extraction §1 already iterative ({ ... } repetition for binary chains); maps to MANY rules in Chevrotain. No left recursion in design. |
EmbeddedActionsParser vs CstParser choice |
Pick EmbeddedActionsParser per task spec. Chevrotain warns about EmbeddedActions in self-analysis; mitigate via recoveryEnabled: true plus careful RULE definitions returning AST nodes directly. |
| Operator precedence collapse | Stratified grammar productions (OrExpr → AndExpr → NotExpr → Comparison → Additive → Multiplicative → Unary → Primary) per extraction §1; no precedence-table hack. |
| AST cap counting brittleness | Post-parse walker (Choice B in §7). |
BigInt overflow in IntLiteral parsing |
Use BigInt(text) directly; if overflow at parser time is a concern (it is, for MAX_INT64-exceeding literals), defer to P1.2.3 validator. The parser stores the bigint as-is. |
| Round-trip property without P1.5.4 canonicalize | Fixture F5 uses parse(s) twice and asserts structural equality (a weaker invariant). Comment marks the upgrade target as P1.5.4. |
| Cross-worktree leak (memory mentions persistent issue at Wave C) | Strict scope discipline — only the 5 files this task owns are edited. git status checked at every commit. |
Lexer keywords Admission/Transition/Consequence/Promotion unused |
Documented as reserved per §6. Tests exercise some of them via prefix-of-identifier (admissionRule is a valid Identifier — Chevrotain’s longer_alt). |
noUncheckedIndexedAccess in tsconfig |
Care needed when accessing tokens[i]! — every parser-internal access is [i]! or guarded; AST-walker must check children for undefined before recursing. |
§12. Estimated implementation
| Step | Lines (rough) |
|---|---|
parser.ts JSDoc + types |
~150 |
parser.ts Chevrotain parser class |
~200 |
parser.ts AST cap walker + helpers |
~50 |
parser.ts parse() entry point |
~50 |
parser.ts total |
~450 |
parser.test.ts AST assertion helpers |
~80 |
parser.test.ts 5 fixture groups (F1–F5) + boundary cases |
~400 |
parser.test.ts total |
~480 |
Test count target: 35–50 cases (slightly larger than lexer’s 22 because the AST surface is wider).
§13. Pre-flight verification
- ✅ Worktree created at
.worktrees/claude/p1-2-2-parserofforigin/main6345ba7aec8d2507337fa5161928c13d4a3b4d3e. - ✅ Branch
feature/p1-2-2-parserset up to trackorigin/main. - ✅
chevrotain@11.0.3already independencies(P1.2.1 lockfile inherited). - ✅ Lexer module readable; surface mapped (§4).
- ✅ EBNF read and codified (§5–6).
- ✅ ADR-006-dsl-grammar drift re-noted (§3).
Next step: contract (Step 2 of 5).