P1.2.2 — κ DSL Parser — Behavioral Contract

Step 2 of the 5-step executor chain. Builds on docs/audits/p1-2-2-parser-audit.md. Defines the public surface, semantics, and invariants for src/domains/rules/parser.ts.

§1. Module identity

Path: src/domains/rules/parser.ts
Axis: κ — Rule Engine (Phase 1 Wave 3)
Kind: pure synchronous module; no I/O, no DB access, no network, no env reads, no console output
Runtime dependency: chevrotain@11.0.3 (exact pin, inherited from P1.2.1)
Internal dependencies:
- ./lexer.js — token-type bundles + tokenize entry point
No imports from src/db/*, src/middleware/*, src/domains/{tasks,skills,trail,proof,router,integrations}/*, or any Node built-ins.

§2. Public API

The module exports the following named entities. The type discriminant on every AST interface drives downstream pattern-matching.

§2.1. AST node interfaces

export interface Location {
  startLine: number;       // 1-indexed (Chevrotain convention)
  startColumn: number;     // 1-indexed
  endLine: number;         // 1-indexed, inclusive of last char
  endColumn: number;       // 1-indexed, inclusive of last char
}

export interface RuleNode {
  type: 'RuleNode';
  location: Location;
  name: string;                       // identifier following the `rule` keyword
  guards: GuardClause[];              // body of `guards { ... }`
  effects: EffectCall[];              // body of `effects { ... }`
}

export interface GuardClause {
  type: 'GuardClause';
  location: Location;
  condition: Expression | null;       // null iff the source uses `else`
  action: 'admit' | 'reject';
  reason: string | null;              // populated only when action === 'reject'
}

export interface EffectCall {
  type: 'EffectCall';
  location: Location;
  function: string;                   // identifier
  args: Expression[];
}

export interface BinaryOp {
  type: 'BinaryOp';
  location: Location;
  op: '+' | '-' | '*' | '/' | '%' | '==' | '!=' | '<' | '>' | '<=' | '>=';
  left: Expression;
  right: Expression;
}

export interface UnaryOp {
  type: 'UnaryOp';
  location: Location;
  op: '-';                            // numeric negation only
  operand: Expression;
}

export interface LogicalOp {
  type: 'LogicalOp';
  location: Location;
  op: 'and' | 'or' | 'not';
  operands: Expression[];             // length 2 for and/or, 1 for not
}

export interface IntLiteral {
  type: 'IntLiteral';
  location: Location;
  value: bigint;                      // bigint to match P1.1.1 / P1.1.3 invariants
}

export interface BoolLiteral {
  type: 'BoolLiteral';
  location: Location;
  value: boolean;
}

export interface StringLiteral {
  type: 'StringLiteral';
  location: Location;
  value: string;                      // decoded; escapes resolved
}

export interface VarRef {
  type: 'VarRef';
  location: Location;
  path: string[];                     // e.g. ['actor', 'reputation']
}

export interface FuncCall {
  type: 'FuncCall';
  location: Location;
  name: string;
  args: Expression[];
}

export type Expression =
  | BinaryOp | UnaryOp | LogicalOp
  | IntLiteral | BoolLiteral | StringLiteral
  | VarRef | FuncCall;

export type AnyNode = RuleNode | GuardClause | EffectCall | Expression;

§2.2. Parse-result types

export interface ParseError {
  kind: 'lex' | 'parse' | 'ast-cap';
  message: string;
  location: Location | null;          // null for non-positioned errors only
}

export interface ParseResult {
  ast: RuleNode[];                    // all rules that parsed cleanly
  errors: ParseError[];               // truncated to first MAX_PARSE_ERRORS for kind='parse'
}

§2.3. Constants

export const MAX_AST_NODES_PER_RULE = 10000;
export const MAX_PARSE_ERRORS = 5;

§2.4. Entry point

export function parse(input: string): ParseResult;

The parser exports no Chevrotain-specific types — IToken, CstNode, etc. are encapsulated. Callers consume ParseResult directly.

§3. Function semantics — `parse`

Signature: parse(input: string): ParseResult

Behavior:

Pass input to tokenize() from the lexer module. Collect any ILexingError entries and convert each to a ParseError with kind: 'lex'.
Construct a Chevrotain EmbeddedActionsParser instance (module-level — constructed once and reused; input is set per call via parser.input = tokens).
Run the top-level ruleset rule, which returns RuleNode[]. Collect any parserInstance.errors entries; convert each to a ParseError with kind: 'parse'. Truncate to first MAX_PARSE_ERRORS = 5.
Walk each returned rule with the recursive node-counter; if any rule exceeds MAX_AST_NODES_PER_RULE = 10000, record a ParseError with kind: 'ast-cap' and omit that rule from ast.
Return { ast, errors }.

Purity & non-throwing:

No time reads, no random reads, no DB / network / file I/O.
No side effects on import; importing the module does not parse anything.
parse(s) called twice with equal s returns structurally equal results (excluding object identity).
The function never throws. All errors are returned in errors.

Recovery:

The Chevrotain parser is constructed with recoveryEnabled: true. Errors do not abort parsing; the parser advances past the offending token and tries to recover.
An input with no rules (empty, only whitespace, or only comments — no comments are supported, but the lexer skips whitespace) returns { ast: [], errors: [] }.

§4. Grammar — stratified productions

The grammar mirrors docs/reference/extractions/kappa-rule-engine-extraction.md §1. Each EBNF production maps to one Chevrotain RULE. Operator precedence is stratified across productions, not collapsed into a precedence table.

ruleset       = { rule } ;
rule          = "rule" IDENTIFIER "{" guardBlock effectBlock "}" ;
guardBlock    = "guards" "{" { guardClause } "}" ;        (* 1+ in practice; 0+ allowed by grammar; validator (P1.2.3) flags empty guard blocks *)
guardClause   = ( expression | "else" ) "->" action ;
action        = "admit"
              | "reject" STRING ;
effectBlock   = "effects" "{" { effectCall } "}" ;
effectCall    = IDENTIFIER "(" [ argList ] ")" ;
argList       = arg { "," arg } ;
arg           = expression
              | STRING ;                                  (* string-only args are valid for effect calls *)

expression    = orExpr ;
orExpr        = andExpr { "or" andExpr } ;
andExpr       = notExpr { "and" notExpr } ;
notExpr       = [ "not" ] comparison ;
comparison    = additive [ compOp additive ] ;
compOp        = "==" | "!=" | "<" | ">" | "<=" | ">=" ;
additive      = multiplicative { ("+" | "-") multiplicative } ;
multiplicative = unary { ("*" | "/" | "%") unary } ;
unary         = [ "-" ] primary ;
primary       = INTEGER
              | "true"
              | "false"
              | variable
              | funcCall
              | "(" expression ")" ;
variable      = VARIABLE ;                                 (* whole `$dot.path` is a single token from the lexer *)
funcCall      = IDENTIFIER "(" [ argList ] ")" ;

§4.1. Disambiguation: `funcCall` vs `Identifier` in primary

primary does not enumerate IDENTIFIER directly; the only place a bare identifier appears is as the head of funcCall. The grammar requires IDENTIFIER to be followed by ( to be a primary. An unsupported $variable-less identifier in expression position (e.g. actor.reputation without $) does not match any production and surfaces as a parse error. This matches the EBNF — variables are always $-prefixed.

§4.2. Disambiguation: `funcCall` vs `effectCall`

Both share the surface IDENTIFIER ( args ). The disambiguator is the parent production: effectCall is invoked from inside effectBlock and produces an EffectCall AST node. funcCall inside primary produces a FuncCall AST node. They are kept as two distinct Chevrotain rules to keep the AST shape unambiguous.

Per extraction §1, EffectCall and FuncCall share the same ArgList = Arg { "," Arg } production where Arg = Expression | STRING. Both call shapes therefore accept either expressions or string literals as arguments. The implementation reuses a single effectArg SUBRULE for both EffectCall.args and FuncCall.args to preserve grammar alignment.

§4.3. AST shape rules — operator chains

EBNF chained productions like orExpr = andExpr { "or" andExpr } produce left-associative chains. The AST is built bottom-up:

$a or $b or $c
=>
LogicalOp{ op: 'or', operands: [
  LogicalOp{ op: 'or', operands: [$a, $b] },
  $c
]}

(Chained two-operand or/and are nested left, not flattened. P1.5.4 canonical serialization may flatten; the parser does not.)

§4.4. Comparison

comparison = additive [ compOp additive ] — at most ONE comparison operator per chain. a < b < c is a parse error (chained comparisons are not supported in κ DSL — the lexer + parser conform to extraction §1 grammar which uses [ compOp additive ], an optional, not iterative).

§5. Invariants

ID	Invariant	Verified by
I1	`parse` returns a `ParseResult` and never throws	`parser.test.ts` — every malformed input case
I2	Empty input → `{ ast: [], errors: [] }`	test
I3	Whitespace-only input → `{ ast: [], errors: [] }`	test
I4	Every AST node has `type` discriminant matching one of 11 defined values	test (full AcceptCommitment AST walked)
I5	Every AST node has a `location` with valid 1-indexed positions	test
I6	Operator precedence: `*`/`/`/`%` > `+`/`-` > comparison > `not` > `and` > `or`	test (precedence fixture F2)
I7	`not` is `LogicalOp` with `operands.length === 1`; binary `and`/`or` have `operands.length === 2`	test
I8	Unary `-` is `UnaryOp` with `op === '-'` and one `operand`	test
I9	`IntLiteral.value` is `bigint`, not `number`	test (typeof === ‘bigint’)
I10	`VarRef.path` is the dot-split of the lexer’s `Variable` image (without the leading `$`)	test
I11	`else` guard clause produces `condition: null`	test
I12	`reject "reason"` action sets `reason` to the decoded string; `admit` sets `reason: null`	test
I13	`parse` is referentially transparent: `parse(s) === parse(s)` structurally for equal `s`	test
I14	A rule with > `MAX_AST_NODES_PER_RULE` nodes is omitted from `ast` and recorded in `errors` with `kind: 'ast-cap'`	test (F4)
I15	Up to `MAX_PARSE_ERRORS = 5` parse errors are reported; further parse errors are silently dropped (the count remains at 5)	test
I16	Lexer errors are passed through with `kind: 'lex'`	test (3.14 input)
I17	Round-trip-stable: `parse(s).ast` is structurally equal to `parse(parse(s).ast→serialized).ast` (deferred to P1.5.4 — fixture F5 uses `parse(s).ast === parse(s).ast` as the locally-testable proxy)	test (F5, with TODO)
I18	Identifier-prefixed-by-keyword inputs (e.g. `admissionRule` as a rule name) tokenize as Identifier and parse as the rule name	test
I19	Empty `guards { }` and empty `effects { }` blocks parse cleanly (validation is P1.2.3’s job)	test

§6. Error model

ParseError.kind is one of three:

Kind	Source	Examples
`'lex'`	`tokenize()` errors (float literal, underscore int, unknown char)	`3.14`, `1_000`, `@`
`'parse'`	Chevrotain parse errors (unexpected token, missing token, etc.)	`rule X { guards { -> admit } }` (missing expression)
`'ast-cap'`	Post-parse cap walker	A rule whose AST exceeds 10,000 nodes

ParseError.location:

For 'lex': derived from ILexingError.{line, column, length}.
For 'parse': derived from the Chevrotain error’s token.{startLine, startColumn, endLine, endColumn} — null only if Chevrotain reports an error with no token (rare; happens at EOF).
For 'ast-cap': the offending rule’s RuleNode.location.

ParseError.message:

For 'lex': passed through verbatim.
For 'parse': Chevrotain’s default error message; no rephrasing.
For 'ast-cap': "Rule '<name>' exceeds maximum AST node count (<count> > <MAX_AST_NODES_PER_RULE>)".

§7. Dependency rules

In: chevrotain (peer of lexer.ts), ./lexer.js (token-type bundles + tokenize). Out: AST consumed by P1.2.3 validator, P1.2.4 registry, P1.3.1 evaluator, P1.5.4 canonical serializer.

Explicitly forbidden imports (mirror lexer):

No src/db/* — parser is pure.
No src/middleware/* — parser is outside the MCP pipeline.
No src/domains/{tasks,skills,trail,proof,router,integrations}/* — κ is a peer axis.
No Node built-ins (fs, path, crypto, os, child_process, http, net, …).

§8. Performance envelope (informational, not gated)

Short rule (~50 tokens) parses in < 5 ms on a modern laptop.
Memory: AST size proportional to source length; no caching across calls.
AST-cap walker is O(N) over total node count.
No memoization — callers may cache ParseResult if desired.

§9. Non-goals (re-stated from audit §10)

AST validator (P1.2.3).
Rule registry / loader (P1.2.4).
Evaluator / interpreter (P1.3.1).
Canonical serialization (P1.5.4).
Rule classification by kind (Admission / StateTransition / Consequence / Promotion).
A new ADR for DSL grammar.
Mutating any file outside src/domains/rules/parser.ts and src/__tests__/domains/rules/parser.test.ts (plus the four chain docs).
Performance SLOs.

§10. Change log

v1 (this commit) — initial contract.

Any subsequent change to the public surface of parser.ts MUST land a contract revision in the same PR. Backward-incompatible changes MUST advance a minor version note here.

§11. Traceability

Requirement	Where defined	Where tested
11 AST node types with `type` discriminant	extraction §2 + contract §2.1	`parser.test.ts` AST shape matrix
Operator precedence stratified	extraction §1 + contract §4	`parser.test.ts` F2 precedence
`recoveryEnabled: true`	task spec + contract §3	`parser.test.ts` F3 malformed input
AST cap at 10000 nodes	task spec + contract §2.3 / §3 step 4	`parser.test.ts` F4
Round-trip stability	task spec + contract §I17	`parser.test.ts` F5
AcceptCommitment fixture	task spec + concept doc	`parser.test.ts` F1
Lexer errors flow through	contract §6	`parser.test.ts` F6
First 5 parse errors	task spec + contract §2.3 / §3 step 3	`parser.test.ts` F7 (synthesized many-error input)
Identifier collisions with reserved keywords	lexer `longer_alt` + contract §4.1 / §I18	`parser.test.ts`

§12. Summary

src/domains/rules/parser.ts exports a single function parse(input: string): ParseResult plus 11 AST node interfaces and supporting types. It uses Chevrotain 11.0.3’s EmbeddedActionsParser with recoveryEnabled: true, mirrors the EBNF in kappa-rule-engine-extraction.md §1 with stratified operator precedence, walks the post-parse AST to enforce a 10,000-node cap per rule, and reports up to 5 parse errors plus all lexer errors plus all AST-cap errors. The function never throws; all errors flow through ParseResult.errors. AST nodes are pure data — no methods, no classes — to keep the surface trivially serialisable for canonical hashing in P1.5.4.

Next step: packet (Step 3 of 5).