P1.2.2 — κ DSL Parser — Behavioral Contract

Step 2 of the 5-step executor chain. Builds on docs/audits/p1-2-2-parser-audit.md. Defines the public surface, semantics, and invariants for src/domains/rules/parser.ts.

§1. Module identity

  • Path: src/domains/rules/parser.ts
  • Axis: κ — Rule Engine (Phase 1 Wave 3)
  • Kind: pure synchronous module; no I/O, no DB access, no network, no env reads, no console output
  • Runtime dependency: chevrotain@11.0.3 (exact pin, inherited from P1.2.1)
  • Internal dependencies:
    • ./lexer.js — token-type bundles + tokenize entry point
  • No imports from src/db/*, src/middleware/*, src/domains/{tasks,skills,trail,proof,router,integrations}/*, or any Node built-ins.

§2. Public API

The module exports the following named entities. The type discriminant on every AST interface drives downstream pattern-matching.

§2.1. AST node interfaces

export interface Location {
  startLine: number;       // 1-indexed (Chevrotain convention)
  startColumn: number;     // 1-indexed
  endLine: number;         // 1-indexed, inclusive of last char
  endColumn: number;       // 1-indexed, inclusive of last char
}

export interface RuleNode {
  type: 'RuleNode';
  location: Location;
  name: string;                       // identifier following the `rule` keyword
  guards: GuardClause[];              // body of `guards { ... }`
  effects: EffectCall[];              // body of `effects { ... }`
}

export interface GuardClause {
  type: 'GuardClause';
  location: Location;
  condition: Expression | null;       // null iff the source uses `else`
  action: 'admit' | 'reject';
  reason: string | null;              // populated only when action === 'reject'
}

export interface EffectCall {
  type: 'EffectCall';
  location: Location;
  function: string;                   // identifier
  args: Expression[];
}

export interface BinaryOp {
  type: 'BinaryOp';
  location: Location;
  op: '+' | '-' | '*' | '/' | '%' | '==' | '!=' | '<' | '>' | '<=' | '>=';
  left: Expression;
  right: Expression;
}

export interface UnaryOp {
  type: 'UnaryOp';
  location: Location;
  op: '-';                            // numeric negation only
  operand: Expression;
}

export interface LogicalOp {
  type: 'LogicalOp';
  location: Location;
  op: 'and' | 'or' | 'not';
  operands: Expression[];             // length 2 for and/or, 1 for not
}

export interface IntLiteral {
  type: 'IntLiteral';
  location: Location;
  value: bigint;                      // bigint to match P1.1.1 / P1.1.3 invariants
}

export interface BoolLiteral {
  type: 'BoolLiteral';
  location: Location;
  value: boolean;
}

export interface StringLiteral {
  type: 'StringLiteral';
  location: Location;
  value: string;                      // decoded; escapes resolved
}

export interface VarRef {
  type: 'VarRef';
  location: Location;
  path: string[];                     // e.g. ['actor', 'reputation']
}

export interface FuncCall {
  type: 'FuncCall';
  location: Location;
  name: string;
  args: Expression[];
}

export type Expression =
  | BinaryOp | UnaryOp | LogicalOp
  | IntLiteral | BoolLiteral | StringLiteral
  | VarRef | FuncCall;

export type AnyNode = RuleNode | GuardClause | EffectCall | Expression;

§2.2. Parse-result types

export interface ParseError {
  kind: 'lex' | 'parse' | 'ast-cap';
  message: string;
  location: Location | null;          // null for non-positioned errors only
}

export interface ParseResult {
  ast: RuleNode[];                    // all rules that parsed cleanly
  errors: ParseError[];               // truncated to first MAX_PARSE_ERRORS for kind='parse'
}

§2.3. Constants

export const MAX_AST_NODES_PER_RULE = 10000;
export const MAX_PARSE_ERRORS = 5;

§2.4. Entry point

export function parse(input: string): ParseResult;

The parser exports no Chevrotain-specific types — IToken, CstNode, etc. are encapsulated. Callers consume ParseResult directly.

§3. Function semantics — parse

Signature: parse(input: string): ParseResult

Behavior:

  1. Pass input to tokenize() from the lexer module. Collect any ILexingError entries and convert each to a ParseError with kind: 'lex'.
  2. Construct a Chevrotain EmbeddedActionsParser instance (module-level — constructed once and reused; input is set per call via parser.input = tokens).
  3. Run the top-level ruleset rule, which returns RuleNode[]. Collect any parserInstance.errors entries; convert each to a ParseError with kind: 'parse'. Truncate to first MAX_PARSE_ERRORS = 5.
  4. Walk each returned rule with the recursive node-counter; if any rule exceeds MAX_AST_NODES_PER_RULE = 10000, record a ParseError with kind: 'ast-cap' and omit that rule from ast.
  5. Return { ast, errors }.

Purity & non-throwing:

  • No time reads, no random reads, no DB / network / file I/O.
  • No side effects on import; importing the module does not parse anything.
  • parse(s) called twice with equal s returns structurally equal results (excluding object identity).
  • The function never throws. All errors are returned in errors.

Recovery:

  • The Chevrotain parser is constructed with recoveryEnabled: true. Errors do not abort parsing; the parser advances past the offending token and tries to recover.
  • An input with no rules (empty, only whitespace, or only comments — no comments are supported, but the lexer skips whitespace) returns { ast: [], errors: [] }.

§4. Grammar — stratified productions

The grammar mirrors docs/reference/extractions/kappa-rule-engine-extraction.md §1. Each EBNF production maps to one Chevrotain RULE. Operator precedence is stratified across productions, not collapsed into a precedence table.

ruleset       = { rule } ;
rule          = "rule" IDENTIFIER "{" guardBlock effectBlock "}" ;
guardBlock    = "guards" "{" { guardClause } "}" ;        (* 1+ in practice; 0+ allowed by grammar; validator (P1.2.3) flags empty guard blocks *)
guardClause   = ( expression | "else" ) "->" action ;
action        = "admit"
              | "reject" STRING ;
effectBlock   = "effects" "{" { effectCall } "}" ;
effectCall    = IDENTIFIER "(" [ argList ] ")" ;
argList       = arg { "," arg } ;
arg           = expression
              | STRING ;                                  (* string-only args are valid for effect calls *)

expression    = orExpr ;
orExpr        = andExpr { "or" andExpr } ;
andExpr       = notExpr { "and" notExpr } ;
notExpr       = [ "not" ] comparison ;
comparison    = additive [ compOp additive ] ;
compOp        = "==" | "!=" | "<" | ">" | "<=" | ">=" ;
additive      = multiplicative { ("+" | "-") multiplicative } ;
multiplicative = unary { ("*" | "/" | "%") unary } ;
unary         = [ "-" ] primary ;
primary       = INTEGER
              | "true"
              | "false"
              | variable
              | funcCall
              | "(" expression ")" ;
variable      = VARIABLE ;                                 (* whole `$dot.path` is a single token from the lexer *)
funcCall      = IDENTIFIER "(" [ argList ] ")" ;

§4.1. Disambiguation: funcCall vs Identifier in primary

primary does not enumerate IDENTIFIER directly; the only place a bare identifier appears is as the head of funcCall. The grammar requires IDENTIFIER to be followed by ( to be a primary. An unsupported $variable-less identifier in expression position (e.g. actor.reputation without $) does not match any production and surfaces as a parse error. This matches the EBNF — variables are always $-prefixed.

§4.2. Disambiguation: funcCall vs effectCall

Both share the surface IDENTIFIER ( args ). The disambiguator is the parent production: effectCall is invoked from inside effectBlock and produces an EffectCall AST node. funcCall inside primary produces a FuncCall AST node. They are kept as two distinct Chevrotain rules to keep the AST shape unambiguous.

Per extraction §1, EffectCall and FuncCall share the same ArgList = Arg { "," Arg } production where Arg = Expression | STRING. Both call shapes therefore accept either expressions or string literals as arguments. The implementation reuses a single effectArg SUBRULE for both EffectCall.args and FuncCall.args to preserve grammar alignment.

§4.3. AST shape rules — operator chains

EBNF chained productions like orExpr = andExpr { "or" andExpr } produce left-associative chains. The AST is built bottom-up:

$a or $b or $c
=>
LogicalOp{ op: 'or', operands: [
  LogicalOp{ op: 'or', operands: [$a, $b] },
  $c
]}

(Chained two-operand or/and are nested left, not flattened. P1.5.4 canonical serialization may flatten; the parser does not.)

§4.4. Comparison

comparison = additive [ compOp additive ] — at most ONE comparison operator per chain. a < b < c is a parse error (chained comparisons are not supported in κ DSL — the lexer + parser conform to extraction §1 grammar which uses [ compOp additive ], an optional, not iterative).

§5. Invariants

ID Invariant Verified by
I1 parse returns a ParseResult and never throws parser.test.ts — every malformed input case
I2 Empty input → { ast: [], errors: [] } test
I3 Whitespace-only input → { ast: [], errors: [] } test
I4 Every AST node has type discriminant matching one of 11 defined values test (full AcceptCommitment AST walked)
I5 Every AST node has a location with valid 1-indexed positions test
I6 Operator precedence: *///% > +/- > comparison > not > and > or test (precedence fixture F2)
I7 not is LogicalOp with operands.length === 1; binary and/or have operands.length === 2 test
I8 Unary - is UnaryOp with op === '-' and one operand test
I9 IntLiteral.value is bigint, not number test (typeof === ‘bigint’)
I10 VarRef.path is the dot-split of the lexer’s Variable image (without the leading $) test
I11 else guard clause produces condition: null test
I12 reject "reason" action sets reason to the decoded string; admit sets reason: null test
I13 parse is referentially transparent: parse(s) === parse(s) structurally for equal s test
I14 A rule with > MAX_AST_NODES_PER_RULE nodes is omitted from ast and recorded in errors with kind: 'ast-cap' test (F4)
I15 Up to MAX_PARSE_ERRORS = 5 parse errors are reported; further parse errors are silently dropped (the count remains at 5) test
I16 Lexer errors are passed through with kind: 'lex' test (3.14 input)
I17 Round-trip-stable: parse(s).ast is structurally equal to parse(parse(s).ast→serialized).ast (deferred to P1.5.4 — fixture F5 uses parse(s).ast === parse(s).ast as the locally-testable proxy) test (F5, with TODO)
I18 Identifier-prefixed-by-keyword inputs (e.g. admissionRule as a rule name) tokenize as Identifier and parse as the rule name test
I19 Empty guards { } and empty effects { } blocks parse cleanly (validation is P1.2.3’s job) test

§6. Error model

ParseError.kind is one of three:

Kind Source Examples
'lex' tokenize() errors (float literal, underscore int, unknown char) 3.14, 1_000, @
'parse' Chevrotain parse errors (unexpected token, missing token, etc.) rule X { guards { -> admit } } (missing expression)
'ast-cap' Post-parse cap walker A rule whose AST exceeds 10,000 nodes

ParseError.location:

  • For 'lex': derived from ILexingError.{line, column, length}.
  • For 'parse': derived from the Chevrotain error’s token.{startLine, startColumn, endLine, endColumn}null only if Chevrotain reports an error with no token (rare; happens at EOF).
  • For 'ast-cap': the offending rule’s RuleNode.location.

ParseError.message:

  • For 'lex': passed through verbatim.
  • For 'parse': Chevrotain’s default error message; no rephrasing.
  • For 'ast-cap': "Rule '<name>' exceeds maximum AST node count (<count> > <MAX_AST_NODES_PER_RULE>)".

§7. Dependency rules

In: chevrotain (peer of lexer.ts), ./lexer.js (token-type bundles + tokenize). Out: AST consumed by P1.2.3 validator, P1.2.4 registry, P1.3.1 evaluator, P1.5.4 canonical serializer.

Explicitly forbidden imports (mirror lexer):

  • No src/db/* — parser is pure.
  • No src/middleware/* — parser is outside the MCP pipeline.
  • No src/domains/{tasks,skills,trail,proof,router,integrations}/* — κ is a peer axis.
  • No Node built-ins (fs, path, crypto, os, child_process, http, net, …).

§8. Performance envelope (informational, not gated)

  • Short rule (~50 tokens) parses in < 5 ms on a modern laptop.
  • Memory: AST size proportional to source length; no caching across calls.
  • AST-cap walker is O(N) over total node count.
  • No memoization — callers may cache ParseResult if desired.

§9. Non-goals (re-stated from audit §10)

  • AST validator (P1.2.3).
  • Rule registry / loader (P1.2.4).
  • Evaluator / interpreter (P1.3.1).
  • Canonical serialization (P1.5.4).
  • Rule classification by kind (Admission / StateTransition / Consequence / Promotion).
  • A new ADR for DSL grammar.
  • Mutating any file outside src/domains/rules/parser.ts and src/__tests__/domains/rules/parser.test.ts (plus the four chain docs).
  • Performance SLOs.

§10. Change log

  • v1 (this commit) — initial contract.

Any subsequent change to the public surface of parser.ts MUST land a contract revision in the same PR. Backward-incompatible changes MUST advance a minor version note here.

§11. Traceability

Requirement Where defined Where tested
11 AST node types with type discriminant extraction §2 + contract §2.1 parser.test.ts AST shape matrix
Operator precedence stratified extraction §1 + contract §4 parser.test.ts F2 precedence
recoveryEnabled: true task spec + contract §3 parser.test.ts F3 malformed input
AST cap at 10000 nodes task spec + contract §2.3 / §3 step 4 parser.test.ts F4
Round-trip stability task spec + contract §I17 parser.test.ts F5
AcceptCommitment fixture task spec + concept doc parser.test.ts F1
Lexer errors flow through contract §6 parser.test.ts F6
First 5 parse errors task spec + contract §2.3 / §3 step 3 parser.test.ts F7 (synthesized many-error input)
Identifier collisions with reserved keywords lexer longer_alt + contract §4.1 / §I18 parser.test.ts

§12. Summary

src/domains/rules/parser.ts exports a single function parse(input: string): ParseResult plus 11 AST node interfaces and supporting types. It uses Chevrotain 11.0.3’s EmbeddedActionsParser with recoveryEnabled: true, mirrors the EBNF in kappa-rule-engine-extraction.md §1 with stratified operator precedence, walks the post-parse AST to enforce a 10,000-node cap per rule, and reports up to 5 parse errors plus all lexer errors plus all AST-cap errors. The function never throws; all errors flow through ParseResult.errors. AST nodes are pure data — no methods, no classes — to keep the surface trivially serialisable for canonical hashing in P1.5.4.

Next step: packet (Step 3 of 5).


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.