P1.2.2 — κ DSL Parser — Behavioral Contract
Step 2 of the 5-step executor chain. Builds on
docs/audits/p1-2-2-parser-audit.md. Defines the public surface, semantics, and invariants forsrc/domains/rules/parser.ts.
§1. Module identity
- Path:
src/domains/rules/parser.ts - Axis: κ — Rule Engine (Phase 1 Wave 3)
- Kind: pure synchronous module; no I/O, no DB access, no network, no env reads, no console output
- Runtime dependency:
chevrotain@11.0.3(exact pin, inherited from P1.2.1) - Internal dependencies:
./lexer.js— token-type bundles +tokenizeentry point
- No imports from
src/db/*,src/middleware/*,src/domains/{tasks,skills,trail,proof,router,integrations}/*, or any Node built-ins.
§2. Public API
The module exports the following named entities. The type discriminant on every AST interface drives downstream pattern-matching.
§2.1. AST node interfaces
export interface Location {
startLine: number; // 1-indexed (Chevrotain convention)
startColumn: number; // 1-indexed
endLine: number; // 1-indexed, inclusive of last char
endColumn: number; // 1-indexed, inclusive of last char
}
export interface RuleNode {
type: 'RuleNode';
location: Location;
name: string; // identifier following the `rule` keyword
guards: GuardClause[]; // body of `guards { ... }`
effects: EffectCall[]; // body of `effects { ... }`
}
export interface GuardClause {
type: 'GuardClause';
location: Location;
condition: Expression | null; // null iff the source uses `else`
action: 'admit' | 'reject';
reason: string | null; // populated only when action === 'reject'
}
export interface EffectCall {
type: 'EffectCall';
location: Location;
function: string; // identifier
args: Expression[];
}
export interface BinaryOp {
type: 'BinaryOp';
location: Location;
op: '+' | '-' | '*' | '/' | '%' | '==' | '!=' | '<' | '>' | '<=' | '>=';
left: Expression;
right: Expression;
}
export interface UnaryOp {
type: 'UnaryOp';
location: Location;
op: '-'; // numeric negation only
operand: Expression;
}
export interface LogicalOp {
type: 'LogicalOp';
location: Location;
op: 'and' | 'or' | 'not';
operands: Expression[]; // length 2 for and/or, 1 for not
}
export interface IntLiteral {
type: 'IntLiteral';
location: Location;
value: bigint; // bigint to match P1.1.1 / P1.1.3 invariants
}
export interface BoolLiteral {
type: 'BoolLiteral';
location: Location;
value: boolean;
}
export interface StringLiteral {
type: 'StringLiteral';
location: Location;
value: string; // decoded; escapes resolved
}
export interface VarRef {
type: 'VarRef';
location: Location;
path: string[]; // e.g. ['actor', 'reputation']
}
export interface FuncCall {
type: 'FuncCall';
location: Location;
name: string;
args: Expression[];
}
export type Expression =
| BinaryOp | UnaryOp | LogicalOp
| IntLiteral | BoolLiteral | StringLiteral
| VarRef | FuncCall;
export type AnyNode = RuleNode | GuardClause | EffectCall | Expression;
§2.2. Parse-result types
export interface ParseError {
kind: 'lex' | 'parse' | 'ast-cap';
message: string;
location: Location | null; // null for non-positioned errors only
}
export interface ParseResult {
ast: RuleNode[]; // all rules that parsed cleanly
errors: ParseError[]; // truncated to first MAX_PARSE_ERRORS for kind='parse'
}
§2.3. Constants
export const MAX_AST_NODES_PER_RULE = 10000;
export const MAX_PARSE_ERRORS = 5;
§2.4. Entry point
export function parse(input: string): ParseResult;
The parser exports no Chevrotain-specific types — IToken, CstNode, etc. are encapsulated. Callers consume ParseResult directly.
§3. Function semantics — parse
Signature: parse(input: string): ParseResult
Behavior:
- Pass
inputtotokenize()from the lexer module. Collect anyILexingErrorentries and convert each to aParseErrorwithkind: 'lex'. - Construct a Chevrotain
EmbeddedActionsParserinstance (module-level — constructed once and reused;inputis set per call viaparser.input = tokens). - Run the top-level
rulesetrule, which returnsRuleNode[]. Collect anyparserInstance.errorsentries; convert each to aParseErrorwithkind: 'parse'. Truncate to firstMAX_PARSE_ERRORS = 5. - Walk each returned rule with the recursive node-counter; if any rule exceeds
MAX_AST_NODES_PER_RULE = 10000, record aParseErrorwithkind: 'ast-cap'and omit that rule fromast. - Return
{ ast, errors }.
Purity & non-throwing:
- No time reads, no random reads, no DB / network / file I/O.
- No side effects on import; importing the module does not parse anything.
parse(s)called twice with equalsreturns structurally equal results (excluding object identity).- The function never throws. All errors are returned in
errors.
Recovery:
- The Chevrotain parser is constructed with
recoveryEnabled: true. Errors do not abort parsing; the parser advances past the offending token and tries to recover. - An input with no rules (empty, only whitespace, or only comments — no comments are supported, but the lexer skips whitespace) returns
{ ast: [], errors: [] }.
§4. Grammar — stratified productions
The grammar mirrors docs/reference/extractions/kappa-rule-engine-extraction.md §1. Each EBNF production maps to one Chevrotain RULE. Operator precedence is stratified across productions, not collapsed into a precedence table.
ruleset = { rule } ;
rule = "rule" IDENTIFIER "{" guardBlock effectBlock "}" ;
guardBlock = "guards" "{" { guardClause } "}" ; (* 1+ in practice; 0+ allowed by grammar; validator (P1.2.3) flags empty guard blocks *)
guardClause = ( expression | "else" ) "->" action ;
action = "admit"
| "reject" STRING ;
effectBlock = "effects" "{" { effectCall } "}" ;
effectCall = IDENTIFIER "(" [ argList ] ")" ;
argList = arg { "," arg } ;
arg = expression
| STRING ; (* string-only args are valid for effect calls *)
expression = orExpr ;
orExpr = andExpr { "or" andExpr } ;
andExpr = notExpr { "and" notExpr } ;
notExpr = [ "not" ] comparison ;
comparison = additive [ compOp additive ] ;
compOp = "==" | "!=" | "<" | ">" | "<=" | ">=" ;
additive = multiplicative { ("+" | "-") multiplicative } ;
multiplicative = unary { ("*" | "/" | "%") unary } ;
unary = [ "-" ] primary ;
primary = INTEGER
| "true"
| "false"
| variable
| funcCall
| "(" expression ")" ;
variable = VARIABLE ; (* whole `$dot.path` is a single token from the lexer *)
funcCall = IDENTIFIER "(" [ argList ] ")" ;
§4.1. Disambiguation: funcCall vs Identifier in primary
primary does not enumerate IDENTIFIER directly; the only place a bare identifier appears is as the head of funcCall. The grammar requires IDENTIFIER to be followed by ( to be a primary. An unsupported $variable-less identifier in expression position (e.g. actor.reputation without $) does not match any production and surfaces as a parse error. This matches the EBNF — variables are always $-prefixed.
§4.2. Disambiguation: funcCall vs effectCall
Both share the surface IDENTIFIER ( args ). The disambiguator is the parent production: effectCall is invoked from inside effectBlock and produces an EffectCall AST node. funcCall inside primary produces a FuncCall AST node. They are kept as two distinct Chevrotain rules to keep the AST shape unambiguous.
Per extraction §1, EffectCall and FuncCall share the same ArgList = Arg { "," Arg } production where Arg = Expression | STRING. Both call shapes therefore accept either expressions or string literals as arguments. The implementation reuses a single effectArg SUBRULE for both EffectCall.args and FuncCall.args to preserve grammar alignment.
§4.3. AST shape rules — operator chains
EBNF chained productions like orExpr = andExpr { "or" andExpr } produce left-associative chains. The AST is built bottom-up:
$a or $b or $c
=>
LogicalOp{ op: 'or', operands: [
LogicalOp{ op: 'or', operands: [$a, $b] },
$c
]}
(Chained two-operand or/and are nested left, not flattened. P1.5.4 canonical serialization may flatten; the parser does not.)
§4.4. Comparison
comparison = additive [ compOp additive ] — at most ONE comparison operator per chain. a < b < c is a parse error (chained comparisons are not supported in κ DSL — the lexer + parser conform to extraction §1 grammar which uses [ compOp additive ], an optional, not iterative).
§5. Invariants
| ID | Invariant | Verified by |
|---|---|---|
| I1 | parse returns a ParseResult and never throws |
parser.test.ts — every malformed input case |
| I2 | Empty input → { ast: [], errors: [] } |
test |
| I3 | Whitespace-only input → { ast: [], errors: [] } |
test |
| I4 | Every AST node has type discriminant matching one of 11 defined values |
test (full AcceptCommitment AST walked) |
| I5 | Every AST node has a location with valid 1-indexed positions |
test |
| I6 | Operator precedence: *///% > +/- > comparison > not > and > or |
test (precedence fixture F2) |
| I7 | not is LogicalOp with operands.length === 1; binary and/or have operands.length === 2 |
test |
| I8 | Unary - is UnaryOp with op === '-' and one operand |
test |
| I9 | IntLiteral.value is bigint, not number |
test (typeof === ‘bigint’) |
| I10 | VarRef.path is the dot-split of the lexer’s Variable image (without the leading $) |
test |
| I11 | else guard clause produces condition: null |
test |
| I12 | reject "reason" action sets reason to the decoded string; admit sets reason: null |
test |
| I13 | parse is referentially transparent: parse(s) === parse(s) structurally for equal s |
test |
| I14 | A rule with > MAX_AST_NODES_PER_RULE nodes is omitted from ast and recorded in errors with kind: 'ast-cap' |
test (F4) |
| I15 | Up to MAX_PARSE_ERRORS = 5 parse errors are reported; further parse errors are silently dropped (the count remains at 5) |
test |
| I16 | Lexer errors are passed through with kind: 'lex' |
test (3.14 input) |
| I17 | Round-trip-stable: parse(s).ast is structurally equal to parse(parse(s).ast→serialized).ast (deferred to P1.5.4 — fixture F5 uses parse(s).ast === parse(s).ast as the locally-testable proxy) |
test (F5, with TODO) |
| I18 | Identifier-prefixed-by-keyword inputs (e.g. admissionRule as a rule name) tokenize as Identifier and parse as the rule name |
test |
| I19 | Empty guards { } and empty effects { } blocks parse cleanly (validation is P1.2.3’s job) |
test |
§6. Error model
ParseError.kind is one of three:
| Kind | Source | Examples |
|---|---|---|
'lex' |
tokenize() errors (float literal, underscore int, unknown char) |
3.14, 1_000, @ |
'parse' |
Chevrotain parse errors (unexpected token, missing token, etc.) | rule X { guards { -> admit } } (missing expression) |
'ast-cap' |
Post-parse cap walker | A rule whose AST exceeds 10,000 nodes |
ParseError.location:
- For
'lex': derived fromILexingError.{line, column, length}. - For
'parse': derived from the Chevrotain error’stoken.{startLine, startColumn, endLine, endColumn}—nullonly if Chevrotain reports an error with no token (rare; happens at EOF). - For
'ast-cap': the offending rule’sRuleNode.location.
ParseError.message:
- For
'lex': passed through verbatim. - For
'parse': Chevrotain’s default error message; no rephrasing. - For
'ast-cap':"Rule '<name>' exceeds maximum AST node count (<count> > <MAX_AST_NODES_PER_RULE>)".
§7. Dependency rules
In: chevrotain (peer of lexer.ts), ./lexer.js (token-type bundles + tokenize).
Out: AST consumed by P1.2.3 validator, P1.2.4 registry, P1.3.1 evaluator, P1.5.4 canonical serializer.
Explicitly forbidden imports (mirror lexer):
- No
src/db/*— parser is pure. - No
src/middleware/*— parser is outside the MCP pipeline. - No
src/domains/{tasks,skills,trail,proof,router,integrations}/*— κ is a peer axis. - No Node built-ins (
fs,path,crypto,os,child_process,http,net, …).
§8. Performance envelope (informational, not gated)
- Short rule (~50 tokens) parses in
< 5 mson a modern laptop. - Memory: AST size proportional to source length; no caching across calls.
- AST-cap walker is
O(N)over total node count. - No memoization — callers may cache
ParseResultif desired.
§9. Non-goals (re-stated from audit §10)
- AST validator (P1.2.3).
- Rule registry / loader (P1.2.4).
- Evaluator / interpreter (P1.3.1).
- Canonical serialization (P1.5.4).
- Rule classification by kind (Admission / StateTransition / Consequence / Promotion).
- A new ADR for DSL grammar.
- Mutating any file outside
src/domains/rules/parser.tsandsrc/__tests__/domains/rules/parser.test.ts(plus the four chain docs). - Performance SLOs.
§10. Change log
- v1 (this commit) — initial contract.
Any subsequent change to the public surface of parser.ts MUST land a contract revision in the same PR. Backward-incompatible changes MUST advance a minor version note here.
§11. Traceability
| Requirement | Where defined | Where tested |
|---|---|---|
11 AST node types with type discriminant |
extraction §2 + contract §2.1 | parser.test.ts AST shape matrix |
| Operator precedence stratified | extraction §1 + contract §4 | parser.test.ts F2 precedence |
recoveryEnabled: true |
task spec + contract §3 | parser.test.ts F3 malformed input |
| AST cap at 10000 nodes | task spec + contract §2.3 / §3 step 4 | parser.test.ts F4 |
| Round-trip stability | task spec + contract §I17 | parser.test.ts F5 |
| AcceptCommitment fixture | task spec + concept doc | parser.test.ts F1 |
| Lexer errors flow through | contract §6 | parser.test.ts F6 |
| First 5 parse errors | task spec + contract §2.3 / §3 step 3 | parser.test.ts F7 (synthesized many-error input) |
| Identifier collisions with reserved keywords | lexer longer_alt + contract §4.1 / §I18 |
parser.test.ts |
§12. Summary
src/domains/rules/parser.ts exports a single function parse(input: string): ParseResult plus 11 AST node interfaces and supporting types. It uses Chevrotain 11.0.3’s EmbeddedActionsParser with recoveryEnabled: true, mirrors the EBNF in kappa-rule-engine-extraction.md §1 with stratified operator precedence, walks the post-parse AST to enforce a 10,000-node cap per rule, and reports up to 5 parse errors plus all lexer errors plus all AST-cap errors. The function never throws; all errors flow through ParseResult.errors. AST nodes are pure data — no methods, no classes — to keep the surface trivially serialisable for canonical hashing in P1.5.4.
Next step: packet (Step 3 of 5).