P1.3.1 — κ Core Evaluation Loop — Behavioral Contract

Step 2 of the 5-step executor chain. Builds on docs/audits/p1-3-1-engine-audit.md. Defines the public surface, semantics, and invariants for src/domains/rules/engine.ts.

§1. Module identity

Path: src/domains/rules/engine.ts
Axis: κ — Rule Engine (Phase 1 Wave 4)
Kind: pure synchronous module; no I/O, no DB access, no network, no env reads, no console output.
Internal dependencies:
- ./parser.js — type-only re-imports of AST node interfaces (RuleNode, Expression, etc.).
- ./integer-math.js — safe_mul, safe_div, OverflowError, DivisionByZeroError.
No imports from src/db/*, src/middleware/*, src/server.ts, or other domain folders. No Node built-ins.

§2. Public API

The module exports the following named entities:

§2.1. Constants

export const MAX_INTEGER_OPS = 10_000;  // total ops per rule (guard + effects)
export const MAX_CALL_DEPTH = 16;       // nested FuncCall frames
export const MAX_ARG_COUNT = 8;         // arity of any single FuncCall

These match the concept doc docs/3-world/physics/laws/rule-engine.md §Default budget constants.

§2.2. Category enum

export type Category =
  | 'Admission'
  | 'StateTransition'
  | 'Consequence'
  | 'Promotion';

export const CATEGORY_ORDER: readonly Category[] = [
  'Admission',
  'StateTransition',
  'Consequence',
  'Promotion',
] as const;

CATEGORY_ORDER is the deterministic execution order — load-bearing for θ consensus.

§2.3. Mutation

export interface Mutation {
  kind: 'set' | 'emit' | 'apply';
  target: string;          // dotted path or external sink name
  field: string;           // field name on the target
  old_value?: unknown;     // prior value if known (P1.4.1 fills this in)
  new_value: unknown;      // value after the mutation lands
}

A Mutation is a description, not an action. P1.4.1 (state-application layer) consumes the array and writes through to ζ / β state.

§2.4. Context + BudgetTracker

export interface BudgetTracker {
  integer_ops: number;
  call_depth: number;
  current_arg_count: number;
}

export interface Context {
  readonly event: Readonly<Record<string, unknown>>;
  readonly state: Readonly<Record<string, unknown>>;
  readonly rule_version: string;
  readonly epoch: bigint;
  readonly bindings: ReadonlyMap<string, bigint | string | boolean>;
  readonly budget: BudgetTracker;  // mutable counter; everything else readonly
}

The Context interface is read-only at the type level for event, state, rule_version, epoch, and bindings — only budget is mutated during walking (counter increments). The contract requires that the evaluator never writes to event, state, or bindings; binding extensions (e.g. let semantics in future phases) would return a new Context with a new bindings map.

§2.5. Result types

export type RuleResult =
  | { status: 'admitted'; mutations: Mutation[] }
  | { status: 'rejected'; reason: string };

export interface TransitionResult {
  all_mutations: Mutation[];
  per_category_results: Map<Category, RuleResult[]>;
}

reason strings used by the engine are:

'NO_MATCH' — no guard clause matched (and no else clause).
'budget:integer_ops' / 'budget:call_depth' / 'budget:arg_count' — budget exceeded.
'overflow:<details>' — integer-math overflow during arithmetic.
'div_by_zero:<details>' — explicit divide-by-zero.
'undefined_function:<name>' — FuncCall with no registered builtin (P1.3.2 will register; pre-P1.3.2, every FuncCall rejects).
'undefined_variable:<path>' — VarRef resolves to nothing.
'type_mismatch:<details>' — operator + operand type mismatch (e.g. + on a string).
Any reason set by an explicit reject "..." guard clause is passed through verbatim from GuardClause.reason.

Caller code that depends on reason strings should treat the format as stable for the prefix before the colon; the post-colon detail is informational.

§2.6. Typed errors

export class RuleBudgetExceeded extends Error {
  override readonly name = 'RuleBudgetExceeded';
  readonly which: 'integer_ops' | 'call_depth' | 'arg_count';
  readonly limit: number;
  readonly observed: number;
  constructor(
    which: 'integer_ops' | 'call_depth' | 'arg_count',
    limit: number,
    observed: number,
  );
}

The engine throws RuleBudgetExceeded from the deep walker. executeRuleset catches it at the rule boundary and converts it to a 'rejected' result with reason = 'budget:<which>'.

The engine does not export an “EngineError” union; it throws RuleBudgetExceeded for budget overruns and lets OverflowError / DivisionByZeroError propagate from integer-math.js only as far as evaluate(...), which catches them and returns a 'rejected' result. Tests can still assert on the converted reason strings.

§2.7. RuleRegistry interface (consumed; not implemented in this task)

export interface CategorizedRule {
  rule: RuleNode;            // from ./parser.js
  category: Category;
}

export interface RuleRegistry {
  getAll(): readonly CategorizedRule[];
}

P1.2.4 (in-flight as a sibling slice) will implement RuleRegistry. P1.3.1 only consumes the getAll() method; any additional methods (getByName, getByTransitionType) remain optional and outside this contract’s scope.

The interface is declared by the engine, implemented by the registry — same direction as a function signature. The engine has no knowledge of how categories are derived (annotation, naming convention, explicit kind keyword); it only sees the result.

§2.8. Functions

export function evaluate(rule: RuleNode, context: Context): RuleResult;

export function evaluateExpr(
  expr: Expression,
  context: Context,
): bigint | string | boolean;

export function executeRuleset(
  registry: RuleRegistry,
  event: Readonly<Record<string, unknown>>,
  state: Readonly<Record<string, unknown>>,
  rule_version: string,
  epoch: bigint,
): TransitionResult;

Plus internal helpers (not exported): evaluateGuard, collectEffectMutation, resolveVarRef, compareValues, applyBinaryArithmetic, applyBinaryComparison, bumpIntegerOps, bumpCallDepth, freshBudget.

§3. Semantics

§3.1. `evaluate(rule, context)` — per-rule evaluator

Algorithm (matches extraction §5 pseudocode):

1. For each guard in rule.guards (in declaration order):
     a. bumpIntegerOps(context.budget) — counts the guard clause itself.
     b. If guard.condition === null (else clause): match = true.
        Else: match = evaluateExpr(guard.condition, context) (must be boolean).
     c. If match:
          if guard.action === 'reject': return { status: 'rejected', reason: guard.reason ?? '' }
          else (admit): break — proceed to effects.
2. If no guard matched and we did not break out via admit:
     return { status: 'rejected', reason: 'NO_MATCH' }
3. For each effect in rule.effects:
     a. bumpIntegerOps(context.budget) — counts the effect call itself.
     b. mutation = collectEffectMutation(effect, context)
     c. mutations.push(mutation)
4. return { status: 'admitted', mutations }

Errors thrown during steps 1–3:

RuleBudgetExceeded from bumpIntegerOps / bumpCallDepth / arg-count check: evaluate does not catch — propagates up to executeRuleset.
OverflowError / DivisionByZeroError from integer-math.js: evaluate does not catch either — same propagation. (executeRuleset is the boundary that converts these to 'rejected' results so that one rule’s overflow doesn’t blow up the whole ruleset.)
Error for 'undefined_function', 'undefined_variable', 'type_mismatch': thrown by helpers; converted at the same boundary.

evaluate is pure: it does not write to rule, context.event, context.state, context.bindings. It only mutates context.budget (counter increments) — this is the single permitted side-effect, contained in a stack-local object (the caller of executeRuleset constructs a fresh BudgetTracker per call; see §3.4 below).

§3.2. `evaluateExpr(expr, context)` — recursive walker

Returns bigint | string | boolean. The result type is determined by the AST node:

AST node	Return type	Notes
`IntLiteral`	`bigint`	`node.value` directly.
`BoolLiteral`	`boolean`	`node.value` directly.
`StringLiteral`	`string`	`node.value` directly. Strings are leaves only; arithmetic / comparison helpers reject string operands with `type_mismatch`.
`VarRef`	`bigint \\| string \\| boolean`	Resolved against `context.bindings`, then `context.event`, then `context.state` (in that order). Throws if absent.
`UnaryOp` (`-`)	`bigint`	Operand must be `bigint`; throws `type_mismatch` otherwise.
`BinaryOp` (arithmetic `+`/`-`/`*`/`/`/`%`)	`bigint`	Both operands must be `bigint`; uses `safe_mul`/`safe_div`/native `+`/`-` (with overflow check via `safe_mul` for products).
`BinaryOp` (comparison `==`/`!=`)	`boolean`	Operands must be same primitive type; `==` is value equality, never reference.
`BinaryOp` (comparison `<`/`>`/`<=`/`>=`)	`boolean`	Operands must both be `bigint`; throws `type_mismatch` for non-bigint operands. (Strings have no ordering semantics in κ; booleans neither.)
`LogicalOp` (`and`, `or`)	`boolean`	Both operands must be `boolean` after evaluation. Short-circuit evaluation is permitted — if `left` of `and` is `false`, `right` is not evaluated; if `left` of `or` is `true`, same. Determinism is preserved because the budget for skipped subtree is not consumed (matching what real arbiters would do).
`LogicalOp` (`not`)	`boolean`	Single operand; must evaluate to `boolean`.
`FuncCall`	`bigint \\| string \\| boolean`	Throws `'undefined_function:<name>'` in P1.3.1 (no built-ins registered until P1.3.2). Args evaluated left-to-right; `args.length` checked against `MAX_ARG_COUNT` before any arg is evaluated. `bumpCallDepth` increments before recursion into args; decrements after.

Every entry into evaluateExpr increments context.budget.integer_ops (then checks the cap). This is the per-node visit count — independent of arithmetic-vs-other distinction. (The “integer ops” name is a heritage label; in practice it counts AST node visits, which is what the upstream pseudocode node_budget counts.)

§3.3. `executeRuleset(registry, event, state, rule_version, epoch)` — orchestrator

1. budget0 = freshBudget()  — fresh tracker for the entire executeRuleset call.
2. allRules = registry.getAll()
3. groups: Map<Category, RuleNode[]> = group by category
4. For each category in CATEGORY_ORDER (Admission → StateTransition → Consequence → Promotion):
     a. rules = groups.get(category) ?? []
     b. Sort rules by rule.name with locale-independent ASCII compare.
     c. For each rule in sorted rules:
          i.   ctx = Context with fresh BudgetTracker (per-rule reset, matching extraction §5)
          ii.  try { result = evaluate(rule, ctx) }
               catch (RuleBudgetExceeded e) { result = { status: 'rejected', reason: 'budget:' + e.which } }
               catch (OverflowError e) { result = { status: 'rejected', reason: 'overflow:' + e.message } }
               catch (DivisionByZeroError e) { result = { status: 'rejected', reason: 'div_by_zero:' + e.message } }
               catch (Error e) { result = { status: 'rejected', reason: e.message } }
          iii. per_category_results.get(category).push(result)
          iv.  if (result.status === 'admitted') all_mutations.push(...result.mutations)
5. return { all_mutations, per_category_results }

Per-rule budget reset matches extraction §5’s node_budget = 0 line at the start of execute_rule. The contract chooses this over a per-executeRuleset shared budget because (a) extraction §5 says so, (b) it gives each rule a fair shot regardless of how many sibling rules ran before it, and (c) it matches the concept doc’s “per rule” wording in §Default budget constants.

The budget0 from step 1 is intentionally unused — kept as a placeholder anchor for any future ruleset-wide budget that Phase 2+ may add. (Removing it now and re-introducing later would be a contract change.) Update during impl: prefer to not allocate budget0 if it remains unused, to keep the engine surface minimal. The packet will document the final decision.

§3.4. Determinism contract (load-bearing)

Two arbiters with the same (rule_version, registry, event, state, epoch) must produce bit-identical TransitionResult.all_mutations:

CATEGORY_ORDER is a constant.
Within each category, rules are sorted by rule.name with String.prototype.localeCompare(other, 'en', { sensitivity: 'variant' }) ⇒ this is locale-dependent. Use a < b ? -1 : a > b ? 1 : 0 instead — pure ASCII string comparison, guaranteed stable across JS engines.
The evaluate walker visits nodes in the order they appear in the AST (which itself is deterministic from the parser).
Map/Set iteration order is insertion order in modern V8 (ECMA-262 spec); the engine must not rely on Object.keys over a record-style object for ordering.
Short-circuit eval for and/or is deterministic given (a) deterministic operand order, (b) deterministic operand value: both hold by construction.

The inspectFunctionForbidden self-scan from determinism.ts will be applied to evaluate in the test suite — empty hits is required.

§4. Invariants

#	Statement	Enforcement
I1	Pure module — no I/O, no DB, no network, no env reads, no console writes	Code review + determinism harness self-scan in tests.
I2	Pure functions — no writes to `rule`, `context.event`, `context.state`, `context.bindings`	Frozen-input test (F5) — pass `Object.freeze`‘d state and event; if engine writes, JS throws TypeError in strict mode (TS modules are strict-by-default).
I3	Determinism — same inputs ⇒ bit-identical outputs across N runs	Exposed via `assertDeterministic(evaluate, [rule, ctx], { iterations: 10 })` in tests.
I4	All bigint arithmetic uses `integer-math.js` for products; never bare `*` on bigints when overflow is plausible	Code review; tests assert `OverflowError` propagates correctly.
I5	Budget caps fire as `RuleBudgetExceeded` with the right `which`	Per-cap test; `which` value asserted.
I6	First-match-wins guard order	F2 fixture asserts.
I7	Category execution order = `CATEGORY_ORDER`	F3 fixture asserts.
I8	Alpha sort within category is locale-independent	Test with names `b`, `A`, `c` ⇒ ASCII sort = `A` first (codepoint 65 < 98 < 99); locale sort would push `A` after `a`-`b`.
I9	Mutations collected, never applied	F5 fixture; engine test never re-reads state after evaluate.
I10	`evaluate(rule, ctx)` mutates only `ctx.budget`	Direct test: snapshot ctx properties before, deep-equal after.

§5. Forbiddens (axiomatic)

#	Forbidden	Why
F1	`Math.*`	Determinism (`Math.random`); float semantics.
F2	`Date.*`, `new Date()`	Clock reads break consensus.
F3	`setTimeout` / `setInterval` / `setImmediate`	Async timers; non-deterministic ordering.
F4	`await` / `async function`	Engine is sync; async breaks budget tracking and determinism.
F5	`crypto.*`, `process.hrtime`, `process.nextTick`	Same as F2.
F6	Float literals (e.g. `3.14`)	Integer-only. The determinism scanner enforces this.
F7	`JSON.parse` on user input	Engine doesn’t parse text — parser does.
F8	`Object.assign(state, ...)` / `state.foo = bar` etc.	Purity invariant.
F9	Bare `*` on two bigints (where product may exceed int64)	Use `safe_mul`.
F10	Throw plain `Error` for budget overruns	Use `RuleBudgetExceeded`.
F11	Mutate the AST passed in	Tree is shared across many evaluations.
F12	Sort with `localeCompare`	Locale-dependent; use ASCII compare.

§6. Out-of-scope (deferred)

Item	Owner
Built-in function bodies (`min`, `max`, `isqrt`, `bps_mul`, etc.)	P1.3.2
Effect application (writing mutations through to state)	P1.4.1
Conflict detection across mutations	P1.4.1
Merkle proof generation over `(state_root, new_state_root, mutations)`	P1.4.1 / η
Rule classification (deciding which rule is `Admission` vs `Promotion`)	P1.2.4
Specificity-based ordering (extraction §6 says alpha; concept doc says guard-term-count)	Engine uses alpha within category per extraction §6; specificity is registry-level concern (P1.2.4).
Validation that `RuleNode` is well-formed (`StringLiteral` not in arithmetic position, `VarRef` paths exist)	P1.2.3 (validator)
`audit_session_start` / `thought_record` integration	Out of κ scope.

§7. Failure & error model

Source	Behavior
Budget cap exceeded inside `evaluate`	Throws `RuleBudgetExceeded` (typed). `executeRuleset` catches and converts to `{ rejected, reason: 'budget:<which>' }`.
Integer-math overflow during evaluation	Throws `OverflowError` from `safe_mul` / `safe_div`. `executeRuleset` catches and converts to `{ rejected, reason: 'overflow:...' }`.
Divide-by-zero	Throws `DivisionByZeroError`. `executeRuleset` catches → `'div_by_zero:...'`.
`VarRef` resolves to undefined	Throws plain `Error` with message `'undefined_variable:<path>'`. `executeRuleset` catches → reason is the message.
`FuncCall` with unknown name	Throws plain `Error` with message `'undefined_function:<name>'`. Same conversion.
Type mismatch on operator	Throws plain `Error` with message `'type_mismatch:<details>'`. Same conversion.
Explicit `reject "..."` guard	Returns `{ rejected, reason: <verbatim from rule> }` directly from `evaluate`; no exception.

evaluate itself does not catch errors — it lets typed errors propagate; the catch boundary is executeRuleset. This makes evaluate testable in isolation: tests can assert expect(() => evaluate(rule, ctx)).toThrow(RuleBudgetExceeded) for budget cases, and expect(evaluate(...)).toEqual(...) for non-throw cases.

§8. Acceptance criteria → contract clauses

Acceptance criterion (dispatch packet)	Contract clause
Recursive AST walker with immutable context	§2.4 (Context readonly types), §3.2 (walker returns), I2
Execution order: Admission → StateTransition → Consequence → Promotion	§2.2 (`CATEGORY_ORDER`), §3.3 step 4, I7
Within each category: alphabetical by rule name (stable)	§3.3 step 4b, §3.4 point 2, I8
First-match-wins guard evaluation	§3.1 step 1, I6
Mutations collected, not applied during evaluate	§3.1 step 3, I9
`MAX_INTEGER_OPS=10000` → `RuleBudgetExceeded("integer_ops")`	§2.1, §2.6, §3.2 walk-bumps, §3.3 catch boundary
`MAX_CALL_DEPTH=16` → `RuleBudgetExceeded("call_depth")`	Same as above; check fires before recursion into FuncCall
`MAX_ARG_COUNT=8` → `RuleBudgetExceeded("arg_count")`	Same; check fires before evaluating any FuncCall arg
`evaluate` is pure	I2 + frozen-input test fixture F5
`npm run build && npm run lint && npm test` ALL THREE green	Verification doc
No regressions on 1467-test baseline	Verification doc baseline check

Ready for Step 3 (Packet). Signed-off contract for engine.ts. Imports declared, types pinned, semantics specified, error model decided, determinism load-bearing.