P1.3.4 — κ Policy Gating / Pre-guards — Execution Packet

Step 3 of the 5-step executor chain. Builds on docs/audits/p1-3-4-policy-gate-audit.md and docs/contracts/p1-3-4-policy-gate-contract.md. Sequences the implementation in Step 4. The packet gates implementation — Step 4 must not begin until Step 3 lands per CLAUDE.md §6.

§P0. Plan summary

Two new files, ~250 LOC + ~250 LOC of tests. Strict-additive: no existing source touched. Five-pass implementation:

  1. P1: Module skeleton — types, enum, helpers.
  2. P2: POLICIES table + module-init parser pass.
  3. P3: check_policy core.
  4. P4: check_all_policies short-circuit loop.
  5. P5: Test suite (10 fixtures, ~50 cases).

§P1. Module skeleton

src/domains/rules/policy-gate.ts:

/**
 * Colibri — Phase 1 κ Rule Engine — Policy Gating / Pre-guards (P1.3.4).
 *
 * P1–P13 constitutional pre-guards that run BEFORE named rule evaluation in
 * the admission flow (P1.4.1 next wave wires this in). A policy-gate denial
 * means the action never enters rule evaluation, never produces mutations,
 * and never lands in the audit trail as "considered". Policies are the
 * cheapest filter (early exit) AND legitimacy-bearing (constitutional, not
 * contingent).
 *
 * Pure module — no I/O, no DB access, no network, no env reads, no console
 * output, no clock, no RNG, no async. Reuses the P1.2.2 parser + P1.3.1
 * evaluator end-to-end (no separate evaluator).
 *
 * Public surface (per docs/contracts/p1-3-4-policy-gate-contract.md §2):
 *   - PolicyId enum (P1..P13, string-valued)
 *   - Policy interface
 *   - PolicyResult discriminated union
 *   - POLICIES table (frozen Record<PolicyId, Policy>)
 *   - check_policy(id, actor, context): PolicyResult
 *   - check_all_policies(action_name, actor, context): PolicyResult
 *
 * Canonical references:
 *   - docs/reference/extractions/kappa-rule-engine-extraction.md §9
 *   - docs/audits/p1-3-4-policy-gate-audit.md
 *   - docs/contracts/p1-3-4-policy-gate-contract.md
 *   - docs/packets/p1-3-4-policy-gate-packet.md (this)
 */

import { parse } from './parser.js';
import type { Expression } from './parser.js';
import { evaluateExpr } from './engine.js';
import type { Context } from './engine.js';

// §1. PolicyId enum
export enum PolicyId { P1 = 'P1', /* ... */ P13 = 'P13' }

// §2. Policy interface
export interface Policy { /* per contract §2.2 */ }

// §3. PolicyResult union
export type PolicyResult = { /* per contract §2.3 */ };

// §4. internal helpers (parsePredicate, makeContext, …)

// §5. POLICIES — module-init seed
export const POLICIES: Readonly<Record<PolicyId, Policy>> = /* ... */;

// §6. POLICY_ORDER — declaration-order array for deterministic iteration
const POLICY_ORDER: readonly PolicyId[] = [PolicyId.P1, ..., PolicyId.P13] as const;

// §7. check_policy
export function check_policy(...): PolicyResult { /* ... */ }

// §8. check_all_policies
export function check_all_policies(...): PolicyResult { /* ... */ }

Acceptance: file compiles standalone (no test references); all types exported.

§P2. POLICIES seed implementation

The 13 stubs use "true" predicate source. Module-init parses each via parse(...) and extracts the first rule’s first guard’s condition — but parse produces RuleNode[], and a bare "true" is not a valid rule (it expects rule NAME { guards { … } effects { … } }).

Resolution: wrap predicate sources in a synthetic rule template at parse time:

function parsePredicate(predicate_source: string): Expression {
  // The κ parser entry point parses full rules. To extract a bare expression
  // we wrap the source in a one-rule, one-guard, no-effect template, then
  // pluck the guard's condition. This synthesizes a valid RuleNode every time
  // the predicate text is itself a valid expression.
  const wrapped = `rule __policy_synthetic { guards { ${predicate_source} -> admit } effects {} }`;
  const parsed = parse(wrapped);
  if (parsed.errors.length > 0) {
    const messages = parsed.errors.map(e => `${e.kind}:${e.message}`).join('; ');
    throw new Error(
      `policy-gate module-init: predicate ${JSON.stringify(predicate_source)} failed to parse: ${messages}`,
    );
  }
  if (parsed.ast.length !== 1) {
    throw new Error(`policy-gate module-init: expected 1 rule, got ${parsed.ast.length}`);
  }
  const rule = parsed.ast[0]!;
  if (rule.guards.length !== 1) {
    throw new Error(`policy-gate module-init: expected 1 guard, got ${rule.guards.length}`);
  }
  const guard = rule.guards[0]!;
  if (guard.condition === null) {
    throw new Error(`policy-gate module-init: predicate produced an else-clause; expected condition`);
  }
  return guard.condition;
}

This synthesizes a valid κ DSL ruleset around the bare predicate, then re-extracts. It costs one parse per policy at module load — 13 parses total, all ≤ 100 chars each, well below MAX_AST_NODES_PER_RULE. The cost is paid once.

Acceptance: all 13 entries in POLICIES have valid predicate_ast; module imports without error.

§P3. check_policy core

const EMPTY_BINDINGS: ReadonlyMap<string, never> = new Map();

function makeContext(
  actor: Readonly<Record<string, unknown>>,
  context: Readonly<Record<string, unknown>>,
): Context {
  return {
    event: { actor },
    state: context,
    rule_version: 'policy-gate',
    epoch: 0n,
    bindings: EMPTY_BINDINGS,
    budget: { integer_ops: 0, call_depth: 0, current_arg_count: 0 },
  };
}

export function check_policy(
  id: PolicyId,
  actor: Readonly<Record<string, unknown>>,
  context: Readonly<Record<string, unknown>>,
): PolicyResult {
  const policy = POLICIES[id];
  const ctx = makeContext(actor, context);
  let value: unknown;
  try {
    value = evaluateExpr(policy.predicate_ast, ctx);
  } catch {
    return { admitted: false, reason: 'POLICY_EVAL_ERROR' };
  }
  if (value === true) {
    return { admitted: true };
  }
  if (value === false) {
    return { admitted: false, reason: policy.rejection_reason };
  }
  return { admitted: false, reason: 'POLICY_TYPE_MISMATCH' };
}

Acceptance:

  • check_policy(PolicyId.P1, {}, {}){admitted: true} (predicate is "true").
  • A test policy with predicate "false"{admitted: false, reason: <its reason>}.
  • A test policy with predicate that throws (e.g. "$undefined.x > 0") → {admitted: false, reason: 'POLICY_EVAL_ERROR'}.
  • A test policy with predicate that evaluates to a bigint (e.g. "42") → {admitted: false, reason: 'POLICY_TYPE_MISMATCH'}.

§P4. check_all_policies short-circuit loop

const POLICY_ORDER: readonly PolicyId[] = [
  PolicyId.P1, PolicyId.P2, PolicyId.P3, PolicyId.P4,
  PolicyId.P5, PolicyId.P6, PolicyId.P7, PolicyId.P8,
  PolicyId.P9, PolicyId.P10, PolicyId.P11, PolicyId.P12,
  PolicyId.P13,
] as const;

export function check_all_policies(
  action_name: string,
  actor: Readonly<Record<string, unknown>>,
  context: Readonly<Record<string, unknown>>,
): PolicyResult {
  for (const id of POLICY_ORDER) {
    const policy = POLICIES[id];
    const applies =
      policy.applicable_actions.indexOf('*') !== -1 ||
      policy.applicable_actions.indexOf(action_name) !== -1;
    if (!applies) {
      continue;
    }
    const result = check_policy(id, actor, context);
    if (!result.admitted) {
      return result;
    }
  }
  return { admitted: true };
}

Acceptance:

  • Iteration is in P1..P13 order (verified via spy test).
  • First failure short-circuits — fixture chains a failing P3 with admitting P4..P13; only the first 3 are evaluated.
  • Empty applicable set returns {admitted: true} (action_name not in any policy’s list AND no '*').

§P5. Test suite

src/__tests__/domains/rules/policy-gate.test.ts. ~50 cases organized as 10 fixtures (matrix from contract §3):

F1 — Module-init invariants (5 cases)

  • F1.1: All 13 PolicyIds present in POLICIES.
  • F1.2: Every Policy.predicate_ast is non-null and is an Expression node (has .type field matching one of the 8 Expression node types).
  • F1.3: Every Policy.rejection_reason is non-empty string.
  • F1.4: All 13 rejection_reason strings are distinct.
  • F1.5: POLICIES is frozen (Object.isFrozen returns true).

F2 — check_policy totality (4 cases)

  • F2.1: Each of the 13 stubs returns {admitted: true} for empty actor + context.
  • F2.2: check_policy never throws on any of {empty, populated, missing-key, deeply-nested} actors.
  • F2.3: check_policy with each PolicyId returns the exact same shape (admitted: true).
  • F2.4: check_policy is deterministic — 100 calls with same inputs return same result (sample test).

F3 — Short-circuit semantics (5 cases)

  • F3.1: Build a synthetic POLICIES override where P3 fails, P4..P13 succeed. Use a spy on evaluateExpr (via fixture) — assert P4..P13’s predicate ASTs are NOT visited.
    • Spy mechanism: since check_policy is a closed-over function reference, we test short-circuit via a wrapper that builds a synthetic policy table replacement at the test level: check_all_policies against an ad-hoc registry. But the contract uses module-level POLICIES — we cannot swap it.
    • Refined approach: add a test-only injection point — NO. The contract forbids dynamic policy tables.
    • Final approach: test short-circuit indirectly. Use fixtures with predicate sources that count their evaluations via context bindings.
      • Predicate text: "$counter.value > 0" would force engine evaluation of VarRef.
      • But context is read-only. We can’t increment a counter from inside a pure evaluator.
    • Actual approach: test short-circuit via the ordering of rejection reasons. Build a second seed table inside the test by importing a test-only export: parsePredicate (helper exported under a __test__ namespace? No — that violates the contract’s clean public surface).
    • Real solution: the test fixture sets up a separate test module that builds its own table using the same primitives (parse, evaluateExpr). It then writes an analogue of check_all_policies and runs it. This proves the algorithm short-circuits, not the module’s state. We supplement this with a test against the real module showing that two predicates that would evaluate non-trivially to false (P_failer with predicate "false", P_admitter with predicate "$count.x") — but stub policies all have "true". So we cannot test the real POLICIES short-circuit directly today (every stub admits).
    • Final final approach: export a test helper __internal_check_all_policies_with_table(table, action, actor, context) that takes a table parameter. The public check_all_policies calls it with POLICIES as the table. This makes short-circuit testable against ad-hoc tables. The naming __internal_ signals private; downstream callers should not use it.

    This is the canonical approach used by other κ slices (e.g. engine.ts has evaluate and executeRuleset that take all dependencies as parameters; evaluateExpr is similarly composable). Adopt it.

    Updated public surface:

    // Public — uses the module's POLICIES table.
    export function check_all_policies(action_name, actor, context): PolicyResult;
    
    // Public — table-parameterized variant for testing AND for future runtime
    // table-replacement (admin tooling, governance change). Signature:
    export function check_all_policies_with_table(
      table: Readonly<Record<PolicyId, Policy>>,
      order: readonly PolicyId[],
      action_name: string,
      actor: Readonly<Record<string, unknown>>,
      context: Readonly<Record<string, unknown>>,
    ): PolicyResult;
    

    The original check_all_policies(...) is then a one-line delegate: return check_all_policies_with_table(POLICIES, POLICY_ORDER, action_name, actor, context);.

    Same pattern for check_policy:

    export function check_policy_with_table(
      table: Readonly<Record<PolicyId, Policy>>,
      id: PolicyId,
      actor: Readonly<Record<string, unknown>>,
      context: Readonly<Record<string, unknown>>,
    ): PolicyResult;
    

    with check_policy(id, actor, context) = check_policy_with_table(POLICIES, id, actor, context).

    This doubles the public surface but cleanly enables short-circuit testing without exposing a private __internal_ marker. The _with_table variant is also forward-useful for governance-driven policy replacement (deferred).

  • F3.1 (revised): Build a synthetic table where P_a admits, P_b rejects, P_c admits. Call check_all_policies_with_table with order [a, b, c]. Assert result is P_b’s rejection. Then construct a separate spying mechanism: replace P_c.predicate_ast with a node that throws on construction-then-evaluation — wait, we can’t. Instead: replace P_c’s predicate_ast with one that, if evaluated, would produce POLICY_TYPE_MISMATCH (e.g. an IntLiteral whose value is 42 — evaluates to bigint, not boolean, → reason 'POLICY_TYPE_MISMATCH'). If the test result is P_b’s reason (not 'POLICY_TYPE_MISMATCH'), we have proven P_c was not visited.
  • F3.2: Three-policy table where all admit → final result is {admitted: true}.
  • F3.3: Three-policy table where the first policy rejects → result is the first’s reason; never enters the loop’s body for indices 1, 2.
  • F3.4: Three-policy table where only the last rejects → result is the last’s reason; loop runs to completion.
  • F3.5: Empty table (zero entries, empty order array) → {admitted: true}.

F4 — applicable_actions filtering (4 cases)

  • F4.1: Synthetic policy with applicable_actions: ['Yield'] rejects when called with action_name 'Yield'; admits when called with 'AcceptCommitment'.
  • F4.2: Policy with applicable_actions: ['*'] rejects for all actions (the standard stub case shape).
  • F4.3: Policy with applicable_actions: [] is never invoked — its predicate’s failure mode would not affect the result.
  • F4.4: Mixed table with one ['*'] policy and one ['Yield'] policy — invoked count for 'AcceptCommitment' is 1, for 'Yield' is 2.

F5 — Iteration order (3 cases)

  • F5.1: POLICY_ORDER literal equals ['P1', 'P2', ..., 'P13'] (string-comparison).
  • F5.2: For a synthetic table whose every policy rejects with a unique numeric reason, calling check_all_policies_with_table([P10, P2, P1], ...) returns P10’s reason (the first-listed-and-applicable).
  • F5.3: The default check_all_policies produces 'P1_NOT_AUTHORIZED'-equivalent — wait, all stubs admit. Different test: assert that POLICY_ORDER is exported (visibility check) and equals [P1..P13].

F6 — Stub permissive baseline (2 cases)

  • F6.1: check_all_policies('Yield', {}, {}){admitted: true} — confirms admission flow tests upstream don’t break at boot.
  • F6.2: check_all_policies('AcceptCommitment', { reputation: { execution: 0n } }, { epoch: 100n }){admitted: true}.

F7 — Rejection-reason fidelity (4 cases)

  • F7.1: Synthetic policy with predicate_source: "false", rejection_reason: "TEST_REASON_X"check_policy_with_table(...) returns {admitted: false, reason: 'TEST_REASON_X'}.
  • F7.2: Same with rejection_reason: "" (empty) — implementation note: contract I3 forbids empty reasons in module init for the real POLICIES, but synthetic test tables can have any string. Empty is preserved verbatim in the result.
  • F7.3: Synthetic table with two failing policies — only the first’s reason surfaces (short-circuit confirmation).
  • F7.4: rejection_reason contains special chars ("REASON: with quotes 'and' newlines\n") — preserved verbatim.

F8 — Type-mismatch translation (3 cases)

  • F8.1: predicate "42" (parses as IntLiteral, evaluates to 42n) → {admitted: false, reason: 'POLICY_TYPE_MISMATCH'}.
  • F8.2: predicate "\"hello\"" (StringLiteral, evaluates to "hello") — wait, top-level expression cannot be a bare string in our DSL guard syntax. Verify via parsing test: parsePredicate('"hello"') succeeds because the parser’s effectArg rule allows StringLiteral, but does the expression rule? Looking at parser.ts: expression → orExpr → andExpr → … → primary, and primary doesn’t have a StringLiteral alternative. So parsePredicate('"hello"') would fail at parse time. Skip this case — the test for non-bool non-bigint values is unreachable via the DSL surface. Update F8 to be 1 case (just the IntLiteral test).
  • F8.2 (replacement): a more complex arithmetic predicate "1 + 2" → bigint result → POLICY_TYPE_MISMATCH.

F9 — Eval-error translation (5 cases)

  • F9.1: predicate "$undefined.var > 0" — undefined VarRef → engine throws Error('undefined_variable:undefined.var'){admitted: false, reason: 'POLICY_EVAL_ERROR'}.
  • F9.2: predicate "unknown_func(1)" — undefined FuncCall → engine throws → 'POLICY_EVAL_ERROR'.
  • F9.3: predicate "1 / 0" — DivisionByZeroError → 'POLICY_EVAL_ERROR'.
  • F9.4: predicate that overflows multiplication, e.g. "9223372036854775807 * 2" (max int64 * 2) — OverflowError → 'POLICY_EVAL_ERROR'.
  • F9.5: predicate that exhausts integer_ops budget — synthesize a deeply nested "$x and $x and $x and …" chain with > 10000 nodes via test-time policy_ast construction (without going through parser, since parser caps at 10000). Use a hand-built AST.
    • Easier alternative: craft a synthetic Policy whose predicate_ast is a hand-built deeply-nested LogicalOp tree with > 10000 AST nodes; check_policy_with_table catches the RuleBudgetExceeded → 'POLICY_EVAL_ERROR'.

F10 — Determinism / no-leak self-scan (3 cases)

  • F10.1: inspectFunctionForbidden(check_policy) returns [].
  • F10.2: inspectFunctionForbidden(check_all_policies) returns [].
  • F10.3: Read policy-gate.ts source via fs.readFileSync in the test (engine test files do this) and assert it imports evaluateExpr from ./engine.js. This satisfies I15 (reuse, not re-implement).
    • Pattern check: verify with regex /from\s+'\.\/engine\.js'/ — must match.
    • Negative check: assert no local re-implementation by scanning for substrings characteristic of an evaluator (switch (expr.type), case 'BinaryOp', etc.).

Optional: F11 — Module-init failure mode (1 case)

  • F11.1: parsePredicate("rule X") (intentionally invalid DSL) throws Error with descriptive message. Confirms the throw-at-init contract behavior. Caveat: the function is not exported (per contract §6); accessing it from tests requires either exporting it under __test_only__ or restructuring. Decision: export parsePredicate for testability. Adds it to public surface in §2 of the contract — minor amendment to the contract, documented inline in policy-gate.ts.

§P6. Implementation order

  1. Write skeleton with PolicyId enum + Policy interface + PolicyResult union.
  2. Write parsePredicate helper (export it for testing per §P5 F11).
  3. Build POLICIES seed using parsePredicate.
  4. Write check_policy_with_table + check_policy.
  5. Write check_all_policies_with_table + check_all_policies.
  6. Run npm run build — should compile clean.
  7. Run npm run lint — fix any TypeScript ESLint complaints.
  8. Write the 10 test fixtures, run incrementally.
  9. Run npm test — all green.
  10. git commit per Step 4 (single commit with both source + test).

§P7. Test count budget

10 fixtures × ~5 cases avg = ~50 cases. Add ~10 cases for edge-case coverage (overflow, division-by-zero literal, frozen-table assertions). Final budget: ~60 test cases. Test count delta from 1658 → ~1718. Memory note pattern: “+~60 tests” in writeback.

§P8. Verification gate (Step 4 → Step 5 transition)

Step 4 is complete when ALL of:

  • npm run build exits 0 with no TypeScript errors.
  • npm run lint exits 0 with no ESLint errors.
  • npm test exits 0 with all 60+ new tests passing AND zero regressions in pre-existing 1658.
  • inspectFunctionForbidden(check_policy) returns [] (asserted in test).
  • inspectFunctionForbidden(check_all_policies) returns [].

Step 5 records test evidence in the verification doc. Writeback follows.

§P9. Risk register

Risk Likelihood Mitigation
Parser surprise: parsePredicate('"true"') fails because expressions can’t be bare strings. Confirmed — bare strings ARE NOT primary AST entries (per parser.ts §5 — Literals.StringLiteral is an alt only inside effectArg). The synthetic-rule wrapper sidesteps this — "true" is parsed as a guard condition (where it’s a BoolLiteral keyword via Keywords.True). Already addressed in §P2 design.
Engine error sub-typing leaks into POLICY_EVAL_ERROR reason. Low — the catch is a bare catch {}, so no error data leaks. Verified in §P3 implementation block.
Iteration order via Object.values(POLICIES) returns insertion order in modern V8 — but enum-string keys may be reordered if the runtime treats them as integer-like ("P1" is non-numeric so safe; "P10" is also non-numeric). Low — strict use of explicit POLICY_ORDER array sidesteps this entirely. Already addressed: never iterate Object.values(POLICIES); always iterate POLICY_ORDER.
evaluateExpr requires non-empty bindings Map for some paths. None — bindings are optional and resolveVarRef checks bindings.has(head) only when path.length === 1; multi-segment paths go through event/state. Verified by reading engine.ts §2 resolveVarRef.
Rebase pain when state-access.ts merges first. None — disjoint files; no symbol overlap. Confirmed.
Test hand-built AST for budget exhaustion — evaluateExpr cap is 10_000. Building a 10_001-node nested LogicalOp at test time may be slow. Low — building once, evaluating once. ~50 ms expected. Use a loop to build nested LogicalOp{op: 'and', operands: [accumulator, BoolLit_true]}.

Step 3 of 5 complete. Step 4 (implement) begins now.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.