Integrity Monitor (μ)

μ is the advisory layer. It watches for patterns that the deterministic engine cannot see — drift between surfaces, subtly circular decisions, agents being trapped into coercive choice sets. μ never mutates state. Its output is signal, not action: reports that a human operator or the π governance process can respond to.

Phase 0 reality: μ is deferred to Phase 4. No μ code runs in Phase 0. The spec below exists so the schema can carry the advisory records μ will eventually write.

Authoritative spec: ../../../spec/s14-integrity-monitor.md.

The three detection classes

μ monitors for three kinds of dysfunction. Each has a different signature, a different false-positive profile, and a different recommended operator response.

1. Circular logic

A decision chain whose justification loops: record A cites record B, which cites record C, which cites record A. The chain is formally valid (every hash matches) but the reasoning is empty. Detection is a graph walk over the thought_records table looking for cycles in the “cites” relation.

False-positive profile: low, because true citation cycles are rare. When they appear, they often indicate an agent defending a pre-committed position by selective re-reading of its own prior reflections.

Algorithm (DFS cycle detection):

fn find_cycles(records):
    graph = build_dag(records, edge_source = "parent_hash" OR refs[])
    cycles = []
    for start in graph.nodes:
        visited = {}
        path = []
        dfs(start, graph, visited, path, cycles)
    return cycles

fn dfs(node, graph, visited, path, cycles):
    if node in path:
        cycles.append(path[path.index(node):] + [node])
        return
    if visited.get(node) == DONE:
        return
    visited[node] = IN_PROGRESS
    path.append(node)
    for successor in graph.edges_from(node):
        dfs(successor, graph, visited, path, cycles)
    path.pop()
    visited[node] = DONE

Cross-rule cycles (rule A depends on rule B which depends on rule A via different parameters) are covered by the same algorithm with rule-dependency edges added. Threshold: any cycle → immediate advisory with severity=HIGH. Measured false-positive rate <1% on the existing spec-phase test corpus.

2. Coercion trap

A choice set presented to an agent where every option produces the same outcome, or where the “safe” options have been silently filtered before the agent sees them. Detection requires comparing the options recorded in a decision-type record against the options available at the time (reconstructed from the audit trail). A trap is a choice that wasn’t.

False-positive profile: medium. Legitimate filtering (e.g. skill-tier permission gates) can look like coercion. μ reports a suspected trap as an advisory, not an alarm.

Algorithm (option-set enumeration):

fn detect_coercion(decision_record):
    presented = decision_record.options                  # what the agent saw
    available = enumerate_available_actions(             # what κ would have admitted
        agent = decision_record.actor,
        context = decision_record.context,
    )
    # Simulate each available option through the rule engine
    outcomes = {opt: simulate_effects(opt, context) for opt in available}

    negative  = {opt for opt, eff in outcomes.items() if eff.reputation_delta < 0}
    obligates = {opt for opt, eff in outcomes.items() if eff.obligation_beyond_capacity}

    if negative == available or obligates == available:
        emit ADVISORY(
            check="coercion_trap",
            severity="HIGH",
            evidence=[presented, available, outcomes],
        )

If every available option produces negative reputation OR every option obligates the agent beyond current capacity, μ flags the situation. Advisory only — cannot veto the decision. The signal exists so π governance can remediate the context that produced the trap.

3. Axiom drift

A pattern of decisions that, taken individually, all pass the κ rule engine, but whose aggregate effect violates one or more constitutional axioms (AX-01 through AX-07 in ../constitution.md). Detection requires an ongoing statistical view, not a point check. Example: an arbiter that denies no individual proposal but, taken over a month, denies every proposal from a particular class — a bias that no single κ evaluation would catch.

False-positive profile: high in the absence of long time-series data. μ’s axiom-drift detector operates over epochs, not rounds.

Algorithm (sliding window):

fn check_axiom_drift(domain, now):
    window    = 6_months
    changes   = parameter_changes(domain, since=now - window)
    magnitude = sum(abs(c.delta_bps) for c in changes)

    # AX-06-derived cap: cumulative change must stay within 1000 bps (10%)
    if magnitude >= 800:  # 8% warning threshold
        emit ADVISORY(check="axiom_drift", severity="MEDIUM", domain=domain)
    if magnitude >= 1000:  # 10% = cap
        emit ADVISORY(check="axiom_drift", severity="HIGH", domain=domain)
        return BLOCK_NEW_PROPOSALS(domain)

    # AX-invariant check: would any axiom invariant regress?
    for proposal in staged_proposals(domain):
        if proposal.would_reduce_invariant(AX_01 through AX_07):
            emit ADVISORY(check="axiom_regression", severity="HIGH")
            return HARD_BLOCK(proposal)

WARN at 8% cumulative change; HARD BLOCK at 10%. A proposal that would reduce any AX invariant is hard-blocked at governance intake (pre-ENACTED, see governance.md) regardless of cumulative position.

Three advisory roles

μ’s output is consumed by three roles, each with a different authority level:

Role Authority Response
Translator read-only Summarize advisory reports for a human operator; no recommendations of its own
Sentinel read-only Flag advisory reports that meet a severity threshold; may escalate to π
Guide read-only Suggest corrective actions for human review; the human decides whether to act

All three are read-only. μ does not have a “mutator” role. An advisory report that prescribes action is still a report; execution requires a separate π governance proposal or a T0-human authorization.

Advisory record schema

Every μ output is a structured record:

{
  "role": "Translator" | "Sentinel" | "Guide",
  "check": "circular_logic" | "coercion_trap" | "axiom_drift" | "axiom_regression",
  "result": "PASS" | "WARN" | "BLOCK",
  "severity": "LOW" | "MED" | "HIGH",
  "evidence": [ <references to records/events/rules> ],
  "recommendation": "<free-form human-readable>",
  "decision_hash": "SHA-256(role || check || canonical(input) || result)",
  "timestamp_logical": <uint64>
}

The decision_hash is the deduplication key; identical inputs produce identical advisories and are collapsed on write into mcp_advisories.

Escalation mapping

Result Action
PASS Logged to ζ at thought_type=advisory; no further effect
WARN Logged; surfaced in the operator console; no rule change
BLOCK Denies the proposal at π intake (pre-ENACTED)
HARD BLOCK Denies at α’s tool-lock admission (s10); downstream κ evaluation never runs

A BLOCK is recoverable — the proposer can amend and re-submit. A HARD BLOCK requires a governance path to clear, usually because the proposal would have retroactively violated an axiom.

What μ is not

  • Not an enforcer. μ never denies a tool call, never closes a session, never penalizes an agent. The actual enforcement happens at α (gating), κ (rules), and π (governance).
  • Not a replacement for verification. β’s VERIFY state checks whether a task’s claimed output matches its acceptance criteria. μ watches for meta-patterns that VERIFY cannot see.
  • Not real-time. μ operates on accumulated records, often over epochs. It is built to be precise, not fast.

Phase 0 posture

  • No μ code in Phase 0; no advisors active.
  • Schema for mcp_advisories table exists as a stub (role, check, result, severity, decision_hash) but is never written.
  • No integration with the Phase 0 tool surface. The 19 Phase 0 tools (ADR-004) do not include any integrity_* call.
  • First real μ activation target: R151+ (Phase 4) per ../../../5-time/roadmap.md.

Phase 4 scope (future)

Phase 4 brings μ online. Expected additions:

  • Three detection jobs, each on an independent schedule, writing to an integrity_advisories table.
  • A colibri-integrity-monitor skill (already extracted as a heritage SKILL.md in .agents/skills/) updated to Phase 4 semantics.
  • Optional: a -$$- DSL for declaring advisory queries. The DSL is specified in the heritage extractions and will be revisited when μ activates.

See also


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.