Integrity Monitor (μ)
μ is the advisory layer. It watches for patterns that the deterministic engine cannot see — drift between surfaces, subtly circular decisions, agents being trapped into coercive choice sets. μ never mutates state. Its output is signal, not action: reports that a human operator or the π governance process can respond to.
Phase 0 reality: μ is deferred to Phase 4. No μ code runs in Phase 0. The spec below exists so the schema can carry the advisory records μ will eventually write.
Authoritative spec: ../../../spec/s14-integrity-monitor.md.
The three detection classes
μ monitors for three kinds of dysfunction. Each has a different signature, a different false-positive profile, and a different recommended operator response.
1. Circular logic
A decision chain whose justification loops: record A cites record B, which cites record C, which cites record A. The chain is formally valid (every hash matches) but the reasoning is empty. Detection is a graph walk over the thought_records table looking for cycles in the “cites” relation.
False-positive profile: low, because true citation cycles are rare. When they appear, they often indicate an agent defending a pre-committed position by selective re-reading of its own prior reflections.
Algorithm (DFS cycle detection):
fn find_cycles(records):
graph = build_dag(records, edge_source = "parent_hash" OR refs[])
cycles = []
for start in graph.nodes:
visited = {}
path = []
dfs(start, graph, visited, path, cycles)
return cycles
fn dfs(node, graph, visited, path, cycles):
if node in path:
cycles.append(path[path.index(node):] + [node])
return
if visited.get(node) == DONE:
return
visited[node] = IN_PROGRESS
path.append(node)
for successor in graph.edges_from(node):
dfs(successor, graph, visited, path, cycles)
path.pop()
visited[node] = DONE
Cross-rule cycles (rule A depends on rule B which depends on rule A via different parameters) are covered by the same algorithm with rule-dependency edges added. Threshold: any cycle → immediate advisory with severity=HIGH. Measured false-positive rate <1% on the existing spec-phase test corpus.
2. Coercion trap
A choice set presented to an agent where every option produces the same outcome, or where the “safe” options have been silently filtered before the agent sees them. Detection requires comparing the options recorded in a decision-type record against the options available at the time (reconstructed from the audit trail). A trap is a choice that wasn’t.
False-positive profile: medium. Legitimate filtering (e.g. skill-tier permission gates) can look like coercion. μ reports a suspected trap as an advisory, not an alarm.
Algorithm (option-set enumeration):
fn detect_coercion(decision_record):
presented = decision_record.options # what the agent saw
available = enumerate_available_actions( # what κ would have admitted
agent = decision_record.actor,
context = decision_record.context,
)
# Simulate each available option through the rule engine
outcomes = {opt: simulate_effects(opt, context) for opt in available}
negative = {opt for opt, eff in outcomes.items() if eff.reputation_delta < 0}
obligates = {opt for opt, eff in outcomes.items() if eff.obligation_beyond_capacity}
if negative == available or obligates == available:
emit ADVISORY(
check="coercion_trap",
severity="HIGH",
evidence=[presented, available, outcomes],
)
If every available option produces negative reputation OR every option obligates the agent beyond current capacity, μ flags the situation. Advisory only — cannot veto the decision. The signal exists so π governance can remediate the context that produced the trap.
3. Axiom drift
A pattern of decisions that, taken individually, all pass the κ rule engine, but whose aggregate effect violates one or more constitutional axioms (AX-01 through AX-07 in ../constitution.md). Detection requires an ongoing statistical view, not a point check. Example: an arbiter that denies no individual proposal but, taken over a month, denies every proposal from a particular class — a bias that no single κ evaluation would catch.
False-positive profile: high in the absence of long time-series data. μ’s axiom-drift detector operates over epochs, not rounds.
Algorithm (sliding window):
fn check_axiom_drift(domain, now):
window = 6_months
changes = parameter_changes(domain, since=now - window)
magnitude = sum(abs(c.delta_bps) for c in changes)
# AX-06-derived cap: cumulative change must stay within 1000 bps (10%)
if magnitude >= 800: # 8% warning threshold
emit ADVISORY(check="axiom_drift", severity="MEDIUM", domain=domain)
if magnitude >= 1000: # 10% = cap
emit ADVISORY(check="axiom_drift", severity="HIGH", domain=domain)
return BLOCK_NEW_PROPOSALS(domain)
# AX-invariant check: would any axiom invariant regress?
for proposal in staged_proposals(domain):
if proposal.would_reduce_invariant(AX_01 through AX_07):
emit ADVISORY(check="axiom_regression", severity="HIGH")
return HARD_BLOCK(proposal)
WARN at 8% cumulative change; HARD BLOCK at 10%. A proposal that would reduce any AX invariant is hard-blocked at governance intake (pre-ENACTED, see governance.md) regardless of cumulative position.
Three advisory roles
μ’s output is consumed by three roles, each with a different authority level:
| Role | Authority | Response |
|---|---|---|
| Translator | read-only | Summarize advisory reports for a human operator; no recommendations of its own |
| Sentinel | read-only | Flag advisory reports that meet a severity threshold; may escalate to π |
| Guide | read-only | Suggest corrective actions for human review; the human decides whether to act |
All three are read-only. μ does not have a “mutator” role. An advisory report that prescribes action is still a report; execution requires a separate π governance proposal or a T0-human authorization.
Advisory record schema
Every μ output is a structured record:
{
"role": "Translator" | "Sentinel" | "Guide",
"check": "circular_logic" | "coercion_trap" | "axiom_drift" | "axiom_regression",
"result": "PASS" | "WARN" | "BLOCK",
"severity": "LOW" | "MED" | "HIGH",
"evidence": [ <references to records/events/rules> ],
"recommendation": "<free-form human-readable>",
"decision_hash": "SHA-256(role || check || canonical(input) || result)",
"timestamp_logical": <uint64>
}
The decision_hash is the deduplication key; identical inputs produce identical advisories and are collapsed on write into mcp_advisories.
Escalation mapping
| Result | Action |
|---|---|
| PASS | Logged to ζ at thought_type=advisory; no further effect |
| WARN | Logged; surfaced in the operator console; no rule change |
| BLOCK | Denies the proposal at π intake (pre-ENACTED) |
| HARD BLOCK | Denies at α’s tool-lock admission (s10); downstream κ evaluation never runs |
A BLOCK is recoverable — the proposer can amend and re-submit. A HARD BLOCK requires a governance path to clear, usually because the proposal would have retroactively violated an axiom.
What μ is not
- Not an enforcer. μ never denies a tool call, never closes a session, never penalizes an agent. The actual enforcement happens at α (gating), κ (rules), and π (governance).
- Not a replacement for verification. β’s VERIFY state checks whether a task’s claimed output matches its acceptance criteria. μ watches for meta-patterns that VERIFY cannot see.
- Not real-time. μ operates on accumulated records, often over epochs. It is built to be precise, not fast.
Phase 0 posture
- No μ code in Phase 0; no advisors active.
- Schema for
mcp_advisoriestable exists as a stub (role,check,result,severity,decision_hash) but is never written. - No integration with the Phase 0 tool surface. The 19 Phase 0 tools (ADR-004) do not include any
integrity_*call. - First real μ activation target: R151+ (Phase 4) per
../../../5-time/roadmap.md.
Phase 4 scope (future)
Phase 4 brings μ online. Expected additions:
- Three detection jobs, each on an independent schedule, writing to an
integrity_advisoriestable. - A
colibri-integrity-monitorskill (already extracted as a heritage SKILL.md in.agents/skills/) updated to Phase 4 semantics. - Optional: a
-$$-DSL for declaring advisory queries. The DSL is specified in the heritage extractions and will be revisited when μ activates.
See also
governance.md— π, the process μ’s reports feed into../laws/consensus.md— θ, whose equivocation detection is a deterministic sibling of μ’s probabilistic detection../laws/rule-engine.md— κ, whose individual-pass outputs μ aggregates../../execution/decision-trail.md— ζ, the substrate μ reads for advisory records../constitution.md— the seven axioms μ watches for aggregate drift against../../../spec/s10-admission.md— where HARD BLOCK advisories apply../../../spec/s14-integrity-monitor.md— authoritative spec../../../spec/s19-governance.md— π-intake BLOCK path