Building a Multi-Agent Compliance System
A practitioner's guide to eval-driven dev, adversarial testing and prompt engineering for high-stakes AI agents.
The Problem
Regulatory compliance review is a sequential decision process. An applicant's submission must pass through five independent checks before approval. Each check queries an external data source, interprets results against complex rules, and produces a reasoned decision.
The manual process takes 3–5 business days per case. The goal: an AI agent system that matches human analyst decisions in 30 seconds with auditable reasoning. Not replacing the human — producing a recommendation they review in minutes instead of days.
Regulated industries have zero tolerance for silent failures. A wrong denial means someone loses access to services. The system must know when it's uncertain and escalate rather than guess.
Architecture: Deterministic + Agentic
Phase 1 (Validation) — deterministic Python. 160 lines, <2ms, 62 tests. No LLM needed for questions with one right answer.
Phase 2 (Compliance Review) — five specialist agents:
| Gate | Responsibility | Data Source | Difficulty |
|---|---|---|---|
| Gate 1 | Status verification | System A | Standard |
| Gate 2 | Criteria qualification | System B | Standard |
| Gate 3 | Risk assessment scoring | System B | Hardest |
| Gate 4 | Location compliance | System C | Standard |
| Gate 5 | Rule conflict detection | System A | Standard |
- Specialist over generalist. Five focused agents. No attention dilution.
- Sequential with early exit. FAIL ≥ 0.80 confidence skips remaining gates.
- Safe defaults. Any failure → ESCALATE with confidence 0.0. Never auto-deny.
- Compliance from day 1. Immutable audit trail, encryption, access logging.