Multi-agent systems — programs in which several AI agents share tasks — have been production-ready since 2024 and are deployed by more and more mid-market companies. They promise to automate entire business processes: a "CEO" agent plans strategy, an "engineer" agent writes code, a "reviewer" agent reviews. Sounds elegant. Is legally demanding. This article shows how to build multi-agent systems compliant with the EU AI Act and GDPR Art. 22.
The compliance double problem
Multi-agent systems combine two frameworks that together demand more than either alone:
- EU AI Act Art. 26: human oversight at critical decision points.
- GDPR Art. 22: no fully automated individual decisions with legal effect without opt-in.
Both together: approval gates wherever money, people or customer-facing content is involved. Without an audit trail, no compliance proof. Without rollback, no error correction.
The five pillars of a compliant agent workflow
1. Audit trail per agent run
Every agent run is fully logged: input, model, output, timestamp, tool calls. A flat log file isn't enough — not searchable, not auditable. Structured logging in Postgres with a search UI is the minimum standard:
CREATE TABLE agent_runs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_name TEXT NOT NULL,
parent_run_id UUID REFERENCES agent_runs(id), -- multi-agent hierarchy
input JSONB NOT NULL,
model TEXT NOT NULL,
output JSONB,
tool_calls JSONB,
status TEXT CHECK (status IN ('pending', 'running', 'ok', 'error', 'rejected')),
duration_ms INTEGER,
approved_by UUID REFERENCES auth.users(id),
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_agent_runs_agent ON agent_runs (agent_name, created_at DESC);
CREATE INDEX idx_agent_runs_status ON agent_runs (status);
2. Approval gates at critical decisions
Before any action with external impact the system waits for human approval:
- Customer-facing content (blog post, email, invoice) → approval queue, not direct send.
- Money movement (invoice, refund) → approval queue with amount limit.
- HR decision (applicant pre-screening) → GDPR Art. 22 opt-in + manual review.
- Code deploy to production → human clicks Deploy.
Rule of thumb: "Would I let an inexperienced employee do this without review?" If no, then not an AI agent either.
3. Memory system for continuous learning
Agents learn from their own findings. Best practice: separate memory collections for "insights" (what works) and "improvements" (what to do better next run). Both are humanly auditable:
CREATE TABLE agent_memory (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_name TEXT NOT NULL,
type TEXT NOT NULL CHECK (type IN ('insight', 'improvement')),
source_run UUID REFERENCES agent_runs(id),
content TEXT NOT NULL,
embedding vector(1536), -- pgvector for semantic search
created_at TIMESTAMPTZ DEFAULT NOW()
);
4. Rollback paths
Every agent action must be reversible. Concretely: before sending a blog post the draft sits in the DB; before sending an invoice a cancellation path is built in; before every code deploy there's a backup branch. No "irreversible by design".
5. EU-hosted models by default
Sensitive workloads (HR files, health data, strategy papers) stay in the EU. My default: Scaleway Mistral for text, Pixtral for multimodal tasks. US models (GPT-4, Claude) only where the use case justifies it — and with Standard Contractual Clauses and Transfer Impact Assessment.
Tool rules per agent
Each agent may only do what is explicitly allowed. Example for a code-engineer agent:
agent: engineer
allowed_tools:
- read_file
- write_file
- run_tests
- create_git_commit
forbidden_tools:
- delete_file # only via approval
- push_to_remote # only via approval
- send_email # not their job
- create_invoice # not their job
data_access:
- source_code: read-write
- test_data: read
- production_db: NONE
Tool rules aren't a recommendation — they're a compliance requirement. An agent allowed to do too much is a compliance risk.
Human oversight dashboard
Admin sees all agent outputs before production. Filter by agent, date, status. Approve / reject with one click. On reject a reason goes into memory — the agent learns from it.
Retention periods
AI prompts and outputs are deleted after 30 days — unless there's a legal retention requirement (e.g. invoices 8 years, see e-invoice article). Don't forget: the memory system also has retention. Insights longer (e.g. 2 years), improvements shorter (e.g. 6 months).
What I do concretely
For a multi-agent setup on your project I deliver: architecture with clear agent roles (e.g. CEO/CTO/engineer pattern) and tool rules per role, memory system with separate insights/improvements collections, approval workflow before production actions with admin UI and email notification, audit logging in Postgres with search UI, EU-hosted models by default, AiBadge integration on every AI output, and a dashboard with agent run history.
More at /compliance/ai-agents.



