🤖

Agentic Fraud Lab

When the fraud agent writes its own rules — inspired by CommBank's autonomous rule generation system.

Static rule-based fraud systems are slow by construction. By the time a senior analyst notices a new pattern, drafts a rule, runs it past compliance and pushes it to production, fraudsters have moved on. CommBank's agentic fraud system flipped this: 75% of card fraud rules are now agent-authored, cutting fraud losses by 20% in H1 FY26 vs. the prior period.

This lab walks through the 8-agent rule synthesis pipeline — Signal Mining → Hypothesis → Rule Drafting → Backtest → Compliance → Analyst Review → Staged Deploy → Decay Monitor — on 5 emerging fraud patterns. You see the actual rule DSL the agent drafts, the backtest it runs, the compliance findings, the human reviewer's decision, and the live deployment trace.

Agentic AI Autonomous Rule Generation Pattern Mining Human-in-the-Loop Champion/Challenger Rule Decay

Reference: superml.dev/commbank-agentic-fraud-rule-generation-2026 ↗ · Architecture showcase only — no LLM, no DB. All numbers are illustrative of the pattern.

8-Agent Rule Synthesis Pipeline

Click Run Pipeline to trace how a fraud pattern becomes a deployed rule

→

Production Metrics — CommBank-style autonomous rule generation

📡

80M

Daily signals analysed

cross-channel fusion

🛡️

−20%

Fraud-loss reduction

H1 FY26 vs prior period

🤖

75%

Rules agent-authored

of card fraud rules

⚡

~2 days

Time-to-deploy a new rule

down from 6 weeks

⏱️

12,400/mo

Analyst hours saved

across 4 fraud teams

↺

3.1%

Auto-rollbacks triggered

rules that failed canary

Interactive Pattern → Rule Trace

Choose an emerging fraud pattern

🎙️

AI-Enabled Social Engineering

growingcritical

Voice-cloned vishing → wire transfer

Scammer clones a relative's voice, calls victim from spoofed number, walks them through a "verification" wire transfer.

$142k

/day loss

victims/day

14,320

signals

Threat narrative

Over the last 11 days, fraud analysts started receiving an unusual cluster of complaints — wires of $4–18k authorised by long-tenure customers, who afterwards reported the call sounded exactly like their adult child. Three different victims described being asked to read out an OTP "to cancel a fraudulent charge." This pattern does not match any existing rule: the wires are individually within velocity limits, the device is the customer's own, and step-up auth is passing because the customer is on the phone.

📡

Signal Mining Agent

Streams 80M signals/day. Flags suspicious clusters.

Agent 1 / 8

→Real-time stream: transactions, logins, device events, payee adds — 80M signals/day
→Recently-confirmed fraud labels from analyst queue
→Customer complaints + chargeback feed
→External intel: scam phone DBs, stolen card markets, breach notifications

LLM role:GPT-4-class model summarises each cluster into plain-English pattern names like "voice-clone wire after spoofed inbound call"Tools:Snowflake (cross-channel data) · sentence-transformers · HDBSCAN · Redis stream

Cluster

Voice-clone wire after recent inbound spoofed call

CLU-2026-0411

Novelty

·14,320 signals·emerged 11d ago

Similar to known archetype?

Closest archetype: traditional vishing (cosine 0.61) — but 0.61 is far below the 0.85 reuse threshold

Channels

Mobile banking app, Phone banking

Geo spread

Sydney metro, Melbourne metro

Top deviating features

time_since_inbound_call_min4–18 minvs 0 calls before wire+∞ (new feature)

caller_number_in_scam_intel_db74% of casesvs 0.4% of all wires+185×

wire_to_new_payee_added_today91%vs 12%+7.6×

customer_on_call_during_wire88%vs 6%+14.7×

session_otp_read_aloud_inferred63% (mic-pattern signal)vs <0.1%novel signal

Example signal vectors (anonymised)

→ TXN-08831 — $9,400 wire to new payee 7 min after inbound call from spoofed family number
→ TXN-08902 — $14,200 wire, customer audibly on call during transaction (mic open)
→ TXN-09011 — $5,750, payee added 2 min before wire, OTP read aloud (inferred)
→ TXN-09155 — $11,800, caller number matched scam intel feed within 24h

Impact Analysis

Manual Detection vs. Agentic Rule Synthesis

📋

Manual analyst-driven approach

SLOW

Time to detect

32 days

Time to deploy rule

+14 days

Analyst hours

96 hrs

$ leaked while waiting

$6.50M

Without the agent, this pattern would surface only after 30+ chargebacks were filed and a senior analyst noticed the cluster manually. Drafting + reviewing the rule manually would take another 2 weeks. Estimated $6.5M leaked while waiting — a real cost of doing fraud detection by hand.

🤖

Agentic rule synthesis

46 days → 4 hrs

Value delivered

The 8-agent pipeline mines, drafts, backtests, reviews and deploys a rule in hours — not weeks. The human analyst spends their time on the 10% that actually requires judgement: novel hypotheses, policy choices, hardship-routing decisions. Routine rule maintenance is automated end-to-end.

−20%

fraud loss

75%

rules agent-authored

~2d

pattern → live

Active Rule Repository

Total active rules

7/9

Agent-authored

$6.94M

Monthly $ saved

Sunset queued

Filter:·Sort:

Rule	Category	Author	Status	Precision	Recall	Triggers/d	$ saved/mo	Age	Trend
PayID mule chain ≥4 hops in 30 min R-2026-0329	Mule	agent	live	0.81	0.74	119	$2240k	33d	→
Voice-clone wire after spoofed inbound call R-2026-0341	APP scam	agent	live	0.92	0.71	84	$1840k	18d	→
Synthetic ID dormancy → drain 24h R-2026-0337	Synthetic ID	agent	live	0.88	0.66	31	$920k	22d	↑
IP-BIN country mismatch + e-commerce R-2025-1184	Card-not-present	analyst	live	0.71	0.49	142	$690k	318d	↓
BNPL stacking — 6+ accounts in 72h R-2026-0312	BNPL fraud	agent+analyst	live	0.84	0.61	22	$410k	49d	→
OTP relay — fast keystroke pattern R-2026-0359	ATO	agent	canary	0.86	0.55	41	$380k	6d	↑
Refund-loop ring on premium electronics R-2026-0301	Chargeback ring	agent	live	0.79	0.58	17	$350k	56d	↓
High-MCC after card add via mobile R-2025-0987	Card-not-present	agent	sunsetting	0.62	0.41	73	$110k	184d	↓
Crypto on-ramp after dormant 90d R-2026-0361	ATO	agent	shadow	0.83	0.62	28	—	3d	↑

The SuperML Take

The biggest unlock isn't that the agent can write rules — it's that the agent maintains them. Pattern mining → backtest → deploy → decay-monitor is a closed loop, and the agent runs that loop every day. Human analysts move from rule-authoring sweatshop to policy reviewers and edge-case curators.