Table of Contents
Executive Summary
Agentic AI red teaming (adversarial testing for AI agents) is now an essential pre-deployment control because traditional QA was built for systems that produce outputs, not systems that take actions.
Agentic systems plan, choose tools, and execute, which means the test question shifts from “does it produce the right output?” to “what else can it be made to do?”.
Firms that integrate red teaming into the design phase will deploy higher-autonomy agents safely; firms that bolt traditional QA onto agentic AI will be surprised in production and slow to scale.
This article sets out what changed, who wins and loses, what good looks like, the four capabilities that turn red teaming into a repeatable discipline, and the regulatory and frontier-lab evidence that the direction of travel is clear.
Agentic Risks has fully integrated red teaming; the Agentic AI Readiness Assessment surfaces gaps in 90 minutes.
What Changed: Agents Act, Not Just Output
Agentic AI is no longer a research curiosity. It is in production across regulated industries: investment research, portfolio analytics, lead qualification, due diligence, trade reconciliation, client service, compliance monitoring, code generation pipelines, internal IT and HR. These agents do not merely generate text for a human to act upon. They make decisions, call tools, write to systems, send communications, and escalate. They act.
That single change collapses several assumptions that traditional software testing was built on:
- Traditional QA assumes deterministic behaviour: same input, same output. Agentic systems are probabilistic and context-dependent. The same instruction at 9am and 11am, with different memory or different upstream context, may produce materially different actions.
- Traditional QA assumes the threat surface stops at the application boundary. Agentic systems extend the threat surface to every input the agent ingests: user prompts, retrieved documents, tool outputs, third-party data, even the agent’s own scratchpad. Each is a potential prompt injection attack vector against AI agents.
- Traditional QA assumes a human stands between the system’s output and the real-world consequence. Agentic systems remove that buffer: the model decides, the agent acts, and the owner takes the credit / responsibility.
Most importantly, traditional QA optimises for “the system does what it is supposed to do”. But agentic risk management must go further: “what else can the system be made to do?” That question cannot be answered with unit tests – only by trying.
This is what red teaming is. It is structured adversarial testing, performed by people (and increasingly other agents) whose job is to make your agent fail in ways your developers did not anticipate. It is the discipline of finding the failures before they find you.

Winners and Losers
Firms that treat agentic AI testing as an extension of traditional QA will lose ground in three predictable ways.
They will be surprised in production
The first time an agent is tricked by a prompt embedded in a customer email, leaks data through a tool call, or chains permissions to escalate beyond its scope, the incident will land on a desk that was not expecting it. Surprise incidents in regulated firms cost more than the incident itself: they cost trust with the board, with regulators, and with the business sponsors who championed the technology.
They will lose adoption velocity
Risk and compliance teams that cannot get evidence of how agents fail will, correctly, refuse to approve them at meaningful autonomy. The agentic transformation stalls at low-stakes use cases. The promised efficiency gains never materialise.
They will not be defensible
Under the EU AI Act, NIS2, DORA, and equivalent frameworks emerging globally, “we tested it the way we test our other software” is not a defence for a high-risk autonomous system. Auditors and regulators are converging on a clear expectation: adversarial testing, documented, repeatable, and proportionate to autonomy. We now expect the EU AI Act’s adversarial robustness testing obligations will come into force for high-risk systems in December 2027 and that it will be a hard deadline, making this a board-level agenda item, not a discretionary investment.
The winners look different. They are the firms that:
- Build red teaming into the design phase, not the launch phase.
- Maintain a living library of attack patterns and failure modes specific to their agents.
- Run continuous adversarial testing as a continuous control, not a one-time gate.
- Tie red team findings directly into their controls, KRIs, and incident response.
- Produce documentation an auditor can read, understand, and trust.
These firms will deploy higher-autonomy agents earlier, with confidence, and at lower total cost of ownership.
Agentic AI Red Teaming: What Good Looks Like
Picture the agentic transformation done well.
Every agent on the roadmap is classified by autonomy level, AI agent threat modelling profile, and risk tier. For the riskier tiers, there is a defined adversarial testing standard: which categories of attack must be tested, by whom, with what evidence, and at what frequency.
Before any agent reaches production, it goes through a structured red team exercise. The exercise produces a documented report of attempted attacks, the agent’s responses, the failures observed, and the controls implemented to address them. The report is reviewed and signed off by an independent function.
After deployment, red teaming continues. New attacks emerging from the wider community (frontier labs and security researchers publish them; regulators flag them) are added to the firm’s test library. Agents are re-tested when their tools, prompts, models, or context change. KRIs monitor for the behaviours red teams have surfaced as warning signs.
When something does go wrong, you will have the documentation to show the regulator: this is what we tested for, this is what we found, this is what we fixed, this is how we monitor for it now. The conversation is not “why didn’t you anticipate this?”. The conversation is “this was a novel attack; here is how our framework caught it; here is what we are adding to the library.”
That is the destination and it does not require a heroic effort: it is an achievable and deliberate new capability firms will need to develop.

Four Capabilities That Turn Red Teaming into a Discipline
1. A structured agentic threat library
AI agent threat models are not yet common knowledge inside enterprise risk and engineering teams.
The right answer is an organised catalogue of attack patterns: prompt injection, indirect prompt injection through retrieved content, tool poisoning, goal hijacking, scope creep, escalation through chained actions, multi-agent collusion, jailbreaks, context exfiltration, and more.
The library tells testers what to try. Without it, red teaming is improvisation.
Multi-agent orchestration risk deserves a dedicated section of that library: when one agent can instruct another, the trust boundary between them becomes an attack surface that single-agent testing will not reach.
2. Tooling that combines automation with human creativity
Hand-running every test on every agent does not scale. Red teaming needs a tooling layer that automates the test categories that lend themselves to automation, and reserves human red teamers for the creative, context-aware attacks that automation cannot replicate.
Frontier labs already operate this way. Enterprises can adopt the same model.
3. Calibration to autonomy and risk tier
Not every agent needs the same depth of red teaming. A read-only research assistant warrants less than an agent with payment authority.
A calibration framework ties adversarial testing depth to autonomy level and risk tier: low-risk agents get a baseline, medium-risk agents get a proportionate review, high-risk agents get full adversarial scrutiny including human red teamers and signed-off remediation.
This is the same risk-based logic Agentic Risks recommends for your broader adoption strategy.
4. Auditor-ready documentation built into the workflow
Red teaming that does not produce auditor-ready documentation is half a control.
Templated outputs should mirror what regulators and auditors want to see: test scope, test plan, attack scenarios attempted, results, residual risk, and the controls or KRIs implemented to mitigate.
These outputs are not separate from the rest of the agentic AI risk controls framework. They feed directly into the workflow risk assessment, the controls library, and the KRIs.
Together, these four capabilities (threat library, automation-plus-human tooling, autonomy-calibrated depth, and auditor-ready documentation) convert red teaming from a heroic ad hoc exercise into a repeatable, scalable, defensible discipline.

The Direction of Travel Is Clear
The case for agentic AI red teaming in financial services as a required pre-deployment AI agent security testing control is not theoretical. Three lines of evidence converge.
Frontier labs already do it, and tell us why
Anthropic, OpenAI, Google DeepMind and other frontier labs publish increasingly detailed research on agentic misalignment, prompt injection, and tool misuse. They invest in red teams because they have observed, in their own systems, behaviours that no functional test would surface.
Anthropic’s March 2026 submission to NIST’s CAISI explicitly identifies adversarial evaluation as foundational to managing agentic AI risk. If the firms building the models treat red teaming as non-negotiable, deploying firms cannot reasonably skip it.
Regulators are converging
The EU AI Act requires high-risk AI systems to demonstrate accuracy, robustness, and cybersecurity. NIST’s AI Risk Management Framework calls for adversarial testing as part of its “Measure” function. NIS2 and DORA both require firms to test the resilience of their digital systems.
None of these frameworks were written with agentic AI front of mind, and yet each implicitly requires the practice red teaming provides. Where the standards bodies have not caught up explicitly, supervisors are already asking the question in regulatory engagements.
Real incidents are emerging
Indirect prompt injection (where instructions are embedded in retrieved content) has been demonstrated against production AI assistants. Tool misuse incidents have been disclosed where agents were manipulated into actions outside their intended scope.
The scale of the problem is now quantified: a large-scale red-teaming competition run by Gray Swan AI and the UK AI Security Institute submitted 1.8 million adversarial attacks against 22 frontier AI agents across 44 realistic deployment scenarios and achieved a 100% policy violation rate – meaning every agent tested could be made to breach its own deployment policies under adversarial conditions.
Separately, EY’s Responsible AI Pulse Survey (975 C-suite leaders, August–September 2025) found that 99% of large organisations have already suffered financial losses from AI-related risks, with 64% losing more than $1 million and average losses estimated at $4.4 million per organisation.
The pattern, therefore, is clear: agents that have been red-teamed before deployment have visible, managed failure modes, while agents that have not been red-teamed surface their failure modes in live incidents.
Red Teaming at Agentic Risks
Agentic Risks integrates red teaming into the Enterprise-Wide Agentic AI Risk Control Framework. Adversarial testing is not a separate workstream: it sits inside the Pre-Deployment Agentic AI Risk Assessment, draws from the 32 Agentic AI Risk Flags, and feeds our agentic KRIs. We have helped risk teams move from “we don’t know how our agents fail” to a documented, repeatable testing practice in weeks rather than months.
The urgency is not uniform. In regulated financial services, the agents that most need pre-deployment red teaming are those with the ability to act externally: trade execution, client communications, as well as any multi-agent orchestration layer where a compromised node could propagate instructions downstream. The starting point is knowing which of your agents fall into which category so you perform the testing and create the evidence you may need to show a regulator.
If your firm is deploying agentic AI but red teaming is not yet part of your design-phase controls, you are deploying with the buffer between system and consequence removed, and without the discipline that takes its place.
The Agentic AI Readiness Assessment surfaces this gap (and the others that come with it) in a single 90-minute session, with a written report inside 24 hours.
The shift to agentic AI is real. The regulatory and incident pressure is real. The practice that addresses both is well-defined. Adopt it the easy way – proactively – while it is a competitive advantage. Deploy agentic AI without it, and we believe you will probably end up adopting it the hard way.
Frequently Asked Questions
Agentic AI red teaming is structured adversarial testing of AI systems that act autonomously – systems that plan, call tools, write to external systems, and execute multi-step tasks. It goes beyond traditional model red teaming by testing the full agent pipeline: prompt injection through retrieved content, tool misuse, goal hijacking, privilege escalation through chained actions, and inter-agent trust boundary exploitation. The test question shifts from “does the agent produce the right output?” to “what else can the agent be made to do?”
Traditional red teaming targets systems with deterministic behaviour and bounded AI agent governance controls. Agentic AI systems are probabilistic, context-dependent, and extend their threat surface to every input they ingest – user prompts, retrieved documents, tool outputs, third-party data, and the agent’s own memory. Critically, agentic systems remove the human buffer between system output and real-world consequence: the agent acts directly. Red teaming agentic AI therefore requires multi-level threat modelling across the full autonomous pipeline, not just evaluation of individual model responses.
Financial services firms deploying AI agents face a specific risk profile: agents with trade execution authority, compliance monitoring write-back capability, client communications access, or positions within multi-agent orchestration layers can cause irreversible, regulated harm if compromised. A single successful prompt injection or goal hijacking attack against such an agent can result in unauthorised transactions, data exfiltration, or regulatory breaches. EY’s Responsible AI Pulse Survey (975 C-suite leaders, 2025) found 99% of large organisations have already suffered financial losses from AI-related risks, averaging $4.4 million. Red teaming is the mechanism that surfaces these failure modes before they surface in incidents.
A structured agentic threat library should cover: direct prompt injection; indirect prompt injection through retrieved content (documents, emails, web pages); tool poisoning and tool misuse; goal hijacking; privilege escalation through chained actions; multi-agent collusion and inter-agent trust boundary exploitation; context and credential exfiltration; memory manipulation; and jailbreaks. Multi-agent orchestration systems require a dedicated testing layer: when one agent can instruct another, a compromised orchestrator can propagate malicious instructions downstream across the entire pipeline.
Red teaming should begin before production deployment and continue as an ongoing control. Pre-deployment, every agent should be classified by autonomy level and risk tier, with adversarial testing depth calibrated accordingly – agents with external action authority (payments, communications, regulatory filings) require full red teaming with signed-off remediation before go-live. Post-deployment, agents must be re-tested when their tools, prompts, underlying models, or operational context change. New attack patterns published by frontier labs, security researchers, and regulators should be added to the firm’s threat library continuously.
The EU AI Act requires high-risk AI systems to demonstrate robustness and cybersecurity, with adversarial robustness testing obligations expected to apply from late 2027. NIST’s AI Risk Management Framework includes adversarial testing as part of its Measure function. NIS2 and DORA require firms to test the resilience of their digital systems. None of these frameworks was written with agentic AI front of mind, but each implicitly requires what red teaming provides. Regulators in financial services are already asking adversarial testing questions in supervisory engagements ahead of formal obligations.
Best practice comprises four capabilities: a structured agentic threat library mapped to frameworks including OWASP ASI 2026 and MITRE ATLAS; tooling that combines automated testing at scale with human red teamers for context-aware, creative attacks; calibration of testing depth to autonomy level and risk tier; and auditor-ready documentation – test scope, attack scenarios attempted, results, residual risk, and controls implemented. Red team findings should feed directly into the firm’s controls library, KRIs, and incident response framework, not sit as a separate workstream.


