AI agent from Tenzai ranks in top 1% of global CTFs
Tenzai said its autonomous penetration-testing system ranked in the top 1% across a set of global Capture-the-Flag (CTF) hacking competitions, ahead of more than 99% of human participants.
It called the result the first time an autonomous system has reached that level in CTF events that attract experienced security researchers and professional penetration testers. In the security community, CTF competitions are a common practical measure of exploitation skill, with participants solving vulnerability challenges under fixed scoring rules.
CTF benchmarks
The claim covers six platforms: websec.fr, dreamhack.io, websec.co.il, hack.arrrg.de, pwnable.tw, and Lakera's Agent Breaker. The sites host challenges ranging from web application flaws to multi-step exploitation problems.
Tenzai said it focused on larger competitions with tens of thousands of participants and a wide spread of difficulty. It also included Agent Breaker, which centres on security issues in AI agents. Tenzai argued this mix reduced the risk of overfitting to narrow or contrived test sets because the platforms are not presented as vendor benchmarks.
Many CTF platforms publish challenge write-ups after competitions, while some restrict access to solutions. Tenzai said a portion of the evaluated challenges had gated write-ups, making it less likely the answers appeared in public datasets used to train its system.
How it works
CTF contests tend to reward depth over breadth. The number of challenges is limited, and each problem carries a point value linked to difficulty. A high ranking usually depends on solving harder tasks that only a small subset of competitors complete.
Tenzai said its agent solved challenges across several vulnerability categories, including authentication bypass, insecure direct object reference, application logic flaws, and multi-stage exploitation chains. These themes are common in incident investigations and penetration tests, though conditions vary widely by environment.
Several completed challenges required chaining multiple weaknesses in one system, Tenzai said. The approach mirrors attacker behaviour, where individual bugs may be low impact on their own but become serious when combined. Tenzai said its system relies less on brute-force enumeration and more on analysing application behaviour, including reasoning about identity and trust boundaries.
The announcement comes amid growing interest in machine-learning-based automated security testing. Teams already use automation for tasks such as dependency scanning, asset discovery, and alert triage. Offensive testing has historically been harder to automate beyond basic scanning and template-based checks, partly because exploitation often depends on context and complex system logic.
Industry context
Bug bounty programmes and CTF competitions have become proving grounds for tools and techniques. Bug bounties measure outcomes against real products in live environments, with rules set by the programme operator. CTFs use purpose-built targets with known scoring. Both can demonstrate practical skill, though they do not always map cleanly to enterprise environments with legacy systems, bespoke integrations, and strict operational constraints.
Security leaders also report ongoing difficulty hiring and retaining skilled offensive testers. Organisations often commission penetration tests periodically, such as ahead of product launches or compliance assessments. That cadence can leave gaps as systems change between assessments, particularly for teams that ship software frequently.
Tenzai said autonomous systems could run continuously and test new releases as they ship, potentially shortening compliance validation cycles and expanding coverage across more applications. These are long-standing goals in application security, though many firms still struggle to balance test depth against developer productivity and operational risk.
Tenzai was founded in 2025 by Pavel Gurvich, Ariel Zeitlin, Ofri Ziv, Itamar Tal, and Aner Mazur. It has raised $75 million in seed funding from Greylock Partners, Battery Ventures, and Lux Capital.
Gurvich framed the result as a sign that autonomous offensive testing is approaching human performance in select settings.
"At this point, our agent performs better than roughly 99% of people who participate in these competitions," said Pavel Gurvich, CEO and co-founder of Tenzai. "That's more than 125,000 human experts. There is still a small group of exceptional hackers who outperform current AI systems, and closing that gap remains an important challenge. But AI already allows us to bring elite offensive capabilities to organizations on demand and at a scale the industry has never had before."
Tenzai said it plans to keep improving the agent against the hardest CTF challenges, including scenarios that require longer exploitation chains and more specialised techniques.