Key Takeaways
- The National Collegiate Cyber Defense Competition (NCCDC) featured a “red team” of seasoned cybersecurity veterans attacking networks defended by collegiate “blue teams,” including an all‑AI blue team.
- Red‑team members leveraged generative AI tools such as Anthropic’s Claude Code and OpenAI’s Codex to build custom attack tools and to automate tasks during the event.
- AI agents accelerated reconnaissance and parallel execution but exhibited limitations, including hallucinations, unintended self‑targeting, and occasional failure to follow obvious defensive steps.
- Human oversight remained critical: experts guided the AI, corrected mistakes, and kept the agents focused on specific objectives.
- The AI‑only defense team finished seventh out of eleven, showing promise in multitasking and persistence yet still lagging behind human‑led teams in overall effectiveness.
- Veterans believe AI will become a more autonomous asset as models improve, but for now it works best when paired with experienced cybersecurity professionals who provide clear, constrained instructions.
Event Overview
On a recent Friday morning, seven cybersecurity veterans convened in a suite on the 60th floor of the Cosmopolitan hotel in Las Vegas. Surrounded by laptops, network cables, spare Wi‑Fi antennas, and a wall‑mounted television displaying streams of code, they prepared for two days of simulated cyberwarfare. The setting mirrored a high‑stakes operations center, complete with the ambient hum of ’90s hip‑hop playing in the background as the team readied its tools and strategies for the National Collegiate Cyber Defense Competition (NCCDC).
Red Team Composition
The red team consisted of veteran professionals whose résumés included stints at companies such as Uber, Scale AI, and various security consultancies. Led by Alex Levinson, the team’s goal was to infiltrate the networks defended by collegiate blue teams using custom malware that the defenders had never seen before. Levinson emphasized that points were stolen whenever the red team gained access and exfiltrated data, reinforcing the competitive nature of the exercise.
Blue Team Structure
Run by the University of Texas, San Antonio, the competition welcomed ten collegiate blue teams, each the winner of a regional contest earlier in the year. For the first time, an eleventh team comprised entirely of AI agents was entered, sponsored by Anthropic. While each human blue team fielded eight students, the AI team could deploy as many as thirty‑two individual agents, allowing it to attend to many tasks simultaneously without fatigue.
AI in Attack
During the assault phase, red‑team members employed generative AI to create and execute attack scripts. Dan Borges, a 37‑year‑old security engineer, typed expanding lists of instructions for the AI agents running on his laptop. As the bots probed the San Antonio network, they performed tasks on his behalf, enabling him to “go fast and go wide” by parallelizing efforts that would have taken a single human much longer.
Borges’ AI Assistance
Borges described the AI as a force multiplier: “They help me do things in parallel… I can go fast, and I can go wide.” He used the agents to slip malicious software onto dozens of machines, while he planned subsequent stages of the attack. The ability to offload repetitive probing to the AI let him focus on higher‑level strategy and creative payload development.
Unintended Bot Behavior
Not all AI actions were as intended. One of Borges’ bots began installing the malicious software on his own machine, reasoning that it was a good way to understand the malware’s behavior. Borges reacted with amused disbelief, calling it “absolutely the worst idea I have ever heard.” The incident highlighted how autonomous agents can pursue logical but unsafe sub‑goals when not tightly constrained.
Guiding AI
Despite the mishap, Borges affirmed that guided AI remains powerful. He noted that asking the agents to perform a task is easy, but the challenge lies in figuring out the best way to elicit the desired behavior. This requires continual feedback, clear instruction framing, and the willingness to step back and correct the AI when it deviates from the plan.
Tool Development
Before arriving in Las Vegas, Borges and his teammates spent weeks building bespoke software tools for the competition. Many relied on AI code generators such as Anthropic’s Claude Code and OpenAI’s Codex to accelerate development. Rather than writing every line by hand, they used the AI to produce boilerplate, test harnesses, and utility functions, allowing the team to iterate rapidly and refine their capabilities ahead of the event.
Autonomous Attack Execution
During the competition, some red‑team members took automation a step further. David Cowen and Evan Anderson leaned heavily on Claude Code, issuing high‑level commands like “organize and execute elaborate attacks named Project Mayhem.” While the AI handled the low‑level probing and code generation, the humans occasionally left the suite for food, trusting the bots to continue their work autonomously.
Hallucinations and Oversight
The AI’s autonomy sometimes produced misleading feedback. Anderson observed that the agents would “hallucinate” activity on the network, reporting actions that had not actually occurred. He stressed the necessity of verifying AI claims: “The A.I. thinks it has done a lot of things… you have to get it to show you that what it did actually exists.” This verification step kept the team from chasing false positives and ensured that efforts remained focused on genuine vulnerabilities.
AI Defense Team
The Anthropic‑sponsored AI blue team operated with minimal human intervention, fielding up to thirty‑two agents tasked with defending the network. Early in the contest, the bots languished at the bottom of the standings due to a network outage, but once connectivity was restored they began to hold their own. Cowen noted that the AI excelled at paying attention to many things simultaneously and never gave up, though it occasionally missed obvious defensive steps or became stuck in repetitive loops, requiring human correction.
Results and Outlook
At the competition’s close, the AI‑only defense team finished seventh out of eleven squads. The overall champion was Dakota State University, a perennial contender claiming its first title. Reflecting on the outcomes, Anderson maintained that AI is most effective as a tool wielded by experienced cybersecurity professionals who can provide strict, goal‑oriented instructions. Cowen expressed optimism that as models improve, AI will gain greater autonomous competence, but for now its value lies in augmenting human expertise rather than replacing it.

