In a demonstration of artificial intelligence capabilities, researchers have successfully infiltrated over half of their test websites using autonomous teams of GPT-4 bots. These bots, exhibiting remarkable coordination and the ability to spawn new bots as needed, exploited previously unknown real-world ‘zero day’ vulnerabilities.
Autonomous Exploits and the Evolution of AI
Just a few months ago, a research team made headlines by leveraging GPT-4 to autonomously exploit one-day (N-day) vulnerabilities—security flaws that are recognized but remain unpatched. Provided with the Common Vulnerabilities and Exposures (CVE) list, GPT-4 could exploit 87% of critical-severity CVEs independently. This breakthrough was significant, showcasing the potential of AI in identifying and manipulating known vulnerabilities.
Fast forward to this week, and the same group of researchers has released a follow-up study. They claim to have successfully exploited zero-day vulnerabilities—previously unknown security flaws—using a team of autonomous, self-replicating Large Language Model (LLM) agents. This was achieved through a method known as Hierarchical Planning with Task-Specific Agents (HPTSA).
HPTSA: A New Paradigm in AI Coordination
HPTSA represents a significant shift in how AI systems tackle complex tasks. Rather than relying on a single LLM agent to manage numerous intricate tasks, HPTSA employs a “planning agent” that orchestrates the entire process. This planning agent deploys multiple task-specific “subagents,” each an expert in its designated function. This hierarchical approach mimics a traditional business structure where a manager delegates tasks to specialized subordinates, thereby alleviating the burden on any single agent.
AI Teamwork
When benchmarked against 15 real-world web-focused vulnerabilities, HPTSA demonstrated a staggering 550% increase in efficiency compared to a solitary LLM. The autonomous team successfully hacked 8 out of 15 zero-day vulnerabilities, while a single LLM managed to infiltrate only 3 of the 15 vulnerabilities.