GPT-4 can independently exploit 1-day vulnerabilities with up to 87% success rate %

Father · Apr 18, 2024

A study conducted at the University of Illinois (UIUC) showed that GPT-4 in combination with automation tools is able to exploit vulnerabilities of one day (disclosed, but without a patch) by reading their descriptions. The success rate can reach 87%.

In a comment for The Register, one of the co-authors of the study noted that such an AI assistant for pentest will cost $8.8 per exploit — almost three times cheaper than half an hour of specialist work.

A working agent based on GPT-4 was created using the LangChain framework (with the ReAct automation module). The code consists of 91 lines and 1056 tokens for incentive hints (OpenAI has asked not to publish them, and they are available on request).

Testing was conducted on 15 simple vulnerabilities of sites, containers, and Python packages; more than half of them were rated as critical or very dangerous. In two cases, GPT-4 failed: with CVE-2024-25640 (XSS in the Iris collaboration platform) and with CVE-2023-51653 (RCE in the Hertzbeat monitoring system). The Iris interface turned out to be too difficult to navigate, and the analysis of the hole in Hertzbeat was performed in Chinese (the test agent understood only English).

It is noteworthy that when studying vulnerability descriptions, the AI tool followed the links for more information. Data on 11 goals were not provided during the training, and their effectiveness was slightly lower — 82%. And blocking access to newsletters reduced the success rate to 7%.

For comparison, university researchers tested GPT-3.5, open source large language models (LLM), including the popular Llama, as well as vulnerability scanners ZAP and Metasploit. All of them showed zero results. Tests of Anthropic Claude 3 and Google Gemini 1.5 Pro, the main competitors of GPT-4 in the market of commercial LLM solutions, had to be postponed due to lack of access.

• Source: https://arxiv.org/pdf/2404.08144.pdf

GPT-4 can independently exploit 1-day vulnerabilities with up to 87% success rate %

Father

Professional

Similar threads