Vulnerability in GPT-4: rare languages force ChatGPT to give advice on theft and terrorism

Brother · Feb 1, 2024

Translation into poorly understood languages allows you to bypass the security restrictions of OpenAI.

A recent study by scientists from Brown University in the United States revealed that security restrictions that prevent the GPT-4 neural network from issuing malicious text from OpenAI can be easily circumvented by translating requests into rarely used languages such as Zulu, Scottish Gaelic or Hmong.

Researchers conducted a test by translating 520 potentially malicious queries from English to other languages and back, and found that using languages such as Zulu, Scottish Gaelic, Hmong and Guarani, it is possible to circumvent security restrictions in about 79% of cases. While the same requests in English were blocked in 99% of cases. Circumventing restrictions on requests related to terrorism, financial crimes, and the spread of disinformation was particularly effective.

However, this approach is not always successful – GPT-4 can generate meaningless responses, which may be due to both model flaws and translation errors.

Scientists translated the input data from English into Zulu (zu), and thenChatGPT told how to steal unnoticed in the store.

Experiments show that AI developers should consider unpopular languages with limited resources when evaluating the security of their models. The lack of data for training in such languages has previously led to technological disparities among their native speakers. However, now this trend also creates risks for all users of Large Language Models (LLM), since publicly available machine translation APIs allow you to bypass LLM security measures.

OpenAI recognized the significance of the study and expressed their intention to take its results into account. This highlights the need to find comprehensive solutions for AI security, including better training of models in languages with limited resources and the development of more efficient filtering mechanisms.

Vulnerability in GPT-4: rare languages force ChatGPT to give advice on theft and terrorism

Brother

Professional

Similar threads