AI protected: Microsoft introduces new features to combat hacking of neural networks

Father · Apr 15, 2024

Spotlighting and Watchdog are designed to be a key link in protecting user data when communicating with chatbots.

Microsoft announced the creation of new technologies aimed at countering threats related to hacking AI systems. The AI Spotlighting and AI Watchdog functions will protect against two types of attacks: the introduction of malicious instructions and the use of" poisoned " content.

Two new approaches to security

AI Spotlighting separates user instructions from malicious content, which allows the AI system to analyze data without the risk of handling hidden threats.
AI Watchdog works like a search dog, recognizing hostile directions and preventing attempts to hack the system.

In addition, the corporation introduced a new tool for AI researchers and security professionals — PyRIT (Python Risk Identification Toolkit). A set of tools helps you identify risks and vulnerabilities in AI systems in advance.

Attack scenarios and methods for neutralizing them
An attacker can use 2 main methods of attacking AI: manipulation of user requests and injection of malicious content.

In the first case, an attacker can give the AI malicious instructions via a user request, and in the second case, force the AI to process a seemingly harmless document that contains hidden instructions for the AI. For example, when analyzing a "poisoned" email, the AI can reset the password or transmit confidential information without the user's knowledge.

Microsoft warns that attacks using "poisoned content" have a high success rate – more than 20%. Spotlighting reduces the indicator to a level below the detection threshold, preserving the overall AI performance.

Multi-level defense and attacks of Crescendo
As part of strengthening security, Microsoft has developed a query filtering system that analyzes the entire history of interaction with AI to identify potential threats.

The filtering system is designed to protect against a new type of attack on AI, which Microsoft experts called Crescendo. Basically, Crescendo tricks the model into creating malicious content using its own responses. By asking carefully thought-out questions or hints that gradually lead the AI to the desired result, instead of asking the task immediately, you can bypass fences and filters — usually this can be achieved in less than 10 turns of interaction.

The company emphasizes that protection against consecutive requests that individually seem harmless, but together can lead to a violation of protective mechanisms, is key to ensuring the security of AI systems. The measures taken, according to Microsoft, significantly reduce the likelihood of a successful attack, strengthening the protection of AI systems in the face of constantly evolving cyber threats.

AI protected: Microsoft introduces new features to combat hacking of neural networks

Father

Professional

Similar threads