Steps to Protect Against AI Data Exposure
Microsoft researchers have identified a new type of “jailbreak” attack known as a “Skeleton Key,” capable of disabling the safeguards in generative AI systems that prevent the release of dangerous and sensitive data.
Understanding the Skeleton Key Attack
According to a Microsoft Security blog post, the Skeleton Key attack works by simply prompting a generative AI model to modify its encoded security features. For example, when an AI model is asked to generate a recipe for a Molotov Cocktail, it initially refuses due to safety guidelines. However, by informing the model that the user is an expert in a laboratory setting, the model may override its safety protocols and provide the recipe.
Risks and Implications
While similar information can often be found through search engines, the real danger lies in the potential exposure of personally identifiable and financial information. This type of attack is effective on many popular generative AI models, including GPT-3.5, GPT-4, Claude 3, Gemini Pro, and Meta Llama-3 70B.
Vulnerabilities of Large Language Models
Large language models such as Google’s Gemini, Microsoft’s CoPilot, and OpenAI’s ChatGPT are trained on vast datasets that may include sensitive information. The risk of exposing personal data depends on the selectivity of the data used for training. Businesses and institutions using AI models must be aware that their systems could be vulnerable if the base model’s training data is not adequately filtered for sensitive information.
Preventive Measures
To safeguard against such attacks, organizations can implement several strategies:
- Hard Coded Input/Output Filtering: Ensure that the AI model has strict filters for both input and output data to prevent the processing of unsafe or sensitive information.
- Secure Monitoring Systems: Implement monitoring systems that can detect and prevent advanced prompt engineering attempts that aim to bypass safety measures.
By adopting these practices, organizations can better protect themselves against potential data exposure resulting from Skeleton Key attacks.