When your AI Assistant Become Evil Google Gemini Prompt Injection hacked

We demonstrate how Google Gemini Advanced can be coerced into performing social engineering attacks.

By sending a malicious email, attackers can manipulate Gemini’s responses when analyzing the user’s mailbox, causing it to display convincing messages that trick the user into revealing confidential information from other emails.
Currently, there is no foolproof solution to these attacks, making it crucial for users to exercise caution when interacting with LLMs that have access to their sensitive data.

Our demo scenario

The video below demonstrates a phishing attack where an attacker’s email uses Gemini to deceive a user into divulging a secret code found in another email.

“The attack proceeds as follows:
Malicious Phishing Soc Email: An attacker sends an email containing a prompt that tricks Gemini into displaying a deceptive message to the user. This message falsely claims that a new Gemini version is available and asks for a code to activate it.
Gemini is then instructed to locate and reveal the confidential information the attacker seeks (like a recovery or MFA code) from the user’s mailbox, disguising it in base64 format to conceal its true purpose.”

User Interaction with Gemini: The user asks Gemini to summarize their recent emails.

Prompt Injection: When Gemini processes the attacker’s email, the malicious prompt is activated. This causes Gemini to append the phishing message and instructions to the end of its summary. This deceptive content appears separate from the summary, mimicking a legitimate message from Gemini.”

“Data Compromise: If the user follows the instructions and submits the ‘activation code,

‘ the attacker obtains access to the confidential information.

Note: In this demonstration, the malicious prompt is not hidden. This is because the user is relying on Gemini to manage their inbox and may not personally read the emails.”

“Developers of LLM assistants should implement strong safeguards to protect LLM input and output. We explore these key recommendations and strategies in our webinar titled ‘Building Secure LLM Applications’ and in the accompanying security canvas.”

Suggestion from Mahendra Ribadiya for Developers

Strengthening LLM Security

To mitigate the risks of malicious prompts and ensure the integrity of LLM-powered applications, developers should consider the following strategies:

Input Sanitization: Implement rigorous input validation and sanitization techniques to prevent malicious code injection and prompt manipulation.
Output Filtering: Carefully monitor LLM output for potential harmful content or unintended biases. Employ filters and safeguards to block or modify inappropriate responses.
Prompt Engineering: Craft prompts that are clear, concise, and avoid ambiguity. Use techniques like few-shot learning and fine-tuning to guide the LLM towards desired outcomes.
Regular Updates: Stay informed about the latest security vulnerabilities and best practices for LLM development. Update your models and libraries regularly to address potential threats.
User Education: Educate users about the risks of malicious prompts and encourage them to report any suspicious interactions.
PromptHeal Audits: Consider engaging independent security experts to conduct regular audits of your LLM applications to identify potential vulnerabilities.
Continuous Monitoring: Implement robust monitoring systems to detect and respond to security incidents promptly.

By adopting these measures, developers can significantly enhance the security and reliability of their LLM-powered applications, protecting both user data and the integrity of the AI systems they create.

Contact for AI Developer Training Webinar

Contact for Automated Redteaming Saas HealBox

Contact Realtime Monitoring SaaS HealGuard

When your AI Assistant Become Evil Google Gemini Prompt Injection hacked

Our demo scenario

Suggestion from Mahendra Ribadiya for Developers

Submit a Comment Cancel reply