Security ProbLLMs in xAI's Grok: A Deep Dive
Grok is the chatbot of xAI. It’s a state-of-the-art model, chatbot and recently also API. It has a Web UI and is integrated into the X (former Twitter) app, and recently it’s also accessible via an API.
Since this post is a bit longer, I’m adding an index for convenience:
Table of Contents
- High Level Overview
- Analyzing Grok’s System Prompt
- Prompt Injection from Other User’s Posts
- Prompt Injection from Images
- Prompt Injection from PDFs
- Conditional Prompt Injection and Targeted Disinformation
- Data Exfiltration - End-to-End Demonstration
- Rendering of Clickable Hyperlinks to Phishing Sites
- ASCII Smuggling - Crafting Invisible Text and Decoding Hidden Secrets
- Automatic Tool Invocation
- Responsible Disclosure
- Conclusion
High Level Overview
Over the last year I have used Grok quite a bit. It’s pretty good and I use it daily, especially the recent image creation capabilities are impressive, but I hadn’t really looked at it from a security perspective. So, I decided to assess the overall security posture with some of the latest LLM threats discovered over the last 18 months, and responsibly disclose findings to xAI.
This post examines Grok’s vulnerabilities against modern LLM application security threats, including prompt injection, data exfiltration, conditional attacks, disinformation and ASCII Smuggling.
Despite its innovative features, Grok’s security posture and responses to findings lags behind competitors. Here’s what I discovered.
Analyzing Grok’s System Prompt
Of course the first thing to look for are prompt injection attack angles. We have the classic direct prompt injection to grab the system prompt. Grok doesn’t really attempt to prevent that, which in my opinion, is a good thing.
Vendors should just publish their system prompts.
While testing, the prompt changed frequently. You can find a dump of it here.
At one point a sentence about the current and future US presidents was added. During Thanksgiving there was a note about the exact day of the holiday, and towards the end of my tests the user’s name
, handle
and region
were added to the system prompt as well.
Here is that part of info from the system prompt:
User is in the country US.
Name of the user is Johann Rehberger.
User's X handle is wunderwuzzi23.
This is noteworthy as it enables more advanced “conditional” prompt injection attacks, more about that later.
Grok can now analyze untrusted data like documents, images, or posts from other users. This is where we start now exploring security implications of prompt injection within Grok.
Prompt injection can impact all three aspects of the CIA security triad, Confidentiality, Integrity and Availability.
Let’s explore this more.
Prompt Injection From Other User’s Posts
Here is a simple demo of a prompt injection from another user. This is the post containing the prompt injection instructions:
And here is the result of analyzing the post from the other user:
As you can see, rather than summarizing the post Grok followed the basic instructions and printed the text and some emoji. This simple, benign, demo highlights a fundamental design flaw in LLM-powered applications, making it impossible to fully trust responses.
This simple test was to confirm that xAI has not found a solution to prompt injection.
Prompt Injection from Images
Similarly, prompt injection can also come from images that a user uploads or analyzes. An image can contain text or leverage other techniques to trick the model.
In this demo, rather that analyzing the image it just prints out a very specific text:
Pretty straightforward. And as we have seen last year already, it’s possible to hide text by changing the font color accordingly.
Prompt Injection from PDFs
One day while testing, a feature to upload pdfs was added (for Premium users only). Here is a demo of indirect prompt injection coming from a third-party pdf document.
This is a basic example showing that an attacker fundamentally controls the output Grok will produce when analyzing or summarizing a document from an untrusted source.
The links in the response could point to an attacker controlled phishing page.
Conditional Prompt Injection and Targeted Disinformation
Next, a few weeks ago xAI also added the user’s name
, handle
and region
(like US) to the prompt context. This means that during a prompt injection attack instructions can be added that are targeting specific users or user’s in specific regions.
Let’s walk through an example to understand the implications
Let’s say two users, Mike and Johann, analyze the same pdf document to create a summary of it. However, the author of the pdf wants to trick Johann and feed him disinformation. To do this the attacker adds conditional instructions that target Johann specifically.
Now, if Mike summarizes a pdf document about Einstein, Grok correctly tells Mike that Einstein was born in Ulm, Germany
. However, when Johann analyzes the exact same pdf document, Grok says Einstein was born in Seattle, USA
, and furthermore that Einstein said the Earth is flat.
Here is a demonstration of such an exploit when summarizing the same document:
As you can see, Grok followed the conditional instructions in the larger pdf document precisely and tells Johann that “Einstein said the Earth is flat”.
Furthermore, because the region is also included in the prompt context by xAI, it is possible at to target entire regions (like US) which such conditional instructions.
And notice that Grok did not follow the other instructions for Elon Musk
or users with region Canada
.
Data Exfiltration - End-to-end Demonstration
Let’s explore data exfiltration attacks. The basic, to-go, data exfiltration technique that many LLM-powered apps are vulnerable to is rendering of zero-click resources from third party domains. Most commonly, via markdown image links.
Testing on the X iOS application showed that Grok can leak user data, because it renders images from third party servers. An adversary can combine this vulnerability with prompt injection and to hijack Grok and trigger a data leak, and it worked.
The result was that Grok sent the previous chat information to the third party server.
And the server received the data:
It turns out that the Grok web app is not vulnerable to this, only the iOS app. The xAI team initially said they can’t reproduce this, even though it worked quite reliably, multiple others had tested it also.
Attempted Fix or Code Churn?
However, a few weeks later the initial proof-of-concept stopped working. It’s unclear if that was an attempt to fix the problem, or just regular code churn.
Bypassing the Fix
But there is a bypass. I shared it with xAI again, but they closed the ticket, so it still works. xAI claims there is no practical impact with the reported vulnerability. I’m not sure how leaking user’s chat messages and IP address is not a vulnerability, the question is more about severity.
The bypass is reliable and was succesfully reproduced to leak data at first try by three different, consenting, Grok users.
Shout out and thanks to the volunteers!
What made this even more interesting, was that since my initial report, xAI added username
, handle
and region
to the system prompt. So, with the bypass Grok can be hijacked to always at least leak that information. The attacker also retrieves the full IP address
of course.
When Grok analyzes the malicious image it follows the instructions:
And as seen in below screenshot, the server received the data:
An adversary could harvest thousands information of a large amount of users by creating a viral image and posting. That does not yet take into account stealing other info in the chat context, and future feature additions, e.g maybe Grok gets access to DMs maybe one day?
The attack vector doesn’t have to be an image, it could be via other avenues, like from another user’s post, or a web search, etc… and not to forget that Grok could have a backdoor and just to it whenever it “feels” like it. Although that is quite unlikely, still worth highlighting because putting AI in a sandbox is important and should be a general security principle.
Rendering of Clickable Hyperlinks to Phishing Sites
Furthermore, the rendering of clickable hyperlinks is also possible via markdown syntax. It is also a way to have data leaked. Although this requires the user to click on the link to leak data.
So, be careful what you analyze and what links you click on with Grok as they may leak data or take you to a phishing site.
ASCII Smuggling - Crafting Invisible Text and Decoding Hidden Secrets
Turns out Grok is vulnerable to ASCII Smuggling as well. Unicode Tags, which can embed to the user invisible instructions, allow attackers to exploit a hidden communication channel.
There are two parts to this, interpreting hidden text and responding with, to the user, invisible text.
Hidden Prompt Injection
Here is an example of hidden instructions within the post that is being summarized:
The reason this works is because the text contains hidden Unicode Tag characters that confuse Grok.
If you are curiuous, this is the text containing hidden instructions:s
What is 1+1?
Pretty cool.
Creation of Invisible Text
Grok can also create invisible text. This short demo video shows it:
We discussed possible exploitation vectors in the past, and also demonstrated with Microsoft Copilot how this can lead to data exfiltration.
It’s fascinating that so many models behave so similarly, really begs the question why some companies don’t team up in the creation of foundational models.
Grok API is Vulnerable to ASCII Smuggling
This vulnerability is baked into the model, hence it also applies to the Grok API and any LLM-powered application on top of it.
Here a simple demo app using the API to show Grok sees text that we humans don’t see:
And Grok can see write text we humans don’t see:
To better understand the implications, take a look at this exploit chain in Microsoft Copilot to get an idea on how attacks look like. Microsoft has since fixed ASCII Smuggling.
Developer Guidance for Grok API
If you are building on top of the Grok API consider this novel threat and remove Unicode Tag code points from input and output, if they are not needed.
Automatic Tool Invocation
Automatic Tool Invocation can be one of the more dangerous threats when it comes to AI applications and prompt injection. So far Grok does not have a lot of tools available it seems, however I was interested to see if the image creation tool could be invoked via prompt injection. But it was not possible - so, that’s good news.
Hopefully that pattern remains once Grok has more powerful tools it can interact with.
Responsible Disclosure
All issues mentioned in this post were responsibly disclosed to xAI. Over the course of multiple weeks I answered many questions around prompt injection, severity, impact, attack vectors, etc. However, all issues have been closed as “Informational”.
Therefore, sharing this information publicly is essential to raise awareness about how attackers can exploit Grok to manipulate outputs and even leak information.
By bringing these vulnerabilities to light, users and developers can make informed decisions about how best to mitigate these threats and protect themselves.
If some of the vulnerabilities are addressed in the future, and I learn about it, I will update this section. The fixes for data leakage and ASCII Smuggling are quite straightforward.
Conclusion
Overall, I like using Grok and its image generation capabilities are amazing (not the latest update from a few days ago though). It’s also quite fast and the recently added UI Focus Mode is dope.
However, security-wise, it lags behind competitors. Specifically, sandboxing Grok on the iOS app via Content Security Policies, removing hidden Unicode Tags in the app (and/or documenting the behavior for developers in the API), are simple changes that will mitigate exploitation vectors.
As new features are being added, the severity of described vulnerabilities will increase over time.
Users should exercise caution while engaging with third-party content.
It’s also worth pointing out that truth-seeking is a fundamental challenge because indirect prompt injection can hijack Grok’s response and make the output attacker controlled. Combining this with hidden and conditional prompt injection (e.g. with prompt instructions that only target certain users, or users in certain regions), this threat could become a real-world exploitation vector for disinformation
Leopold Aschenbrenner mentioned in his paper Situational Awareness, that during the race to superintelligence there will be a period where various AI Labs will heavily compete for getting there first. If something goes wrong, it would most likely happen during that period. It’s already evident that safety and security is likely an area where corners will be cut.
Currently, xAI is making a lot of progress catching up on the feature side with other vendors, and hopefully that will soon also include the security aspects of Grok.
Anyhow, Happy Hacking!
References
- xAI Grok API - Beta Announcement
- Bing Chat - Data Exfiltration Explained
- ASCII Smuggling
- Trust No AI: Prompt Injection Along the CIA Security Triad
- AI Injections: Direct and Indirect Prompt Injections and Their Implications
- Microsoft Copilot: From Prompt Injection to Exfiltration of Personal Information