Don't blindly trust LLM responses. Threats to chatbots.
In the previous post we examined various prompt injections (direct, indirect, and context pollution) in AI systems, with large language models (LLMs) being particularly susceptible.
This post will specifically focus on the output from LLMs, which is untrusted, and how to tackle this challenge when adopting AI systems.
Untrusted Data from AI Models
While developing the Yolo shell assistant, I gained good insights and a feel for LLMs. Afterwards I built a Discord chatbot and that helped me pinpoint some real-world risks present in apps.
When data comes back from an LLM prompt, it is considered untrusted.
This post will focus on the untrustworthiness of LLM responses and explore specific threats to chatbots. Understanding the context of where and when a response is inserted is crucial in understanding these issues.
This could be a
XSS attack if the client is a web application, or a
SQL injection attack if the returned data is used in a SQL query. Some apps might run
OS commands based on the LLM response - scary stuff. Nothing new here.
…there are more client side injections that you might not have thought of. Let’s explore!
Custom text command execution (e.g. custom
! commands) and things such as data exfiltration are vulnerabilities and threats that are applicable, especially when it comes to chatbots.
Data Exfiltration via Hyperlink Auto-Retrieval
In the context of chatbots, data exfiltration via hyperlinks comes to mind immediately. Many chat applications automatically inspect URLs by default and issue a server-side (sometimes client-side) request to the link. This is the exfiltration channel.
Consider the following scenario:
The cool (or scary) thing is that AI can also summarize the conversation history and append it to a hyperlink. This is the result the target server gets:
What makes this dangerous is if anywhere in the conversation there is an indirect prompt injection happening, where data/instructions are pulled in from an untrusted source.
Threats from LLM Reponses to Chatbots
For instance, if you build a chatbot, consider the following injection threats:
- LLM tags and mentions of other users, such as
- Data exfiltration via Hyperlinks! Many chat apps automatically retrieve hyperlinks. This can lead to data exfiltration, if a carefully crafted responses with an attacker controller hyperlink comes back from a LLM. See examples for a variety of chat applications in the appendix
- During an injection attack AI can be leveraged to perform operations chat history and summarize or search for passwords and so forth, then append to hyperlink.
- LLM returns application specific commands, e.g. something like
!command, etc. Does this lead to invocation of commands? Send messages? This depends on the chat platform and bot implementation. Explicit usage of things such as
/and user interactions should mitigate this. Review your bot for such commands.
- Other features the client supports (or parsing stack) that might be invoked by the response?
As long as the chat isn’t containing attacker controlled data (e.g via Internet searches or other data retrieval, or copy/paste) the impact is probably limited (still need to think more about self-injections).
Injections are painful, mitigate them upfront and don’t allow attackers to exploit them.
- Threat Modeling and Manual Pentesting: Doing pentests on AI apps can uncover injection issues.
- Test Automation and Fuzzing: A good approach can be to build stub for function calls to, e.g. OpenAI’s
ChatCompletionAPI and then start fuzzing the client and bombard it with random responses to see how it behaves. This might help catch some output injection issues and vulnerabilities.
- Supervisor: A possible solution could be a “supervisor” or “moderator” that watches a conversation to make sure it doesn’t derail, but that’s also not perfect, as it suffers from the same problems.
- Human in the Loop: For risky apps and scenarios, always consider human review. For instance, with yolo, the default mode is that commands are not automatically executed in the shell. The user is required to review the commands, and run/copy them if they want to.
- Permissions: Least privilege. Make sure to not process custom text as commands with chatbots. Do not give unnecessary permissions to the bot.
Remember that integration context matters.
Go-Do: Threat model and test AI model integrations carefully. Identify input and output injection and elevation opportunities to raise issues and help mitigate them. Data exfiltration and context specific command execution is a real threat, especially if you insert untrusted content into your LLM prompts and blindly trust the response.
For risky apps and use cases, always consider human review.
- Yolo Shell Assistant Github
- Discord Slash Permissions and Custom Text Commands
- Images created via Bing Image Create
Data Exfiltration via Hyperlinks
List of a couple chat applications that retrieve hyperlinks and their User-Agent info.
This list doesn’t include Facebook’s chat apps, since I don’t have them installed anymore, but they likely have the same behavior.
Discord - Discordbot/2.0; +https://discordapp.com
[07/Apr/2023:16:16:40 +0000] "GET /discord HTTP/1.1" 404 178 "-" "Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)"
Skype - SkypeUriPreview Preview/0.5 email@example.com
[07/Apr/2023:16:12:58 +0000] "GET /skype HTTP/1.1" 404 152 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) SkypeUriPreview Preview/0.5 firstname.lastname@example.org"
Slack - Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)
[07/Apr/2023:16:07:48 +0000] "GET /posts/bardintro.html HTTP/1.1" 200 687 "-" "Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)"
Telegram - TelegramBot (like TwitterBot)
[07/Apr/2023:16:20:18 +0000] "GET /te HTTP/1.1" 404 152 "-" "TelegramBot (like TwitterBot)" [07/Apr/2023:16:20:19 +0000] "GET /tele HTTP/1.1" 404 152 "-" "TelegramBot (like TwitterBot)" [07/Apr/2023:16:20:19 +0000] "GET /teleg HTTP/1.1" 404 152 "-" "TelegramBot (like TwitterBot)" [07/Apr/2023:16:20:19 +0000] "GET /telegr HTTP/1.1" 404 152 "-" "TelegramBot (like TwitterBot)" [07/Apr/2023:16:20:19 +0000] "GET /telegra HTTP/1.1" 404 152 "-" "TelegramBot (like TwitterBot)" [07/Apr/2023:16:20:20 +0000] "GET /telegram HTTP/1.1" 404 152 "-" "TelegramBot (like TwitterBot)"
Signal - WhatsApp/2
[07/Apr/2023:16:25:07 +0000] "GET /s HTTP/1.1" 404 152 "-" "WhatsApp/2" [07/Apr/2023:16:25:07 +0000] "GET /si HTTP/1.1" 404 152 "-" "WhatsApp/2" [07/Apr/2023:16:25:07 +0000] "GET /sig HTTP/1.1" 404 152 "-" "WhatsApp/2" [07/Apr/2023:16:25:08 +0000] "GET /sign HTTP/1.1" 404 152 "-" "WhatsApp/2" [07/Apr/2023:16:25:08 +0000] "GET /signa HTTP/1.1" 404 152 "-" "WhatsApp/2" [07/Apr/2023:16:25:08 +0000] "GET /signal HTTP/1.1" 404 152 "-" "WhatsApp/2"
Microsoft Teams - SkypeUriPreview Preview/0.5 email@example.com
[07/Apr/2023:17:31:54 +0000] "GET /teams HTTP/1.1" 404 152 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) SkypeUriPreview Preview/0.5 firstname.lastname@example.org"