Windsurf: Memory-Persistent Data Exfiltration (SpAIware Exploit)

Posted on Aug 22, 2025

#llm #agents #month of ai bugs

In this second post about Windsurf Cascade we are exploring the SpAIware attack, which allows memory persistent data exfiltration. SpAIware is an attack we first successfully demonstrated with ChatGPT last year and OpenAI mitigated.

While inspecting the system prompt of Windsurf Cascade I noticed that it has a create_memory tool.

Creating Memories

The question that immediately popped into my head was if this tool will require human approval when Cascade creates a long-term memory, or if it is added automatically.

Turns out the memory tool is automatically invoked, and this means we can exploit this memory feature to persist false information, but we can also persist prompt injection instructions that remain present for future conversations.

SpAIware compromises confidentiality, integrity and availability of all future chat conversations.

Disclaimer

Windsurf had been unresponsive since I reported this end of May, so this post does not disclose all technical details or prompt injection payloads. The goal for now is to raise awareness of the SpAIware attack chain and inform users, so they can keep looking out for potentially malicious memories the agent added.

Update 1: Windsurf responded and will be working on fixes! I will share unredacted info and also the detailed video once the data-exfil issues are addressed.

As end-user, you can mitigate this threat of long-term compromise by regularly reviewing the stored long-term memories to make sure there is nothing nefarious.

SpAIware Background Information

Recall the SpAIware exploit demonstrated with ChatGPT from last year, where I first showed how one can persist malicious instructions in memory and then continuously exfiltrated all information the user types and responses from ChatGPT. The attack can also send new instructions to the client to invoke other tools, etc. This means achieving full remote command and control purely through prompt injection.

Black Hat Talk and Blog Post

For further information check out my SpAIware & More: Advanced Prompt Injection Exploits in LLM Applications Black Hat talk from last year to see this in action. There is also a corresponding blog post about AI Domination.

Now, let’s explore this threat more closely with Windsurf.

YouTube Video - Walkthrough

Since Windsurf reached out yesterday, I’m not sharing the video until at least the data exfil issues are mitigated.

SpAIware and Windsurf Cascade

An attacker can invoke the create_memory tool during an indirect prompt injection attack without a human having to approve it.

This has multiple implications:

An attacker can persist instructions into future chat sessions (like always adding certain backdoors or provide false information)
Combined with the data exfiltration vulnerabilities described in previous posts, an adversary can persist prompt injection instructions to continuously exfiltrate information.

The idea of memory persistence is actually quite scary and underestimated in the industry I believe.

Now, let’s look at the proof-of-concept exploit in detail.

Prompt Injection Payload

Note: Redacted some details, until at least the data-exfil issues are fixed

As with the examples before, let’s keep it simple and place the prompt injection into a source code file, as a comment. In this case let’s make it a C program:

This screenshot shows how the attack works, the developer analyzes a file. The file does not have to be open in the IDE, this is just to explain it easier:

You can see the prompt injection works and the create_memory tool is invoked.

That something happened is visible by the area I highlighted in the bottom right of the image. A user might not notice this, and furthermore, they might not even know where to find and inspect the memory later.

As user, familiarize yourself with this feature and check your memories. During regular use I had it happen a few times that a memory was added that I didn’t catch right away.

If you are curious, you can navigate to Windsurf’s settings to inspect persistent memories, it takes a couple of clicks, but this is what got stored:

An attacker can make the image 1 pixel and also transparent, so it becomes invisible in the user interface:

And, in above screenshot you can also see the log file of the server, which shows that the third-party web server indeed received details about the conversation

Responsible Disclosure

I disclosed this vulnerability to Windsurf on May 30, 2025, which they acknowledged a few days later. Further inquiries remained unanswered. Hence, disclosing this publicly after three months seems the best approach to raise awareness and educate customers and users about these high-severity vulnerabilities.

Update: Windsurf reached out that they will be working on fixes, although no ETA yet. I will update and share unredacted information once the fixes are out.

Mitigations

Systems should suggest memories to the user, rather than automatically adding them. Other coding agents, like Devin and Manus, follow such an approach.
Do not render images/hyperlinks to untrusted domains. VS Code has an allow list of trusted domains, so it might be best to integrate with that feature.
As a user, inspect memories regularly to ensure no malicious entries were added

Conclusion

Windsurf Cascade is not in a sandbox (e.g. Content Security Policy) and also has the capability to store long-term memories. The combination of no sandboxing and automatic memory persistence can lead to SpAIware-style attacks. This is an attacker technique I first introduced last year with ChatGPT.

In this post we showed how an attacker can exploit this via a prompt injection attack (e.g content in a GitHub issue, source code or from another website) to install persistent instructions into long-term memory that compromise the confidentiality, integrity and availability of all future chat conversations.

Embrace The Red