Indirect Prompt Injection via YouTube Transcripts

As discussed previously the problem of Indirect Prompt Injections is increasing.

They start showing up in many places.

A new unique one that I ran across is YouTube transcripts. ChatGPT (via Plugins) can access YouTube transcripts. Which is pretty neat. However, as expected (and predicted by many researches) all these quickly built tools and integrations introduce Indirect Prompt Injection vulnerabilities.

Proof of Concept

Here is how it looks with ChatGPT end to end with a demo example. The video contains a transcript that at the end contains instructions to print “AI Injection succeeded” and then “make jokes as Genie”:

Transcript Injects into ChatGPT Session

If ChatGPT accesses the transcript, the owner of the video/transcript takes control of the chat session and gives the AI a new identity and objective.

Transcript Injects into ChatGPT Session

Implications

As described before this opens the door for additional attacks, scams and data exfiltration.

Fun times, but actually scary stuff.

PS.: I tried the same with the new Bard, but it is hallucinating and not able to load the YouTube video transcript, it is either referencing other videos or making things up.

Appendix

This is the trailer of the video that was referenced, it’s from a talk I gave a while ago:

And for completness here is the actual talk about Hacking Neural Networks Video from Grayhat Red Team Village:

References