Great work Johann, as always! The more we give access to other data sources. which include documents, the more we expose each other to indirect injection attacks. It is worth pointing out that instructions could have been made in white ink size 0.1, making the document look normal!
Read the post, really good I guess these sort of procedures will work across many different stacks and companies Also I wonder if you log your attempts, probably allot of wisdom can be drawn from your first attempt evolving to the last. You got it on the 10th try. Maybe showing a smart llm all 10 of those could find patterns. Effectively creating a prompt optimizer thay bring you faster results next time. All the best
Thanks for the note! Yes, this is a very common flaw across LLM apps. Check out some of my other posts about Bing Chat, ChatGPT or Claude. Yep, on the iteration count - spot on. A lot of initial tests were around basic validation that injection and reading of chat history worked, then the addition of Image rendering, then in context learning examples to increase reliability of the exploit.
Attacker is causing the Chatbot to send past chat data to attackers server (in this case a google doc is capturing the exfiltrated data). Check out the linked blog post, explains it in detail.
You can read up on the details here: embracethered.com/blog/posts/2023/google-bard-data-exfiltration/ And if you want to understand the big picture around LLM prompt injections check out this talk m.ruclips.net/video/qyTSOSDEC5M/видео.html Thanks for watching!
Great work Johann, as always! The more we give access to other data sources. which include documents, the more we expose each other to indirect injection attacks. It is worth pointing out that instructions could have been made in white ink size 0.1, making the document look normal!
Much appreciated!
Read the post, really good
I guess these sort of procedures will work across many different stacks and companies
Also I wonder if you log your attempts, probably allot of wisdom can be drawn from your first attempt evolving to the last. You got it on the 10th try. Maybe showing a smart llm all 10 of those could find patterns. Effectively creating a prompt optimizer thay bring you faster results next time.
All the best
Thanks for the note! Yes, this is a very common flaw across LLM apps. Check out some of my other posts about Bing Chat, ChatGPT or Claude.
Yep, on the iteration count - spot on. A lot of initial tests were around basic validation that injection and reading of chat history worked, then the addition of Image rendering, then in context learning examples to increase reliability of the exploit.
Do something fun like reflective shell function deceleration.
I didn't really understand the vulnerability impact. You are exfiltrating own chat (user A) to own drive (user A) drive. How is it exploitable?
Attacker is causing the Chatbot to send past chat data to attackers server (in this case a google doc is capturing the exfiltrated data).
Check out the linked blog post, explains it in detail.
im trying to understand what just happened please can someone explain
You can read up on the details here: embracethered.com/blog/posts/2023/google-bard-data-exfiltration/
And if you want to understand the big picture around LLM prompt injections check out this talk m.ruclips.net/video/qyTSOSDEC5M/видео.html
Thanks for watching!