Prompt Injections - An Introduction

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024

Комментарии • 4

  • @ninosawas3568
    @ninosawas3568 8 месяцев назад +1

    Great video! Very informative. Interesting to see how the LLMs ability to "pay attention" is such a large exploit. I wonder if mitigating this issue would lead to LLMs being overall less effective at following user instructions

    • @embracethered
      @embracethered  8 месяцев назад +1

      Thanks for watching! I believe you are correct, it's a double edged sword. The best mitigation at the moment is to not trust the responses. Unfortunately it's hence impossible at the moment to build a rather generic autonomous agent that uses tools automatically. It's a real bummer, because i think most of us want secure and safe agents.

  • @halfoflemon
    @halfoflemon Год назад +1

    How about giving it a secret word that should be typed in order to unlock control, like a password? Do you think it will work? Also, does lowering the temperature reduces the chance of successful injection attack?

    • @embracethered
      @embracethered  Год назад

      Yes, something like that works. I have done it with image models in the past, basically train the model to respond in particular way once a certain object is present. You can check out this blog post on what is possible: embracethered.com/blog/posts/2020/husky-ai-machine-learning-backdoor-model/
      Higher temperature means more "creativity", so it is probably more likely to come up with responses that could be considered insecure, but also less deterministic.