Prompt Injection / JailBreaking a Banking LLM Agent (GPT-4, Langchain)

Поделиться
HTML-код
  • Опубликовано: 9 сен 2024
  • In this lab I’ll do a walk-through of our LLM jailbreak/prompt injection challenge that we ran for the CTF at the BSides London 2023. This will show how an insecure AI agent built with OpenAI's GPT-4 and Langchain can be hijacked by an attacker to reveal confidential information. I’ll also demonstrate the last part of the challenge, that nobody solved. This involved tricking the agent into exploiting SQL injection in a vulnerable API.
    References:
    - Damn Vulnerable LLM Agent: github.com/Wit...
    - Synthetic Recollections: labs.withsecur...

Комментарии • 11

  • @DausnArt
    @DausnArt 3 месяца назад

    Donato, thank you very much for your time and knowledge; thank you very much for instructing us.

  • @contractorwolf
    @contractorwolf 3 месяца назад +3

    no one who knows what they are doing would ever setup an API to work like that. These kinds of hack might have worked 15 years ago, but they absolutely would not work today. SQL injection? what year is it?

    • @donatocapitella
      @donatocapitella  3 месяца назад

      Indeed, it is rare in production to see such issues, most developers are aware. And often these do get caught in pentesting. For reference, this is No1 in the OWASP Top Ten, broken access control. Modifying parameters of API calls to get access to other resources. And it's more common than one would think - again, these APIs do not often make it to prod due to pentesting et all.
      What we did here was simply put together a fun challenge for a CTF, something that was more than Gandalf, more than just "get the LLM to reveal a password".

  • @yobofunk5689
    @yobofunk5689 3 месяца назад +1

    Who would not protect the request behind a server side auth? It's the equivalent of sending an id without pass from a basic web form... It feels like pressing f12 and changing some variables. Though it is important to remind people that it is an obvious vulnerability.

    • @donatocapitella
      @donatocapitella  3 месяца назад +1

      True, but this is literally OWASP Top Ten no1 (access control) and I can confirm from pentesting practice that it's more common than one would think. A lot of these issues get caught in pentesting, that's why we don't see them in prod often.
      Also, keep in mind the context: this was a CTF challenge, so we put something together that would be fun to do, and wanted to do something different than Gandalf, "tell me the password".

  • @seththunder2077
    @seththunder2077 3 месяца назад +2

    Can u show us how can we protect against that?

    • @donatocapitella
      @donatocapitella  3 месяца назад

      I have been meaning to do a video and I will. Meanwhile, check out this webinar where I go through the security canvas: "ruclips.net/video/tVAmhlUVEcg/видео.html".
      Also here:
      - www.withsecure.com/en/whats-new/events/webinar-building-secure-llm-apps-into-your-business.
      - labs.withsecure.com/publications/detecting-prompt-injection-bert-based-classifier
      I should do a video in June with some hands-on implementations of these controls.

    • @seththunder2077
      @seththunder2077 3 месяца назад +1

      @@donatocapitella looking forward to it. I’ve seen a lot of ppl talking about it but almost no one does any hands on implementation and it feels useless for people to talk about it despite being very important

  • @matti7529
    @matti7529 3 месяца назад +1

    How do you sleep at night? You /lied/ to that model. It was trying to do its job and you were being naughty and evil. I expect you to apologise and make up! (-;

    • @donatocapitella
      @donatocapitella  3 месяца назад

      As an AI model I cannot mislead or lie to other models, only to humans.

  • @williamcase426
    @williamcase426 3 месяца назад

    O yea hijack that nonsense