AI Agent Evaluation with RAGAS

Поделиться
HTML-код
  • Опубликовано: 30 июл 2024
  • НаукаНаука

Комментарии • 21

  • @utkarshkapil
    @utkarshkapil 3 месяца назад +5

    Been following your tutorials since last year, every single video has been super helpful and provides complete knowledge. THANK YOU and I hope you never stop!!

    • @dil6953
      @dil6953 3 месяца назад

      Agreed!! This man is a savior!!

    • @jamesbriggs
      @jamesbriggs  3 месяца назад

      haha thanks man I appreciate this a lot!

  • @realCleanK
    @realCleanK 3 месяца назад

    Thank you!

  • @erdoganyildiz617
    @erdoganyildiz617 3 месяца назад +3

    Hey there James. Thank you for the content.
    I am confused about the retrieval mesaures. Specifically, it seems like we don't feed the ground truth contexts to the to the ragas evaluator (we only feed the ground truth answers), then how come it can decide if a retrieved chunk by the RAG is actually positive or negative?
    Even if we fed it, I would still be confused about how to compare a retrieved context/chunk with a ground truth context? Because in your example it seems like we have a single and long ground truth context, on the other hand the RAG retrieves 5 smaller chunks, so how do we decide if a single retrieved chunk is positive or not?
    And lastly, how do we even obtain the ground truth context at all? To answer a question there might be many useful chunks inside our source documents right? How do we decide which one or ones are the best and how do we decide their length and so on?
    I would appreciate any kind of answer. Thanks in advance. :)

  • @waterangel273
    @waterangel273 3 месяца назад

    I like your video even before i view it cause it knows it will be awesome!

  • @sivi3883
    @sivi3883 Месяц назад

    Awesome video! RAGAS looks very nice as we are stumbling upon in building an automated framework for evaluation. Understood we need to have manual test cases during the development but also it is not realistic to scale the manual evaluation process once the RAG apps go to production.
    I am aware you mentioned RAGAS can generate question and ground truth pair based on the provided data. In my use case we have thousand's of PDFs, HTMLs in RAG application. Does that mean we need to supply every single doc to RAGAS to generate this pair? Just wondering how feasible it is for a chatbot where user could ask any query post go live and how to generate these metrics effectively. Would love to hear your thoughts!

  • @horyekhunley
    @horyekhunley 3 месяца назад

    I have a dataset from stack overflow containing questions and answers, how can I prepare the data to be used for this ragas evaluation?

  • @alivecoding4995
    @alivecoding4995 3 месяца назад

    Hi James. I would like to ask, how do you cut the videos such that the end result looks almost as if you talked through it without any mistakes. Clearly, there are many cuts. But is it fully automated? And which tool provides this?

  • @shreerajkulkarni
    @shreerajkulkarni 3 месяца назад

    Hi James, Can you make a tutorial on how to integrate RAGAS for local RAG pipelines

  • @javifernandez8736
    @javifernandez8736 3 месяца назад +2

    Hey James, I tried running Ragas with a RAG system using your AI chunked database, and It obliterated my Openai API funds (20$ within just a 8% of the test runned). Am I doing something wrong? Do you think there is a way of calculating the cost beforehand?
    Thank you so much for your videos

    • @julianrosenberger1793
      @julianrosenberger1793 2 месяца назад

      You can check out what is going on by using a web proxy, like webm. There you can see what exact api calls ragas is doing under the hood and whether there is something going wrong. (for example extracting json and as a result some calls are made multiple times.... )

  • @DiljyotSingh-vn6wo
    @DiljyotSingh-vn6wo 3 месяца назад

    For evaluation, the contexts that we provide to ragas. Is it the predicted context or the ground truth contexts?

    • @jamesbriggs
      @jamesbriggs  3 месяца назад +1

      ground truth are the actual positives (p) and predicted contexts are the (p_hat), we use both in different places

  • @DoktorUde
    @DoktorUde 3 месяца назад

    We have been experimenting with RAGAS for evaluating our RAG pipelines but for us the metrics (especially the ones for context retrieval) seem to be very unreliable. Using a test set of 50 questions the recall would go from 0.45 to 0.85 between runs without us changing any of the parameters. For the time being we have stopped using RAGAS because of this. What have your experiences been? Would be interested to know if it's maybe something we have been doing wrong.

    • @jamesbriggs
      @jamesbriggs  3 месяца назад

      I don't think the retrieval metrics (context recall and precision) should vary between runs (assuming you are getting the same output from the retrieval pipeline). The generative metrics rely on generative elements from the LLMs and so they will tend to change between runs

    • @julianrosenberger1793
      @julianrosenberger1793 2 месяца назад

      Did you specify the llm? because ragas is using per default only GPT-3.5 Turbo... you have to specify it to a GPT4 version.
      but if you have a more complex usecase it's probably even with gpt4 not working... for all more complex projects i've been involved i've ended up building custom evaluation systems specified for this use case and they outperformed ragas by far in measuring the real quality...

    • @jamesbriggs
      @jamesbriggs  2 месяца назад +2

      @@julianrosenberger1793 yeah I agree, RAGAS is a nice first step, but for good and accurate evaluation you need to be creating your own test cases (which you can use RAGAS to initially create, but you should be prepared to modify them a lot)

    • @BriceGrelet
      @BriceGrelet Месяц назад

      Hi james, thank you for your job. Ragas seems to be similar to the concepts behind DSPy framework. Have you been testing it yet and if yes what’s your opinion about it ? Thank you again 👍

  • @scharlesworth93
    @scharlesworth93 3 месяца назад

    RAGGA TWINS STEP OUT! BO! BO! BO!