Testing Framework Giskard for LLM and RAG Evaluation (Bias, Hallucination, and More)

AI Anytime

Просмотров 7 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 21 авг 2024
Join me on a deep dive into Giskard, the cutting-edge Python library designed to revolutionize the way we test and evaluate AI models. In this tutorial, I uncover how Giskard is not just another tool, but an essential ally in detecting a wide range of vulnerabilities, from performance biases and data leakage to more nuanced issues like spurious correlations, hallucination, and even toxicity.
Learn how to harness the power of Giskard to scrutinize large language models (LLM) and retrieval-augmented generation (RAG), ensuring your models are not just high-performing but also ethical and secure. With practical examples and step-by-step guidance, I'll show you how Giskard can help you save valuable time, significantly reduce the manual effort in problem identification, and push the boundaries of what's possible in AI reliability and trustworthiness.
🔔 Subscribe for more insights into Gen AI model evaluation and development.
👍 Like this video if you find it helpful-it supports the channel and helps me create more content.
💬 Comment below if you have any questions or share your experiences with using Giskard and other Gen AI testing tools.
📢 Share this video with peers who could benefit from a robust testing framework for their Gen AI projects.
GitHub Repo: github.com/AIA...
Find Giskard Here: github.com/Gis...

Комментарии • 21

@giridharreddy7011 9 месяцев назад ⁺²
This channel never compromises in its quality ❤
@AIAnytime 9 месяцев назад
Glad you enjoy it!
@RameshBaburbabu 9 месяцев назад ⁺¹
🎯 Key Takeaways for my quick navigation:
00:15 🤖 *Giskard is a testing framework for evaluating machine learning models, especially large language models (LLMs), detecting biases, hallucination rates, and other metrics. It offers high-level APIs for easy implementation.*
01:13 🛡️ *Qualitative metrics like RAG and GPT eval might not suffice for enterprise or regulatory needs; quantitative measures like accuracy, F1 score, recall, and precision are crucial for evaluating LLMs.*
02:46 💼 *Giskard classifies issues into four major categories: performance, robustness, overconfidence, and spurious correlation. It provides metrics like accuracy, recall, precision, and more to address these issues.*
04:27 🧰 *The video demonstrates the installation process for Giskard in a Colab notebook, including the necessary dependencies such as Langchain, PiPDF, Fast, OpenAI, and others.*
21:30 📊 *The video showcases the creation of a custom Giskard wrapper class for serialization, including methods for defining, saving, and loading the Giskard model.*
23:56 📄 *The class methods include 'Define', 'Save', and 'Load' for the Giskard model, allowing for easy serialization and deserialization, enabling the saving of models, fast retrievers, and OpenAI embeddings.*
26:28 🤖 *Giskard is used to create a climate change question-answering model. Fast RAG model is employed for text generation tasks, specifying model type, name, description, and features.*
30:47 🧠 *Giskard model is validated with a dataset containing questions related to climate change based on IPCC reports.*
32:49 🕵️ *Giskard is used to scan for hallucination in the model's outputs, identifying potential issues in generating non-factual or incoherent responses.*
34:40 🧪 *A test suite is generated by Giskard to assess the model's performance, including metrics for hallucination, security, and other categories.*
37:02 🚨 *Giskard detects an issue in model responses, where the model contradicts itself in different outputs, emphasizing the importance of handling varied user queries effectively.*
39:34 📊 *Giskard provides comprehensive results, including accuracy, F1 score, precision, recall, robustness, and overconfidence measures, making it a versatile testing framework for language models and retrievers.*
@doge1931 9 месяцев назад
straight to the code ...love it
@AIAnytime 9 месяцев назад
Thanks
@deepaksingh9318 5 месяцев назад
Could you please also make a video Specially on Embeddings(What are different Embedding available, What are their Pros and Cons, Which one is ideal in which scenarios , What are few popular ones etc. with code)
@arpsami7797 7 месяцев назад ⁺¹
Very informative as always, Thank you. And one question, Is this applicable for a non English document? I am specifically looking an evolution for Japanese
@AIAnytime 7 месяцев назад
Yes, it is
@SnehaRoy-pf9cw 9 месяцев назад
Wow this is fantastic❤
@AIAnytime 9 месяцев назад
Thanks
@FalahgsGate 4 месяца назад
Testing Framework Giskard does not support the Gemini Pro model...I'm working with the Gemini Pro model ...the Giskard it is not working with it
@SriNagabhirava 7 месяцев назад
Thanks!
@AIAnytime 7 месяцев назад
Thank you so much for the support 🙏
@TesterOps09 2 месяца назад
climate_qa_chain = RetrievalQA.from_llm(
llm=llm,
retriever=get_context_storage().as_retriever(),
prompt=prompt
)
This code keeps giving me 429 error every time I try to use it with OpenAI. Can you let me know why this happens? I don't see it happening in the example video. Is there anything wrong that I am doing
@mcmarvin7843 9 месяцев назад
Quality
@milesonme 9 месяцев назад
Hi @AI Anytime, if i remember correctly I think you have a project where you built llms with custom data and then a user interface using I think TypeScript. I am trying to learn how to integrate a custom llm with a javascript front-end say react, I have been looking for that video for days, if you can please help me find it.
@pearlmarysamuel4809 9 месяцев назад
Will it support code llama where input is question, output is code? Or only RAG and chatbot
@b15ganeshgulhane90 2 месяца назад
how can i use opensource model in giskard instead of openai?
@chuanjiang6931 9 месяцев назад
How do we know if a llm is supported by Giskard?
@pearlmarysamuel4809 9 месяцев назад
Also can we replace openAI with any other opensource llm ro act as judge? Any thoughts
@AIAnytime 9 месяцев назад
Not possible at the moment. But Giskard creators are working on it.

Следующие

Автовоспроизведение

Check Hallucination of LLMs and RAGs using Open Source Evaluation Model by Vectara