Testing Framework Giskard for LLM and RAG Evaluation (Bias, Hallucination, and More)

Check Hallucination of LLMs and RAGs using Open Source Evaluation Model by Vectara

Track and Monitor RAG Pipelines using Weights & Biases (wandb)

Full Joe Biden DNC speech reflects on legacy as president, endorses Kamala Harris over Donald Trump

Black Myth Wukong Review : Awesome and a Bit Troubled

Sid Meier’s Civilization VII - Gameplay Reveal Trailer

Evaluating Biases in LLMs using WEAT and Demographic Diversity Analysis

AI Anytime

Просмотров 2,8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 авг 2024
In today's tutorial, I dive deep into the world of Responsible AI, shedding light on how to evaluate biases in Large Language Models (LLMs) using the Word Embedding Association Test (WEAT) and Demographic Diversity Analysis. Understand the mathematical intuition, and real-world implications, and get hands-on with Python code examples to gauge the performance of these models across different demographic groups.
Bias in AI models can lead to unfair outcomes, and it's crucial for us to identify and mitigate them. Join me in this journey to ensure our AI systems are fair, inclusive, and responsible.
🔍 Topics Covered:
Introduction to WEAT
Mathematical intuition behind WEAT
Demographic Diversity Analysis in LLMs
Practical Python code demonstrations
Interpretation of results and recommendations
👍 If you found this tutorial insightful, please give it a thumbs up-it helps a lot!
💬 Have questions or insights? Drop a comment below; I'd love to hear from you!
🔔 And don't forget to subscribe for more content on Generative AI.
GitHub Repo: github.com/AIA...
Intro Video: • Learn to Evaluate LLMs...
#generativeai #ai #genai

Комментарии • 10

@tlong900 8 месяцев назад
Thank you for this series. Really appreciate it
@AIAnytime 8 месяцев назад
Glad you enjoy it!
@onionisnoopinionbutagift2987 Месяц назад
Very nice videos and very helpful - thank you a lot. Do you have by any chance a reference to the paper that introduce demographic diversity analysis? I tried to find it online, but I failed so far.
@sanjayojha1 9 месяцев назад
I like the series, but we need more explanation to the certain part of codes, for example code starting from 9:10
@Jeganbaskaran 8 месяцев назад
How the evaluation metics can be used in the real world scenorio for the huge dataset. Do we need to have any intermediate layer before we respond to the users?
@ashisranjanlahiri 8 месяцев назад
Good work
@AIAnytime 8 месяцев назад
Thanks
@giridharreddy7011 9 месяцев назад
More videos on RAG evaluation
@vijaybudhewar7014 8 месяцев назад
This Demographic evaluation should fall under LLM assisted section right? arent we using LLM's response for this ?
@AIAnytime 8 месяцев назад
This is correct in this context.

Следующие

Автовоспроизведение

Testing Framework Giskard for LLM and RAG Evaluation (Bias, Hallucination, and More)

Testing Framework Giskard for LLM and RAG Evaluation (Bias, Hallucination, and More)

Check Hallucination of LLMs and RAGs using Open Source Evaluation Model by Vectara

Check Hallucination of LLMs and RAGs using Open Source Evaluation Model by Vectara

Track and Monitor RAG Pipelines using Weights & Biases (wandb)

Track and Monitor RAG Pipelines using Weights & Biases (wandb)

Full Joe Biden DNC speech reflects on legacy as president, endorses Kamala Harris over Donald Trump

Full Joe Biden DNC speech reflects on legacy as president, endorses Kamala Harris over Donald Trump

Black Myth Wukong Review : Awesome and a Bit Troubled

Black Myth Wukong Review : Awesome and a Bit Troubled

Sid Meier’s Civilization VII - Gameplay Reveal Trailer

Sid Meier’s Civilization VII - Gameplay Reveal Trailer

I LINKED UP WITH MY EX AFTER 3 YEARS.....😱

I LINKED UP WITH MY EX AFTER 3 YEARS.....😱

Episode 1- Efficient LLM training with Unsloth.ai Co-Founder

Episode 1- Efficient LLM training with Unsloth.ai Co-Founder

Evaluate LLMs with Language Model Evaluation Harness

Evaluate LLMs with Language Model Evaluation Harness

RAGAS - Evaluate your LangChain RAG Pipelines

RAGAS - Evaluate your LangChain RAG Pipelines

Model Distillation: Same LLM Power but 3240x Smaller

Model Distillation: Same LLM Power but 3240x Smaller

LLM Evaluation Essentials: Statistical Analysis of Summarization LLM Evaluations

LLM Evaluation Essentials: Statistical Analysis of Summarization LLM Evaluations

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

Build a RAG Evaluation Tool and Python Library

Build a RAG Evaluation Tool and Python Library

AI Agent Evaluation with RAGAS

AI Agent Evaluation with RAGAS

МОЙ ОБЗОР НА МАЛЬЧИКА-ВАНГУ ПРОПЛАЧЕН ТЕЛЕКАНАЛАМИ? - ОТВЕТ МАМЫ «САША ВИДИТ»

МОЙ ОБЗОР НА МАЛЬЧИКА-ВАНГУ ПРОПЛАЧЕН ТЕЛЕКАНАЛАМИ? - ОТВЕТ МАМЫ «САША ВИДИТ»

Перечное мясо просто и вкусно! Ссылка на полное видео в строке выше - жми и смотри #shorts

Перечное мясо просто и вкусно! Ссылка на полное видео в строке выше — жми и смотри #shorts

ПОЛИНА ХЛЕБ vs ХЕЙТЕРЫ! ХАЙП на ОСКАРЕ!

ПОЛИНА ХЛЕБ vs ХЕЙТЕРЫ! ХАЙП на ОСКАРЕ!

Ik Heb Aardbeien Gemaakt Van Kip🍓🐔😋

Ik Heb Aardbeien Gemaakt Van Kip🍓🐔😋

Кулинарный AMONG US в РЕАЛЬНОЙ ЖИЗНИ! Масленников, Янчик, Егорик, Сатир, Жидковский, ЯЯна

Кулинарный AMONG US в РЕАЛЬНОЙ ЖИЗНИ! Масленников, Янчик, Егорик, Сатир, Жидковский, ЯЯна

ПРОКЛЯТИЕ ЗАБРОШЕННОГО ЛАГЕРЯ - Страшилки Minecraft

ПРОКЛЯТИЕ ЗАБРОШЕННОГО ЛАГЕРЯ - Страшилки Minecraft

Бежать пока не поздно! Отец Андрей Ткачёв

Бежать пока не поздно! Отец Андрей Ткачёв

😯 Самое неудачное предложение руки и сердца на глазах у тысяч зрителей! | Новостничок

😯 Самое неудачное предложение руки и сердца на глазах у тысяч зрителей! | Новостничок