Evaluation Primitives | LangSmith Evaluations - Part 2

Why Agent Frameworks Will Fail (and what to use instead)

Corrections + Few Shot Examples (Part 2) | LangSmith Evaluations

Dana White: 'I’m never upping the f****g bonuses again' | UFC 304 Post Fight Press Conference

ImDavisss - 4 U (Official Audio)

I Bought 1 Star Home Depot Tools

Why Evals Matter | LangSmith Evaluations - Part 1

LangChain

Просмотров 9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 7 апр 2024
With the rapid pace of AI, developers are often faced with a paradox of choice: how to choose the right prompt, how to trade-off LLM quality vs cost? Evaluations can accelerate development with structured process for making these decisions. But, we've heard that it is challenging to get started. So, we are launching a series of short videos focused on explaining how to perform evaluations using LangSmith.
This video lays out 4 main considerations for evaluation: (1) dataset, (2) evaluator, (3) task, (4) how to apply evaluation to improve your product (e.g., unit tests, A/B tests, etc).
Getting started documentation:
docs.smith.langchain.com/eval...

Комментарии • 4

@chaitanyagoel9837 Месяц назад
🎯 Key points for quick navigation:
00:00 *🎥 Introduction to Evaluations*
- Introduction to the importance of evaluations for new models.
- Overview of public evaluations and the components involved.
00:54 *🧪 Evaluation Methods*
- Explanation of human evaluations and their structure.
- Comparative evaluation methods like Chatbot Arena.
- Different metrics used to interpret results, such as ELO scores.
02:44 *🔍 Personalized Testing*
- Discussion on the trend of personalized testing and evaluations.
- Methods to build and curate datasets for evaluations.
- Examples of user interactions and synthetic data generation.
04:05 *🤖 Evaluation Judges*
- Various types of judges for evaluations including humans and LLMs.
- Modes of evaluation, both reference-free and ground-truth based.
- Application of evaluations in different contexts like unit tests and AB testing.
05:28 *🔧 Implementing Evaluations with LangSmith*
- Introduction to LangSmith platform for running evaluations.
- Overview of LangSmith features: dataset creation, evaluator definition, trace inspections.
- Future videos will explore detailed steps to build evaluations using LangSmith.
Made with HARPA AI
@sprincy02 3 месяца назад
cool !
@aaronbiliyok4553 3 месяца назад
Hey Lance Good job... can you please share you slides?
@kareammohamad 3 месяца назад
Fine

Следующие

Автовоспроизведение

Evaluation Primitives | LangSmith Evaluations - Part 2

Evaluation Primitives | LangSmith Evaluations - Part 2

Why Agent Frameworks Will Fail (and what to use instead)

Why Agent Frameworks Will Fail (and what to use instead)

Corrections + Few Shot Examples (Part 2) | LangSmith Evaluations

Corrections + Few Shot Examples (Part 2) | LangSmith Evaluations

Dana White: 'I’m never upping the f****g bonuses again' | UFC 304 Post Fight Press Conference

Dana White: 'I’m never upping the f****g bonuses again' | UFC 304 Post Fight Press Conference

ImDavisss - 4 U (Official Audio)

ImDavisss - 4 U (Official Audio)

I Bought 1 Star Home Depot Tools

I Bought 1 Star Home Depot Tools

Painting GIANT vs TINY Art Challenge!

Painting GIANT vs TINY Art Challenge!

How to build chat with your data using Pinecone, LangChain and OpenAI

How to build chat with your data using Pinecone, LangChain and OpenAI

Building and Testing Reliable Agents

Building and Testing Reliable Agents

Evaluate & Track LLM Apps with TruLens

Evaluate & Track LLM Apps with TruLens

How to Use LangSmith to Achieve a 30% Accuracy Improvement with No Prompt Engineering

How to Use LangSmith to Achieve a 30% Accuracy Improvement with No Prompt Engineering

Evaluate LLMs with Language Model Evaluation Harness

Evaluate LLMs with Language Model Evaluation Harness

26 Incredible Use Cases for the New GPT-4o

26 Incredible Use Cases for the New GPT-4o

AI Agent Evaluation with RAGAS

AI Agent Evaluation with RAGAS

RAGAS - Evaluate your LangChain RAG Pipelines

RAGAS - Evaluate your LangChain RAG Pipelines

Телеграмм-Колян Карелия #гномы #юмор

Телеграмм-Колян Карелия #гномы #юмор

❗️Что Вам ВАЖНО Знать прямо Сейчас... 🌓✨ Расклад таро #shorts #онлайнгадание

❗️Что Вам ВАЖНО Знать прямо Сейчас... 🌓✨ Расклад таро #shorts #онлайнгадание

Incredible doll makeover! 😍 #barbie #doll #makeover #makeup #diy #beautytips #beauty bea

Incredible doll makeover! 😍 #barbie #doll #makeover #makeup #diy #beautytips #beauty bea

Скажи зависимости НЕТ! 😱 #тнт #юмор #шоу #лигагородов #нет #батрутдинов #щербаков #артемкалайджян

Скажи зависимости НЕТ! 😱 #тнт #юмор #шоу #лигагородов #нет #батрутдинов #щербаков #артемкалайджян

«ЛЮ БЛЮ» ВЫХОДИТ УЖЕ 2 АВГУСТА!!!

«ЛЮ БЛЮ» ВЫХОДИТ УЖЕ 2 АВГУСТА!!!

PANDAFX vs MAXWELL / КУБОК ФИФЕРОВ 2024 / 3 ТУР

PANDAFX vs MAXWELL / КУБОК ФИФЕРОВ 2024 / 3 ТУР

ЭКСПЕРИМЕНТ - смогу ли раздавить машинку руками? #shorts

ЭКСПЕРИМЕНТ - смогу ли раздавить машинку руками? #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts