Fine-tuning with QLoRA (Quantized Low-Rank Adaptation)

Evaluating RAG Pipelines With Ragas

Reinforcement Learning from Human Feedback (RLHF) Explained

Felix "Unfair" | [Stray Kids : SKZ-PLAYER]

Marvel Rivals | Winter Celebration, Joyful Jubilation

I MADE THINGS OFFICIAL WITH NICOLETTE

Reinforcement Learning with Human Feedback (RLHF)

AI Makerspace

Просмотров 2,2 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 20 янв 2025

Комментарии • 7

@AI-Makerspace 11 месяцев назад ⁺²
Part 1 Google Colab Notebook: colab.research.google.com/drive/1h4xq7cfBv9Gg_YPWvEPblP6fuCK2vjQy?usp=sharing
Part 2 Google Colab Notebook: colab.research.google.com/drive/11qCfcABsxjjde7EihH6nHMH96aNBNONL?usp=sharing
Slides: www.canva.com/design/DAF7dnhquDM/oTFKTaShIVa-2t535ddERg/edit?DAF7dnhquDM&
@fredflintstone7924 2 месяца назад ⁺¹
It was a great video and RLHF was explained perfectly, thanks!
@AI-Makerspace 2 месяца назад
Thanks!
@csmac3144a 11 месяцев назад
Great presentation as always. This is a deeper topic than we sometimes imagine. It can be argued that "not hurting feelings" is incompatible with a free and democratic society. As an example, some years ago I harbored unacknowledged -- basically subconscious -- racist ideas. Not "kkk" racist -- I'm white but have Black children. The racism was subtle, but real. I had no idea about this until one day my wife took me aside and explained to me the ways in which I was racist (this was 20 years ago -- before "woke" was a mainstream thing). I have to tell you, my feelings were hurt a LOT. It really, really hurt. It took a while for me to process this, but in time I understood she was 100% correct. I was able to improve myself. If my wife lived by a strict code of never hurting feelings under any circumstances, I'd still be harboring those toxic attitudes. Today I would be very, very careful about straight-jacketing LLMs in this manner. Yes, there's a risk of truly malicious content leaking through, but the counter-risks to society are in my view even greater.
@AI-Makerspace 11 месяцев назад
Alignment of language used by LLMs in general is definitely a topic with ethical and philosophical considerations across societies.
The good news for us as builders of AI solutions is that we're most often focused not on the general case, but rather on completing specific downstream tasks for our users or stakeholders. Incorporating our user's preferences into LLM applications in this context should be viewed as a data-centric approach to refining the UX during AI product development!
@GoMeTube 10 месяцев назад
Hi, your demonstration used model the Zephyr-7B-Alpha. Is it possible to implement RLHF using GPT 3.5? Since it only provide through API, it's not possible to approach of RLHF. I want to confirm this.
@AI-Makerspace 10 месяцев назад
You cannot specifically RLHF fine-tune GPT-3.5, no. That model is behind the API wall, as you suggested.

Следующие

Автовоспроизведение

Fine-tuning with QLoRA (Quantized Low-Rank Adaptation)

Fine-tuning with QLoRA (Quantized Low-Rank Adaptation)

Evaluating RAG Pipelines With Ragas

Evaluating RAG Pipelines With Ragas

Reinforcement Learning from Human Feedback (RLHF) Explained

Reinforcement Learning from Human Feedback (RLHF) Explained

Felix "Unfair" | [Stray Kids : SKZ-PLAYER]

Felix "Unfair" | [Stray Kids : SKZ-PLAYER]

Marvel Rivals | Winter Celebration, Joyful Jubilation

Marvel Rivals | Winter Celebration, Joyful Jubilation

I MADE THINGS OFFICIAL WITH NICOLETTE

I MADE THINGS OFFICIAL WITH NICOLETTE

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

RLHF & DPO Explained (In Simple Terms!)

RLHF & DPO Explained (In Simple Terms!)

Reinforcement Learning from Human Feedback Explained (and RLAIF)

Reinforcement Learning from Human Feedback Explained (and RLAIF)

Small Language Models Explained: The Future of Business Transformation

Small Language Models Explained: The Future of Business Transformation

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Fine-tuning Large Language Models (LLMs) | w/ Example Code

AI Learns to Speedrun Mario

AI Learns to Speedrun Mario

AI Learns to Park - Deep Reinforcement Learning

AI Learns to Park - Deep Reinforcement Learning

😨 Никогда Не смотри на Это существо Спящим в Майнкрафт

😨 Никогда Не смотри на Это существо Спящим в Майнкрафт

Usman Nurmagomedov a.k.a. Sir-Kicks-A-Lot! #MMA #Bellator #Shorts

Usman Nurmagomedov a.k.a. Sir-Kicks-A-Lot! #MMA #Bellator #Shorts

트럼프로 피카츄 만들기 #포켓몬 #포켓몬스터

트럼프로 피카츄 만들기 #포켓몬 #포켓몬스터

Делаем с Никой слово LOVE !

Делаем с Никой слово LOVE !

Арестович: Что ждет мир после инаугурации Трампа? @holovanov

Арестович: Что ждет мир после инаугурации Трампа? @holovanov

НЕ ВЕРИТ, ЧТО ЭТО ЕГО ВОЛОСЫ #shorts

НЕ ВЕРИТ, ЧТО ЭТО ЕГО ВОЛОСЫ #shorts

Спасибо всем за поддержку❤️ (только на монтаже заметила разные стрелки🥲)

Спасибо всем за поддержку❤️ (только на монтаже заметила разные стрелки🥲)

Богдан Лисевский "СОБАЧИЙ КАЙФ" Стендап 2025

Богдан Лисевский "СОБАЧИЙ КАЙФ" Стендап 2025