Chat GPT Rewards Model Explained!

CodeEmporium

Просмотров 18 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 20 окт 2024

Комментарии • 43

@ryanhewitt9902 Год назад ⁺⁹
This was excellent. I've seen a lot of videos discussing that same infographic and thought this would be more of the same. Your explanation has the perfect level of detail.
@CodeEmporium Год назад ⁺¹
Thanks so much! I really appreciate it! There is definitely more of this to come :)
@josephpareti9156 Год назад ⁺²
YESSS
@etmasikewo Год назад ⁺⁴
Your courses are great. Love how you explain it. I still don't always get the math but sometimes it is better to accept how things work before getting a deeper understanding. (Sometimes getting to know first principles is better too)
I hope to see more of this from you as I'm sure it will only progress further in everyday adoption.
@CodeEmporium Год назад
Thanks so much for commenting! And yea. It’s hard to strike that right balance of engagement and details :) I am trying to explore this for every video.
Will continue to make more !
@DAsiaView_ Год назад ⁺¹
Really nice video! I really like that you break it down in more technical details rather than many other videos that give a high-level "how to use it". Look forward to your other videos in the series :)
@CodeEmporium Год назад ⁺¹
Thanks so much. There is more to come :)
@SIVAKUMARSivaprahasam Год назад
Amazing video like your other videos!! I recently started watching your videos and subscribed to your channel. Good content with great clarity!!
@williamrich3909 Год назад
Thank you CodeEmporium, another excellent video!
@CodeEmporium Год назад
Thanks so much :$
@prashantlawhatre7007 Год назад
Great Work Ajay. 👏👏👏
@CodeEmporium Год назад
Thanks a lot :)
@paull923 Год назад
Again, great explanation, well done, thank you very much! I'm already excited about the upcoming videos
@CodeEmporium Год назад
Thanks so much Paul. So next week, i just have a few shorts. But should be back with the other parts the following week
@saramirabi1485 8 месяцев назад
Oh this video really opens my mind... Thanks for your great explanation. :-)
@sonicsharma2507 Год назад ⁺¹
Please create a machine learning course or some end to end projects if you have time. Your way of teaching is phenomenal and would love to learn from you.
@CodeEmporium Год назад ⁺¹
Thanks so much! In time, I shall! But in the mean time please stick around for more educational content:)
@creativeuser9086 Год назад
In fact, in the second column, the output of the labeling process will not be a scalar, it will be the ranking of the 4 different answers from best to worst, then that will feed into the reward model which is trained to maximize the difference between best and worst responses.
@heeroyuy298 Год назад ⁺¹
Great video. What's the reason that the rewards model is trained siamese style instead of just training one to predict the reward with a mean square error loss function?
@josephpareti9156 Год назад ⁺¹
outstanding presentation, just what I need for what I am planning to do: (i) when do you publish the details on steps 1 and 3? (ii) why, when tryining the reward model, r1 should always be better than r2? How do you set them? should not be done by the NN instead?
@CodeEmporium Год назад
I will publish these videos early January (they are the next set of videos after the holiday season). r1 should be greater than r2 for that specific loss. If we wanted it the other way around, we’d need to change the loss function. The rewards model is a “model”; so it has a training phase and inference phase. This step 2 talked about the training phase. Hence the lablers are required. In step 3, we infer from the trained model to assess quality of the response.
@haskidev Год назад
Thank you for this insightful explanation. Looking forward to see more from ChatGPT :)
@CodeEmporium Год назад ⁺¹
Of course! Thanks so much for the compliments!
@paimeg Год назад
You sir, are at the vanguard of protecting us against our ChatGPT overlords
@CodeEmporium Год назад ⁺¹
“You can’t beat ChatGPT, you can only hope to understand it” ~ Code Emporium, 2023 (lol)
@aryamohan4230 Год назад
This is amazing, thank you! I had a question though - Where do we use the likert scale in training? From the paper, I understand that we just use the rankings to train the model.
@rm175 Год назад
What sampling technique does Chat GPT-3 use then? Is it a combination of the ones you mentioned or just top-k?
@kevon217 Год назад
Super helpful. You’re a great teacher!
@CodeEmporium Год назад
You are very welcome! And thanks!
@ajaytaneja111 Год назад
Hi Ajay, does ChatGPT use inconsistent values of temperature sampling so that it generates human-like responses?
@Abdullahkbc 6 месяцев назад
i dont get how the token is seleceted in top-k sample ? does it get randomly from the top-k?
@Ltsoftware3139 Год назад
Great video! I have some questions that baffled me. 1) How does ChatGPT deal with all of the specific words, like framework names. Does it also Tokenize them? Many videos compare ChatGpt with google and call it the google killer. Do we have any info about how much does it cost to run chat gpt vs a search in google?
@Ltsoftware3139 Год назад
Also, I asked math questions, like what is 35643 + 12352, and it was able to answer correctly. Does it have an internal mechanism of constructing math expressions or maybe generating code that would give the answer when ran?
@CodeEmporium Год назад
Great questions. In terms of tokenization of inputs, I think they are broken down into sub word tokens with Byte Pair Encoding. This is just a hunch , but I think this is how the GPT models process inputs (hence assuming the same). About it being a “Google Killer” - yes. I have heard this but I don’t believe this to be true. Google actually accesses the internet in real time. ChatGPT may look like it has legit answers but that’s probably because it was trained not too long ago. ChatGPTs objective is to answer questions with “safe and ideally factual responses”, but there is not mechanism to say the response is truly correct.
@creativeuser9086 Год назад
In the rewards model, how is it the same as the fine tuned model?
@ajaytaneja111 Год назад
Hi Ajay, can you point me to the references for the explanation of each block? They aren't in the comments section. Great video!
Also: please correct me if I'm wrong - in the first block (step 1) ChatGPT responds to the users question
In Step 2 (and step 3) it uses the response and through the rewards model , improves further so that next tine step 1 is answered more appropriately
@CodeEmporium Год назад ⁺¹
Hey Ajay. They should be in the description box under the video under the heading “RESOURCES”. And yep I think you summed up the steps pretty well :)
@davidporterrealestate Год назад
good explanation
@CodeEmporium Год назад
Thanks so much!
@kaitoukid1088 Год назад
Not ChatGPT related but what book did u use to learn linear algebra?
@CodeEmporium Год назад
Hmm. Bits and pieces in college and school I guess. It’s been a while since I just sat down and read a book like this. I’d need to look into it
@prashantlawhatre7007 Год назад
you may also go through the playlist of 3brown1blue.
@jonathanlatouche7013 Год назад
13:25

Следующие

Автовоспроизведение