This was excellent. I've seen a lot of videos discussing that same infographic and thought this would be more of the same. Your explanation has the perfect level of detail.
Your courses are great. Love how you explain it. I still don't always get the math but sometimes it is better to accept how things work before getting a deeper understanding. (Sometimes getting to know first principles is better too) I hope to see more of this from you as I'm sure it will only progress further in everyday adoption.
Thanks so much for commenting! And yea. It’s hard to strike that right balance of engagement and details :) I am trying to explore this for every video. Will continue to make more !
Really nice video! I really like that you break it down in more technical details rather than many other videos that give a high-level "how to use it". Look forward to your other videos in the series :)
Please create a machine learning course or some end to end projects if you have time. Your way of teaching is phenomenal and would love to learn from you.
In fact, in the second column, the output of the labeling process will not be a scalar, it will be the ranking of the 4 different answers from best to worst, then that will feed into the reward model which is trained to maximize the difference between best and worst responses.
Great video. What's the reason that the rewards model is trained siamese style instead of just training one to predict the reward with a mean square error loss function?
outstanding presentation, just what I need for what I am planning to do: (i) when do you publish the details on steps 1 and 3? (ii) why, when tryining the reward model, r1 should always be better than r2? How do you set them? should not be done by the NN instead?
I will publish these videos early January (they are the next set of videos after the holiday season). r1 should be greater than r2 for that specific loss. If we wanted it the other way around, we’d need to change the loss function. The rewards model is a “model”; so it has a training phase and inference phase. This step 2 talked about the training phase. Hence the lablers are required. In step 3, we infer from the trained model to assess quality of the response.
This is amazing, thank you! I had a question though - Where do we use the likert scale in training? From the paper, I understand that we just use the rankings to train the model.
Great video! I have some questions that baffled me. 1) How does ChatGPT deal with all of the specific words, like framework names. Does it also Tokenize them? Many videos compare ChatGpt with google and call it the google killer. Do we have any info about how much does it cost to run chat gpt vs a search in google?
Also, I asked math questions, like what is 35643 + 12352, and it was able to answer correctly. Does it have an internal mechanism of constructing math expressions or maybe generating code that would give the answer when ran?
Great questions. In terms of tokenization of inputs, I think they are broken down into sub word tokens with Byte Pair Encoding. This is just a hunch , but I think this is how the GPT models process inputs (hence assuming the same). About it being a “Google Killer” - yes. I have heard this but I don’t believe this to be true. Google actually accesses the internet in real time. ChatGPT may look like it has legit answers but that’s probably because it was trained not too long ago. ChatGPTs objective is to answer questions with “safe and ideally factual responses”, but there is not mechanism to say the response is truly correct.
Hi Ajay, can you point me to the references for the explanation of each block? They aren't in the comments section. Great video! Also: please correct me if I'm wrong - in the first block (step 1) ChatGPT responds to the users question In Step 2 (and step 3) it uses the response and through the rewards model , improves further so that next tine step 1 is answered more appropriately
This was excellent. I've seen a lot of videos discussing that same infographic and thought this would be more of the same. Your explanation has the perfect level of detail.
Thanks so much! I really appreciate it! There is definitely more of this to come :)
YESSS
Your courses are great. Love how you explain it. I still don't always get the math but sometimes it is better to accept how things work before getting a deeper understanding. (Sometimes getting to know first principles is better too)
I hope to see more of this from you as I'm sure it will only progress further in everyday adoption.
Thanks so much for commenting! And yea. It’s hard to strike that right balance of engagement and details :) I am trying to explore this for every video.
Will continue to make more !
Really nice video! I really like that you break it down in more technical details rather than many other videos that give a high-level "how to use it". Look forward to your other videos in the series :)
Thanks so much. There is more to come :)
Amazing video like your other videos!! I recently started watching your videos and subscribed to your channel. Good content with great clarity!!
Thank you CodeEmporium, another excellent video!
Thanks so much :$
Great Work Ajay. 👏👏👏
Thanks a lot :)
Again, great explanation, well done, thank you very much! I'm already excited about the upcoming videos
Thanks so much Paul. So next week, i just have a few shorts. But should be back with the other parts the following week
Oh this video really opens my mind... Thanks for your great explanation. :-)
Please create a machine learning course or some end to end projects if you have time. Your way of teaching is phenomenal and would love to learn from you.
Thanks so much! In time, I shall! But in the mean time please stick around for more educational content:)
In fact, in the second column, the output of the labeling process will not be a scalar, it will be the ranking of the 4 different answers from best to worst, then that will feed into the reward model which is trained to maximize the difference between best and worst responses.
Great video. What's the reason that the rewards model is trained siamese style instead of just training one to predict the reward with a mean square error loss function?
outstanding presentation, just what I need for what I am planning to do: (i) when do you publish the details on steps 1 and 3? (ii) why, when tryining the reward model, r1 should always be better than r2? How do you set them? should not be done by the NN instead?
I will publish these videos early January (they are the next set of videos after the holiday season). r1 should be greater than r2 for that specific loss. If we wanted it the other way around, we’d need to change the loss function. The rewards model is a “model”; so it has a training phase and inference phase. This step 2 talked about the training phase. Hence the lablers are required. In step 3, we infer from the trained model to assess quality of the response.
Thank you for this insightful explanation. Looking forward to see more from ChatGPT :)
Of course! Thanks so much for the compliments!
You sir, are at the vanguard of protecting us against our ChatGPT overlords
“You can’t beat ChatGPT, you can only hope to understand it” ~ Code Emporium, 2023 (lol)
This is amazing, thank you! I had a question though - Where do we use the likert scale in training? From the paper, I understand that we just use the rankings to train the model.
What sampling technique does Chat GPT-3 use then? Is it a combination of the ones you mentioned or just top-k?
Super helpful. You’re a great teacher!
You are very welcome! And thanks!
Hi Ajay, does ChatGPT use inconsistent values of temperature sampling so that it generates human-like responses?
i dont get how the token is seleceted in top-k sample ? does it get randomly from the top-k?
Great video! I have some questions that baffled me. 1) How does ChatGPT deal with all of the specific words, like framework names. Does it also Tokenize them? Many videos compare ChatGpt with google and call it the google killer. Do we have any info about how much does it cost to run chat gpt vs a search in google?
Also, I asked math questions, like what is 35643 + 12352, and it was able to answer correctly. Does it have an internal mechanism of constructing math expressions or maybe generating code that would give the answer when ran?
Great questions. In terms of tokenization of inputs, I think they are broken down into sub word tokens with Byte Pair Encoding. This is just a hunch , but I think this is how the GPT models process inputs (hence assuming the same). About it being a “Google Killer” - yes. I have heard this but I don’t believe this to be true. Google actually accesses the internet in real time. ChatGPT may look like it has legit answers but that’s probably because it was trained not too long ago. ChatGPTs objective is to answer questions with “safe and ideally factual responses”, but there is not mechanism to say the response is truly correct.
In the rewards model, how is it the same as the fine tuned model?
Hi Ajay, can you point me to the references for the explanation of each block? They aren't in the comments section. Great video!
Also: please correct me if I'm wrong - in the first block (step 1) ChatGPT responds to the users question
In Step 2 (and step 3) it uses the response and through the rewards model , improves further so that next tine step 1 is answered more appropriately
Hey Ajay. They should be in the description box under the video under the heading “RESOURCES”. And yep I think you summed up the steps pretty well :)
good explanation
Thanks so much!
Not ChatGPT related but what book did u use to learn linear algebra?
Hmm. Bits and pieces in college and school I guess. It’s been a while since I just sat down and read a book like this. I’d need to look into it
you may also go through the playlist of 3brown1blue.
13:25