Part 1 Google Colab Notebook: colab.research.google.com/drive/1h4xq7cfBv9Gg_YPWvEPblP6fuCK2vjQy?usp=sharing Part 2 Google Colab Notebook: colab.research.google.com/drive/11qCfcABsxjjde7EihH6nHMH96aNBNONL?usp=sharing Slides: www.canva.com/design/DAF7dnhquDM/oTFKTaShIVa-2t535ddERg/edit?DAF7dnhquDM&
Great presentation as always. This is a deeper topic than we sometimes imagine. It can be argued that "not hurting feelings" is incompatible with a free and democratic society. As an example, some years ago I harbored unacknowledged -- basically subconscious -- racist ideas. Not "kkk" racist -- I'm white but have Black children. The racism was subtle, but real. I had no idea about this until one day my wife took me aside and explained to me the ways in which I was racist (this was 20 years ago -- before "woke" was a mainstream thing). I have to tell you, my feelings were hurt a LOT. It really, really hurt. It took a while for me to process this, but in time I understood she was 100% correct. I was able to improve myself. If my wife lived by a strict code of never hurting feelings under any circumstances, I'd still be harboring those toxic attitudes. Today I would be very, very careful about straight-jacketing LLMs in this manner. Yes, there's a risk of truly malicious content leaking through, but the counter-risks to society are in my view even greater.
Alignment of language used by LLMs in general is definitely a topic with ethical and philosophical considerations across societies. The good news for us as builders of AI solutions is that we're most often focused not on the general case, but rather on completing specific downstream tasks for our users or stakeholders. Incorporating our user's preferences into LLM applications in this context should be viewed as a data-centric approach to refining the UX during AI product development!
Hi, your demonstration used model the Zephyr-7B-Alpha. Is it possible to implement RLHF using GPT 3.5? Since it only provide through API, it's not possible to approach of RLHF. I want to confirm this.
Part 1 Google Colab Notebook: colab.research.google.com/drive/1h4xq7cfBv9Gg_YPWvEPblP6fuCK2vjQy?usp=sharing
Part 2 Google Colab Notebook: colab.research.google.com/drive/11qCfcABsxjjde7EihH6nHMH96aNBNONL?usp=sharing
Slides: www.canva.com/design/DAF7dnhquDM/oTFKTaShIVa-2t535ddERg/edit?DAF7dnhquDM&
It was a great video and RLHF was explained perfectly, thanks!
Thanks!
Great presentation as always. This is a deeper topic than we sometimes imagine. It can be argued that "not hurting feelings" is incompatible with a free and democratic society. As an example, some years ago I harbored unacknowledged -- basically subconscious -- racist ideas. Not "kkk" racist -- I'm white but have Black children. The racism was subtle, but real. I had no idea about this until one day my wife took me aside and explained to me the ways in which I was racist (this was 20 years ago -- before "woke" was a mainstream thing). I have to tell you, my feelings were hurt a LOT. It really, really hurt. It took a while for me to process this, but in time I understood she was 100% correct. I was able to improve myself. If my wife lived by a strict code of never hurting feelings under any circumstances, I'd still be harboring those toxic attitudes. Today I would be very, very careful about straight-jacketing LLMs in this manner. Yes, there's a risk of truly malicious content leaking through, but the counter-risks to society are in my view even greater.
Alignment of language used by LLMs in general is definitely a topic with ethical and philosophical considerations across societies.
The good news for us as builders of AI solutions is that we're most often focused not on the general case, but rather on completing specific downstream tasks for our users or stakeholders. Incorporating our user's preferences into LLM applications in this context should be viewed as a data-centric approach to refining the UX during AI product development!
Hi, your demonstration used model the Zephyr-7B-Alpha. Is it possible to implement RLHF using GPT 3.5? Since it only provide through API, it's not possible to approach of RLHF. I want to confirm this.
You cannot specifically RLHF fine-tune GPT-3.5, no. That model is behind the API wall, as you suggested.