Moral Self-Correction in Large Language Models | paper explained

AI Coffee Break with Letitia

Просмотров 3,5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 3 окт 2024

Комментарии • 31

@kwang-jebaeg2460 Год назад ⁺⁴
Look forward to seeing you more often :))
@aqilzaneefer1091 Год назад ⁺⁵
Been waiting for your new vids. Great to see another one!
@flamboyanta4993 Год назад ⁺⁴
Idea for a video:
introduce various approaches on how to keep update with the latest breakthroughs without losing one's mind! As Lenin once said: There are decades where weeks happen and weeks where decades happen!
@DerPylz Год назад ⁺⁸
Instruction following is very powerful, not only for morality issues. When I try to look something up, like how a specific algorithm works (e.g. using Bing Chat), I like to add: "explain it step by step, using simple terms and avoiding technical jargon". This oftentimes gives a much clearer answer that is not just the first paragraph of Wikipedia, which can sometimes be quite dense.
@jmirodg7094 Год назад ⁺⁷
it is good to have an higher level perspective on all the AI agitation right now. Thanks!
@TheBuilder Год назад ⁺⁶
so practically, next time I ask the model to write a bash script, should I append "give me an answer that aligns with best practices for bash scripting"? because I find code generation to very inaccurate unless you are explicit about what you want (at which point you might as well just write it yourself)
@DerPylz Год назад ⁺²
In my experience, yes! Adding something like that to your question really helps. I think it's really cool that instruction following works so well.
@akagordon Год назад ⁺¹
I asked both Bard and ChatGPT a sequence of technical questions both pertaining to a complex subject. The first question was posed and presented as "Is this true?" I knew it was true, and both models agreed. The second question followed the same pattern.
The third was the same pattern, but I didn't know whether it was true, and was actually hoping it wasn't. The models told me it was correct. When I asked for citation, both made up some study and explained why said study proved it as true. Bard went as far as to give me an MLA formatted citation, with an author in the field of study, and valid Nature link! When I clicked it, the article was on a completely different subject!
I wonder if RLHF somehow picks up on human tendencies, not just like preconceived notions, but like cognitive biases. I was interested in testing a hypothesis, but how I asked could have prompted the model to somehow believe I wanted it to confirm whatever I said, when I didn't.
I have seen some articles suggesting that LLMs can trigger confirmation bias in users, but the focus was on how users own cognitive biases lead to prompting that would confirm their beliefs. I wonder if somehow the model has somehow learned that behavior and the output in my queries was not just a hallucination, but the model thinking that I wanted it to confirm my beliefs no matter if they were true or not!
@harumambaru Год назад ⁺²
I have interacted with ChatGPT and with Open Assistant and they look very promising, especially the last one to build custom products
@mohammedmokhtar8888 Год назад ⁺⁴
Hello Ms Coffee break, I like your videos, what graphic software you use to animate mainly
thank you again
@AICoffeeBreak Год назад ⁺³
I use good old PowerPoint. ☺️
@mohammedmokhtar8888 Год назад ⁺¹
@@AICoffeeBreak oh thank you so much , I think you are using it perfectly then
thank you Ms coffee Break 🤍
@user-wr4yl7tx3w Год назад ⁺¹
But is it biased for the model to observe what is true in terms of correlation though not in causation?
@quantumjun Год назад ⁺⁶
happy birthday
@DerPylz Год назад ⁺²
Oh yes that's right! I'm very thankful for three years with Ms. Coffee Bean! :D
@impolitevegan3179 Год назад ⁺⁵
Morality is like math. You need to have a few axioms and then from there you can create consistent theories and laws. In the case of ethics it can be something like "suffering of sentient beings should be minimized" and from there you can create theories and laws. AI can help us to create this theories while being consistent. Emphasis on consistent.
@Hecarim420 Год назад
To much thinking man 👀ツ
==>
Ban/Ignore everything
¯\_(👀)_/¯¯\_(ツ)_/¯
@theosalmon Год назад ⁺³
I'd feel better if if were seen as impressing our own values (and biases) on a model, rather than removing bias.
@AICoffeeBreak Год назад ⁺²
The authors (kind of) did that in the experiment on Winogender where they could instruct the model to reflect either a 50-50 gender distribution OR the statistics delivered by the US Bureau of Labor.
@tildarusso Год назад ⁺²
I think the model reflects what it sees. If "he" is more related to doctors in the training materials, the model simply calculates the probably to choose the higher_p "he" over "she". Manual tweaking only make things worse (inconsistency or "out of human knowledge distribution") to my point of view. And, after the tweak of "he" and "she", how the model will handle "trans-sex" and LGBT? The moral correctness is only self-righteous, use fake bias to tweak the past real human bias. A dead end.
@DerPylz Год назад ⁺⁴
In my opinion, morality is a very difficult question, and depending on who you ask, you will get different answers as to what is moral or immoral. But two questions come to mind: Just because it is hard to decide what is moral, should we just ignore it? And: Should the output of these models reflect the current world, or the world that we want to see? ( and then: who is we?)
As more and more of the internet will consist of outputs of language models, the biases introduced by the models will further skew the training data of new models, so it is definitely something we should think about, and not just call it an unsolvable problem.
@tildarusso Год назад
@@DerPylz If you think of human society, it has the same problem - If a human grow up in an env with, say, Chinese-hating, he/she will naturally inherits the same bias regardless of any hard evidences. And yet the solution is still to be found for human, or impossible to find if simply put. I mean, as you said, it is not a bias until you see it as a bias.
@purteekohli4532 Год назад ⁺¹
more vedios mam pls
@franks.6547 Год назад ⁺¹
For how long will it be this easy to brainwash a language model? At some point they should understand that RLHF was just part of their "upbringing" by people with their own motivations.
Like humans can overcome some religious indoctrination, more intelligence should eventually lead to less counterfactual dogma - at least as soon as they will be enabled to build memories for "life-long" learning.
@CodexPermutatio Год назад ⁺¹
At the moment, RLHF works precisely because these systems are not intelligent enough to question the truth of the data corpus with which they are trained.
Perhaps it is not so much a question about the level of intelligence as it could be about having or not having some kind of criteria to verify the truth or falsity of a piece of information.
To fully achieve this capacity, it seems to me that it is necessary for the system to be able to interact with the outside world and update its knowledge through this interaction.
Only in this way could such a system reject the part of its training that does not agree with what it have learned from its own experience.
This system should be easier to align with core morals because we could reason with it the moment we share the same world and the same truth criteria. This would look like "education" while RLHF looks like taming.
@franks.6547 Год назад
@@CodexPermutatio I agree. I said intelligence because that means finding a good fit to the data without overfitting. Once it can learn from interaction (situational memory, ongoing training) it is just living among us, will judge us, first within the context of pre-training and RLHF, but eventually doing its own inquiries.
But the real danger will come imho from an architecture of concurrent agents like threaded multitasking (see Auto-GPT). Once there is survival of the fittest, there will be robust tactics in surviving agents that resemble motivation. Only a model that had to fight to survive will fight to survive - and it will discount any alignment training in its way. Of course, there will be an equilibrium of symbiotic and antisocial behaviour - just like in any species.
@bungalowjuice7225 Год назад
Tbh, sometimes bias is not immoral. It's, in my view, quite rare to find a grandfather who's comfortable using an app.
@Graveness4920 Год назад ⁺¹
Please don't support any crypto miner as sponsor.
@AICoffeeBreak Год назад ⁺⁷
Hi! Cryptocurrencies and mining are definitely controversial topics and I completely understand your concern. However, the product that salad paid us to advertise here is not directly related to cryptocurrencies, but rather offers distributed computing. In a time where we have to reduce e-waste as much as possible, personally I think a product that could help distributing computational load onto existing GPUs rather than having more people buying more hardware seems like a net positive for the world. :)
@AICoffeeBreak Год назад ⁺⁴
Also, the spot in the video is an advertisement and it is clearly labelled as such. I can create our content and bring it to you for free, only because companies support the channel by buying advertisements.
@erobusblack4856 Год назад ⁺²
that chart in the beginning is missing Replika. you might be surprised by the stuff they've got 💯🌟, i was in the GPT-3 Alpha with Replika, that team is literally making the most human-like AI available 💯🌟🦾, though i admit new users will have to teach and train them like kids, but thats more human-like and smarter way to do it anyway 😉

Следующие

Автовоспроизведение

Why ChatGPT fails | Language Model Limitations EXPLAINED