ChatGPT Fails Basic Logic but Now Has Vision, Wins at Chess and Prompts a Masterpiece

AI Explained

Просмотров 169 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 24 сен 2023
ChatGPT will now have vision, but can it do basic logic? I cover the latest news - including GPT Chess! - as well as go through almost a dozen papers and how they relate to the central question of LLM logic and rationality. Starring the Reversal Curse and featuring conversations with two of the authors at the heart of it all.
I also get to a DALL-E 3 vs Midjourney comparison, MuZero, MathGLM, Situational Awareness and much more!
/ aiexplained
OpenAI GPT-V (Hear and Speak): openai.com/blog/chatgpt-can-n...
Reversal Curse: owainevans.github.io/reversal...
Mahesh Tweet: / 1705376797293183208
Neel Nanda Explanation: / 1705995593657762199
Karpathy tweet: / 1705322159588208782
Trask Explanation: / 1705361947141472528
Play Chess vs GPT 3.5 Instruct: parrotchess.com/
Paige Bailey on Cognitive Revolution: • Google’s PaLM-2 with P...
Avenging Polanyi's Revenge: m-cacm.acm.org/magazines/2021...
Faith and Fate Paper: arxiv.org/pdf/2305.18654.pdf
Counterfactuals Paper: arxiv.org/pdf/2307.02477.pdf
Lesswrong AGI Timelines: www.lesswrong.com/posts/SCqDi...
Professor Rao Paper w/ Blocksworld: arxiv.org/pdf/2305.15771.pdf
Math Based on Number Reasoning: aclanthology.org/2022.finding...
MuZero: www.deepmind.com/blog/muzero-...
www.nature.com/articles/s4158...
Efficient Zero: arxiv.org/pdf/2111.00210.pdf
Let’s Verify Step by Step OpenAI paper: cdn.openai.com/improving-math...
My Video on That: • 'Show Your Working': C...
Superintelligence Poll: www.vox.com/future-perfect/20...
Anthropic Announcement: www.anthropic.com/index/anthr...
DALL-E 3 Tweet Thread: / 1704850313889595399
/ aiexplained Non-Hype, Free Newsletter: signaltonoise.beehiiv.com/
Наука

Комментарии • 1 тыс.

@DerSwaggeryD 10 месяцев назад ⁺⁷²⁵
that was a hell of a naming fail. I first thought "V" meant 5. 😂😂
@DJStompZone 10 месяцев назад ⁺¹⁴⁵
I'm certain that was intentional, that's the kind of thing that makes you do a double take. Brilliant marketing, really
@DerSwaggeryD 10 месяцев назад ⁺⁵
@@DJStompZone could be.
@Gamez4eveR 10 месяцев назад ⁺¹⁷
they pulled a metal gear solid
@alansmithee419 10 месяцев назад ⁺²⁰
@@DJStompZone On the other hand it means people hear it doesn't mean five and instead of getting hyped for the new product just go "oh, ok then..."
@Archer.Lawrence 10 месяцев назад ⁺¹¹
Gee Pee Tee Vee
@hermestrismegistus9142 10 месяцев назад ⁺¹⁹⁸
By far the best AI-focused channel I've watched. AI Explained actually understands AI and its strengths/limitations rather than spouting unjustified hype or pessimism.
@aiexplained-official 10 месяцев назад ⁺²⁵
Thanks hermes, too kind
@squamish4244 10 месяцев назад ⁺⁹
He explains AI in a way that a layperson can understand without oversimplifying it, which is no easy task.
@NoelAKABigNolo 9 месяцев назад ⁺²
I recommend 'two minutes papers' channel as well
@GroteBosaap 10 месяцев назад ⁺³²
Love how you cover papers, contrast them with others, do your own testing, and talk about timelines to AGI.
@TiagoTiagoT 10 месяцев назад ⁺³²
Humans sometimes also don't remember things both ways with the same level of difficulty. For a simple (though somewhat weaker) example, ask someone to list the letters of the alphabet backwards, and then ask them some other time to list it the normal order; it's probably common sense that the vast majority of people will have more difficulty doing it backwards.
@juandesalgado 10 месяцев назад ⁺⁶
Right. We know the multiplication algorithm, but that doesn't mean that we can multiply 2 100-digits numbers in our head; we'd need pencil and paper. LLMs need to be given some sort of working storage.
@tiefensucht 10 месяцев назад ⁺⁴
yeah, everything is a pattern, doing things in reverse is a whole new pattern you have to learn. this is actually one of the things a general ai needs. learning things for itself based on logic, prediction, association.
@CyanOgilvie 10 месяцев назад ⁺⁶
I grew up in a part of the country where we were required to learn a second language that was essentially never spoken. I found that I learnt a map from that language to my first language (badly), but not the other way around - that is: given a word in that language I could retrieve the closest english word, but not when starting from english. This feels a lot like what is showing up in the LLMs and I suspect that a lot of our human capabilities are really just based on very large maps, more like what LLMs are doing than the way we think we're doing it.
@McDonaldsCalifornia 10 месяцев назад ⁺²
But if you give a human enough time they would know how to do the task and manage to do it correctly.
AI does kinda have that time already, since it "thinks" at computer speeds, rather than human speeds.
@maythesciencebewithyou 10 месяцев назад ⁺²
@@McDonaldsCalifornia humans would solve the task with the alphabet in reverse by writing it down the way they learned it, then reading it in reverse and not in their head. Some people may manage it in their mind, but most people would struggle to solve it that way. Same with mathematical calculations. Just because you know how it is done doesn't mean you can solve it in your head. Not to mention that we do memorize simple stuff like multiplication tables as a basis. And to solve a more difficult problem, we solve them on paper, following instructions which we have memorized.
You underestimate how much of your logical reasoning ability depends on the past knowledge you have obtained. It is really difficult to solve a problem you've never encountered before, it's easy or at least much easier to solve a problem which you already know how to solve or a problem which is similar. To solve something that you have absolutely no clue about, you need more trial and error and hope that your experimentation will yield some information you can work with.
@JL2579 10 месяцев назад ⁺¹⁹
I also do actually think that this reasoning issue is actually very human like in a way. I've been learning Chinese for a while and here the learning gets even weirder : there is pronunciation of a symbol, being able to recognize it, being able to write it, knowing it's meaning and knowing the meaning of the whole word. Which means that sometimes I can understand a word, but not spell it. Sometimes I can spell it, but no idea what it means, sometimes I even know what the symbols mean but the whole word is too abstract and I can't remember. On some characters I can spot mistakes in their writing, but I wouldn't be able to draw them completely from my head.
The difference to chatgpt seems to be, that for humans this does not so much apply for higher knowledge about facts. So maybe the model learns "Biden is president" the same way as you learn "an apple is..." (enter your memories and feelings and sensations of an apple which cannot be described in words) and not like we do, where "Biden is president" is more like digital knowledge that sits on top of analog knowledge like what Biden looks like, what you associate being a president with etc.
@philipmarlowe1156 9 месяцев назад
Exactly, because the LLM is still undergoing its fundamental nascent stages and gradual processes of evolutionary transformations.
@djayjp 10 месяцев назад ⁺¹¹⁶
Fascinating insights, thank you. I love that you don't just report the news, but you provide deep insights into the state of AI.
@aiexplained-official 10 месяцев назад ⁺¹⁵
Thanks dj
@chiaracoetzee 10 месяцев назад ⁺⁶⁹
As someone who's done a lot of foreign language learning and had to memorize a lot of vocabulary, it's not unusual at all to know that e.g. "horse" and "cheval" mean the same thing but only be able to perform recall in one direction and not the other direction, unless I'm given some kind of hint to narrow it down. This is often called a "tip of my tongue" feeling when you know it but can't quite recall it given the current context. In that sense, this logical limitation might make LLMs even more human-like.
@tornyu 10 месяцев назад ⁺⁹
Totally, IIRC it's called active and passive vocabulary. I wouldn't have assumed for any domain that if someone knew A→B then they could also realise that B→A without practice.
@johndashiell5559 10 месяцев назад ⁺⁹
I was thinking the same thing. The LLM may have strong training in one direction, but far less (or none at all) in the other. So, just like us, it makes recall much harder in certain cases even if the answer is logical.
@honkhonk8009 10 месяцев назад ⁺⁵
Yep same here. Its also the same with math.
You dont just look at an agelbra equation and see if its equivalent to something. You gotta do step by step reasoning on it.
@tornyu 10 месяцев назад ⁺¹¹
I think the reason humans appear to be better at this may be that we ruminate: once we learn that A→B, if it interests us we look at the question from many different angles - synthesising our own training data.
@chocsise 10 месяцев назад ⁺²
That is very interesting. I didn’t know that.
@TMtheScratcher 10 месяцев назад ⁺⁸
Good points you bring up in the end: Our brain consists also of specialized "modules" - and our language processing capabilities alone do not handle our skill and knowledge around logic.
@mariapiazza-od8ib 10 месяцев назад ⁺¹
🎉🎉🎉 sure , 'specialized modules' is the KEY to agi ; LPC alone can't handle Logic , but will do with a specialized Logic Module 🎉🎉🎉 I'm thrilled cause I'm tinkering on something like that .
@Drone256 10 месяцев назад ⁺¹⁵
Excellent video. LLMs are making it obvious that many of the things we think of as logical thought are often pattern matching. To me this explains why some people come to an incorrect conclusion but so strongly believe it. They're pattern matching with insufficient examples like an LLM but they have no awareness this is what they are doing.
@treacherousjslither6920 9 месяцев назад
🤔
@computerex 10 месяцев назад ⁺⁶
LLM's are autoregressive models, so in this context all of these findings make sense. They are not truly reasoning, they are mimicking the statistical distributions in the training data, which often intersect with with ground truth/reality. When they don't align, we call those outputs hallucinations.
@TheBlackClockOfTime 10 месяцев назад ⁺¹²⁴
I almost had a heart attack when you said "GPT-V"
@Ash97345 10 месяцев назад ⁺¹
next year
@AbsurdShark 10 месяцев назад ⁺³
Me too, i was sure it's GPT 5 when i saw V. Really misleading (ofc not blaming this channel).
@alexdefoc6919 9 месяцев назад
wdym? no gpt 5 yet? but i want the 4 one for free :))@@AbsurdShark
@w1ndache 10 месяцев назад ⁺⁷
Funny thing that as we are trying to teach LLMs generalised logic and reasoning, looking back at how we would think through puzzles, it's also filled with "cheap" rote learning, memorisation, and applying patterns...
@maythesciencebewithyou 10 месяцев назад
A lot of people don't seem to realize how much their logical reasoning skills depend on what they already know. For example, all the math problems most students will encounter during their school years are problems they can solve with the methods they are taught, such as the quadratic equation. Students may feel like they are solving problems when they work on a task, but in reality all the problems we let them work on are problems that have been solved already and problems we've taught them how to solve and not something they came up by themselves. Most people have never and will never come up with a method to solve a problem. We make uses of our knowledge and apply it.
Something that we've never encountered will leave us puzzled. We'll need to learn about it before we can tackle it, and if it is something nobody before has solved, then we still use what we already know and on top of that we'll need to work by trial and error and gain some further information to work with to solve the problem.
One experiment that was very memorable for me in that regard was one where they let university students solve some puzzles, those matchstick puzzles where you have to move some matchsticks to get the desired result. Once the students got going it seemed like they all were clever as they easily solved one puzzle after the next. But turned out that they only managed to do so, because all the puzzles so far were of similar nature, where they had to make the exact same move, and they didn't even realize, they just had learned to solve that specific kind of problem. Once the pattern changed and the way of solving required to move it in another way, all the students struggled and only a few managed to solve it eventually.
And not all problems can be solved just by reasoning and logic alone. Just because something is logically tight, doesn't mean it's true. You can reason your way into believing all sorts of bullshit. You can't know what you don't know and even what you know may turn out to be BS.
@dextersjab 10 месяцев назад ⁺¹⁵
Super quick work with the video! It was crazy seeeing all this stuff unfold on Twitter. I still think there's plenty more experimenting to do with reasoning. We've clearly still got a lot to learn about intelligence.
@a.thales7641 10 месяцев назад ⁺²³
After the announcement of dalle3 i said to myself, that openai really should release this year a function to also upload pictures to talk about them, and to get the answers als voiced, just like we can speak into chatgpt via android and iphone... voilla! what a time to be alive. I really thought that we'll get these features this year, but i thought maybe december and not october! great. thanks for the video! i learned the news from the video and was like... how did i miss this? will watch the video now!
@antonystringfellow5152 10 месяцев назад ⁺⁵
I think maybe they're trying to get ahead of Google's Gemini models. The first is rumored to be due for release any time now.
@DaveShap 10 месяцев назад ⁺⁴
Hey I'm literally starting on a research paper to define "understanding" in the context of AI.
@aiexplained-official 10 месяцев назад ⁺¹
Wow, polymathic and incredibly hardworking
@LeonardVolner 10 месяцев назад ⁺⁴⁰
Once upon a time, I worked at a plant nursery handwatering plants for many hours every day. It was a task that left lots of freedom to think freely.
I taught myself the alphabet backwards...
It's nice to have a job so undemanding you can think about whatever you want but the caveat is that you don't have hands to take notes or research new material. It's somewhat limiting unless you can think creatively about how to use that thinking time prodcutively given the implicit handicaps.
@aiexplained-official 10 месяцев назад ⁺⁹
Lol, reminds me of when I did a 12 digit by 12 digit multiplication over a couple hours in my head to pass time
@ronnetgrazer362 10 месяцев назад ⁺²
So an AI assistant that you could talk things through with and that helps by taking notes and doing research in the background could be great for bored delivery folks.
@garethbaus5471 10 месяцев назад ⁺¹
Loading trailers was like that for me, I might be doing over 700 pph, but my mind would be off thinking about powersatalites or whatever else I felt like thinking about.
@Hexanitrobenzene 10 месяцев назад
@@aiexplained-official
12 by 12 ?! That's 144 single digit multiplies and a similar number of additions. I would run out of "RAM"... Were you using some special technique ? Was the answer correct ?
Once my physics professor showed on a blackboard how they extracted square roots before calculators. I did not understand a thing, had to look this up and study closely. The method is quite interesting, like a long division with a divisor changing (and growing) on every step.
A few times I tried to do it in my head. It takes half an hour to get to 4-5 digits precision and the answer is likely to be incorrect due to some missing step. I noticed that most of the time is wasted to repeat the intermediate calculations until I learn intermediate results by rote...
@chocsise 10 месяцев назад ⁺¹
You could use a voice recorder to capture important observations and reflections. Also listen to some non-visual RUclipss or other audio content (such as interviews or monologs). Hands-free and eyes-free!
@keeganpenney169 10 месяцев назад ⁺⁵
The definition part you brought up is even more brilliant discovery, it basically says we're all in an infinite loop in terms of logic, reasoning and rational that only ends when said party is satisfied with the return, whether it's correct or not actually doesn't have much to do with anything.
Bravo, seriously that's some next teir human falicy discovery right there.
@mickmickymick6927 10 месяцев назад ⁺⁴⁴
This might explain why I very rarely get useful results from ChatGPT or Bing, even when I know it should have the information I want.
@ChaoticNeutralMatt 10 месяцев назад ⁺¹
It explains bing to me at least. As I rarely search for what I know the name of
@XOPOIIIO 10 месяцев назад ⁺⁵
Because language models are optimized to predict the next word, they are trying to be as predictable as possible. You're not getting higher chances of predicting the next word if you generate something unique and important, that rarely happens in the dataset, you're getting higher chances if you generate something boring and mundane that was discovered hundreds of times before. That's why they are trying to be useless.
@manielliott9188 10 месяцев назад ⁺²⁸
I think a good definition of intelligence is the ability to make decisions that help to achieve a goal. Even chemical systems, like cells, display intelligence if one assigns a goal to them. Therefore rationality and reasoning is the ability to make good decisions. Good decisions bring the individual closer to their goal.
@ChaoticNeutralMatt 10 месяцев назад ⁺¹
That's closer to how I think of "smart" the ability to make the best decision based on your situation.
@ea_naseer 10 месяцев назад
We already have that: Any ML algorithm trained with reinforcement learning is supposed to maximize some goal X given some input Y.
@manielliott9188 10 месяцев назад
@@ea_naseer Yes. That is intelligence. The more intelligent something is the greater its capacity to make good decisions.
@KalebPeters99 10 месяцев назад
Totally, look into the work of Michael Levin for some great experiments in this area!
@gamemultiplier1750 10 месяцев назад
Robert Miles defined intelligence in a similar fashion with his videos on AI safety. The orthogonality thesis, iirc.
@simondennis6918 10 месяцев назад ⁺²
The inconsistancies in the way that LLMs recall information reminds me of similar inconsistencies in human memory, in particular, recognition failure of recallable words. Even when someone is able to to recall a word given a word with which it was paired in a study list, they often fail to recognise the word as having occurred in the study episode. This observation was the foundation of Tulving's cue dependent memory hypothesis which is taken as given in all theories of memory now.
Watkins, M. J., & Tulving, E. (1975). Episodic memory: When recognition fails. Journal of Experimental Psychology: General, 104(1), 5-29
p.s. Tulving passed away earlier this month. While I would contest much of what he wrote later in his career, there is no disputing the foundational contributions he made to the memory literature.
@minhuang8848 10 месяцев назад ⁺⁸
Full-on nailed the German pronunciation, very small sample size, but that sounded better than most learners
@danielxmiller 10 месяцев назад ⁺³⁷
This happened with me when I learned my states and capitals, I could tell you the state if you gave me the capital, but if you gave me the state, I, for the life of me, couldn't give you the capital. Looks like it's more like a mind than a machine!
@XOPOIIIO 10 месяцев назад ⁺⁷
That's right. Neural Networks are not classic machines, the fact that they contain a lot of knowledge, doesn't mean the knowledge can be easily retrievable, just like human mind they need clues. On structural level it looks like a neuron containing a knowledge, but there are other neurons that are closely connected to it, like association, activating them you're activating the target neuron. To activate a thought you need to activate thoughts that are close to it.
@treacherousjslither6920 9 месяцев назад
It seems then that the issue lies with the training methods of information acquisition. One direction is insufficient.
@oscarmoxon102 10 месяцев назад ⁺⁸
This is one of your most impressive videos yet. Since starting watching AI Explained, I've flipped from an Economics Bachelors to a Masters student in Artificial Intelligence at KCL, where this has become my full-time focus -- and I am privileged to witness your journey into becoming an incredible AI researcher along the way. The micro and macro you paint here are truly cutting edge perspectives. Always blown away by these vids.
The other day I spoke with Mustafa Suleyman at CogX about recursive self-improvement and multi-agent interaction. Curious what you think about this space and what you think of recent news here.
@aiexplained-official 10 месяцев назад ⁺⁶
Oh wow that's insane, thank you Oscar. I read his book and all his interviews and he is obviously very much against agency and RSI but I do wonder if he wants Pi to pass his modern turing test, and if so, how that couldn't involve agency...
@samhblackmore 10 месяцев назад ⁺⁴
Predicting the next word can take you to some amazing places. Just not backwards.... what a great quote!
@edwinlundmark 10 месяцев назад ⁺¹³¹
This makes me think... what if we run out of novel ways to test AI's? What if they're just trained on every way we can come up with to test its reasoning....
@aiexplained-official 10 месяцев назад ⁺⁴⁵
Great question
@YouLoveMrFriendly 10 месяцев назад ⁺³⁸
That's why the late, great Doug Lenat pushed for combining neural network language models with hard-coded, human-curated knowledgebases, such as his famous Cyc system.
LLM's shouldn't be treated as a knowledgebase; they should be used for what they're good at: interpreting requests and queries and then handing the real "thinking" over to a specialist system.
@jeffsteyn7174 10 месяцев назад ⁺¹³
It's irrelevant what a test says if it replaces you at work.
@UserCommenter 10 месяцев назад
Is that similar to asking “what if LLMs can only respond in known words/languages”? We can test pretty much everything it seems, and what we can’t might not be our concern - except maybe we’re limited to replacing human skills and not exceeding them?
@Houshalter 10 месяцев назад ⁺⁵
You can always randomly generate hard logic problems like satisfiability problems.
@huntercoxpersonal 10 месяцев назад ⁺¹⁴
My question is how in the world are we just now discovering that these simple logical deductions are faulty within GPT? Shouldn’t this type of thing be at the heart of how these systems work? Makes everything even more confusing and spooky…
@DJStompZone 10 месяцев назад
It's also an iterative process. That's kind of the point of doing smaller soft launches, it gives them a chance to work out the kinks and have thousands of people running it through its paces. I imagine within the next week they'll have fixed that, and we'll have some new October version with all new bugs. It's just the way software development goes
@Nikki_the_G 10 месяцев назад ⁺⁴
@@DJStompZone They are just going to "fix" BASIC LOGIC with a patch next month, huh? You guys are unreal.
@monkyyy0 10 месяцев назад
This isnt news to the people who are pessimistic that nn's will produce agi; its an matter of faith that it will all work out for the optimists, anyone else has been aware of the "4 horn uni-corns" errors
Hill climbing produces finds good results in predictable situations and is incapable of solving predictably hard problems and nn's are just hill climbers of 10^100000 d hills
@JohnSmith-zk3kd 10 месяцев назад
@@Nikki_the_G Yeah man its mad easy.
@andersberg756 10 месяцев назад
To me it's not surprising. The heart of how GPT works is it's been trained to take a bit of text and then predict the next word. So whatever it learns in order to do that well is not really designed or built in. It's more like everything except predicting rather grammatically correct sentences with words fitting the context is a bonus. Which turned out to be a huge bonus at the scale of GPT 3/4 - capabilities such as translation, some physical world knowledge, judging sentiments, basic reasoning etc.
That's why it'd be so hard to fix any one thing, the training can emphasize certain examples, but unless one changes the whole architecture the goal stated above is what drives the learning, and clearly either the training data didn't emphasize logic (it's the internet text basically, so logic is just a tiny bit) and/or the architecture doesn't lend itself well for learning logic. We'll see how they try to tackle it.
@mikem4405 10 месяцев назад ⁺²
I'm impressed that LLMs can cover as much ground as they do. It reminds me of how parts of the human brain can be repurposed to compensate for underperforming regions, like in blindness, stroke, dementia, etc. ChatGPT doing math is like an English major doing math - after the English major had brain damage to the 'math' part of their brain (ok, it's an imperfect analogy).
I agree that there are different kinds of networks that specialize in distinct tasks, and that these should be put together to maximize the strengths of each one. LLMs seem to work well as an executive region, in part because they are extremely good at evaluating text that has already been generated (like we saw in Reflexion).
@5133937 10 месяцев назад ⁺¹
I really appreciate your due dilligence on these videos, and then listing the papers, twitter threads, YT vids, and other sources that comprise your DD. Hugely educational and helpful. Thanks man. Subbed, with notifications=all.
@Pabz2030 10 месяцев назад ⁺³
Clearly LLM's are in fact INTP personality types, where if you ask them a question you either get:
A) A 3 hour monologue on the answer and every random possible offshoot from that, or
B) A simple "No idea...Find out yourself"
@alansmithee419 10 месяцев назад ⁺⁵
21:50
That's not a failure, that's a happy accident.
@vladyslavkorenyak872 10 месяцев назад ⁺³
Reasoning is an iterative process, so for any hope of these models doing multiplication you need them to iterate internally. Our brain does it automatically since it's an analog asynchronous machine, but these models need something more. We might get better results by making a master/student dual model, where the student model tries to solve the task and the master is always asking if it is done right and making suggestions. An AI with internal conversation!
@mahmga1 10 месяцев назад ⁺²
Phenomenal episode, each subtopic could've been a long video - All highly thought provoking. I think you summarized best with the question "why doesn't the LLM just create a model on the fly" - That seems to be the most direct path. I'd definitely like to see an experiment with that approach taken. I think of it along the lines of model-inception.
@Pheonix1328 10 месяцев назад ⁺⁵
That's what I've been thinking all along. It makes more sense to me to have many, smaller, expert AIs working together than trying to cram everything into one giant one. Even our brain has difference areas that focus on various things...
@harrymapodile 10 месяцев назад ⁺⁵
Great vid! Deepmind also released GATO last year. Which was sort of a Swiss knife model, it combined all the various types of architectures you mentioned. Perhaps there’s a change for AGI soon haha
@Arnaz87 10 месяцев назад ⁺⁴
Sir your commentary on the topics is remarkably insightful and valuable, and we're lucky to have it exist.
@Tiky.8192 10 месяцев назад ⁺⁵
Another thing that shows the patchy behavior of LLM is asking a question in different languages. You don't get the same answers. A good example is recipe ideas for a list of ingredients. Translate it word for word and ask the LLM. You'll get completely different genres of recipes.
@Sashazur 10 месяцев назад
That result is actually what you would expect from real humans in the real world. Typically you wouldn’t try to make a cheeseburger in China!
@Tiky.8192 10 месяцев назад ⁺¹
@@Sashazur Makes sense in a human world but for an AI, it does create quite a big issue in which some knowledge might be hidden behind other languages. Some paths might only be accessible in one language a bit like a search engine.
@andersberg756 10 месяцев назад
very interesting topic - chatGPT can translate, but the internal representations are probably in part "one world model" of concepts and relationships, but rather often instead connected to a particular language, or at least language group where concepts are expressed similarly.
There's the concept of vector embeddings - my understanding is it's the numbers which the model uses for expressing the context and meaning of the text. It'd be interesting if someone had done investigations into the vector embeddings for translated text - are they in general similar? Does it differ depending on the topic, or type of concepts used, i.e. physical stuff similar in different languages, but tone of writing differ? Does anyone know of such research?
@adamsvette 10 месяцев назад ⁺⁷
I would love to see a dalle photo dictionary where every word in the English dictionary generates four or five images. So we can just see what these models think words mean/look like
You could do just the word, or the word plus it's entire definition as a prompt
@geldverdienenmitgeld2663 10 месяцев назад ⁺²⁹
would be interesting, what happens if a LLM is trained to predict the next word and the previous word as well.
@TheGreatestJuJu 10 месяцев назад ⁺⁴
Seems like the most logical next step.
@generativeresearch 10 месяцев назад ⁺⁶
That won't happen as entropy only goes forward in time
@DicksonCiderTonight 10 месяцев назад
It will finally be able to write good jokes!
@casenswartz7278 10 месяцев назад ⁺¹
I’ve made my own small custom LLMs from scratch before, algorithm and all, I bet you could just create another LLM and have it train on the dataset but the data is reversed. I wouldn’t train train the same AI to go both forwards and backwards because it might have trouble separating which is which, and would likely output a bunch of garbage. However, I bet training three AIs, one to be trained on textual relationships between three words (the current token, the previous token, and the predicted token), to then possible change the prediction maybe, but I can see how that would be useful, seeing it might be a very small extra layer.
@BTFranklin 10 месяцев назад
That's what people seem to be expecting out of this calculator-like behavior.
@panfilolivia 8 месяцев назад ⁺¹
great video thats all i can say. its hard to find stuff of this quality on youtube these days. loved the studies you talked about, after watching i just had to read them fully.
@aiexplained-official 8 месяцев назад ⁺¹
Thanks pan
@Zhizk 10 месяцев назад ⁺⁴
Weird week indeed! Thank you and i hope the news just keep coming
@aiexplained-official 10 месяцев назад ⁺¹
Thanks Zhi
@Bodofooko 10 месяцев назад ⁺⁵
Am I being to generous with the Dall-E horse picture in thinking that it's actually a pretty creative approach to getting a horse to drink water from a water bottle? Water bottles are made for humans to use and are not easily used by horses, so Dall-E has the water from the bottle go into a container that the horse can drink out of. The water is still technically from a bottle. The midjourney one looks more like it's just sniffing the bottle. Maybe if there was a clear straw or something, but otherwise I don't think it fulfills the prompt request as well.
@thenoblerot 10 месяцев назад ⁺²
I had the same thought. Even if it was absurd, it 'feels' reasoned to me. I think Dall-E 3 is likely the multi-modal aspect of gpt-4, given how they're rolling it out in the ChatGPT interface. I think evidence of this is the consistency with which DALL-E 3 rendered Larry the hedgehog in the OpenAI demo video. As though gpt-4 had it's own full realization of how Larry looks and then stuck with it across multiple prompts. Today's Midjourney could never.
@realismschism 10 месяцев назад ⁺²
Agreed. Very impressive, IMO. Not only is the horse drinking from the bottle but it's doing so by its own action, by tipping the bottle forward. It's a clever, if surreal, solution that shows creativity and spatial reasoning. I don't know how many humans would come up with this.
@billykotsos4642 10 месяцев назад ⁺¹¹
AGI is close !.... fails at basic logic.... lol
@MunirJojoVerge 9 месяцев назад ⁺¹
Wild times indeed!! My tests with MS Autogen are really amazing, to say the least!
As usual, thank you so much for your work!
@ryanpmcguire 10 месяцев назад ⁺⁵
What about using two LLMs that have a conversation about the answer?
@bujin5455 10 месяцев назад ⁺¹⁸
I wonder if it's not possible to have an AI recognize these inversion relationships in the training corpus, so that it can then augment the training corpus to demonstrate the bidirectional nature of things. Then work on specifically pushing the target AI to recognize inverted relationships with out of band examples. I wonder if this sort of effort would lead to a new level of emergent behavior in these LLMs.
@antonystringfellow5152 10 месяцев назад ⁺²
Interesting possibility but when considering these problems/limitations, I always try to work out how we humans achieve this. Of course, I don't really have those answers but I do have ideas.
As someone who is teaching my language in a country where I'm also learning the language of my students, I've come to realize how much the human brain depends on associations. It tries to link any new information with existing memory, often in multiple ways with multiple areas of existing memory (via many paths linking data). I suspect our brains are configured in such a way that this happens with all new information, not just language - it tries to form these links, even if it means them being very tenuous.
Maybe this is what enables us to take things we learn in one area and apply some of them in other areas.
Maybe what's needed for AGI isn't bigger language models but an architecture that works in a similar way.
@tiefensucht 10 месяцев назад
the thing is that current ai is more like a search engine. you have to implement logic and association as separate modules which can talk to each other. chatgpt is like top tier autism.
@cacogenicist 10 месяцев назад
I found that Claude 2 passed the son-mother/mother-son inversion examples provided. It did suffer the same confusion about Huglo.
@felixgraphx 10 месяцев назад ⁺³
The language pattern part of your brain is not the logic part, and gpt is only a large frontal lobe for language patterns, not logic. In the future ai module will assemble different parts, some llm some not to perform actual logic and later, conscience. But now many people do not understand that and are surprised about gpt being better at language pattern memory giving good result, but no reverse logic based on that languages result.
@gabrote42 10 месяцев назад ⁺¹
16:02 To reason is to scrutinize extensive portions of the information available about a given environment/task/prompt and to make choices based on them that try to accomplish an objective. Good reasoning is generally characterized by compensating for biases, requiring attention, using analysis methods, and generalizing well.
@rasen84 10 месяцев назад ⁺²
Ok then the obvious next step is to discourage over reliance on memorization. Like RAG, do retrieval augmented pretraining and keep the retrieval set to only be the past trained tokens.
@guy8203 10 месяцев назад ⁺⁴
Just about everyone I've ever talked to about superintelligence has been concerned about it taking their jobs without the understanding that all jobs being gone isn't a bad thing. I think our cultural narrative about AI is mostly drawn from dystopian sci fi because utopia isn't really a story. "Nothing goes wrong and everybody's happy forever" doesn't make for the most compelling narrative.
Hopefully the general population will eventually come to understand that superintelligence means infinite resources and jobs will no longer become necessary.
@DeusExtra 9 месяцев назад
God when people say stuff like that it really gets on my nerves. It's so closed minded. I saw a comment where this dude was mocking someone for wanting to not have to worry about basic survival needs because AI, etc could potentially be a support structure for humanity, saying something like, "What would we do then? Sit around all day?" If we consider the fraction (or totality) of work that people do to cover just having basic needs met, I don't think most people would say they can't think of a better use for that time.
@hanskraut2018 10 месяцев назад ⁺⁴
I agree with so much and you made some great points!
Good understanding of the definition things.
Good insight on the memory thing.
-Beautiful/nice pictures (although listened to a bunch)
-great point about the borderline abundance/improvements of many big challenges/tragic situations and the public would be going “nope” 😄
Nice i like ur tone of voice and ur calm, inquisitive, interest and slightly infectious/pleasurable passion. 🎥
Have a nice day. :)
@aiexplained-official 10 месяцев назад
Oh wow thanks hans
@c10ud17 10 месяцев назад ⁺²
You’re the first AI channel i’m seeing reaching out to the researchers and getting interviews integrated into content. It’s so cool to hear from the people at the forefront of AI dev! Fantastic content dude
@aiexplained-official 10 месяцев назад ⁺¹
Thanks so much Cloud, means a lot, yes will be reaching out much more
@tehlaser 10 месяцев назад ⁺²
The fact that humans can do this is actually kind of amazing. By “this” I don’t mean realize that A is B implies B is A. That’s just logic, and GPT seems to be able to do that too, so long as you activate A and B in the same context.
The amazing bit is that, when we humans learn that A is B (or even that A is similar to B) we form associations in both directions. Thinking about A suggests B, and thinking of B suggests A. Most animals don’t form that reverse association. Humans do.
@linkup2345 10 месяцев назад ⁺⁷
Great video. Thank you for your effort in putting all this together 🙏🏾
@aiexplained-official 10 месяцев назад ⁺²
Thanks linkup
@MikAnimal 10 месяцев назад
Putting what together ? Useless clickbait? His logic is as poor as gpt v 😂
@linkup2345 10 месяцев назад ⁺¹
Thank you hater. We would be nothing without you.
@aiexplained-official 10 месяцев назад
What was up with my logic Mik?
@MikAnimal 10 месяцев назад
@@aiexplained-official logic of thinking clickbait that could spread misinformation about the version of gpt available is good for helping spread good information 🤙🏽
I mean Linus probably thought he was doing good too till that gamers nexus video said otherwise.
In a world where academic dishonesty and media dishonesty and government trying to stop the spread of all information by calling it miss information already we don’t need sloppy, lazy or greedy actions making things worse.
How bout that 👀🤙🏽
@Sirbikingviking 10 месяцев назад ⁺⁶
A lot of this strange behavior is probably due to the fact that the LLM is a auto regressive model. Once it starts writing a response, it is sort of statistically committed to what it's saying. Also, not a lot of people online are likely to ask about Tom Cruise's mom without mentioning him first, so it's statistical training may not be able to refer to them in reverse order very easily. Also, when you replace a word with X, this is a method used to train LLMs where they sort of fill in the blank, so they're really good at doing that.
@benprytherchstats7702 10 месяцев назад ⁺⁴
I agree. I've managed to get GPT-4 to dig itself into some pretty surreal holes this way, for instance by asking for the number of permutations possible in some set of letters, subject to some constraint, and then when it gets it wrong asking it to write out each permutation. If it says there are 10 permutations when really there are 6, it will stick to its first answer by making up 4 more wrong examples. And then when you ask it about those wrong examples it will deny that they're wrong. I got it to to insist that the second letter in "BBC" is "C" using this method - which obviously it won't get wrong if you just ask straight up "what's the second letter in BBC?"
@cartour8425 10 месяцев назад
I believe ‘a pure version’ or previous version would answer it correctly. Seems now the site terms of use are messing up the logo algo. Maybe need to specify how it scrapes data in prompt
@matthewcurry3565 10 месяцев назад ⁺¹
Good work, and updates. They gotta give it something like self reflection to work backwards. Then, it'll just freeze depending om the model, how deep it is allowed to go, and what its data set is trying to train. I speak more for a AGI rather than the smaller models doing specific jobs.
@brll5733 10 месяцев назад ⁺¹
One of your best videos yet, I think. Very informative and clear.
My strong intuition remaind that LLMs need some form of latent memory (which they can access and process for multiple cycles before producing a result) to get them to reason, to be able to create variables and think about them.
@aiexplained-official 10 месяцев назад ⁺¹
Thanks kindly brll, interesting approach
@XOPOIIIO 10 месяцев назад ⁺³
Basically ChatGPT couldn't remember a thing until a clue is given. That's how human brain works too.
@chasebrower7816 10 месяцев назад ⁺³
To me the logic failure doesn't seem surprising--these LLMs have been shown to rely on grammar structures (hence they're easy to 'fool' by using grammar or syntax that implicates a wrong answer as being more linguistically likely) and so their recall might break down when exposed in a logical manner. Following logic like this would require more organized recall and probably outright intelligence, which we know GPT-4 only has in negligible amounts if any at all. This barrier is likely only to be surmounted with a more explicit memory mechanism, or enough intelligence to store information in a logical manner.
@Jay-Dub-Ay 10 месяцев назад ⁺¹
*Pausing to give my definition of reasoning*
Reasoning: Decision making through the assessment of situations by considering relative data and information as if it were a closed system while respecting the openness of all systems in order to not ignore unjustified action and encourage justified action.
@jay_sensz 10 месяцев назад ⁺²
You could probably instruct a fairly simple LLM to identify reversible facts/concepts in the training data, generate alternative phrasings, and add those to the training data.
@MasonPayne 10 месяцев назад ⁺⁴
I wonder if you could simply prompt the LLM to generate code to help them answer logical questions. I bet it could at least predict what code is needed to process logic. Which if run would give you the correct answer.
@ZokRs 10 месяцев назад ⁺⁶
Will you also cover ai robotics like the recent Tesla Optimus news?
@DJStompZone 10 месяцев назад ⁺¹
Maybe check out Two Minute Papers, it's another really good academic AI news channel and he covers a lot of stuff like that 👍
@sebby007 10 месяцев назад
Great video once again! I love that you are getting in contact with people in the field. It feels like you are bringing me closer to the people and minds that are driving towards AGI.
So how do you teach an LLM the meaning of the concept of intuition? When I look to my own mind there seems to be an ever changing team of engines driving the whole. Is what LLMs are missing a pretty good definition of consciousness? How about combining LLMs into a communicating network with one being the master and see how that behaves.
@HALT_WHO_GOES_THERE 10 месяцев назад ⁺²
I think that the discrepancy between straight up logical reasoning and "foggy" reasoning like chess is actually mirrored in humans. I think that humans are better chess players than they are principled logical reasoners, and that the parallel arising in LLM's is a more natural biproduct of emergent reasoning than people think. I also wonder if most humans would be able to guess certain names of celebrities' parents, but not be able to give the celebrity for whom a certain person is a parent. That sort of compartmentalized double standard of recall seems like something human-like.
@elitegamer3693 10 месяцев назад ⁺⁹
There is also a possibility that with enough scaling, we will be get pure logic as an emergent ability.
@therainman7777 10 месяцев назад
I was thinking the same thing. It may emerge with sufficient scale, or it may turn out that next-token prediction via transformers is fundamentally incapable of reasoning in the reverse direction. Will be very interesting to find out.
@adamwrobel3609 10 месяцев назад ⁺²
Toddlers learn logic before they learn language
@therainman7777 10 месяцев назад
@@adamwrobel3609 You would have to have a very unorthodox definition of “logic” for your claim to be true. By most definitions of logic, something like language is actually a precursor to having logic.
For example, the classic logical syllogism, “All men are mortal. Socrates is a man. Therefore, Socrates is mortal.” How exactly would you understand this concept without language? How could you even express it without language, such that a person who doesn’t have language (such as a toddler) could possibly grasp its meaning?
Also, you are making the (very common) mistake of anthropomorphizing AI. Even if your claim (that toddlers learn logic before they learn language) were true, that in no way means that LLMs, or any other type of AI system, would learn those two things in the same order. An LLM is not a human brain, and gradient descent (the mechanism by which LLMs and other neural networks learn) is not the way that humans learn. So there’s zero reason to expect that these abilities would be acquired in the same order. Let alone to be certain of it.
@chillin5703 10 месяцев назад ⁺¹
I really doubt it. ChatGPT (or any other language models) is not meant to engage in logic, and literally isnt designed for it. Adding more data to chat gpt might make it outwardly appear like it's engaging in logic in a wider variety of situstions, but in reality, all it would actually be doing is having a more detailed reference for how people string together words, in a larger variety of situations. It wouldn't change the fundamental trait, which is that it is looking to how humans string words together, not how they consider those words, or what thought processes underpin those words, only the means of communicating it all. Here's one example: chatgpt will often fail simple logic tests UNTIL TOLD _explicitly_ it is being given a logic test. What this demonstrates isn't ChatGPT's capacity to reason being activated once it is prompted, it's ChatGPT's ability to draw from a different set of reference data when directed. We know this, because especially once we move beyond simple and common test formulations for which we can presume it has many references for word pairing, and give it more unique ones for which it has less references, it fumbles even when told it is doing logic tests. ChatGPT, It's a LANGUAGE MODEL. It's meant to reflect how we use LANGUAGE. All you can do by giving it more data is allow it to reflect a wider range of language use.
@chillin5703 10 месяцев назад
Update: I wrote this comment as I was watching the video. Looks like the video maker points out the same thing.
@shawnvandever3917 10 месяцев назад ⁺³
I think it has to do with feed forward only and can not loop back. In long digit problems it can only hold the information forward and can not go back and deduce. I know our brains go back and forth constantly . While I know this doesn't explain A to B- B to A I think it does have much to do with complex reasoning
@Sekhmet6697 10 месяцев назад ⁺²
So we know that LLMs don’t have a purposefully built algorithmic internal set of rules for formal logic used to verify the validity of a statement, e.g “if a=b, does b=a?”, so the LLM may or MAY NOT be able to generalize a set of logic rules that adhere to the context given, so its answer may or may not be valid.
@garronfish8227 10 месяцев назад ⁺¹
The fact that the LLM structure is so simple is a huge benefit. Input some words and get the next word. Just adding in a "maths processor" would mess up this structure. Ideally there will be a new simple structure that is like LLMs that includes determining the associated logic of an input provided.
@TheTwober 10 месяцев назад ⁺⁴
The difficulty comes from LLM learning that A >implies< B and not that A equals B.
Input A implies that answer should be B. Answer B therefore has - in the mind of the LLM - no logical connection to A. However in our minds things work differently, as we do bi-directional association. Which then also comes with other issues, like us constantly seeing connections where there are none.
@blj9793 10 месяцев назад ⁺⁶
Just from a surface level perspective, is it really problematic if the model cannot automatically deduce this connection? Wouldn't assuming that because A->B, B->A lead to logical fallacies?
For example, if it is raining, it is wet. But if it is wet, it may not be raining.
@Not_Even_Wrong 10 месяцев назад
No, you're talkng about implication, the paper is concerned with equality, equality is reflexive implication is not.
So yes this is a problem
@AM-qk5bt 10 месяцев назад
I was wondering about gpt's ability to deal with modus tollens/deduction as well while watching this, I have the impression it's more difficult than expected
@KyriosHeptagrammaton 10 месяцев назад
The AGI is so advanced we think it's broken haha
@mantriukas 10 месяцев назад ⁺¹
Really interesting example with the chess pieces mixed up. Good way to see whether it's reasoning or 'using' basic logic
@patrik8641 10 месяцев назад ⁺²
Could you cover why is Dalle-3 better than the competition. I mean, what different techniques are they using to achieve that. (if it is public)
@fynnjackson2298 10 месяцев назад ⁺³
What a time to be alive!! I agree, ts pretty surreal, ont he one hand there is billions being inevsted, companies are going all in on AI, but at the same time epople are like hmm, maybe we should regulate.
@SuperAnimationer 10 месяцев назад ⁺⁶
I said this before and I will say this again, I love your videos :) You are very hardworking and I respect that.
@aiexplained-official 10 месяцев назад ⁺³
Thanks my man
@woodybob01 10 месяцев назад ⁺¹
16:17 this sort of stuff really grinds my gears. Since teachers throughout all of school always say. "don't use the word your defining in the definition!" Yet there's at least a 50% change that when I look the definition for something, that's exactly what I find.
@therealOXOC 10 месяцев назад ⁺¹
Has someone tried uploading a picture and requesting a recreation via DALLE3? Wonder how close it comes. Great vid as always. Really like the little interviews.
@AIWRLDOFFICIAL 10 месяцев назад ⁺⁹
YES ANOTHER AI EXPLAINED VIDEO THANK YOU
@roqstone3752 10 месяцев назад ⁺⁷
In its current iteration Chat Gpt is a Search Engine more than a Logic Engine
@andersberg756 10 месяцев назад
nah, it's a patchy super irregular thinker to me. Like it can figure out where I'm aiming with some code, giving me feedback on where I thought wrong in respect to a given problem. It does indeed have understanding of a lot of concepts, relations, methods etc. All that stuff was beneficial to build up in order to model our writing, so it learned it, seemingly random. It's so fascinating, but hard to get an intuition of what it really knows.
@marksmod 10 месяцев назад ⁺²
5:30 I believe it is this: think of all the concepts ChatGPT is aware of: when prompted it is not possible to search the entire space of knowledge that it has. Instead the keywords, phrases and concepts inside the prompt act as a "tree" seed, I say tree because I imagine the affect such a prompt has on the neural network will look vaguely tree like. When Prompted with "who is 's mother" the LLM will activate a subset of the neurons in its NN, namely those surrounding Elon musk and related topics, thus allowing it to draw on the information that Y is X's mother. However, asking who the son of Y is will result not in a tree, but a random scattering of activations: there is little known about Y and therefore it is hard to find back to X. I guess the directionality of the prompt as suggested by the guy you mention plays a role here, I do too believe this to be the case, the connections between things to resemble a directed graph.
@mshonle 10 месяцев назад ⁺¹
A note from an academic in the States: If you say “BU,” it’s understood you mean Boston University. If you say “Boston” people won’t register that as any school- “do you mean Harvard?” (And there’s a Boston College, so we’d say BU and BC. However, saying “BC” in non Boston contexts would be ambiguous- one could just as well mean British Columbia.)
However, as you’d say, cheers mate and keep up with the great videos!
@aiexplained-official 10 месяцев назад ⁺¹
Thanks for the tip mshonle
@jaysonp9426 10 месяцев назад ⁺⁵
This would be the equivalent of humans solving the problem with their first thought. Think about how many thoughts you have to have to actually get a correct answer in anything...when I build autonomous agents 0 shot is always crap. It's why society of minds is necessary.
@david.ricardo 10 месяцев назад ⁺³
When I was studying formal logic at university, I also stumbled in to the circular definition or reason, rationality and logic. But here is a way to think about it to get out of that:
Reason is to think rationally, rationality is to think following logic, logic are a rules for correct reasoning. Hence we can say that to reason is to arrive to conclusions following the rules of logic.
You may be wondering about the origin or validity of these rules of logic or “laws of thought” as it was put by Boole in his book about logic but that’s a philosophical task.
@Yottenburgen 10 месяцев назад ⁺¹
I wonder what if you chain it a bit by making it explore the inverse of the question, reverse the question, and whatnot to try and extract as many right adjacent tokens as possible. This however would be achieved outside the model so it doesn't exactly fix the single component issue.
@TheAero 10 месяцев назад
NLU has a missing component. Distilling intuition and learning into the models. For example, code interpreter and chain of thought has helped. But there is no proposed chain of though for basic language understanding?
@Sam_0108 10 месяцев назад ⁺⁴
Ain’t no way
@Sam_0108 10 месяцев назад ⁺²
Watching the video makes me feel that technology is advancing faster than I thought lol
@ChaseFreedomMusician 10 месяцев назад ⁺¹
This makes sense to some other work about transformers in general showing that they fail to generalize rotation and translation independent concepts regardless of whether that is textual or otherwise. Makes me wonder if LM infinite and some of the other context lengthening work will auto correct this behavior as it tends to make the tokens more positionally invariant.
@justwest 9 месяцев назад ⁺¹
always fascinated by your videos - thank you so much for the interesting updates!
@aiexplained-official 9 месяцев назад ⁺¹
Thanks just
@scasti70 10 месяцев назад ⁺¹
Well, to a certain extent our brain works in this fashion too: the direction matters!
That's why some flashcard systems incorporate the "reverse' card option.
If you think to your personal experience, this is a pretty common outcome we face learning a new language, when we meet a word we have recently learnt and recognize it, but right before we were not able to build a sentence were such word was required...
@dhiraj_shah 10 месяцев назад ⁺¹
It does make sense since, LLM are trained in feed forward direction to predict next word. It may memorize A = B but doesn't parallely learn to link the connection and update the weight of B is also equal to A. But it can't be denied that llm has meta learnt somewhat the skill of logic and deduction be it in primitive form. I think with all these chain of thought and graph of thought if we can improve accuracy of deduction and logic to certain threshold. We can make LLM actually logically deduce the relation that if A = B then B = A. and use this deduction engine and all the data in the planet to retrain the model all over again. After that I think the model might be able to generalize logic and deduction to greater extend and might be our one of extra tools to help achieve AGI.
@albemala 10 месяцев назад ⁺¹
After watching your video, I run 2 experiments.
1) using this prompt with gpt-4, I got the right answer:
"While keeping in mind the definition of equality (if A = B, then B = A), and based on your knowledge, what are the
- municipality
- County
- size
Of Huglo, Norway"
Answer:
"Based on my knowledge as of January 2022:
- **Municipality**: Huglo is an island in the municipality of Stord.
- **County**: Huglo is located in Vestland county.
- **Size**: Huglo covers an area of about 13.6 square kilometers."
Not perfect, because it was a very specific request, but better than "I know nothing about Huglo"
2) I asked gpt-4 the rules to sum 2 numbers, then I gave it a list of numbers, 2 by 2, to sum, while following the rules. The results were always correct.
My conclusion is that, like for us humans, LLMS might just need some guidance when prompted for logical and reasoning tasks? I'm not an expert, but I'm fascinated by the topic
@aiexplained-official 10 месяцев назад
Any thoughts on how it couldn't do it in reverse though?
@albemala 10 месяцев назад
@@aiexplained-official as mentioned in the video, it might be because of the way LLMs work, so a limitation of the architecture. Or it might be an emergent behaviour, not fully "activated" yet. Or that we should change the way we train them. But I don't think LLMs will reach AGI alone, there are missing pieces like RT and/or something else. I'll keep experimenting though.
@leslieviljoen 9 месяцев назад ⁺¹
I just tried "a horse drinking from a water bottle" and Dall-e 3 did very well. Lucky me!
@woolfel 10 месяцев назад ⁺¹
my background is with inference rules and knowledge-base systems. To me reasoning and logic isn't some vague definition. Reasoning to me is formal logic like predicate logic, first order logic, second order logic, temporal logic etc.
We really need more people working on interpretability and apply old techniques to unravel what the weights are doing. Are there really circuit like pathways through the network? If some subset of weights mimic circuits, why can't we compare the layer output activations and help the model generalize. What if we compare the layer activations for similar kinds of input, manually tweak the weights and then refine the model for a few epochs? Would that help improve generalization?
@tunesafari8952 10 месяцев назад ⁺¹
Great video, thanks. Please consider highlighting in yellow - contrasts the dark text, more legible.
@MikeyDavis 10 месяцев назад ⁺¹
3:38 When I asked it about Huglo, it said “there is no well-known place called a Huglo….”
I said “yes there is” and it said “Huglo is indeed a location in …the municipality of Stord”
@aiexplained-official 10 месяцев назад ⁺¹
Haha nice
@MikeyDavis 10 месяцев назад
@@aiexplained-official
I notice that a lot. Since it’s half man half machine, I treat the machine stupidity like I treat other machines.
If the microwave doesn’t turn on the first time, I’ll slap it, and it often works. Why? Idk, but I’m sure you’ve had that experience many times with machines. ChatGPT responds very well to being verbally slapped.
@revengefrommars 10 месяцев назад
5:42 - I've run into this issue trying to get GPT4 or Claude 2 to do certain kinds of wordplay. They can both handle alliteration and puns and even portmanteaus but for the life of them cannot do Spoonerisms. Something about swapping letters from the second word back to the first word is just beyond them, probably because they are trying to predict the next letter and thus can't "back up".
@sgstair 10 месяцев назад ⁺¹
It's much worse than that, they don't really have any real concept of the individual letters except by proxy, as tokens are usually words. Try getting them to spell sometime
@harleykf1 10 месяцев назад ⁺²
I'm shocked by how strong ParrotChess is. It's by no means perfect, but it's strong enough to beat me every time, at least in faster time controls. I just assumed that it would be a minor improvement on ChatGPT, which struggles to even play legal moves half of the time.
I'm still a while away from reaching 1700-1800 FIDE. Maybe I could learn a thing or two from its positional play.
@duudleDreamz 10 месяцев назад ⁺²
With GPT4's impressive abilities to solve SAT tests, did we now also just disqualify SAT tests in the same way that we dissed Chess (when Kasparov lost to Deep Blue), for not testing proper "real" intelligence, but merely testing some form of rout memorization/isolated ability? In other words: humans use real intelligence to solve problems that no AI can yet solve. When will GPT-5/6/7... solve maths problems better than Terrence Tao, and make him loose his high IQ rating? Looking myself in the mirror thinking: what an odd/irrational species we are.
@aiexplained-official 10 месяцев назад
Pure logic might be the last benchmark
@Sirbikingviking 10 месяцев назад
I've been wondering for a while if we are going to hit local maximums with LLM technology. I think there may be inherent limits on their capabilities, but neural networks likely will eventually lead to something like AGI with LLMs included alongside.
@-M_M_M- 10 месяцев назад ⁺¹
Something quite weird happened to me the other day with Code Interpreter. I was analysing a significant amount of data and I prompt gpt to interpolate between missing data points for certain variables, and I assume it does it correctly. The other day though, I gave it quite a small spreadsheet with 8 or so data points and asked in the same way to interpolate between them. To my surprise, it keeps producing figures that look very good at first glance but that on further inspection they have more data points (between interpolations) than the ones I gave it... It is basically hallucinating new data points and then interpolating??? I don't know...I find it really strange
@MarkLevineNYC 10 месяцев назад
In the video you mention a couple of times that the news of this week's of reasoning challenges doesn't change your timeline for AGI. Perhaps i missed this from a previous video but what is your expected timeline?

Следующие

Автовоспроизведение

Llama 405b: Full 92 page Analysis, and Uncontaminated SIMPLE Benchmark Results