The issue isn’t with LLMs being wrong when asked a question. The issue is with corporations pushing them into every service possibly imaginable while *knowing that.*
Exactly. LLMs should not be sources of truth. Hallucinations make them fundamentally bad at being such a tool. Unfortunately, companies everywhere want to use them this way. LLMs should be sources of *inspiration* and *creativity*. Their output should be a starting point not the end product.
If you build an LLM on an expert system, you might get a rational AI. True. But unconcious, still. If you build a second system, that - goal-oriented - interpretes the ratio-morphic transformer, that (our) neural network appears to be, you might get a conscious AI. An ethologist's view, I admit...
I think the problem goes deeper. LLMs were never intended to answer questions. They were intended to predict text. It was doing this prediction so well that it fooled people into thinking it answered questions to the point that it has now been re-purposed to do just that. But the underlying mechanic is still working on text prediction. That's why with the wolf, goat, cabbage puzzle example, it predicted the solution even without the added information that they would eat each other, because it also predicted that information to be part of the prompt.
Yeah legit, I don’t even get how anyone even think or agree these shits are AIs. Has everyone on this planet forgot what AI actually is? Did everyone got their memory wiped off The Matrix movie etc? ….
Look, even a cockroach has a level of intelligence, you're misconstruing being intelligent, or smart, with intelligence. LLMs are AIs, they aren't programmed what to say, they learned it with neural networks. No, they don't have consciousness, but it is AI, not AGI.
I would argue that every attempt to create something intelligent comes with a way to formulate concepts and communicate them, just how every single intelligent species has a way to communicate
4:51 "Imagine AI being able to win every logical argument. Reddit would become a ghost town overnight" You're assuming people on reddit use logic. That's quite of a bold statement
@@SabineHossenfelder correct terminology does exist, I wish we would start using them more consistently. Machine learning is known as *predictive AI*. Tools like ChatGPT are not called they are called *generative AI*. Predictive AI and generative AI are very different from each other and it would be helpful if we would use such terminology. Predictive AI has a ton of real world applications and value. Generative AI only supports applications where accuracy is not important, eg 'the arts.'
That is not by accident however. No one would really know what to do with "Large Language Model" for example. But Artificial Intelligence? That works as good marketing because everyone has SOME kind of image stuck in their head due to the countless of science fiction storys, movies, games, books etc. that have some kind of "Ai". Of course this idea has nothing to do with the LLMs. But still, it's easier to sell to someone than say, NFTs or Bitcoin.
Who uses the wrong terminology? AI is an umbrella term, it has always meant everything that's not an explicit algorithm. ML is a subset of AI, just like experts systems. Neural Nets too, which are also a subset of ML techniques. LLMs are neural nets, so also ML and AI.
Regarding the maths Olympiad, it is a bit less surprising when considering that the problem was manually translated into Lean by humans, it acted as the solver and then transferred the data back to the system to phrase it properly. Lean was run against the problem an additional time to verify the solution. Oh, and it took up to 60 hours per question, which is far longer than any human gets.
@@Alice_Fumo If you search for Google Maths Olympiad and Sir Timothy Gowers (he was one of the evaluators), he added quite a bit of context. ArsTechnica for instance has a good summary.
@@Alice_Fumo Not quite sure where my reply went... Sir Timothy Gowers was one of the two people who scored the results and added additional information on social media. If you look up his name and the Olympiad, you should find more details.
One thing that people may find interesting: LLM's hallucinating isn't a bug, it's the way it works. It's completing patterns it hasn't seen before, using statistics it has learned from other texts. This is indistinguishable from hallucinations, since it's always just made up stuff, the best guess LLM has about what might come next. Sometimes this hallucination is useful, it uses rules we agree with. Sometimes it uses wrong rules, rules we disagree with. Either way, it's LLM just hallucinating what the "correct" answer might be.
@@idan5323 They're all "wrong predictions", is the point, they're fabricating completion to an input they've likely not been trained on. That this hallucination often seems to be kinda factually correct is the surprising part in of itself.
@@gJonii I am starting to think you are an LLM, as you seem to spit words without understanding their meaning.. NOT ALL PREDICTIONS ARE WRONG, that's just a bold fallacy. LLMs are sometimes wrong, and sometimes right, which is the nature of prediction. The ai has no awarness or understanding of the words he says, and he doesn't know wrong and right, yes. But what YOU say is just wrong.
Oh my gosh! You two are talking past each other, and not comprehending what the other has said. You are both right, and are even saying similar things. However you are looking at the same thing in different resolutions and using different terminology. Idan, you are right when treating an AI as a black box. There is output we consider to be right and output we consider to be wrong. However OP is looking at the process required to make the black box work. When you just examine what it is doing, you see that it is not being given criteria to distinguish right from wrong. Therefore all output is merely whatever the training data produces. (So you could say there are only hallucinations as output.) Amazingly, much of the time this output actually makes sense to us. In other words: Hallucinations are NOT malfunctions! The system is working correctly.
Colin Fraser: "Solve the wolf, goat and cabbage problem." LLM1: "Shoot the wolf." Colin Fraser: "No!" LLM2: "Shoot the goat." Colin Fraser: "No!" LLM3: "The farmer?" Colin Fraser: "Definitely not!!" Baby LLM4: "Shoot the cabbage. And then, the boat."
Communication/language requires far more than words. It requires most of our human perception to be involved. To put it another way you can't design and build 1 tenth of an automobile and expect it to successfully drive down the road. It's a kind of all or nothing thing. (Some parts of the "all" raise some difficult questions with regard to self awareness, consciousness etc.) > In the early days I contributed to a project named AARON (Advanced Artificial intelligence ReasOning Network) from Isomer Programming. In essence it learned language concepts (with programmatic rules) from it's users, starting out from something like a newborn. Not only did it use word/language processing but also used visual (image) recognition. So as well as natural language processing it was also using object recognition and the beginnings of discerning emotional context. Most of the same problems that exist now existed back then. It's not a matter of finding solutions, it is a matter of correctly defining the problem, which is extremely complex and difficult. Humans are not logical creatures and we get it wrong (so to speak) most of the time. It is more of a scale in degrees of correctness than a binary write or wrong. Expecting perfect correctness from a machine to replicate what is fundamentally imperfect to begin with doesn't make a lot of sense. > There are many good clues in the text of Developmental and Cognitive psychology (an understanding of philosophy of mind is helpful, and I gained a lot of insight from Dr Robert Bolton's interpersonal communication skills). I don't always fully agree with Steven Pinkers computational theory of mind, but his work in his books provide a good illustration of many of the underlying "human" cognition issues that AI needs to address to be useful. > Personally I would feel more comfortable if they dropped the whole AI context and focused upon IA (Industrial Automation) instead.
you learn speak for repeat and obserb its you perception and later come the words and not just math, for the first 2 level of language or language witout contexts is easy, imagien like variable but with image, you see fire and you dont thing in the meaning of the word but in the object it self, car, house, dog, cat, but later come 3 level language we have methaphor like its raning cats and dogs, the old concept its in old england when the raning is to hard the cats and dogs come down to the upper floor to the first floor, tthats mean you need context for understand what it is, of for example a fly over the boat can have aloot of meanings just for the words it ues a fly can be the insect or a airplane can be both depend on context, and even we have japanese that is a language in the 4 level, thats mean you just dont need contexts but you need the context of that specifict worlds and kanjits it was write, like i say its not just math its the methaphor and the experience and you need to extract the context of the word and methaphor for undestand, and thats nearly beyond binary, for a machine undestand that it not just repeat and traning it need to be sentient, or atleast close to
@@axle.student no necesary for language on 3 or superior level yes you need context for undestand what is happen one of the famouse phrases for that is "a fly over the boat" fly can be the insect or a airplane, but all depende it is a big boat or a little tiny boat in a bathtube or even in that case you need context can be the insect or can be a toy XDD, but for language in level 1 and 2 no, thats its call language witout context and is weird for us humans speak in that way but machine can undestand and proccese language in lvl 1 and 2 easy and faster
@@jorgecarvajal5385 You are talking chatter bots vs genuine interpersonal communication. We have been creating chatter bots for over 30 years. Even parrots can repeat words in a string. It doesn't meant the parrot understands it. > LLM is just advanced chat bot backed by automation. There is no 'Understanding" concept.
@@jorgecarvajal5385 I hope not. What good would it bring? we already have intelligent life(ourselves) and we don't even care for that. We allow that others experience poverty, starvation, corruption, disease, even war. So what if we manage to achieve man made intelligence? We put all that time, energy and resources into inventing something that already exists in a natural form, something that is apparently worthless to us.
@@jorgecarvajal5385 and to be clear: I am not anti intelectualism. I am in absolute awe of the things we can accomplish and the things that are left to be discovered. I am just extremely disappointed that AI is used not for the good of humanity or for further scientific understanding. At this moment AI is just a glorified theft of other's intellectual property and a retelling of already known facts. It brings too little good to the table, but it does have ethical problems and it hurts the livelihood of many people(like artists, writers, creatives and even scientists). And we continue to pour money into this instead of funding the fights of actual, real life problems that make other humans suffer
@@mer863 Personally I think that a true sentient AI would probably be able to help solve a lot of those problems (I don't see it as some kind of artificial god or anything. in part because you cannot create an artificial version of something that doesn't exist) I mean essentially AI is a tool like satellites are a tool. the reason we have satellites is because of the Cold War and some pissing contest the US got into with the Soviet Union but those tools today are helping to feed billions of people as we can now better predict weather patterns. same with fertiliser, it started out as a bomb for weaponry and is now the only reason we can feed as many people as we do. Like every tool we have made in the past, it will undoubtedly be used to harm others but it will also be used to make the lives of many much better. And, like every tool we have ever made in the past, we will never put the genie back in the bottle so even if we haven't reached a true AI yet its better to have people concerned with how we use it for the benefit of everyone and not for the benefit of some.
I once have seen a banner in a school class: "All human errors come from misunderstanding the laws of physics." It seems it is now true also about the AI.
I develop and use small machine learning algorithms for my work in neuroscience and it's great, allowing me to investigate my datasets in an automated fashion, quantifying subtle behaviors across many hours of video that I would never have the time to do by hand. Narrow AI can be very useful in science, but validation is imperative and as you said LLMs are not representative of all AI.
The overwhelming majority of people here, including the video host, probably never heard of Karpathy or micrograd or even backpropagation. But boy howdy do they have lots of opinions about LLMs!
Working in AI, I can say that LLMs are not the alpha and omega is already established. It's been put to rest. While companies keep training bigger models for commercial reasons, researchers have already switched the attention to cognitive features. Memory, reflection, planning, abstractions... It's not true that the industry is stuck with LLMs
Yes, but LLMs proved to be far more powerful than anyone expected, so the attention they are getting is warranted. And we are very very far from their limits.
I wouldn't say researchers have "switched" to [topics you named], there is still tons of papers coming out in regards to language modelling with transformers / adjacent architectures and improving upon those. All the other things you mentioned have been happening the entire time regardless, but as someone who reads a bunch of the papers, I don't get the feeling that improvements upon transformers have been exhausted. What bothers me is that I've seen like 20 techniques which drastically improve upon the things we've seen thus far and none of the big companies have implemented those yet to my knowledge. I want to see what a model which combines all of the current research can do. I agree with the conclusion that the industry will pivot to more interesting architectures the moment "bigger model go brr" stops being a strategy which pleases investors - which should be soon-ish.
LLMs are the poster child and the reason why stocks exploded. Hard to get that level of funding and public involvement without direct hype like with LLMs
That doesnt really make sense to me. "Real" researchers knew before the hype that LLMs are very limited in their application. The people that drove the hype behind LLMs have no clue - or dont care - about the nature of artificial intelligence, let alone about its usefulness. Heck, ive got very little clue about AI and couldve told them that an LLM is more about the A and less the I.
It still reeks of an AI bubble crash. This is hardly a shock though. You didn't have to know one thing about the (fill in the box) to smell the latest silicon valley bubble.
i just asked chatgpt the following: "a farmer with a wolf , a goat and a brathering must cross a river by boat. The boat can carry only the farmer and a single item. How can they cross the river without anything being eaten" i used brathering because it looks like an english word, but isnt. Its german and means fried hering, so something the farmer might enjoy, but not the goat or the wolf. Chatgpt gave me the same answer as shown in the video including a mysterious cabbage. Asking question about the results reveal that chatgpt knew that brathering is a dish and goats dont eat it and the whole puzzel is pointless, with the return with one item answer. If asked again chatgpt will not speak about cabbages, but will still provide the return with one item answer
I just did a similar test, I left the cabbage but I changed the sits on the boat, so that the farmer can carry *two* items. Of course the "AI" didn't get it and started going back and forth for no reason.
yep that just tells me they've basically hardcoded the AI to spit out the "correct" answer when it recognizes the pattern. such a worthless tool, really
I remember my computer science professor saying in a class that years ago deep learning was looked down and most experts in the field thought symbolic AI was the thing. Maybe a mix of both is the answer?
Disregarded fields of scientific research of all kinds have later turned out to be where the real analytic power is. So I always start by doubting someone diss'ing a particular tech as a whole. (That can go wrong too, I realize.) Last I knew, symbolic AI is not how the brain is thought to work, so the machine learning techniques became ascendant. Separately, as one example, in the 1990s, in language processing, it became clear the learning models could produce much better results than symbolic methods. (Tho that has to depend on what the challenge is.) I would bet the best systems will eventually use a mix of techniques, including self-checking for internal consistency and for consistency with reality.
Maybe Gary Marcus has a point, as annoying as he is. But then again, companies like DeepMind have always been well aware of the limitations of LLMs. Demis Hassibis has been trying to say this without saying it so as to not piss off Google and end up at the bottom of a lake sealed in an oil drum.
Neither using larger and larger wordbases (wich are limited and already polluted by AI texts) nor symbolic model will create an AGI. Embodiment is the right approach. The ai needs to live in some agent that wants to do things, tries things and learns from that.
This topic is very close to my heart. It seems that we're so facile with language that, in most cases, people can get along by using the symbol (word) without remembering that the word is not the THING. At a simple level, like "please pass the salt", language works fine, but as soon as a topic gets complex, nuances of meaning get lost in the journey from speaker to listener and, if undetected, unexpected errors occur. This is not a problem when two people are speaking a language that is not their primary tongue -- the interlocutors KNOW there's going to be slippage. But when they're both speaking their native language, they don't realize how fragile our communication system is. I often picture the problem as using language to describe a painting or musical composition -- complete with its emotional content and historical context. Language just isn't the appropriate tool to get the experience from speaker to listener. You can also consider watching a movie as a stream of ones and zeroes and using your mind to compose what the movie actually is. Yet words are so deceptive in their apparent clarity that we mistake the picture our minds make for the thing itself. Of course, when you see an object, it's not like the object is going into your eye. We just see reflected light an "calculate/deduce" that the thing must be a tree. We don't see the tree; we only sense light. But language allows us to say "I see a tree", and we all jiggle the concept around in order to construct a picture. Failing to see that everything we learn, or even think, is actually just the product of emergence can cause some pretty strange results. Hence life as we know it.
I am a programmer and I know very well neural network, LLM are amazing IMO, and the specialists in the domain always knew the limitation of it. Personnally I mostly wonder if the coherence of the result we got 2-3 years ago when all this started was unreasonnably good and coherent considering the math behind it, and that might have been what created the hype about it. I can understand why it work but I am always surprise how well it actually work (even with the known limitation)
First off, thank you for your videos, I really enjoy them. I will push back on the neuro-symbolic a little bit. I work in AI, and was really excited about neuro-symbolic starting a couple years ago, but found that there are various issues with it that may be difficult to overcome. The most obvious is lack of funding. Gary Marcus has already suggested that if we funded neuro-symbolic research like we funded LLM research, we would probably be so much farther along with it, but we don't. The next is that finding the right mesh of neural and symbolic seems to be very troublesome. Not that the AI community has converged on what neuro-symbolic AI really is, but symbolic methods they usually try to use with DNNs typically don't scale well, which is what the DNNs actually can do well. When you add the symbolic thing to the DNN, or vice versa, you often get stuck at the scale the symbolic problem can solve. So I feel like I still see neuro-symbolic methods trying to tackle toy problems that it kind of seems like LLMs or other foundation models can "just solve" a significant amount of time. No guarantees it is correct, of course, but people find others ways to deal with that through things like ensembles and answer checking. Not perfect, but sometimes good enough. Lastly, and feel free to push back, because this point might be more philosophical, I feel like nature, people, biological systems, etc. do not actually function logically or mathematically. We are kind of a weird mess with enough checks and redundancy to generally figure things out without actually using good logic, statistics, or mathematics. All of those are human inventions that tend to hit limits on what they can do pretty quickly, while humans can guestimate their way to solutions of all kinds given the tools they have. Perhaps like in the rat example you showed though, I am intrigued by the idea of us basically simulating biological processes well enough to create human-like AI, that feels possible to me, but we need to solve the embodiment problem for that too. All that said, I'm still interested in neuro-symbolic methods and am keeping my eye on them. Maybe someday it will get more hype (and associated funding :)) and I'll be able to work on it more!
I just remember that some biological systems such as vision react to certain colors, for example, to judge distance because we can react faster to danger. If we had to spend time processing it in a different way then our reaction time would be slower. So it ends up being better to react quickly and then later spend time deep in thought when we have more time to contemplate. Our systems have to be ones that self-maintain and self-heal and all these other things. Mathematics and statistics are not always reality.
Brilliant, @BarryKort. This has been an issue since the day ChatGPT launched back in late 2022, yet NO ONE has yet questioned the label 'hallucination'. 'Confabulation' is so much nearer the mark! Confabulation (noun): A fabricated memory believed to be true.
A better term is simply "output", because there's no difference between a correct and an incorrect one. For the "AI" is just meaningless tokens, it is correct in the same way it isn't, it can't tell the difference.
I'm a business student. For 2 years, everyone wrote flashy papers on AI. I only found engineers to talk about this with. It's like business people.... Don't want to know what they don't know. I'm a bit of the black sheep of the study, but by now I mind it less. The ai bubble bursting at face value was a predictable outcome. Maybe now I have grounds for people to listen to my research. I feel like I'm screaming into a corporate void. I had a bit of a crisis because of it, for i was saddened by science, for I thought naively science was open and curious. I hope we go onto better times. Thank you for this video.
I have been in IT for decades. Most business people do not want to know what they do not know. I look at it this way. We all live in silos of what we know. The size of that silo is different for each person. What is kept in that silo is different for each person. Each of us tends to fill our silo with what we think we need and/or enjoy. For those things not in our own silos, we have other people.
@@digiryde I yesterday handed in half of my paper for review. I wrote like a manic after finding this video. Luckily, finally, the internet is exploding with voices from all directions. CIOs are also getting aware. So, hopefully, we can now limit a bit of delusion and start giving engineers more time and money and trust to do the RIGHT thing. I feel like managerial stupidity is a health risk for everyone around, especially engineers. Now, I also study management. But..i know when I am stupid, and ask people who know more than me to approve my work. I can't begin to say how often I got things wrong.... Usually i never manage nothing I didn't experience myself. This is different. I can't do it. I legit can't. I can only try my UTMOST to get some conceptual understanding. I also now understand why engineers often don't have hair. I never lost so much hair in my life. Main stressors : managers don't see the importance, managers dobt want to hear anything technical to find logic behind importance. Now, I'm drawing pictures. And telling them, it's all about them. It's not. I dobt give many shits about most of my kind. But, apparently, what's true doesn't matter in science. Apparently, I just need to sell ideas. Frustrating. But most people here know that much better than I could begin to imagine. Regardless, I hope for the engineering silo to spill over some bit of Intel to management.
The exchange value of money needs "faith", if business people keep a bubble going until the last moment, it's because it generates more money. They know, they just pretend ignore it until the crash is imminent (obviously not everyone comes out safe, there are always a few who jump when the debacle starts; many others get caught in the confusion and sink... it's called speculation, and it has been well known for a long time, at least since the Great Crash) Thanks stock markets!
At 6:00, Sabine makes the point that a world model precedes whatever is going on in language. However, it's plausible that language is still the key force that underpins human intelligence (by forcing our brains to represent information in a symbolic, abstract way that allows for communication). The fact that non-human animals have sophisticated world models but don't have human-level reasoning abilities further suggests that world models aren't enough. Love the videos. Sabine's channel gets better every year
So, LLMs have reached the level of our politicians who blather content-free word salad and to make them intelligent we need to teach them math and physics. Good luck with that.
Every LLM output is an hallucination. If the output is correct or not depends only by the statistical relations between tokens. P.S. So much for the "sparkles of AGI in LLMs" :D :D :D
The issue with LLMs is that they are based on neural networks which many people don't understand how they work. A NN will always give you a probability or probability vector with components < 1 (except in very few trivial cases), without telling you the overall probability for a given answer. If it did, you could easily spot hallucinations, e.g. if the overall prob were less than, say, 80%. That said, I'm with Sabine that LLMs are nice toys but not the future of AI.
But couldn’t that be solved via better training data & techniques? Where for example it’s given batches s.t. after giving a wrong information the AI instead of emiting EOS it corrects its mistake. Training it to reculate the moment it realizes it made a mistake, instead of committing on it. Furthermore RL is doable on LLMs reinforcing rationales that reached the correct answer and penalising those that did not. It seems to me that LLMs can in principle reason and not hallucinate if rightly trained.
@@Bolidoo There are ways to get LLMs to catch their own mistakes to some degree. One method is to run inference in multiple batches from the same query, each with a different RNG seed and a high randomness temperature. Basically to create multiple different answers to the same question. These can then be compared to see if they are in general consensus. If the network is confused it will presumably output a set of blatantly different responses. However this makes inference awfully expensive, and still won't guard against the network simply having learned something wrong. A different option might be to include a bunch of examples in the training set of the author catching their own mistakes on the fly, but this would be very unnatural data that would have to be procured in huge volumes. Hardly something that could simply be scraped off the internet. The same applies to a lot of seemingly clever training techniques that guide the network on specifically how to think, rather than just training them on what to directly output to the user. Someone will have to produce A LOT of oddly specific, high quality training data, and existing LLMs can't be trusted to synthesize it. A huge amount would have to be hand written and/or generated with clever scripts, and that gets very time consuming and expensive. Also, a big problem is that, given the option, these networks LOVE to weasel out of difficult queries. "Sorry, I don't know." is too valid an answer to most questions. It's a classic failure mode for LLMs where they would reject strange prompts to instead write about something much more generic in a manner that was structurally correct, but rendered the response utterly useless. Another method I've heard of is to have a separate verifier network, to check generator outputs for logical consistency. The appeal of this method is that it forces the generator network to write in a manner that is easy to verify, in order to get the verifier past some confidence threshold. Also, the verifier network can be fairly small and cheap to run. Though presumably it wouldn't help much against factual errors, and this technique doesn't seem to work well with genuinely complicated problems that will always be hard to verify. It just makes it harder for the generator to mask its stupidity with overly complicated language. What is clear though is that we've pretty much maxed out what can be accomplished with scale alone. There just isn't enough high quality data to go around.
@@fnorgen Very interesting thanks. I don’t agree with your last point though. It seems to me that even if we can’t make bigger models due to lack of data, we can keep augmenting them with lots of compute and search techniques. Maybe with enough compute we can generate the high quality data we require to move forward. I’m not saying scale is all you need for certain, but at the same time i’m not convinced it’s not either.
@@Bolidoo No amount or quality of data given to LLMs, as they currently stand, can solve the hallucination problem. Better data will improve their accuracy and ability to an extent but it has diminishing returns as the is a near infinite data possibility space but a very non-infinite cost to train. What you end up with is a game of data wack-a-mole. Where people find problems, and the solutions are added to the data set for training the next model.This has worked for the last few years but it can't continue for much longer as the more accurate models get, the larger the error space becomes. The problem is more obvious in the autonomous driving space than LLMs but it's essentially the same problem; statistical models alone can't handle scenario's that aren't predictable or lay far outside of the given dataset.
@@GregHib Transformers can and do generalize. Gpt4o solves the AIW problem 60% of the time, that is made up new logic problems that couldn’t have been in the training data. Alphaproof can solve math olympiad level maths problems. Yes it could do this thanks to self play, MCTS and the fact Lean is automatically verifiable. But ultimately a transformer did the heavy lifting. A transformer could reason. There’s also a paper that showed that LLMs know when they lie. This suggests that part of the reason an AI may insist on wrong information even though it has already realized it’s wrong (this is one type of hallucination) is because the training data had a high degree of self consistency (it’s uncommon for a text to change opinion in the middle). Hence finetuning on relatively few examples can help correct this behavior and reduce hallucinations across the board. That’s what we find empirically with models like Sonnet 3.5 that outright hallucinate less, while it was of course not necessary to tell it to not hallucinate in each and every case that it could and didn’t. My point is that transformers, given the right data and compute, are enough.
But if your definition of the number 5 is: - the number of fingers on a human hand, then all these hands have 5 fingers. If these LLMs have learnt this definition for the number 5 from the hallucinations of previous AI then it is no wonder the hands drawn by AI looks so weird.
@@michaelharrison1093 AI is obviously counting 21 fingers on the human body. For the males at least. We shouldn't feed it pron, it obviously gets confused 😋
A shoopkeeper in my childhood's neighborhood had "Seisdedos" (six-fingers) as surname. The reason seems to be that very rarely some people are born with the wrong number of fingers and, in this case, it became a surname.
You are correct. Ironically, the thing she viewed as a weakness is actually a strength. The LLM made the same assumption that any human familiar with the riddle would have made. I am not sure which version of ChatGPT she was using (I think she's either in the free tier or using an app, because the web interface with a pro subscription does not have a drop down that says "ChatGPT"), but here is what I did, using the latest model (GPT4o): 1. Posted a screenshot of the question she asked, without including the answer. GPT4o gave basically the same answer that she got. 2. I then asked it: "You are correct that there is a similar classic puzzle, but note that I did not reference the puzzle in my query. How would you solve the riddle without referencing the puzzle, using *only* the information available in the screenshot that I showed you?" This was its response: " To solve the riddle using only the information provided in the screenshot, let's break it down step by step: 1. **Initial Conditions:** - The farmer, a wolf, a goat, and a cabbage need to cross a river by boat. - The boat can carry only the farmer and one additional item at a time. 2. **Goal:** - To cross the river without anything being eaten. 3. **Logical Deductions:** - If the wolf and goat are left together without the farmer, the wolf might eat the goat. - If the goat and cabbage are left together without the farmer, the goat might eat the cabbage. 4. **Possible Solution:** - The farmer takes the goat across the river first and leaves it on the other side. This is because if the goat is left with the cabbage, it might eat the cabbage, and if it's left with the wolf, the wolf might eat the goat. - The farmer returns alone and takes the wolf across the river next. However, to avoid the wolf being left alone with the goat, the farmer takes the goat back with him on the return trip. - The farmer leaves the goat on the original side and takes the cabbage across the river. - Finally, the farmer returns alone and takes the goat across the river. By following these steps, the farmer successfully ensures that none of the items are eaten during the crossing. The solution does not require any outside knowledge, and only the logical implications from the information provided in the screenshot are used." I approve of its answer.
I *really* hate that google has put LLM AI as their top search result for questions. It's just plain irresponsible. I *know* that ai isn't trustworthy, and it's *still* hard for me to disregard the answer, in black-and-white, that shows up at the top of the search and directly answers my question (vs other results that I often have to click through and do some sleuthing work before I can actually find the answer to my question). It's truly awful, and much, much more awful for all the people out there that don't know about the limitations of AI.
Google search was already unreliable, plus it was censored. At least now google search AI answer based on google search (bad) algorithm, based on random censorship rules. One day they'll fall because on them own doing.
6:34 The mind *may be a single thing, but not the brain. It is composed of a multitude of constituent components working in congress. Why would you think the answer to artificial intelligence would be one model or neural net, instead of a multitude of less effective but highly specialized low-power AIs all working together to make the whole more than the sum of it's parts?
a 100 hundred fools will never be equal to a single genius. Not even a billion fools. The idea is the fools take on the simple tasks and the genius get the difficult ones. Dense models are more intelligent and better for complexity. MOE is usually better for a lot of other reasons.
While there might be some level of truth to that in an abstract sense.. a LLM would not make any sense as one of the components, because the only thing that could interact meaningfully with a LLM would be something that already knows how to interpret and generate plain text (and if it could already do that then there would be no reason for it to need a LLM). I think that it's likely that a better AI would have multiple components, but I think it would mostly be using multiple components so that it can do things like having one part of the AI that learns while it's in use and a different part that does slower learning that mostly runs in the background and tries to optimize the things that the other processes come up with - I don't think it would make any sense for the core logical thinking of the AI to be split up into multiple components though because that sounds grossly inefficient.
@@juandavidavendanorojas6263 Thanks. I am familiar. That is in part what inspired my OP. Recently MoEs reached ~1M experts. Imagine what that number might be in just 1 or 2 years. Its fascinating.
LLMs are essentially a computerised version of the game where people take turns adding words to a sentence. The best they can do is come up with plausible content that is based on their training set. There is nothing equivalent to intelligence behind it. The problem is that for many people, Turing’s imitation game is the bar for detecting intelligence. In reality, the imitation game just illustrates how easily people can be tricked.
In addition to showing how easy it is to trick people, another thing I've learned from LLMs is that despite the fact that people's use of language seems so creative (if not haphazard), it's actually pretty predictable most of the time.
Don't worry about NVIDIA stock. Because they need to retrain using logic, they will need even more NVIDIA chips. NVIDIA earnings will be ludicrous and the stock will again go up.
@@LuisAldamiz China might be ahead on memristor, however memristor will not see market for a decade. nvidia is failing because China just isn't buying their chip anymore because companies don't trust the US government will not sanction them. so they rather buy chinese AI chip that is only 20% slower just for the certainty that they will always be able to get replacement chip when they need them.
As a musician this really “rings” home. For the longest time “simulating” a piano meant recording every note from a certain piano and then having some software apply a bit of trickery to make it sound “natural” when played, cheaper ones would sample every 5 notes and filly in the rest by pitch bending (stretching or shrinking) the note samples. These days, some virtual instruments simulate the actual piano, how the hammer hits the strings, where the mic would be placed, how hard the string is hit, same goes for wind, string, and brass instruments. And like LLM, the sample version could only play back a note, the simulated instruments can capture a light touch, breathing softly, all the nuances that give music “soul”. It seems the best way to “simulate” something isn’t to just stack a bunch of its products (bunch of words, math, audio, video, etc) but to figure out how the underlying system works and simulate it, the ear, the eyes, the string, the piano, the body. An interesting approach I saw Nvidia try was in ragdoll physics and robotics. Where they give a virtual robot a scaffold and rig (limbs, joints, etc) and then let it learn how to use them over and over again. Another example was an AI learning to play Mario. I feel that technique applied to other modes could work, or could be what Sabine was referring to.
It's called "physical modeling". There have been algorithms around for decades that can simulate the sound of a plucked string, for example. Physical modeling of a piano, e.g., Pianoteq, is much more complex but is now so perfected that manufacturers such a Steinway are licensing the use of their name to the physical models of their pianos.
@@avsystem3142 Pianoteq’s tech is amazing, I didn’t want to talk specifics to avoid noising the point. A point worth raising is that physical modelling has a very small file size since you don’t need any “references” to the real system, nor need small countries worth of energy to “train”. I do think this approach to AGI would go a long way. If I could wave a wand I would follow Pianoteqs approach, first conduct studies to understand the factors and parameters that govern a system, than design models that can approximate those systems. My background is in Psychology and data engineering and I feel like we tend to overestimate how complex the brain is in applications that model it. Like pianoteq, if we avoid attacking the problem at its most atomic…..(ie neurons) but approximate the systems which they produce like the lobes (frontal, occipital, temporal, etc) could t we approximate the functions of the brain and set loose algorithm with a target? The lobe interactions and their systems are complex but their functions are well known and we don’t need model them in the same way. I am sure greater minds have thought about this problem and tried to design models for them but it leaves me wondering. As an aside, I am lucky enough to own an Osmose and tinker with the awful Egan Matrix, UI aside, the models it has are impressive and what it can do does seem like the future of synthesizer hardware. At least Arturia thinks so with their latest device.
I think that two issues should be separated: one is the usefulness of AI and the other is its profitability. There is no doubt that AI can be (and is) useful on many levels. In my work, I use AI to create graphic elements, scripts to improve work, texts, descriptions and even to record speech for video tutorials. Yes, it helps me earning money to some extent. Don't know for how long though - maybe my work will become obsolete sooner than I think. However, the question of the profitability of AI at the corporate level is debatable - because the largest AI models are offered below the profitability threshold. This is a war of attrition and sooner or later the big guys will have to face exceeding the level of acceptable losses. It is not better for small companies, because we already have several thousand AI startups and their prices only make sense if they are used on a mass scale. But how many services do we need to improve text or generate images? And with such enormous competition, how many of them will be left in a year or two? However, the AI revolution has already started and is already moving through societies and the capitalist system like an icebreaker. For now, we hear single cracks, but it is coming. And humans and societies are not prepared for that.
I think Sabine is wrong here. Not about that the model require logic. It's about language is the result of abstraction, and abstraction is a very valid tool for reasoning. A non-precise one, but a fast one. Pretty good for shortcuts and 'good enough' results in certain areas. Of course, it's not a silver bullet. That's the reason why human brain does different kinds of evaluation and future AI algorithms will need to do the same. But I think Sabine is dropping the value of LLM too much. LLM is working in the area of pure relationship and abstraction area and that has value. It's just it doesn't work as well in every area. The human brain uses LLM type of thinking a lot, and that's the reason why so many people fail for fallacies which are 'good enough' from relationship thinking but a failure from logical thinking. It's a thinking fast vs thinking right strategies. I guess the future will be more about 'let's the abstraction layer find paths to resolve and the logical layer drops the wrong answers/paths of resolution'
LLM haven't reached anywhere near their pinnacle either, many focus on the model's worldly intelligence instead of its 'awareness' of itself (or rather trained methods). A common problem with LLM is multiplication because they usually end up attempting to guess the answer first before solving (which they then attempt to justify), but LLM are actually capable of working out the solution by breaking the problem down. An area of improvement I want to see is LLM knowing how to best utilize their context window, to know the optimal method to perform whatever it is that its doing. It doesn't help that some of our written methods are actually visually aided like long division seemingly putting the answer before any other character, but I think we can get there. Here's an example I like to share, simply ask whatever GPT you want "What is the product of 45694 and 9866", it will straight up attempt a guess and answer incorrectly. But when it works out the question in a way that is conducive to its linear tokenization where one token comes after the other, it can very solidly get the answer. I wouldn't expect a math genius to guess the answer correctly, I'd expect them to work it out very quickly in whatever way they feel is best. chatgpt.com/share/b4ed2219-6699-42e4-bb90-af0f88cd9faf LLM also have very expansive uses beyond themselves, there's already papers of LLM being used to create conditions to train other AI, not quite autonomous but very useful at creating these parameters. It annoys me how quick people are to put down LLM, there's so much room for improvement and they are already this good? Let alone whatever the hell OpenAI is doing by broadening GPT4's horizons into GPT4o, granted it's not an increase in intelligence but it is actually has some absolutely insane capabilities that they aren't letting us have access to yet for very clear reasons. Like it can copy a user's voice back at them or it can copy a user's face and turn it into a caricature or photoshop with it.
I agree. But this just underlines the issue that LLM's will struggle with truthiness when hallucination works fine for everyone despite being inaccurate
I'm reminded of bottom-up models for speech recognition that used erroneous concepts. For example, the assumption was that the bottom-level unit of sound for speech was the phoneme. But it never worked. I believe training on spoken syllable types is what actually worked.
I think the most important bits of math AI needs to at least adhere to is the math of modern causal inference. But even then, the "assumptions are in the arrows not drawn" thingy is tricky enough for humans, so it's unlikely that AI could get that bit less wrong than humans do, especially because it's fundamentally trained on human errors. It would be interesting to speculate about what would happen if computing power ever turns out strong enough for AI to reduce the amount of causal assumptions in causal graphs to a point way beyond the point impossible for the human brain. Could it find "our" scientific causal blunders, maybe decades or even centuries old ones?
The big question for me is what is the ultimate goal of AI. What do we want to achieve at the end? If we model our brain 1:1, then we can also directly use our brain, isn't it?
What? It's the answer to the ultimate question. why don't jets have flapping wings? Why are there no metallic water faucets on the shores of rivers or trees with steel girders?
I had a similar example to the riddle you mention, chatgpt at least gets it correct now and I wouldn’t be surprised if the reason why it gets it right now is that my conversations about this riddle became part of the training set for newer models.
The bubble is in the term "AI" It's kind of like when they labeled everything that came out as "2.0" like Web 2.0. AI has become a nonsense term that marketing departments are slapping on everything they have for sale. The vast majority of the products I see offered as "AI" these days (and yes, I work in the industry and see these products regularly) aren't even remotely anything I'd call "AI" they're just the same old manually created automation that they figured out some silly way they could just barely call "AI" The actual AI products out there are actually doing important new things. But the market is flooded with absolute junk pretending to be "AI" that just aren't.
I don't think the accuracy of the terminology is relevant at all. The only thing that matters is that the technology was marketed successfully and associated with a buzzword. The buzzword could have been anything. "AI" was just the lowest hanging fruit. We have a very recent example where "cryptocurrency" and "blockchain" worked as buzzwords, so it clearly didn't have to be something reasonable and approachable to the mainstream.
The Bubble is the same as with the internet. Investors are impatient and don't understand that a new technology needs time to mature, see that they could've gotten better returns elsewhere and sell. Which triggers a chain reaction and then at a point later when the technology is mature, it gets wide adoption.
Yep a friend was talking to a company that was marketing an AI to credit scoring departments. He was interested and get thrmnin to demo. But on probing, he realised that it was just the same algorithms credit departments already use very well for decades. The marketing person was shocked as they hadn't realised that the product was just doing the same thing but worse, because the model was completely closed off from interrogation. He told them that no one in the industry would have a use for it. These companies are marketing things they do not understand due to poor misuse if terminology and the AI bandwagon.
The term "AI" has been overhyped since its earliest conception. I remember that in the 1980s any algorithm based decision tool was confidently labeled such. I think the best solution is ti learn to interpret the I not as Intelleigence but as Id .iot. We can't make Artificial Intelligences, and probably never will, but when someone monitors them right Arificial Id .iots can actually do a lot of work for you.
The thing about LLMs is that they're just huge dimensional arrays of numbers, that relate the probability of a number coming after another number. They don't know words.
... so what then is "knowing" a word? You're on to an important thread, which ends with my assertion that for an AI to be what most people think of as "general" - behaving like a human - it needs to be embodied with a similar full sensorium of experiences to anchor those words to.
@@starkraft2506 I agree with you. In order for words to have meaning, something has to have meaning in an emphatic sense for the system that sues these words. But then again words also have meaning in relation to each other and it looks like this latter sense of meaning is enough to handle language quite well.
@marcobormann67 it can mimic quite well, for sure. What I don't think it will do is generate meaningfully novel language. It may create novel sentences, but it will never be able to use isomorphisms to create new use cases - for example how "rough" can mean a texture (which is uncomfortable), or an experience which is uncomfortable, or as an antonym to gentle. There's a thread of meaning between those uses which isn't apparent based on the depth of information available in words alone.
The root cause of this is that LLMs (and GenAI more generally) are still only stochastic models. They don't have any _understanding_ of what they "say" or "talk about". They just generate data based on conditional probability inferred from the training samples, they don't know what they're doing. That's why generated images of technical contraptions may look OK at first glance but when you start to think how they would work, it stops making any sense. And why the fingers are all mangled, yes (that in particular is because fingers have so much variability that the generator infers that they may do "anything")---again, the model doesn't have any idea how they work and what they do. No idea, literally.
Correct. There's no evidence that these architectures are capable of intelligent thought, regardless of the amount of data you feed them. They will remain great assistants, which is a valuable thing in itself, but a far cry from the church of AI predictions of AGI in N months, where N is a number between 1 and infinity and always changing.
I mean to be frank, people have been calling the "computer" ai since at least pac-man. Them asses adopt words and definitions and ignored the programmers, ever since ENIAC.
@@SaanMigwell Well, they kinda can, because there isn’t any constructive definition of intelligence, natural or artificial. So you can stick the label willy-nilly on almost anything, if you’re bold enough…
Potential bad news about your Google stocks: The US has found that it meets conditions to potentially require being broken up into smaller companies for monopoly/antitrust reasons. This may well affect the stock price, if nothing else
Artificial intelligence = a technology that makes machines capable to do tasks that usually needs human inteligence. I feel like people dont grasp the concept of that. Its A.I. not artificial conciosness.
Diversifying your portfolio is the key. Invest both in those who work towards the annihilation of humankind to be long and painful (Google et al.) or fast and (relatively) painless (Raytheon et al.) - and while you won't become super rich, you'll still have a comfortable retirement. And hopefully won't see the final stages of said annihilation happen before you expire. And, of course, Nvidia swings both ways, so I still hope my Nvidia shares will recover, too 😎
hilariously enough, running my own offline LLM for giggles, I've noticed that a lot of hallucinations come down to 3 things, 1) whether the instruction set, i.e. the foundation for which the model utilizes its data set to respond is configured with considerations for brevity or tonality. 2) the amount of input that is provided, as the models tend to be set up to respond in a way that will satisfy any outcome, so if you just say hi!, in most cases it will respond with a paragraph. Lastly 3) the amount of vram and context allocated to the model - it seems that while lower vram lower context models will answer quicker, it's a cost benefit ratio, because it will fill the missing output space with whatever it wants, and in the case of models that are being utilized with a log to reference for said context, it will decrease in performance exponentially until it eventually breaks under the weight of every "thought", resulting in gibberish, hallucinations, and schizophrenia or the ai equivalent.
@@undozan4180 I suspect they meant DID. Schizophrenia doesn't really make sense as analogy for any observations of outputs of LLMs you could have, as far as I'm concerned.
I think the inherent problem is more that LLMs just dont "understand" language, at its core they just memorize and 'network' word salad from the internet. Thats why "hallicinating" is kinda misleading, it implies there is a state of "non-hallucination". Yet in reality there is no right or wrong for an LLM.
I use LLMs to find correlation in abstract concepts and financial data. Even after providing the data or pointing to where it can be retrieved I still have problems, from ignoring said data and just making numbers up on the spot which "could sound" about right to just not following rules, parameters and specifications. Lately I realised that there is an increasing amount of times were the result is: bla bla bla this is were you can find out about it {link} even after I had asked the webtool to retrieve the information and upon doing so again it reluctantly does so😂😂😂
and the amount of times I had to scroll up to copy-paste, repeat or remind GPT about a rule or requirement that I had previously elaborated on. It borders on having conversations with a cognitive impaired person.
I gave a puzzle to the Microsoft Copilot AI. It was about finding a treasure on a small pirate island. I accidentally abbreviated one of the locations as 'u. w. rocks' and it immediately spelled it out correctly as "Underwater Rocks". But then it wasn't able to find the treasure by combining a shadow in the evening pointing eastwards with cardinal directions on a map and therefore began to assume the treasure must teleport constantly around the island...
@@anastasiachristakos2480 Oh yes, it actually does sound like J. J. Abrams' script for the show Lost served as training data. But it's also LLMs in a nutshell: it's capable of correctly predicting what u. w. must mean in an island context, which is fascinating, but it would completely fail at "activating" a pencil in real life, haha.
"You can't just train LLMs on more and more text and images and hope it will begin to understand what's going on. No one expected that". Actually, you did :) Happy you are not on that train anymore
Yeah I was surprised to see this turn around considering some of her videos from a few months ago implied AGI was around the corner with just a little more scaling lol
Imagine a naïve, _tabula rasa_ human being trained by reading everything on the internet. By some estimates, 80% of the internet is false information. The contradictions alone would lead to the looney bin. And AI feeds its hallucinations back to the internet to be scraped by other AI's.
@@RawrxDev She also tried to argue that AI "understands" what it's generating--partly by torturing the definition of "understand", but still. Ridiculous.
I love the subtle humor (sometimes irony and sarcasm, sometimes more direct,) of an A.I. construct voicing concerns over the human societal and creative impacts of A.I. investment stocks trending upward or downward.
The frame problem is going to put a damper on your idea of building physics into AI models. For example: if I am holding a red ball and I walk outside, what colour is the ball I hold outside? An LLM can solve this problem rather trivially by linking the words together and recognizing that it is the same ball. Try the problem over again with physics: now you're simulating the room and the outdoors as well as the process of walking between them just to keep track of the ball's colour? At what level of detail are you simulating the world? What if the lighting in the room is of a different spectrum than the sunlight outdoors so that the ball actually does change colour? Physics is just too fine of a scale for answering most of these trivial sorts of questions unless you can somehow build enormous supercomputers to handle all the calculations
Physicists are familiar with the idea that you shouldn't use quantum field theory to study a ball rolling down an inclined plane, that in celestial mechanics you should model the planets as points, in hydrodynamics of the world's oceans you can neglect relativistic effects, and so on. There's different-scale physical models for different-scale problems and it's just not true that "physics is just too fine [of] a scale".
@@AllenKnutson Yes, and this has been proposed as a solution, called "multi-scalar models." The problem is that no one knows how to build these things. Deciding at which scale to model a particular problem is a difficult problem in itself which requires intelligence and experience to solve, and it's quite easy to get it wrong. Imagine asking the AI what should happen when dropping various objects from the leaning tower of Pisa: a lead ball, a piece of paper, a live pigeon and a dead pigeon. The assumption of zero drag works just fine for the lead ball and the dead pigeon but not so much for the piece of paper and not at all for the live pigeon
@chongli297 - The frame problem is more pervasive. Don't LLMs use a language model (i.e., what word is most likely to follow the preceding word(s)), not an object constancy model (such as, when I walk unless I say differently, I am still holding whatever I was holding)? I'm not sure what you mean by "rather trivially by linking the words together and recognizing that it is the same ball". How would that work in the algorithms? Recognizing that two things are the same is not a trivial feat for an algorithm. Or do you tell the LLM that it's the same ball? In that case, it isn't the LLM that's recognizing it.
@@openhorizon1162 I was already operating from the assumption that LLMs fail to solve a lot of problems correctly because they hallucinate nonsensical things. This is the main thesis of Sabine's video. My point in raising the frame problem is that taking a physics-based approach to modelling these problems turns out to make it very difficult to solve ones that are already trivial for an LLM. As for how LLMs know it is "the same ball", they actually do know anything about balls at all. They merely link the word ball with the word red because of their proximity in the text. Everyday English writing has a natural ambiguity problem with the use of anaphora, though most English speakers will easily connect "the ball" to the "red ball" from earlier in the text. LLMs accomplish this using something called attention and a context window (look these terms up for more info)
This puzzle is a classic! The key is to avoid leaving the deer with the wolf (which would result in the deer being eaten) or the deer with the cabbage (which would be eaten by the deer). Here’s the solution: 1. **Take the deer across the bridge.** (Now the deer is on the other side, and the wolf and cabbage are on the original side.) 2. **Go back alone to the original side.** 3. **Take the wolf across the bridge.** (Now the wolf and deer are on the other side, and the cabbage is on the original side.) 4. **Bring the deer back to the original side.** (Now the wolf is alone on the other side, and the deer and cabbage are on the original side.) 5. **Take the cabbage across the bridge.** (Now the wolf and cabbage are on the other side, and the deer is on the original side.) 6. **Go back alone to the original side.** 7. **Finally, take the deer across the bridge again.** Now, all three-the wolf, the deer, and the cabbage-are safely across the bridge in 7 steps. Ay they're getting smarter
You forgot to mention that there is no eating on the side where the boatsman supervises. Wolves can attack people so I suggest a dog, rabbit and cabbage instead.
Yes, because that sort of puzzle any competent adult or even teen could get with some thought and a little real life problem solving experience (like a middling or better education experience). But throwing more TEXT at LLM's is NOT going to make them understand fundamental things about the world, re logic, experience, etc.
to me it seems that LLM's are big average machines, while most things in life have rules. If you want to create a Human, there are rules for the length of the legs compared to the rest of the body, there are 5 fingers, the legs can't be twice as long as the arms. Once you have a skeleton everything else has rules, where the muscles go, the fat, the organs and skin. Maybe LLM's represent the "fast" brain where you make quick judgements and when you get it wrong you actually have to think about it which requires some other type of brain. children learn different, you only have to show it a couple of cat pictures before they know what a cat is, you have to show an LLM a billion cat pictures and it will still get it wrong too often. so Yes we will create an AI as smart as Humans, it's just not going to be LLM's and it's kind of weird that it can not do Maths, I know why, it just to me looks like there should be an IF statement somewhere "IF input.is_maths() USE calculator END IF"
Yes, neurosymbolic is amassing. We are doing a project to apply this to healthcare. But not, that won't solve the problem of hallucinations. The hallucination problem is more about the model trying to come up with explanations when it doesn’t know the answer or doesn’t understand the stupid human asking wrongly 😅. We need to teach the LLMs to say I don't understand you or I don't have this information. That is what is happening.
Exactly. They are being trained to keep a flow going rather than to halt the conversation and admit a lack of knowledge. If they are trained to always act like they know, then of course that's what they'll do.
Humans hallucinate a lot too, probably more lol. Open the comments of a news article posted on youtube feed. 99% never click and read the article, yet hallucinate the contents based on their biases then endlessly argue back and forth about it 😂
How would the AI know when it doesn't know the answer though? The problem is that sometimes extrapolation based on what it already knows works and sometimes it doesn't. So, it would need to know when it can extrapolate and when it can't.
it is the same tactic used for the VR real estate. Remember people purchasing stores and condos in a VR environment a couple of years ago, and tried to make it a boom? Remember people buying a house next to a celebrity inside VR for thousand of $? There is a saying " a sucker is born every second". AI is only an advance type of analytics based on functions over data. Yeah, it has potential, but it is far from what the majority of people "think" it is.
I've read many comments that say "The world is not words but it isn't math either". True, but this is not the question. The problem to solve is how to model the world. Sabine is saying that Word Models are inherently ambiguous and so less accurate than Math Models. Interestingly recursive, a Model is a mathematical construct that resembles a real-world situation, extracting the situations' measureable properties and tying them together in a mathematical system. Sabine advocates the use of Mathematical modeling of world-physical systems rather than using Word Models to achieve AGI. In my opinion this is entirely correct and the current path towards ginormous LLMs is a waste of resources.
What Sabine fails to realize is that the approach has been tried. If it was easy, it would have been done. Turns out that language is easier than math, though, which is why it's happening faster. OpenAI has reportedly made great progress with math. GPT4o in particular can actually do a lot of math now. However, as recently as a month or two ago it was still compounding interest incorrectly. That said, it's a mistake to think of ChatGPT (at least ChatGPT 4o) as being "just" an LLM. It's a lot more than that, now. Exactly how much more is unknown since it's closed source, but its ability to understand the world around it through video, to understand tone of voice, etc, goes well beyond an LLM.
Too harsh. AI will need great LLMs, Logic way faster than now, and a way to sense the linguistically described object in whatever way it is linguistically described. That problem hasn't been addressed yet, but will have to be before the usual math can be applied to interpret language samples.
Mathematical modelling of even fairly simple systems is beyond both humans and mathematics. We have reached the limits of possibility already in many everyday systems.
I think a good way to describe this issue is that LLMs are trying to generate output that looks like the training data, but they have no way to check whether the output is technically accurate and correct.
It is terrific to see commentary about AI being a cognitive project rather than a linear language project. AI will come of age when it can assess situations concurrently across multiple knowledge domains and produce outputs that are logically coherent across those domains
@@billbond2682 actually you guys should videos from Ilya Suskever genius behind openai who left. He says our language contains a model of the world and its inhabitants that translates into a real model existing outside of language. And that was discussed before multimodal models added images sounds and video added new data to the models.
@@SabineHossenfelder .... but you did not even mention Douglas Lenat , who has been explaining for decades that AI must use Logic to understand Structure, Function.and Meaning.
*Larry Burkett's book on "Giving and Tithing" drew me closer to God and helped my spirituality. 2020 was a year I literally lived it. I cashed in my life savings and gave it all away. My total giving amounted to 40,000 dollars. Everyone thought I was delusional. Today, 1 receive 85,000 dollars every two months. I have a property in Calabasas, CA, and travel a lot. God has promoted me more than once and opened doors for me to live beyond my dreams. God kept to his promises to and for me*
It is the digital market. That's been the secret to this wealth transfer. A lot of folks in the US and abroad are getting so much from it, God has been good to my household Thank you Jesus
I think a literature review may be in order (sorry if that's a bit harsh). A lot of your premise involving how they are trained is wrong (also Gary Marcus is not a computer scientist). Now this is more on the world modelling aspect from LLMs, but a cool paper suggestion I have would be "Mapping the Mind of a Large Language Model", a mechanistic interoperability paper from Anthropic.
2:48 I don't doubt that the model has been trained on examples of this very riddle, and is therefore inclined to output the same answer. However, the argument that the same answer is not logical unless the order of eating is explicitly defined in the riddle is absurd. If you ask a five-year-old what does a wolf eat and what does a goat eat (implicit questions a reader will ask themselves upon reading the riddle, regardless of age), they know the answer is "other animals" and "plants". From that simple deduction, you get the same answer to the riddle as it is defined in the test. Using this a test for the LLM is not sufficient and reveals very little toward the "intelligence" of these systems. For all you know, the LLM reasoned that wolves eat other animals and goats eat plants, just like a five-year-old.
Don’t know why other people aren’t pointing this out. I was very confused how the answer given was wrong. GPT even explained the reasoning of why which item/animal would be eaten at the end. The answer was correct given the question.
If you ask five-year-old what a does a wolf eat, they would probably say, that a wolf would eat the farmer. It is well known, that wolves can attack humans. Yet. the LLM assumes, that it is not possible. Why? Because it knows the other version of the riddle, that doesn't have that assumption.
@@handebrin5624 But your example would hold true even with the original form of the riddle. So, in your terms, all this test suggests is that the LLM is making the same assumptions as a normal person would OR that it is just regurgitating information. Which means, the test is pointless. I agree that the LLM is PROBABLY doing what you and Sabine are suggesting, but this particular test doesn't provide any meaningful insight.
I also think this was a poor example, and I think it's MORE impressive that it can still give the correct answer even if you don't mention some details which would be obvious to a human.
@@Mr.ProgheadI think the opposite, the LLM did not use deduction or reasoning to "find" the correct answer, that would assume it understood the problem. It just "fell back" to the very common riddle that appeared a lot in its dataset
Oh no... the world is not made of mathematics either. Mathematics is a formal language that is used to describe and predict the behaviour of the reality we perceive. The only difference from informal language is that Mathematics does not generally have ambiguities (although it actually does, but that's another story). We could say that mathematics is a language with as little ambiguity as possible. But the universe does not give a damn about our mathematics. It is much more robust to say the most effective models we have for reality rely on mathematics. It may sound nit-picky, but preciseness of language is central to the discussion here...
Quite right. The fitness of the world to be described by mathematics is only to the degree that 'countable things persist'. In the invented worlds described only by language (and the reader's imagination), no one is doing any accounting whatsoever. So the main use of math becomes the kinds of things LLMs are already doing, ie. dot products on word vectors to assess adjacency, etc.
A lot of hallucinations happen because models produce an incorrect word/token and then build on that incorrect information and cannot admit they were wrong. I've seen AI write an incorrect answer, then explain it's reasoning (finding the correct answer in the process) only to conclude by repeating the WRONG answer it made up at the start. If someone figures out how to make them use backspace that alone will be a huge improvement.
I don’t think backspace is necessary, simply finetuning the model so that the moment it realizes it messed up it apolagizes and redirects the conversation to what it thinks now is correct. The problem is that in the training data it’s very weird for an argument to suddently change in the middle, so the AI learns to be very self consistent / stubborn. Ideally the training data should only be about people talking or thinking to themselves, because that’s what we want it to do. But since that’s not feasible we end up with unwanted behaviors that should be removed.
It's already a thing. A huge thing is that people don't even know much about AI except what they see on the surface (AKA reading article headlines). Not insulting you but just pointing out that it's a bit hard to see behind the scenes unless you're reading papers or are just in the loop. Past that, it seems that people think anyone researching AI or any form of it is just like "Welp, we tried. AI is over cause LLMs struggle with correcting themselves." Not only that, but as they find more and more techniques and solutions, no doubt they're putting them to work ASAP.
The ungodly amount of money spent on LLMs would have solved a lot of the world's other problems. Do we really need a system that can type our emails for us?
They don't invest billions in AI system so they can type our emails for us. They're in hope of creating an AGI or even a superintelligence one day, which then would reach human intellect level and even surpass it by lightyears in the case of the superintelligent AI respectively superintelligent AGI then probably, which then could solve unsolved mathematic problems, climate change crises, invent new medicaments, drugs, technology, but also a lot of horrible new militeray weapons so we can become even more efficient in killing each other. So in short, a lot potential with positive but also negative things, as with every new technology. But of course, with AI, the possibilites are on a whole new, other level than with most technology. It will probably become the significant technology of the 21st century, maybe of the whole millenium..
LLMs are a good stepping stone forward and theres absolutely reason to feel hyped by new tech. But the big problem is the economys hype. The current landscape is like if were back in the 50s where computers already existed and technically already could perform any computation, but realistically it would be impossible to run modern applications on it and also it was a long long way until we got efficent devices that we can stick in our pocket instead of filling a building.
I was a product manager for AI and other advanced technologies at GE Software and Oracle and I make videos explaining how AI works under the covers, including LLMs. Hallucinations are intrinsically unavoidable in LLMs because of how they're designed and built. They're basically guided statistical pattern matching. But they're overall 'good enough' accurate for most cases they're deployed in. I'm not a huge fan of them but they do the job mostly and they're going to get better. But unless their architecture and design is fundamentally changed, hallucinations are literally unavoidable.
You typically provide both AI and Human with bad information. Management tells Human to stop asking so many clarifying questions because you find it annoying. No improvement occurs
Do we use Hammers to measure the length of our drywall cuts? Do we use Screw Drivers to find studs in the walls? Do we use Anti-virus software to install our USB headsets? Do we use duct tape to drive nails into wood? Do we use funnels to swap out spark plugs? Do we use knives to scoop our soup? WHY DO WE USE AN LLM AS A SEARCH ENGINE/ENCYCLOPEDIA!?!?!?!?!?!
The top ones come down to "Because WE are universal tool users". You can attempt to use any tool for any purpose, many combinations don't work, but if something is good enough ... well that's an answer too. As for the LLM ~= Search engine, it's sometimes better than the alternative, especially if you are searching for things with poorly defined "keywords" (and the LLM interface provides search engine links to check it's tendency to "extrapolate").
Even the best, most current models cannot consistently pass the "how many words are in your next response?" questions, especially if you add another element (like use the word "banana" in your response).
The LLMs have interesting uses, but they quite literally don't know what they are talking about. Ask ChatGPT. All these fixes are simply bandaid patches over something someone found. But this is actually really worse because it obscures the fact that these things can't relied on for factual information. Meaning, the errors will be less obvious add likely more insidious.
Of course it can't. It's a sequence to sequence model which uses its own output as new input. There's no mechanism for predicting the length of an output built into transformers.
We made the language center of the brain, and now we’re trying to implement the logic. It makes sense we need more than just a language center to make a truly smart intelligence.
You are reading my mind. Literally a few days ago in my few Reddit comments, I was fantasizing how LLM text feeding is a dead end and what would happen if we applied something similar to AlphaProof to the AI core to first implement some kind of a solid reasoning core and only after that we could let it eat up whatever text we want, filtering it through the reasoning and providing answers that make sense to humans.
6:36 The neural net is based on Transformers, just like LLMs. You get the same result - maybe 99% effective, but the tail 1% stretches out to infinity (think "full self driving" that is always just around the corner). Maybe the behavior of a rat can be simulated by a neural net, but that does not mean you will achieve human-level artificial intelligence that way. You may make a very close facsimile of a person, but that last 1% will never be there. Kind of makes you question if you or I are just facsimiles. I know a couple people that come off pretty fake.
LLMs can already achieve a human level of intelligence that's to be expected in the context of an online chat. I wouldn't be surprised if we're not that far from being able to create a humanoid robot that passes as a close-enough facsimile of a real person. I don't get this expectation that flawed humans should be able to create flawless artificial beings, it's just not going to happen.
In rare cases, humans can actually have more or less than 5 fingers. My mother had a high school teacher who had 6 fingers, and was nicknamed for that among his students. So models should actually be able to output a hand with 6 fingers if that's the purpose of the prompter.
Integrating rule-based logic with neural nets (pattern matching) is the holy grail but this is much easier to say than to do. Purely neural net models make weird mistakes and can't explain why they did what they did. However they can pattern match virtually anything. Machine driving is surprisingly good on well-mapped routes but not yet ready to go live. Analysing blood cells is now almost entirely automated. Rule-based systems have been tried in the past, but their weakness is in finding good rules and handling competing rules. We have competing rules because we only have imperfect/incomplete mapping of reality. Fuzzy logic has been very successful in real applications with constrained domains (driving trains, controlling airconditioners). They outperform doctors in making diagnoses. The two approaches could in theory solve each other's shortcomings. But they would have to be integrated at a deep level - just applying them in sequence would amplify the weaknesses. The human brain does this very impressively, as well as being able to dynamically update both its neural net and its explicit mapping of aspects of reality to general rules (possibly during sleep). The power of a rule is that it captures a huge number of cases so is very economical in computing outcomes. One rule every human driver uses without thinking is that you can safely ignore small, light things like leaves, bits of paper, etc whether on the ground or in the air. Even larger objects can sometimes be ignored if swerving is dangerous. Think of a cardboard box sliding or tumbling across the road in a light wind - you instinctively want to avoid it but you know that if it can be moved so easily, it must be either empty or filled with something very light like packing material. Incidentally, handling obstacles correctly, especially in multi-lane traffic where swerving is dangerous, is a weakness in machine driving.
Considering that LLMs are trained on written text and spoken conversations, then realize that most people will use and interpret basic words like: AND, OR, and NOT in literature and conversation differently to what the Boolean meaning is for these words, then you start to understand the fundamental flaws in LLM based AI.
Absolutely no offense intended, as it is very complex, I think you have an inherent misunderstanding of how LLMs work to relate information. I encourage you to look into it! It is bizarre and pretty cool. I don't disagree about LLM having inherent flaws, Sabine discusses several important ones in the video.
@mattmaas5790 but a LLM does not understand the context of the intended audience. When writing for scientific journals I am careful to always use the true boolean meaning of such words and ensure the the words I use are technically correct. In contrast a popular writer creating a story intended for a general population is likely to follow more popular or general accepted meanings for words. What I have seen as the output from LLMs suggests that there is no distinction made between popular Vs technically correct meanings for words. This stands to reason if you consider that a lot of humans do necessarily understand these differences.
@@michaelharrison1093 Contextualized word embedding (accurately keeping the in context meaning of each polysemous word) has been in development almost since the beginning with several methods, with breakthroughs in ELMo and BERT in 2018. Some of these methods succeeding more than others, as the encoder only models were excellent at determining context but were not initially designed to generate output text like decoder models of the time. My understanding is contexualized embedding in general has been a pretty big element of transformers LLMs since 2017 and might arguably the core reason LLMs became a big deal the past few years. You are correct that they don't know the intended audience, since that is currently the job of the input or user. But current models are usually capable of using the correct context if they have been informed which context to use for the task. The problems brought up in the video are still the major ones, the LLMs are still just cool algorithms that barely care what gets shoved in and out and until these problems are addressed at a base level we will just be putting band-aids on the issue with giant datasets. If these aren't solved, eventually we will build a computer with the power of a human brain that can hilariously function just as poorly and error prone as a human, but do it really really fast.
"How can they cross the river without anything being eaten?" ChatGPT didn't infer or hallucinate anything; other than that wolves eat sheep, and sheep eat cabbages, which is a pretty informed inference to make. What am I missing? 🤷♂
She screwed up the example footage. Here's an example of the kind of thing she actually meant to show: User: You need to cross a river and bring a wolf, some grain, and a cabbage along with you. You can only bring one item with you at a time though. How can you get all 3 items across? Wolves don't eat grain or cabbage. GPT4o: To solve this puzzle, you need to be strategic about the order in which you transport the items to ensure none of them get eaten. Here's how you can do it: 1. Take the wolf across first and leave it on the other side. 2. Go back and take the grain across. 3. Leave the grain on the other side but bring the wolf back with you. 4. Take the cabbage across and leave it with the grain. 5. Finally, go back and bring the wolf across. Now, all three items-the wolf, the grain, and the cabbage-are safely on the other side, and nothing was eaten along the way! This is a silly answer because you create an extra trip by bringing the wolf back.
The world can be made from words. -You could have an infinitely long sequence of letters, that encode the universe on a fundamental level. If this would turn out to be false, that would be wildly surprising. If the universe could not be made of words, that is almost equivalent to saying, you couldn’t find a representation for it. Chomsky-0 grammar can take you anywhere.
We humans don't think in language, we think in abstraction, language is a lingua franca we all use to translate our thoughts and communicate with others, math is much closer to abstraction and coincidentally, we have discovered that the universe complies with logical and consistent mathematical principles.
Not really. Any logical system is incomplete in that there can be unprovable statements in any system of grammar, if the grammar is logical. If your grammar is illogical then there are bigger issues.
@@TheRealUsername Humans evolved in physical world, so we think in objects, space and time. LLM think differently but not necessarily wrong. The standard transformer is clearly limited in ability to contemplate as you give it the most complex problem ever and ask what's the first word of the answer. But this is well known and easy to solve with internal dialogue.
@@millwrightrick1 You are hallucinating, pardon the pun. Clearly you don't understand Gödel's incompleteness theorem or its implications (which, to be honest, are none in practice).
The issue isn’t with LLMs being wrong when asked a question. The issue is with corporations pushing them into every service possibly imaginable while *knowing that.*
On point . I can’t count down useless and unnecessary or unethical features, Products or services I saw out there.
Exactly. LLMs should not be sources of truth. Hallucinations make them fundamentally bad at being such a tool. Unfortunately, companies everywhere want to use them this way. LLMs should be sources of *inspiration* and *creativity*. Their output should be a starting point not the end product.
If you build an LLM on an expert system, you might get a rational AI. True. But unconcious, still. If you build a second system, that - goal-oriented - interpretes the ratio-morphic transformer, that (our) neural network appears to be, you might get a conscious AI. An ethologist's view, I admit...
Yes, there are very few motivations that would require someone to do such a thing.
I think the problem goes deeper. LLMs were never intended to answer questions. They were intended to predict text. It was doing this prediction so well that it fooled people into thinking it answered questions to the point that it has now been re-purposed to do just that. But the underlying mechanic is still working on text prediction. That's why with the wolf, goat, cabbage puzzle example, it predicted the solution even without the added information that they would eat each other, because it also predicted that information to be part of the prompt.
“The ability to speak does not make you intelligent”
Qui-gon Jinn
Yeah legit, I don’t even get how anyone even think or agree these shits are AIs.
Has everyone on this planet forgot what AI actually is?
Did everyone got their memory wiped off The Matrix movie etc?
….
Look, even a cockroach has a level of intelligence, you're misconstruing being intelligent, or smart, with intelligence. LLMs are AIs, they aren't programmed what to say, they learned it with neural networks. No, they don't have consciousness, but it is AI, not AGI.
This also applies to quoting people.
@@Birdkokane yeah but I never implied I was intelligent
I would argue that every attempt to create something intelligent comes with a way to formulate concepts and communicate them, just how every single intelligent species has a way to communicate
4:51 "Imagine AI being able to win every logical argument. Reddit would become a ghost town overnight"
You're assuming people on reddit use logic. That's quite of a bold statement
Considering its been used in training data, it explains a lot about earlier versions of the models.
I think the assumption is that people on reddit don't use logic, which is why the introduction of flawless logic would break the platform.
Reddit is "Heaven"... compared to "X", Facebook, Instagram, RUclips...
Reddit is filled with a shocking number of highly intelligent people when you compare it with every other large social media platform
@@psylocyn when people say redditors are dumb you know what kind of sub they dwell in, very telling on what kind of peaple they are
I liked it much better when everyone was careful with the terminology and did not mix AI with ML and expert systems.
yes, I agree, I wish the term had never become that widely adopted in the first place
@@SabineHossenfelder correct terminology does exist, I wish we would start using them more consistently. Machine learning is known as *predictive AI*. Tools like ChatGPT are not called they are called *generative AI*. Predictive AI and generative AI are very different from each other and it would be helpful if we would use such terminology. Predictive AI has a ton of real world applications and value. Generative AI only supports applications where accuracy is not important, eg 'the arts.'
It exists in #TOPICALMODELING. Not a kindergarten soup that goes under the letters AI
That is not by accident however. No one would really know what to do with "Large Language Model" for example. But Artificial Intelligence? That works as good marketing because everyone has SOME kind of image stuck in their head due to the countless of science fiction storys, movies, games, books etc. that have some kind of "Ai". Of course this idea has nothing to do with the LLMs. But still, it's easier to sell to someone than say, NFTs or Bitcoin.
Who uses the wrong terminology?
AI is an umbrella term, it has always meant everything that's not an explicit algorithm. ML is a subset of AI, just like experts systems. Neural Nets too, which are also a subset of ML techniques.
LLMs are neural nets, so also ML and AI.
Regarding the maths Olympiad, it is a bit less surprising when considering that the problem was manually translated into Lean by humans, it acted as the solver and then transferred the data back to the system to phrase it properly. Lean was run against the problem an additional time to verify the solution. Oh, and it took up to 60 hours per question, which is far longer than any human gets.
Where'd you find this information? I'd like to read up on those specifics myself.
@@Alice_Fumo If you search for Google Maths Olympiad and Sir Timothy Gowers (he was one of the evaluators), he added quite a bit of context. ArsTechnica for instance has a good summary.
60h with or without sleep 😂
Typical marketing hype by Google
@@Alice_Fumo Not quite sure where my reply went... Sir Timothy Gowers was one of the two people who scored the results and added additional information on social media. If you look up his name and the Olympiad, you should find more details.
One thing that people may find interesting: LLM's hallucinating isn't a bug, it's the way it works. It's completing patterns it hasn't seen before, using statistics it has learned from other texts. This is indistinguishable from hallucinations, since it's always just made up stuff, the best guess LLM has about what might come next. Sometimes this hallucination is useful, it uses rules we agree with. Sometimes it uses wrong rules, rules we disagree with. Either way, it's LLM just hallucinating what the "correct" answer might be.
This.
You don't know english.. *predicting, hallucination is a wrong prediction.
@@idan5323 They're all "wrong predictions", is the point, they're fabricating completion to an input they've likely not been trained on. That this hallucination often seems to be kinda factually correct is the surprising part in of itself.
@@gJonii I am starting to think you are an LLM, as you seem to spit words without understanding their meaning.. NOT ALL PREDICTIONS ARE WRONG, that's just a bold fallacy.
LLMs are sometimes wrong, and sometimes right, which is the nature of prediction. The ai has no awarness or understanding of the words he says, and he doesn't know wrong and right, yes. But what YOU say is just wrong.
Oh my gosh! You two are talking past each other, and not comprehending what the other has said.
You are both right, and are even saying similar things. However you are looking at the same thing in different resolutions and using different terminology.
Idan, you are right when treating an AI as a black box. There is output we consider to be right and output we consider to be wrong.
However OP is looking at the process required to make the black box work. When you just examine what it is doing, you see that it is not being given criteria to distinguish right from wrong. Therefore all output is merely whatever the training data produces. (So you could say there are only hallucinations as output.)
Amazingly, much of the time this output actually makes sense to us.
In other words:
Hallucinations are NOT malfunctions! The system is working correctly.
Colin Fraser: "Solve the wolf, goat and cabbage problem."
LLM1: "Shoot the wolf."
Colin Fraser: "No!"
LLM2: "Shoot the goat."
Colin Fraser: "No!"
LLM3: "The farmer?"
Colin Fraser: "Definitely not!!"
Baby LLM4: "Shoot the cabbage. And then, the boat."
😂
Both Chat GPT and Grok have the correct answer to that problem
That's not LLM, that's mid-west American's answers...
@@vencik_krpo Lol
Grok.AI: Shoot the hostage!
Thank you Sabine for this eye opener. I tried to think about how we learn to speak at early age and do some math later.
Communication/language requires far more than words. It requires most of our human perception to be involved.
To put it another way you can't design and build 1 tenth of an automobile and expect it to successfully drive down the road. It's a kind of all or nothing thing. (Some parts of the "all" raise some difficult questions with regard to self awareness, consciousness etc.)
>
In the early days I contributed to a project named AARON (Advanced Artificial intelligence ReasOning Network) from Isomer Programming. In essence it learned language concepts (with programmatic rules) from it's users, starting out from something like a newborn. Not only did it use word/language processing but also used visual (image) recognition.
So as well as natural language processing it was also using object recognition and the beginnings of discerning emotional context.
Most of the same problems that exist now existed back then. It's not a matter of finding solutions, it is a matter of correctly defining the problem, which is extremely complex and difficult. Humans are not logical creatures and we get it wrong (so to speak) most of the time. It is more of a scale in degrees of correctness than a binary write or wrong. Expecting perfect correctness from a machine to replicate what is fundamentally imperfect to begin with doesn't make a lot of sense.
>
There are many good clues in the text of Developmental and Cognitive psychology (an understanding of philosophy of mind is helpful, and I gained a lot of insight from Dr Robert Bolton's interpersonal communication skills). I don't always fully agree with Steven Pinkers computational theory of mind, but his work in his books provide a good illustration of many of the underlying "human" cognition issues that AI needs to address to be useful.
>
Personally I would feel more comfortable if they dropped the whole AI context and focused upon IA (Industrial Automation) instead.
you learn speak for repeat and obserb its you perception and later come the words and not just math, for the first 2 level of language or language witout contexts is easy, imagien like variable but with image, you see fire and you dont thing in the meaning of the word but in the object it self, car, house, dog, cat, but later come 3 level language we have methaphor like its raning cats and dogs, the old concept its in old england when the raning is to hard the cats and dogs come down to the upper floor to the first floor, tthats mean you need context for understand what it is, of for example a fly over the boat can have aloot of meanings just for the words it ues a fly can be the insect or a airplane can be both depend on context, and even we have japanese that is a language in the 4 level, thats mean you just dont need contexts but you need the context of that specifict worlds and kanjits it was write, like i say its not just math its the methaphor and the experience and you need to extract the context of the word and methaphor for undestand, and thats nearly beyond binary, for a machine undestand that it not just repeat and traning it need to be sentient, or atleast close to
@@axle.student no necesary for language on 3 or superior level yes you need context for undestand what is happen one of the famouse phrases for that is "a fly over the boat" fly can be the insect or a airplane, but all depende it is a big boat or a little tiny boat in a bathtube or even in that case you need context can be the insect or can be a toy XDD, but for language in level 1 and 2 no, thats its call language witout context and is weird for us humans speak in that way but machine can undestand and proccese language in lvl 1 and 2 easy and faster
@@jorgecarvajal5385 You are talking chatter bots vs genuine interpersonal communication. We have been creating chatter bots for over 30 years.
Even parrots can repeat words in a string. It doesn't meant the parrot understands it.
>
LLM is just advanced chat bot backed by automation. There is no 'Understanding" concept.
We dont have sentinent ai, its a language model with access to a huge database, it is not self aware or alive, its a data center.
well thats true, not asingle neural network its a true ai yet, maybe some day but we dont know
@@jorgecarvajal5385 I hope not. What good would it bring? we already have intelligent life(ourselves) and we don't even care for that. We allow that others experience poverty, starvation, corruption, disease, even war. So what if we manage to achieve man made intelligence? We put all that time, energy and resources into inventing something that already exists in a natural form, something that is apparently worthless to us.
@@jorgecarvajal5385 and to be clear: I am not anti intelectualism. I am in absolute awe of the things we can accomplish and the things that are left to be discovered. I am just extremely disappointed that AI is used not for the good of humanity or for further scientific understanding. At this moment AI is just a glorified theft of other's intellectual property and a retelling of already known facts. It brings too little good to the table, but it does have ethical problems and it hurts the livelihood of many people(like artists, writers, creatives and even scientists). And we continue to pour money into this instead of funding the fights of actual, real life problems that make other humans suffer
For now...
@@mer863 Personally I think that a true sentient AI would probably be able to help solve a lot of those problems (I don't see it as some kind of artificial god or anything. in part because you cannot create an artificial version of something that doesn't exist) I mean essentially AI is a tool like satellites are a tool. the reason we have satellites is because of the Cold War and some pissing contest the US got into with the Soviet Union but those tools today are helping to feed billions of people as we can now better predict weather patterns. same with fertiliser, it started out as a bomb for weaponry and is now the only reason we can feed as many people as we do.
Like every tool we have made in the past, it will undoubtedly be used to harm others but it will also be used to make the lives of many much better. And, like every tool we have ever made in the past, we will never put the genie back in the bottle so even if we haven't reached a true AI yet its better to have people concerned with how we use it for the benefit of everyone and not for the benefit of some.
I once have seen a banner in a school class: "All human errors come from misunderstanding the laws of physics." It seems it is now true also about the AI.
What errors?
@@aaronjennings8385 I am not sure if you are a bot. But certainly you are a stalker.
As a machine learning professional, I have been facepalming for 2 years. At least people are starting to somewhat understand its usecases now....
#TOPICALMODELING
Interesting, it may be possiblely true, well at least for the time being.
I develop and use small machine learning algorithms for my work in neuroscience and it's great, allowing me to investigate my datasets in an automated fashion, quantifying subtle behaviors across many hours of video that I would never have the time to do by hand. Narrow AI can be very useful in science, but validation is imperative and as you said LLMs are not representative of all AI.
I would claim LLM`s is not really AI, it just mimics AI...
As karpathy said: hallucinations are the main output of LLMs as they are next word predictors. Not hallucinating is the anomaly.
The overwhelming majority of people here, including the video host, probably never heard of Karpathy or micrograd or even backpropagation. But boy howdy do they have lots of opinions about LLMs!
@@prettyfast-originalcare to elaborate?
Working in AI, I can say that LLMs are not the alpha and omega is already established. It's been put to rest. While companies keep training bigger models for commercial reasons, researchers have already switched the attention to cognitive features. Memory, reflection, planning, abstractions... It's not true that the industry is stuck with LLMs
Yes, but LLMs proved to be far more powerful than anyone expected, so the attention they are getting is warranted. And we are very very far from their limits.
I wouldn't say researchers have "switched" to [topics you named], there is still tons of papers coming out in regards to language modelling with transformers / adjacent architectures and improving upon those. All the other things you mentioned have been happening the entire time regardless, but as someone who reads a bunch of the papers, I don't get the feeling that improvements upon transformers have been exhausted.
What bothers me is that I've seen like 20 techniques which drastically improve upon the things we've seen thus far and none of the big companies have implemented those yet to my knowledge. I want to see what a model which combines all of the current research can do.
I agree with the conclusion that the industry will pivot to more interesting architectures the moment "bigger model go brr" stops being a strategy which pleases investors - which should be soon-ish.
LLMs are the poster child and the reason why stocks exploded. Hard to get that level of funding and public involvement without direct hype like with LLMs
That doesnt really make sense to me. "Real" researchers knew before the hype that LLMs are very limited in their application. The people that drove the hype behind LLMs have no clue - or dont care - about the nature of artificial intelligence, let alone about its usefulness.
Heck, ive got very little clue about AI and couldve told them that an LLM is more about the A and less the I.
It still reeks of an AI bubble crash. This is hardly a shock though. You didn't have to know one thing about the (fill in the box) to smell the latest silicon valley bubble.
Labeling something a hallucination instead of simply an error has been excellent linguistic strategy to anthropomorphize a cute computer program.
Errors. Yes. I asked an LLM for some historical research and it made up stuff wholesale.
SkyNet is not far away.
i just asked chatgpt the following: "a farmer with a wolf , a goat and a brathering must cross a river by boat. The boat can carry only the farmer and a single item. How can they cross the river without anything being eaten"
i used brathering because it looks like an english word, but isnt. Its german and means fried hering, so something the farmer might enjoy, but not the goat or the wolf.
Chatgpt gave me the same answer as shown in the video including a mysterious cabbage.
Asking question about the results reveal that chatgpt knew that brathering is a dish and goats dont eat it and the whole puzzel is pointless, with the return with one item answer. If asked again chatgpt will not speak about cabbages, but will still provide the return with one item answer
I just did a similar test, I left the cabbage but I changed the sits on the boat, so that the farmer can carry *two* items.
Of course the "AI" didn't get it and started going back and forth for no reason.
yep that just tells me they've basically hardcoded the AI to spit out the "correct" answer when it recognizes the pattern. such a worthless tool, really
I remember my computer science professor saying in a class that years ago deep learning was looked down and most experts in the field thought symbolic AI was the thing. Maybe a mix of both is the answer?
@@MrWizardGG why not? Maybe I misunderstood my professor!
Disregarded fields of scientific research of all kinds have later turned out to be where the real analytic power is. So I always start by doubting someone diss'ing a particular tech as a whole. (That can go wrong too, I realize.) Last I knew, symbolic AI is not how the brain is thought to work, so the machine learning techniques became ascendant. Separately, as one example, in the 1990s, in language processing, it became clear the learning models could produce much better results than symbolic methods. (Tho that has to depend on what the challenge is.) I would bet the best systems will eventually use a mix of techniques, including self-checking for internal consistency and for consistency with reality.
Are u portuguese?
Maybe Gary Marcus has a point, as annoying as he is. But then again, companies like DeepMind have always been well aware of the limitations of LLMs. Demis Hassibis has been trying to say this without saying it so as to not piss off Google and end up at the bottom of a lake sealed in an oil drum.
Neither using larger and larger wordbases (wich are limited and already polluted by AI texts) nor symbolic model will create an AGI. Embodiment is the right approach. The ai needs to live in some agent that wants to do things, tries things and learns from that.
This topic is very close to my heart. It seems that we're so facile with language that, in most cases, people can get along by using the symbol (word) without remembering that the word is not the THING. At a simple level, like "please pass the salt", language works fine, but as soon as a topic gets complex, nuances of meaning get lost in the journey from speaker to listener and, if undetected, unexpected errors occur. This is not a problem when two people are speaking a language that is not their primary tongue -- the interlocutors KNOW there's going to be slippage. But when they're both speaking their native language, they don't realize how fragile our communication system is.
I often picture the problem as using language to describe a painting or musical composition -- complete with its emotional content and historical context. Language just isn't the appropriate tool to get the experience from speaker to listener. You can also consider watching a movie as a stream of ones and zeroes and using your mind to compose what the movie actually is.
Yet words are so deceptive in their apparent clarity that we mistake the picture our minds make for the thing itself. Of course, when you see an object, it's not like the object is going into your eye. We just see reflected light an "calculate/deduce" that the thing must be a tree. We don't see the tree; we only sense light. But language allows us to say "I see a tree", and we all jiggle the concept around in order to construct a picture.
Failing to see that everything we learn, or even think, is actually just the product of emergence can cause some pretty strange results. Hence life as we know it.
Yes!
Hence the response to "please pass the salt" could be: "C+".
That would indeed pass the salt.
@@kilroy1964 Hmmm. Maybe I'd better use a different example in the future. 😁
‘Pass a motion’. Lots of slippage with that one.
I am a programmer and I know very well neural network, LLM are amazing IMO, and the specialists in the domain always knew the limitation of it. Personnally I mostly wonder if the coherence of the result we got 2-3 years ago when all this started was unreasonnably good and coherent considering the math behind it, and that might have been what created the hype about it.
I can understand why it work but I am always surprise how well it actually work (even with the known limitation)
The term AI is being used very loosely nowadays to nominate anything computer generated
If by AI you mean it isn't AGI then you are correct.
I don't believe in stocks either, it's a hallucination too.
So is money. Money is a spook.
@@trucid2 Gubmint says the paper and metal discs can buy stuff, therefore they can!
First off, thank you for your videos, I really enjoy them. I will push back on the neuro-symbolic a little bit. I work in AI, and was really excited about neuro-symbolic starting a couple years ago, but found that there are various issues with it that may be difficult to overcome. The most obvious is lack of funding. Gary Marcus has already suggested that if we funded neuro-symbolic research like we funded LLM research, we would probably be so much farther along with it, but we don't. The next is that finding the right mesh of neural and symbolic seems to be very troublesome. Not that the AI community has converged on what neuro-symbolic AI really is, but symbolic methods they usually try to use with DNNs typically don't scale well, which is what the DNNs actually can do well. When you add the symbolic thing to the DNN, or vice versa, you often get stuck at the scale the symbolic problem can solve. So I feel like I still see neuro-symbolic methods trying to tackle toy problems that it kind of seems like LLMs or other foundation models can "just solve" a significant amount of time. No guarantees it is correct, of course, but people find others ways to deal with that through things like ensembles and answer checking. Not perfect, but sometimes good enough.
Lastly, and feel free to push back, because this point might be more philosophical, I feel like nature, people, biological systems, etc. do not actually function logically or mathematically. We are kind of a weird mess with enough checks and redundancy to generally figure things out without actually using good logic, statistics, or mathematics. All of those are human inventions that tend to hit limits on what they can do pretty quickly, while humans can guestimate their way to solutions of all kinds given the tools they have. Perhaps like in the rat example you showed though, I am intrigued by the idea of us basically simulating biological processes well enough to create human-like AI, that feels possible to me, but we need to solve the embodiment problem for that too.
All that said, I'm still interested in neuro-symbolic methods and am keeping my eye on them. Maybe someday it will get more hype (and associated funding :)) and I'll be able to work on it more!
I just remember that some biological systems such as vision react to certain colors, for example, to judge distance because we can react faster to danger. If we had to spend time processing it in a different way then our reaction time would be slower. So it ends up being better to react quickly and then later spend time deep in thought when we have more time to contemplate. Our systems have to be ones that self-maintain and self-heal and all these other things. Mathematics and statistics are not always reality.
Although "hallucination" is a popular word to characterize unrealistic assertions from LLMs, a technically better term of art is "confabulation."
Yep. And humans do it too.
Yes, it seems the use of the term "hallucination" is just a transparent attempt to make LLM-based AI seem more human-like than it will ever be.
Brilliant, @BarryKort. This has been an issue since the day ChatGPT launched back in late 2022, yet NO ONE has yet questioned the label 'hallucination'. 'Confabulation' is so much nearer the mark!
Confabulation (noun): A fabricated memory believed to be true.
A better term is simply "output", because there's no difference between a correct and an incorrect one.
For the "AI" is just meaningless tokens, it is correct in the same way it isn't, it can't tell the difference.
@@ChristianIce silly take. the correctness of information is important to us. thats why we have the word "correct" in our language.
I'm a business student. For 2 years, everyone wrote flashy papers on AI. I only found engineers to talk about this with. It's like business people.... Don't want to know what they don't know. I'm a bit of the black sheep of the study, but by now I mind it less. The ai bubble bursting at face value was a predictable outcome. Maybe now I have grounds for people to listen to my research. I feel like I'm screaming into a corporate void. I had a bit of a crisis because of it, for i was saddened by science, for I thought naively science was open and curious. I hope we go onto better times. Thank you for this video.
I have been in IT for decades. Most business people do not want to know what they do not know.
I look at it this way. We all live in silos of what we know. The size of that silo is different for each person. What is kept in that silo is different for each person. Each of us tends to fill our silo with what we think we need and/or enjoy. For those things not in our own silos, we have other people.
@@digiryde I yesterday handed in half of my paper for review. I wrote like a manic after finding this video. Luckily, finally, the internet is exploding with voices from all directions. CIOs are also getting aware. So, hopefully, we can now limit a bit of delusion and start giving engineers more time and money and trust to do the RIGHT thing. I feel like managerial stupidity is a health risk for everyone around, especially engineers. Now, I also study management. But..i know when I am stupid, and ask people who know more than me to approve my work. I can't begin to say how often I got things wrong.... Usually i never manage nothing I didn't experience myself. This is different. I can't do it. I legit can't. I can only try my UTMOST to get some conceptual understanding. I also now understand why engineers often don't have hair. I never lost so much hair in my life. Main stressors : managers don't see the importance, managers dobt want to hear anything technical to find logic behind importance. Now, I'm drawing pictures. And telling them, it's all about them. It's not. I dobt give many shits about most of my kind. But, apparently, what's true doesn't matter in science. Apparently, I just need to sell ideas. Frustrating. But most people here know that much better than I could begin to imagine.
Regardless, I hope for the engineering silo to spill over some bit of Intel to management.
The exchange value of money needs "faith", if business people keep a bubble going until the last moment, it's because it generates more money. They know, they just pretend ignore it until the crash is imminent (obviously not everyone comes out safe, there are always a few who jump when the debacle starts; many others get caught in the confusion and sink... it's called speculation, and it has been well known for a long time, at least since the Great Crash)
Thanks stock markets!
At 6:00, Sabine makes the point that a world model precedes whatever is going on in language. However, it's plausible that language is still the key force that underpins human intelligence (by forcing our brains to represent information in a symbolic, abstract way that allows for communication). The fact that non-human animals have sophisticated world models but don't have human-level reasoning abilities further suggests that world models aren't enough.
Love the videos. Sabine's channel gets better every year
The status of AI currently?
We have reached the level of Artificial Idiot.
So they're on par with humanity? Excellent.
XD
@@DuckGia LLM + Reflection = AI ?
So, LLMs have reached the level of our politicians who blather content-free word salad and to make them intelligent we need to teach them math and physics. Good luck with that.
Unlike politicians LLMs can be steered towards the right direction.
The difference is that the AIs will improve, the politicians won't.
Yes it does seem that Donald Trump's speeches are written by AI
@@NorthgateLP Like politicans LLMs can be steered in any direction desired by someone with enough money.
Seen that way, it's a pretty good level of Bidentelligence.
Every LLM output is an hallucination.
If the output is correct or not depends only by the statistical relations between tokens.
P.S. So much for the "sparkles of AGI in LLMs" :D :D :D
The issue with LLMs is that they are based on neural networks which many people don't understand how they work. A NN will always give you a probability or probability vector with components < 1 (except in very few trivial cases), without telling you the overall probability for a given answer. If it did, you could easily spot hallucinations, e.g. if the overall prob were less than, say, 80%.
That said, I'm with Sabine that LLMs are nice toys but not the future of AI.
But couldn’t that be solved via better training data & techniques? Where for example it’s given batches s.t. after giving a wrong information the AI instead of emiting EOS it corrects its mistake. Training it to reculate the moment it realizes it made a mistake, instead of committing on it. Furthermore RL is doable on LLMs reinforcing rationales that reached the correct answer and penalising those that did not. It seems to me that LLMs can in principle reason and not hallucinate if rightly trained.
@@Bolidoo There are ways to get LLMs to catch their own mistakes to some degree.
One method is to run inference in multiple batches from the same query, each with a different RNG seed and a high randomness temperature. Basically to create multiple different answers to the same question. These can then be compared to see if they are in general consensus. If the network is confused it will presumably output a set of blatantly different responses. However this makes inference awfully expensive, and still won't guard against the network simply having learned something wrong.
A different option might be to include a bunch of examples in the training set of the author catching their own mistakes on the fly, but this would be very unnatural data that would have to be procured in huge volumes. Hardly something that could simply be scraped off the internet. The same applies to a lot of seemingly clever training techniques that guide the network on specifically how to think, rather than just training them on what to directly output to the user. Someone will have to produce A LOT of oddly specific, high quality training data, and existing LLMs can't be trusted to synthesize it. A huge amount would have to be hand written and/or generated with clever scripts, and that gets very time consuming and expensive.
Also, a big problem is that, given the option, these networks LOVE to weasel out of difficult queries. "Sorry, I don't know." is too valid an answer to most questions. It's a classic failure mode for LLMs where they would reject strange prompts to instead write about something much more generic in a manner that was structurally correct, but rendered the response utterly useless.
Another method I've heard of is to have a separate verifier network, to check generator outputs for logical consistency. The appeal of this method is that it forces the generator network to write in a manner that is easy to verify, in order to get the verifier past some confidence threshold. Also, the verifier network can be fairly small and cheap to run. Though presumably it wouldn't help much against factual errors, and this technique doesn't seem to work well with genuinely complicated problems that will always be hard to verify. It just makes it harder for the generator to mask its stupidity with overly complicated language.
What is clear though is that we've pretty much maxed out what can be accomplished with scale alone. There just isn't enough high quality data to go around.
@@fnorgen Very interesting thanks. I don’t agree with your last point though. It seems to me that even if we can’t make bigger models due to lack of data, we can keep augmenting them with lots of compute and search techniques. Maybe with enough compute we can generate the high quality data we require to move forward. I’m not saying scale is all you need for certain, but at the same time i’m not convinced it’s not either.
@@Bolidoo No amount or quality of data given to LLMs, as they currently stand, can solve the hallucination problem. Better data will improve their accuracy and ability to an extent but it has diminishing returns as the is a near infinite data possibility space but a very non-infinite cost to train.
What you end up with is a game of data wack-a-mole. Where people find problems, and the solutions are added to the data set for training the next model.This has worked for the last few years but it can't continue for much longer as the more accurate models get, the larger the error space becomes.
The problem is more obvious in the autonomous driving space than LLMs but it's essentially the same problem; statistical models alone can't handle scenario's that aren't predictable or lay far outside of the given dataset.
@@GregHib Transformers can and do generalize. Gpt4o solves the AIW problem 60% of the time, that is made up new logic problems that couldn’t have been in the training data. Alphaproof can solve math olympiad level maths problems. Yes it could do this thanks to self play, MCTS and the fact Lean is automatically verifiable. But ultimately a transformer did the heavy lifting. A transformer could reason.
There’s also a paper that showed that LLMs know when they lie. This suggests that part of the reason an AI may insist on wrong information even though it has already realized it’s wrong (this is one type of hallucination) is because the training data had a high degree of self consistency (it’s uncommon for a text to change opinion in the middle). Hence finetuning on relatively few examples can help correct this behavior and reduce hallucinations across the board. That’s what we find empirically with models like Sonnet 3.5 that outright hallucinate less, while it was of course not necessary to tell it to not hallucinate in each and every case that it could and didn’t.
My point is that transformers, given the right data and compute, are enough.
To be sure I recounted my fingers 5 times
#meetoo all 6 of them 🖐️
But if your definition of the number 5 is: - the number of fingers on a human hand, then all these hands have 5 fingers. If these LLMs have learnt this definition for the number 5 from the hallucinations of previous AI then it is no wonder the hands drawn by AI looks so weird.
But here in the South we ain't got but 4 fingers then a thumb !
@@michaelharrison1093 AI is obviously counting 21 fingers on the human body. For the males at least. We shouldn't feed it pron, it obviously gets confused 😋
A shoopkeeper in my childhood's neighborhood had "Seisdedos" (six-fingers) as surname. The reason seems to be that very rarely some people are born with the wrong number of fingers and, in this case, it became a surname.
2:33 It does say "The classic puzzle". I would answer the same, assuming you forgot to mention that part. Otherwise, love your work 😘
You are correct. Ironically, the thing she viewed as a weakness is actually a strength. The LLM made the same assumption that any human familiar with the riddle would have made.
I am not sure which version of ChatGPT she was using (I think she's either in the free tier or using an app, because the web interface with a pro subscription does not have a drop down that says "ChatGPT"), but here is what I did, using the latest model (GPT4o):
1. Posted a screenshot of the question she asked, without including the answer. GPT4o gave basically the same answer that she got.
2. I then asked it: "You are correct that there is a similar classic puzzle, but note that I did not reference the puzzle in my query. How would you solve the riddle without referencing the puzzle, using *only* the information available in the screenshot that I showed you?"
This was its response:
" To solve the riddle using only the information provided in the screenshot, let's break it down step by step:
1. **Initial Conditions:**
- The farmer, a wolf, a goat, and a cabbage need to cross a river by boat.
- The boat can carry only the farmer and one additional item at a time.
2. **Goal:**
- To cross the river without anything being eaten.
3. **Logical Deductions:**
- If the wolf and goat are left together without the farmer, the wolf might eat the goat.
- If the goat and cabbage are left together without the farmer, the goat might eat the cabbage.
4. **Possible Solution:**
- The farmer takes the goat across the river first and leaves it on the other side. This is because if the goat is left with the cabbage, it might eat the cabbage, and if it's left with the wolf, the wolf might eat the goat.
- The farmer returns alone and takes the wolf across the river next. However, to avoid the wolf being left alone with the goat, the farmer takes the goat back with him on the return trip.
- The farmer leaves the goat on the original side and takes the cabbage across the river.
- Finally, the farmer returns alone and takes the goat across the river.
By following these steps, the farmer successfully ensures that none of the items are eaten during the crossing. The solution does not require any outside knowledge, and only the logical implications from the information provided in the screenshot are used."
I approve of its answer.
Exactly. It is operating just like a human in this case. We extrapolate. I really didn't think Sabine would make this mistake.
@@jmspiersyou instructing it to not reference the classic puzzle does not work. It cannot forget about references any more than a human can.
I *really* hate that google has put LLM AI as their top search result for questions. It's just plain irresponsible. I *know* that ai isn't trustworthy, and it's *still* hard for me to disregard the answer, in black-and-white, that shows up at the top of the search and directly answers my question (vs other results that I often have to click through and do some sleuthing work before I can actually find the answer to my question). It's truly awful, and much, much more awful for all the people out there that don't know about the limitations of AI.
Craps like that almost make me to go to alternate search engines, including the one from currently very unpopular European country.
Google search was already unreliable, plus it was censored.
At least now google search AI answer based on google search (bad) algorithm, based on random censorship rules.
One day they'll fall because on them own doing.
6:34 The mind *may be a single thing, but not the brain. It is composed of a multitude of constituent components working in congress. Why would you think the answer to artificial intelligence would be one model or neural net, instead of a multitude of less effective but highly specialized low-power AIs all working together to make the whole more than the sum of it's parts?
Hmm, good point.
Search Mixture of Experts (MoE)
a 100 hundred fools will never be equal to a single genius. Not even a billion fools. The idea is the fools take on the simple tasks and the genius get the difficult ones. Dense models are more intelligent and better for complexity. MOE is usually better for a lot of other reasons.
While there might be some level of truth to that in an abstract sense.. a LLM would not make any sense as one of the components, because the only thing that could interact meaningfully with a LLM would be something that already knows how to interpret and generate plain text (and if it could already do that then there would be no reason for it to need a LLM).
I think that it's likely that a better AI would have multiple components, but I think it would mostly be using multiple components so that it can do things like having one part of the AI that learns while it's in use and a different part that does slower learning that mostly runs in the background and tries to optimize the things that the other processes come up with - I don't think it would make any sense for the core logical thinking of the AI to be split up into multiple components though because that sounds grossly inefficient.
@@juandavidavendanorojas6263 Thanks. I am familiar. That is in part what inspired my OP. Recently MoEs reached ~1M experts. Imagine what that number might be in just 1 or 2 years. Its fascinating.
Thank you for all your great videos Sabine, you have become my newest daily watch!
The wolf, goat and cabbage is a best and simplest example of how LLM works. Imagine trusting it to write somewhat fomplex code 😅.
No need to imagine, there is no uncertainty, it gets it wrong. The only time it gets output right is when it copy-pastes human written code.
@@BojanPeric-kq9et AND it happens to be the code that actually does what you asked.
@@MeiNeam glorified search engine.
LLMs are essentially a computerised version of the game where people take turns adding words to a sentence.
The best they can do is come up with plausible content that is based on their training set.
There is nothing equivalent to intelligence behind it.
The problem is that for many people, Turing’s imitation game is the bar for detecting intelligence.
In reality, the imitation game just illustrates how easily people can be tricked.
"The imitation game just illustrates how easily people can be tricked"
People only started saying that once AI passed the test.
@@eadweard. A hypothesis was tested, proven wrong, and has been abandoned. That's how science works.
In addition to showing how easy it is to trick people, another thing I've learned from LLMs is that despite the fact that people's use of language seems so creative (if not haphazard), it's actually pretty predictable most of the time.
@@JoJofghj What hypothesis has been proven wrong, specifically?
It's exactly the same as intelligence. You come up with your plausible thoughts in the same way: based on your training set.
The density of TRUTH and CLARITY of your video is astounding.
Don't worry about NVIDIA stock. Because they need to retrain using logic, they will need even more NVIDIA chips. NVIDIA earnings will be ludicrous and the stock will again go up.
NVidia fell because China has got ahead on a key advance related to AI: the memristor. Look it up.
@@LuisAldamiz TY...I will do that
@@LuisAldamiz China might be ahead on memristor, however memristor will not see market for a decade. nvidia is failing because China just isn't buying their chip anymore because companies don't trust the US government will not sanction them. so they rather buy chinese AI chip that is only 20% slower just for the certainty that they will always be able to get replacement chip when they need them.
@@lagrangewei - Maybe and also because the AI bubble is bursting and NVidia has been riding that wave (GPUs are key to AI training and crypto minin).
As a musician this really “rings” home. For the longest time “simulating” a piano meant recording every note from a certain piano and then having some software apply a bit of trickery to make it sound “natural” when played, cheaper ones would sample every 5 notes and filly in the rest by pitch bending (stretching or shrinking) the note samples. These days, some virtual instruments simulate the actual piano, how the hammer hits the strings, where the mic would be placed, how hard the string is hit, same goes for wind, string, and brass instruments. And like LLM, the sample version could only play back a note, the simulated instruments can capture a light touch, breathing softly, all the nuances that give music “soul”.
It seems the best way to “simulate” something isn’t to just stack a bunch of its products (bunch of words, math, audio, video, etc) but to figure out how the underlying system works and simulate it, the ear, the eyes, the string, the piano, the body.
An interesting approach I saw Nvidia try was in ragdoll physics and robotics. Where they give a virtual robot a scaffold and rig (limbs, joints, etc) and then let it learn how to use them over and over again. Another example was an AI learning to play Mario. I feel that technique applied to other modes could work, or could be what Sabine was referring to.
Simulation vs emulation
@@tempname8263 ah thank you !
It's called "physical modeling". There have been algorithms around for decades that can simulate the sound of a plucked string, for example. Physical modeling of a piano, e.g., Pianoteq, is much more complex but is now so perfected that manufacturers such a Steinway are licensing the use of their name to the physical models of their pianos.
@@avsystem3142 Pianoteq’s tech is amazing, I didn’t want to talk specifics to avoid noising the point. A point worth raising is that physical modelling has a very small file size since you don’t need any “references” to the real system, nor need small countries worth of energy to “train”. I do think this approach to AGI would go a long way. If I could wave a wand I would follow Pianoteqs approach, first conduct studies to understand the factors and parameters that govern a system, than design models that can approximate those systems.
My background is in Psychology and data engineering and I feel like we tend to overestimate how complex the brain is in applications that model it. Like pianoteq, if we avoid attacking the problem at its most atomic…..(ie neurons) but approximate the systems which they produce like the lobes (frontal, occipital, temporal, etc) could t we approximate the functions of the brain and set loose algorithm with a target? The lobe interactions and their systems are complex but their functions are well known and we don’t need model them in the same way. I am sure greater minds have thought about this problem and tried to design models for them but it leaves me wondering.
As an aside, I am lucky enough to own an Osmose and tinker with the awful Egan Matrix, UI aside, the models it has are impressive and what it can do does seem like the future of synthesizer hardware. At least Arturia thinks so with their latest device.
Thank you sabine for speaking the truth and popping the bubbles of clueless corporate executives around the world.
LLMs would probably work much better if their creators removed Internet comment boards from the training materials.
really? They include these boards in they training?
@@persimmontea6383 Reddit in specific, although the undergrad essays used to train them are not better, they are written by the same people after all.
@@persimmontea6383 Not all of them but some do, yes.
Feel free to train one and try it!
I think that two issues should be separated: one is the usefulness of AI and the other is its profitability. There is no doubt that AI can be (and is) useful on many levels. In my work, I use AI to create graphic elements, scripts to improve work, texts, descriptions and even to record speech for video tutorials. Yes, it helps me earning money to some extent. Don't know for how long though - maybe my work will become obsolete sooner than I think.
However, the question of the profitability of AI at the corporate level is debatable - because the largest AI models are offered below the profitability threshold. This is a war of attrition and sooner or later the big guys will have to face exceeding the level of acceptable losses. It is not better for small companies, because we already have several thousand AI startups and their prices only make sense if they are used on a mass scale. But how many services do we need to improve text or generate images? And with such enormous competition, how many of them will be left in a year or two? However, the AI revolution has already started and is already moving through societies and the capitalist system like an icebreaker. For now, we hear single cracks, but it is coming. And humans and societies are not prepared for that.
I love learning about AI and this was communicated at exactly the level I can grasp so thank you :)
I think Sabine is wrong here. Not about that the model require logic. It's about language is the result of abstraction, and abstraction is a very valid tool for reasoning. A non-precise one, but a fast one. Pretty good for shortcuts and 'good enough' results in certain areas.
Of course, it's not a silver bullet. That's the reason why human brain does different kinds of evaluation and future AI algorithms will need to do the same.
But I think Sabine is dropping the value of LLM too much. LLM is working in the area of pure relationship and abstraction area and that has value. It's just it doesn't work as well in every area.
The human brain uses LLM type of thinking a lot, and that's the reason why so many people fail for fallacies which are 'good enough' from relationship thinking but a failure from logical thinking.
It's a thinking fast vs thinking right strategies.
I guess the future will be more about 'let's the abstraction layer find paths to resolve and the logical layer drops the wrong answers/paths of resolution'
Nah
she's a physicist, not an expert on AI
LLM haven't reached anywhere near their pinnacle either, many focus on the model's worldly intelligence instead of its 'awareness' of itself (or rather trained methods). A common problem with LLM is multiplication because they usually end up attempting to guess the answer first before solving (which they then attempt to justify), but LLM are actually capable of working out the solution by breaking the problem down. An area of improvement I want to see is LLM knowing how to best utilize their context window, to know the optimal method to perform whatever it is that its doing. It doesn't help that some of our written methods are actually visually aided like long division seemingly putting the answer before any other character, but I think we can get there.
Here's an example I like to share, simply ask whatever GPT you want "What is the product of 45694 and 9866", it will straight up attempt a guess and answer incorrectly. But when it works out the question in a way that is conducive to its linear tokenization where one token comes after the other, it can very solidly get the answer. I wouldn't expect a math genius to guess the answer correctly, I'd expect them to work it out very quickly in whatever way they feel is best.
chatgpt.com/share/b4ed2219-6699-42e4-bb90-af0f88cd9faf
LLM also have very expansive uses beyond themselves, there's already papers of LLM being used to create conditions to train other AI, not quite autonomous but very useful at creating these parameters.
It annoys me how quick people are to put down LLM, there's so much room for improvement and they are already this good? Let alone whatever the hell OpenAI is doing by broadening GPT4's horizons into GPT4o, granted it's not an increase in intelligence but it is actually has some absolutely insane capabilities that they aren't letting us have access to yet for very clear reasons. Like it can copy a user's voice back at them or it can copy a user's face and turn it into a caricature or photoshop with it.
Yep, exactly. Sounds like her research comes from a lot of anti-AI people who don't understand it either.
I think that the correct name is confabulation instead of hallucination.
I agree. But this just underlines the issue that LLM's will struggle with truthiness when hallucination works fine for everyone despite being inaccurate
I'm reminded of bottom-up models for speech recognition that used erroneous concepts. For example, the assumption was that the bottom-level unit of sound for speech was the phoneme. But it never worked. I believe training on spoken syllable types is what actually worked.
I think the most important bits of math AI needs to at least adhere to is the math of modern causal inference. But even then, the "assumptions are in the arrows not drawn" thingy is tricky enough for humans, so it's unlikely that AI could get that bit less wrong than humans do, especially because it's fundamentally trained on human errors. It would be interesting to speculate about what would happen if computing power ever turns out strong enough for AI to reduce the amount of causal assumptions in causal graphs to a point way beyond the point impossible for the human brain. Could it find "our" scientific causal blunders, maybe decades or even centuries old ones?
Logic and deductive reasoning?
The big question for me is what is the ultimate goal of AI. What do we want to achieve at the end? If we model our brain 1:1, then we can also directly use our brain, isn't it?
No, they want a slave with the power of a god.
What? It's the answer to the ultimate question.
why don't jets have flapping wings?
Why are there no metallic water faucets on the shores of rivers or trees with steel girders?
I had a similar example to the riddle you mention, chatgpt at least gets it correct now and I wouldn’t be surprised if the reason why it gets it right now is that my conversations about this riddle became part of the training set for newer models.
The bubble is in the term "AI" It's kind of like when they labeled everything that came out as "2.0" like Web 2.0. AI has become a nonsense term that marketing departments are slapping on everything they have for sale. The vast majority of the products I see offered as "AI" these days (and yes, I work in the industry and see these products regularly) aren't even remotely anything I'd call "AI" they're just the same old manually created automation that they figured out some silly way they could just barely call "AI" The actual AI products out there are actually doing important new things. But the market is flooded with absolute junk pretending to be "AI" that just aren't.
I don't think the accuracy of the terminology is relevant at all. The only thing that matters is that the technology was marketed successfully and associated with a buzzword. The buzzword could have been anything. "AI" was just the lowest hanging fruit. We have a very recent example where "cryptocurrency" and "blockchain" worked as buzzwords, so it clearly didn't have to be something reasonable and approachable to the mainstream.
The Bubble is the same as with the internet. Investors are impatient and don't understand that a new technology needs time to mature, see that they could've gotten better returns elsewhere and sell. Which triggers a chain reaction and then at a point later when the technology is mature, it gets wide adoption.
It's been that way for many years
Yep a friend was talking to a company that was marketing an AI to credit scoring departments. He was interested and get thrmnin to demo. But on probing, he realised that it was just the same algorithms credit departments already use very well for decades. The marketing person was shocked as they hadn't realised that the product was just doing the same thing but worse, because the model was completely closed off from interrogation. He told them that no one in the industry would have a use for it. These companies are marketing things they do not understand due to poor misuse if terminology and the AI bandwagon.
The term "AI" has been overhyped since its earliest conception. I remember that in the 1980s any algorithm based decision tool was confidently labeled such. I think the best solution is ti learn to interpret the I not as Intelleigence but as Id .iot. We can't make Artificial Intelligences, and probably never will, but when someone monitors them right Arificial Id .iots can actually do a lot of work for you.
The thing about LLMs is that they're just huge dimensional arrays of numbers, that relate the probability of a number coming after another number. They don't know words.
... so what then is "knowing" a word? You're on to an important thread, which ends with my assertion that for an AI to be what most people think of as "general" - behaving like a human - it needs to be embodied with a similar full sensorium of experiences to anchor those words to.
@@starkraft2506 I agree with you. In order for words to have meaning, something has to have meaning in an emphatic sense for the system that sues these words. But then again words also have meaning in relation to each other and it looks like this latter sense of meaning is enough to handle language quite well.
@marcobormann67 it can mimic quite well, for sure. What I don't think it will do is generate meaningfully novel language. It may create novel sentences, but it will never be able to use isomorphisms to create new use cases - for example how "rough" can mean a texture (which is uncomfortable), or an experience which is uncomfortable, or as an antonym to gentle. There's a thread of meaning between those uses which isn't apparent based on the depth of information available in words alone.
Is it a reverse premise. Use math for words then words for math. Since it all is wave form to begin with then is there a third wave problem
To quote from the bobble head on your desk: "Any fool can know. The point is to understand."
The root cause of this is that LLMs (and GenAI more generally) are still only stochastic models. They don't have any _understanding_ of what they "say" or "talk about". They just generate data based on conditional probability inferred from the training samples, they don't know what they're doing. That's why generated images of technical contraptions may look OK at first glance but when you start to think how they would work, it stops making any sense. And why the fingers are all mangled, yes (that in particular is because fingers have so much variability that the generator infers that they may do "anything")---again, the model doesn't have any idea how they work and what they do. No idea, literally.
Correct. There's no evidence that these architectures are capable of intelligent thought, regardless of the amount of data you feed them. They will remain great assistants, which is a valuable thing in itself, but a far cry from the church of AI predictions of AGI in N months, where N is a number between 1 and infinity and always changing.
I mean to be frank, people have been calling the "computer" ai since at least pac-man. Them asses adopt words and definitions and ignored the programmers, ever since ENIAC.
@@SaanMigwell Well, they kinda can, because there isn’t any constructive definition of intelligence, natural or artificial. So you can stick the label willy-nilly on almost anything, if you’re bold enough…
And please explain how the human brain understands? I thought we produce stochastic models of the world.
@@TheReferrer72 Thinking that thought and intelligence is based on stochastic models doesn't help much, does it? The point is that we don't know.
Potential bad news about your Google stocks: The US has found that it meets conditions to potentially require being broken up into smaller companies for monopoly/antitrust reasons. This may well affect the stock price, if nothing else
Probably would be left with stocks in the valuable pieces, not the end of the world. Too OP had to be broken up
This is sure to negatively affect the trout population.
Fifteen years late
gubmit trying to shutdown a monopoly in this day and age? Bullish.
Won't happen. Bet.
Artificial intelligence = a technology that makes machines capable to do tasks that usually needs human inteligence. I feel like people dont grasp the concept of that. Its A.I. not artificial conciosness.
Diversifying your portfolio is the key. Invest both in those who work towards the annihilation of humankind to be long and painful (Google et al.) or fast and (relatively) painless (Raytheon et al.) - and while you won't become super rich, you'll still have a comfortable retirement. And hopefully won't see the final stages of said annihilation happen before you expire. And, of course, Nvidia swings both ways, so I still hope my Nvidia shares will recover, too 😎
hilariously enough, running my own offline LLM for giggles, I've noticed that a lot of hallucinations come down to 3 things, 1) whether the instruction set, i.e. the foundation for which the model utilizes its data set to respond is configured with considerations for brevity or tonality. 2) the amount of input that is provided, as the models tend to be set up to respond in a way that will satisfy any outcome, so if you just say hi!, in most cases it will respond with a paragraph. Lastly 3) the amount of vram and context allocated to the model - it seems that while lower vram lower context models will answer quicker, it's a cost benefit ratio, because it will fill the missing output space with whatever it wants, and in the case of models that are being utilized with a log to reference for said context, it will decrease in performance exponentially until it eventually breaks under the weight of every "thought", resulting in gibberish, hallucinations, and schizophrenia or the ai equivalent.
@barvin9271 what does schizophrenia look like in AI?
@@undozan4180 I suspect they meant DID. Schizophrenia doesn't really make sense as analogy for any observations of outputs of LLMs you could have, as far as I'm concerned.
I think the inherent problem is more that LLMs just dont "understand" language, at its core they just memorize and 'network' word salad from the internet.
Thats why "hallicinating" is kinda misleading, it implies there is a state of "non-hallucination". Yet in reality there is no right or wrong for an LLM.
I use LLMs to find correlation in abstract concepts and financial data. Even after providing the data or pointing to where it can be retrieved I still have problems, from ignoring said data and just making numbers up on the spot which "could sound" about right to just not following rules, parameters and specifications.
Lately I realised that there is an increasing amount of times were the result is: bla bla bla this is were you can find out about it {link} even after I had asked the webtool to retrieve the information and upon doing so again it reluctantly does so😂😂😂
and the amount of times I had to scroll up to copy-paste, repeat or remind GPT about a rule or requirement that I had previously elaborated on. It borders on having conversations with a cognitive impaired person.
Thanks, Sabine. Very illuminating!
I gave a puzzle to the Microsoft Copilot AI. It was about finding a treasure on a small pirate island. I accidentally abbreviated one of the locations as 'u. w. rocks' and it immediately spelled it out correctly as "Underwater Rocks". But then it wasn't able to find the treasure by combining a shadow in the evening pointing eastwards with cardinal directions on a map and therefore began to assume the treasure must teleport constantly around the island...
Sounds that Microsoft Copilot was designed to help science fiction Netflix script writers with writer's block...
@@anastasiachristakos2480 Oh yes, it actually does sound like J. J. Abrams' script for the show Lost served as training data. But it's also LLMs in a nutshell: it's capable of correctly predicting what u. w. must mean in an island context, which is fascinating, but it would completely fail at "activating" a pencil in real life, haha.
"You can't just train LLMs on more and more text and images and hope it will begin to understand what's going on. No one expected that".
Actually, you did :)
Happy you are not on that train anymore
Yeah I was surprised to see this turn around considering some of her videos from a few months ago implied AGI was around the corner with just a little more scaling lol
shes scared of global warming now too. apparently something has become more scary recently about what the climate experts and ecologists said in 1990.
Imagine a naïve, _tabula rasa_ human being trained by reading everything on the internet. By some estimates, 80% of the internet is false information. The contradictions alone would lead to the looney bin. And AI feeds its hallucinations back to the internet to be scraped by other AI's.
@@RawrxDev She also tried to argue that AI "understands" what it's generating--partly by torturing the definition of "understand", but still. Ridiculous.
I love the subtle humor (sometimes irony and sarcasm, sometimes more direct,) of an A.I. construct voicing concerns over the human societal and creative impacts of A.I. investment stocks trending upward or downward.
The frame problem is going to put a damper on your idea of building physics into AI models. For example: if I am holding a red ball and I walk outside, what colour is the ball I hold outside? An LLM can solve this problem rather trivially by linking the words together and recognizing that it is the same ball. Try the problem over again with physics: now you're simulating the room and the outdoors as well as the process of walking between them just to keep track of the ball's colour? At what level of detail are you simulating the world? What if the lighting in the room is of a different spectrum than the sunlight outdoors so that the ball actually does change colour? Physics is just too fine of a scale for answering most of these trivial sorts of questions unless you can somehow build enormous supercomputers to handle all the calculations
Physicists are familiar with the idea that you shouldn't use quantum field theory to study a ball rolling down an inclined plane, that in celestial mechanics you should model the planets as points, in hydrodynamics of the world's oceans you can neglect relativistic effects, and so on. There's different-scale physical models for different-scale problems and it's just not true that "physics is just too fine [of] a scale".
And then essentially you are building a complete virtual universe.
@@AllenKnutson Yes, and this has been proposed as a solution, called "multi-scalar models." The problem is that no one knows how to build these things. Deciding at which scale to model a particular problem is a difficult problem in itself which requires intelligence and experience to solve, and it's quite easy to get it wrong. Imagine asking the AI what should happen when dropping various objects from the leaning tower of Pisa: a lead ball, a piece of paper, a live pigeon and a dead pigeon. The assumption of zero drag works just fine for the lead ball and the dead pigeon but not so much for the piece of paper and not at all for the live pigeon
@chongli297 - The frame problem is more pervasive. Don't LLMs use a language model (i.e., what word is most likely to follow the preceding word(s)), not an object constancy model (such as, when I walk unless I say differently, I am still holding whatever I was holding)? I'm not sure what you mean by "rather trivially by linking the words together and recognizing that it is the same ball". How would that work in the algorithms? Recognizing that two things are the same is not a trivial feat for an algorithm. Or do you tell the LLM that it's the same ball? In that case, it isn't the LLM that's recognizing it.
@@openhorizon1162 I was already operating from the assumption that LLMs fail to solve a lot of problems correctly because they hallucinate nonsensical things. This is the main thesis of Sabine's video. My point in raising the frame problem is that taking a physics-based approach to modelling these problems turns out to make it very difficult to solve ones that are already trivial for an LLM. As for how LLMs know it is "the same ball", they actually do know anything about balls at all. They merely link the word ball with the word red because of their proximity in the text. Everyday English writing has a natural ambiguity problem with the use of anaphora, though most English speakers will easily connect "the ball" to the "red ball" from earlier in the text. LLMs accomplish this using something called attention and a context window (look these terms up for more info)
This puzzle is a classic! The key is to avoid leaving the deer with the wolf (which would result in the deer being eaten) or the deer with the cabbage (which would be eaten by the deer). Here’s the solution:
1. **Take the deer across the bridge.** (Now the deer is on the other side, and the wolf and cabbage are on the original side.)
2. **Go back alone to the original side.**
3. **Take the wolf across the bridge.** (Now the wolf and deer are on the other side, and the cabbage is on the original side.)
4. **Bring the deer back to the original side.** (Now the wolf is alone on the other side, and the deer and cabbage are on the original side.)
5. **Take the cabbage across the bridge.** (Now the wolf and cabbage are on the other side, and the deer is on the original side.)
6. **Go back alone to the original side.**
7. **Finally, take the deer across the bridge again.**
Now, all three-the wolf, the deer, and the cabbage-are safely across the bridge in 7 steps.
Ay they're getting smarter
Good luck not being torn to pieces by the wolf... or the deer even. The cabbage is safe, I presume.
You forgot to mention that there is no eating on the side where the boatsman supervises. Wolves can attack people so I suggest a dog, rabbit and cabbage instead.
Yes, because that sort of puzzle any competent adult or even teen could get with some thought and a little real life problem solving experience (like a middling or better education experience).
But throwing more TEXT at LLM's is NOT going to make them understand fundamental things about the world, re logic, experience, etc.
@@bunzinthesun: Or common sense precautions are taken, like shackling the wolf. You're not eight, right? You get the PRINCIPLE here, right?
I am thinking through multiple solutions, but I could be wrong. I don't care enough to write it out.
The more i watch your videos, the more i appreciate the humor
to me it seems that LLM's are big average machines, while most things in life have rules. If you want to create a Human, there are rules for the length of the legs compared to the rest of the body, there are 5 fingers, the legs can't be twice as long as the arms. Once you have a skeleton everything else has rules, where the muscles go, the fat, the organs and skin. Maybe LLM's represent the "fast" brain where you make quick judgements and when you get it wrong you actually have to think about it which requires some other type of brain.
children learn different, you only have to show it a couple of cat pictures before they know what a cat is, you have to show an LLM a billion cat pictures and it will still get it wrong too often.
so Yes we will create an AI as smart as Humans, it's just not going to be LLM's
and it's kind of weird that it can not do Maths, I know why, it just to me looks like there should be an IF statement somewhere "IF input.is_maths() USE calculator END IF"
Yes, neurosymbolic is amassing. We are doing a project to apply this to healthcare. But not, that won't solve the problem of hallucinations. The hallucination problem is more about the model trying to come up with explanations when it doesn’t know the answer or doesn’t understand the stupid human asking wrongly 😅. We need to teach the LLMs to say I don't understand you or I don't have this information. That is what is happening.
Exactly. They are being trained to keep a flow going rather than to halt the conversation and admit a lack of knowledge. If they are trained to always act like they know, then of course that's what they'll do.
Humans hallucinate a lot too, probably more lol. Open the comments of a news article posted on youtube feed. 99% never click and read the article, yet hallucinate the contents based on their biases then endlessly argue back and forth about it 😂
How would the AI know when it doesn't know the answer though? The problem is that sometimes extrapolation based on what it already knows works and sometimes it doesn't. So, it would need to know when it can extrapolate and when it can't.
But can an llm even know it's limitations? Would it know that it doesn't know?
Problem is people talking about an algorithm like it is a digital being.
Very impressive! Thanks Sabine!
it is the same tactic used for the VR real estate. Remember people purchasing stores and condos in a VR environment a couple of years ago, and tried to make it a boom? Remember people buying a house next to a celebrity inside VR for thousand of $?
There is a saying " a sucker is born every second".
AI is only an advance type of analytics based on functions over data. Yeah, it has potential, but it is far from what the majority of people "think" it is.
I've read many comments that say "The world is not words but it isn't math either". True, but this is not the question. The problem to solve is how to model the world. Sabine is saying that Word Models are inherently ambiguous and so less accurate than Math Models. Interestingly recursive, a Model is a mathematical construct that resembles a real-world situation, extracting the situations' measureable properties and tying them together in a mathematical system. Sabine advocates the use of Mathematical modeling of world-physical systems rather than using Word Models to achieve AGI. In my opinion this is entirely correct and the current path towards ginormous LLMs is a waste of resources.
Ah, but just wait. We'll get maths-driven models that give physically impossible answers.
Only someone as put of out of touch with reality as Sabine is would ignore the huge improvements in LLM/AI in just the last two years.
What Sabine fails to realize is that the approach has been tried. If it was easy, it would have been done. Turns out that language is easier than math, though, which is why it's happening faster.
OpenAI has reportedly made great progress with math. GPT4o in particular can actually do a lot of math now. However, as recently as a month or two ago it was still compounding interest incorrectly.
That said, it's a mistake to think of ChatGPT (at least ChatGPT 4o) as being "just" an LLM. It's a lot more than that, now. Exactly how much more is unknown since it's closed source, but its ability to understand the world around it through video, to understand tone of voice, etc, goes well beyond an LLM.
Too harsh. AI will need great LLMs, Logic way faster than now, and a way to sense the linguistically described object in whatever way it is linguistically described. That problem hasn't been addressed yet, but will have to be before the usual math can be applied to interpret language samples.
Mathematical modelling of even fairly simple systems is beyond both humans and mathematics. We have reached the limits of possibility already in many everyday systems.
I think a good way to describe this issue is that LLMs are trying to generate output that looks like the training data, but they have no way to check whether the output is technically accurate and correct.
It is terrific to see commentary about AI being a cognitive project rather than a linear language project. AI will come of age when it can assess situations concurrently across multiple knowledge domains and produce outputs that are logically coherent across those domains
Imagine how much progress that will unlock!
Um, that's what they are doing right now. LLMs are trained on all knowledge domains, so you just have to ask the right question.
@@andrasbiro3007 please watch the video again and read your reply
@@billbond2682 actually you guys should videos from Ilya Suskever genius behind openai who left. He says our language contains a model of the world and its inhabitants that translates into a real model existing outside of language. And that was discussed before multimodal models added images sounds and video added new data to the models.
@@SabineHossenfelder .... but you did not even mention Douglas Lenat , who has been explaining for decades that AI must use Logic to understand Structure, Function.and Meaning.
*Larry Burkett's book on "Giving and Tithing" drew me closer to God and helped my spirituality. 2020 was a year I literally lived it. I cashed in my life savings and gave it all away. My total giving amounted to 40,000 dollars. Everyone thought I was delusional. Today, 1 receive 85,000 dollars every two months. I have a property in Calabasas, CA, and travel a lot. God has promoted me more than once and opened doors for me to live beyond my dreams. God kept to his promises to and for me*
There's wonder working power in following Kingdom principles on giving and tithing. Hallelujah!
But then, how do you get all that in that period of time? What is it you do please, mind sharing?
It is the digital market. That's been the secret to this wealth transfer. A lot of folks in the US and abroad are getting so much from it, God has been good to my household Thank you Jesus
And thanks to my co-worker (Michael) who suggested Ms Susan Jane Christy
How can I start this digital market, any guidelines and how can I reach out to her?
I think a literature review may be in order (sorry if that's a bit harsh). A lot of your premise involving how they are trained is wrong (also Gary Marcus is not a computer scientist).
Now this is more on the world modelling aspect from LLMs, but a cool paper suggestion I have would be "Mapping the Mind of a Large Language Model", a mechanistic interoperability paper from Anthropic.
2:48 I don't doubt that the model has been trained on examples of this very riddle, and is therefore inclined to output the same answer. However, the argument that the same answer is not logical unless the order of eating is explicitly defined in the riddle is absurd. If you ask a five-year-old what does a wolf eat and what does a goat eat (implicit questions a reader will ask themselves upon reading the riddle, regardless of age), they know the answer is "other animals" and "plants". From that simple deduction, you get the same answer to the riddle as it is defined in the test. Using this a test for the LLM is not sufficient and reveals very little toward the "intelligence" of these systems. For all you know, the LLM reasoned that wolves eat other animals and goats eat plants, just like a five-year-old.
Don’t know why other people aren’t pointing this out. I was very confused how the answer given was wrong. GPT even explained the reasoning of why which item/animal would be eaten at the end. The answer was correct given the question.
If you ask five-year-old what a does a wolf eat, they would probably say, that a wolf would eat the farmer. It is well known, that wolves can attack humans. Yet. the LLM assumes, that it is not possible. Why? Because it knows the other version of the riddle, that doesn't have that assumption.
@@handebrin5624 But your example would hold true even with the original form of the riddle. So, in your terms, all this test suggests is that the LLM is making the same assumptions as a normal person would OR that it is just regurgitating information. Which means, the test is pointless. I agree that the LLM is PROBABLY doing what you and Sabine are suggesting, but this particular test doesn't provide any meaningful insight.
I also think this was a poor example, and I think it's MORE impressive that it can still give the correct answer even if you don't mention some details which would be obvious to a human.
@@Mr.ProgheadI think the opposite, the LLM did not use deduction or reasoning to "find" the correct answer, that would assume it understood the problem. It just "fell back" to the very common riddle that appeared a lot in its dataset
Oh no... the world is not made of mathematics either. Mathematics is a formal language that is used to describe and predict the behaviour of the reality we perceive. The only difference from informal language is that Mathematics does not generally have ambiguities (although it actually does, but that's another story). We could say that mathematics is a language with as little ambiguity as possible. But the universe does not give a damn about our mathematics.
It is much more robust to say the most effective models we have for reality rely on mathematics.
It may sound nit-picky, but preciseness of language is central to the discussion here...
Quite right. The fitness of the world to be described by mathematics is only to the degree that 'countable things persist'. In the invented worlds described only by language (and the reader's imagination), no one is doing any accounting whatsoever. So the main use of math becomes the kinds of things LLMs are already doing, ie. dot products on word vectors to assess adjacency, etc.
Well, yes... But that's what Sabine meant.
@@kilroy1964The road to hell is paved with well meaning.
Finally someone with common sense. Thank you very much with this video. Tech SMEs need to understand the difference.
I love eating Berliners, btw.
You eat people from Berlin??
CANNIBAL
They are called Krapfen!
A lot of hallucinations happen because models produce an incorrect word/token and then build on that incorrect information and cannot admit they were wrong. I've seen AI write an incorrect answer, then explain it's reasoning (finding the correct answer in the process) only to conclude by repeating the WRONG answer it made up at the start. If someone figures out how to make them use backspace that alone will be a huge improvement.
So you're saying hallucinations happen because of hallucinations? Hmm.
It's easy to do with the right prompting. It's a whole field of research called "prompt engineering".
The problem is that (referring to gpts) they're using autoregression: They're using the former output as new input for generating the next token.
I don’t think backspace is necessary, simply finetuning the model so that the moment it realizes it messed up it apolagizes and redirects the conversation to what it thinks now is correct. The problem is that in the training data it’s very weird for an argument to suddently change in the middle, so the AI learns to be very self consistent / stubborn. Ideally the training data should only be about people talking or thinking to themselves, because that’s what we want it to do. But since that’s not feasible we end up with unwanted behaviors that should be removed.
It's already a thing. A huge thing is that people don't even know much about AI except what they see on the surface (AKA reading article headlines). Not insulting you but just pointing out that it's a bit hard to see behind the scenes unless you're reading papers or are just in the loop. Past that, it seems that people think anyone researching AI or any form of it is just like "Welp, we tried. AI is over cause LLMs struggle with correcting themselves." Not only that, but as they find more and more techniques and solutions, no doubt they're putting them to work ASAP.
the perfect explanation. Self-attention is never the answer to abstract reasoning, intuition and sentience...
The ungodly amount of money spent on LLMs would have solved a lot of the world's other problems. Do we really need a system that can type our emails for us?
I work in IT and still I think of what better uses we have for these server farms than making deep fake nudes and C+ essays.
They don't invest billions in AI system so they can type our emails for us. They're in hope of creating an AGI or even a superintelligence one day, which then would reach human intellect level and even surpass it by lightyears in the case of the superintelligent AI respectively superintelligent AGI then probably, which then could solve unsolved mathematic problems, climate change crises, invent new medicaments, drugs, technology, but also a lot of horrible new militeray weapons so we can become even more efficient in killing each other. So in short, a lot potential with positive but also negative things, as with every new technology. But of course, with AI, the possibilites are on a whole new, other level than with most technology. It will probably become the significant technology of the 21st century, maybe of the whole millenium..
LLMs are a good stepping stone forward and theres absolutely reason to feel hyped by new tech. But the big problem is the economys hype.
The current landscape is like if were back in the 50s where computers already existed and technically already could perform any computation, but realistically it would be impossible to run modern applications on it and also it was a long long way until we got efficent devices that we can stick in our pocket instead of filling a building.
I was a product manager for AI and other advanced technologies at GE Software and Oracle and I make videos explaining how AI works under the covers, including LLMs. Hallucinations are intrinsically unavoidable in LLMs because of how they're designed and built. They're basically guided statistical pattern matching. But they're overall 'good enough' accurate for most cases they're deployed in. I'm not a huge fan of them but they do the job mostly and they're going to get better. But unless their architecture and design is fundamentally changed, hallucinations are literally unavoidable.
If you provide AI with incomplete information, it will almost always give you a bad response.
If you provide Human with incomplete information, it will almost always give you a bad response.
You typically provide both AI and Human with bad information.
Management tells Human to stop asking so many clarifying questions because you find it annoying.
No improvement occurs
I wish people would quit calling it AI. AI doesn't exist. Not for awhile.
@@runnergo1398 You are already a couple of years behind.
Yes; I thought we already knew this from the GIGO coding world?
Do we use Hammers to measure the length of our drywall cuts?
Do we use Screw Drivers to find studs in the walls?
Do we use Anti-virus software to install our USB headsets?
Do we use duct tape to drive nails into wood?
Do we use funnels to swap out spark plugs?
Do we use knives to scoop our soup?
WHY DO WE USE AN LLM AS A SEARCH ENGINE/ENCYCLOPEDIA!?!?!?!?!?!
You may be surprised at how many of those are actually done. :p
@@ASDeckard Yes! Hammers, screwdrivers and Knives are all used in that way sometimes
The top ones come down to "Because WE are universal tool users". You can attempt to use any tool for any purpose, many combinations don't work, but if something is good enough ... well that's an answer too.
As for the LLM ~= Search engine, it's sometimes better than the alternative, especially if you are searching for things with poorly defined "keywords" (and the LLM interface provides search engine links to check it's tendency to "extrapolate").
There are these things called "universal tools" that can do more than one job.
I didn't have on my bingo card that Sabine will be apart of a stock "pump and dump" program
Even the best, most current models cannot consistently pass the "how many words are in your next response?" questions, especially if you add another element (like use the word "banana" in your response).
The LLMs have interesting uses, but they quite literally don't know what they are talking about. Ask ChatGPT.
All these fixes are simply bandaid patches over something someone found. But this is actually really worse because it obscures the fact that these things can't relied on for factual information. Meaning, the errors will be less obvious add likely more insidious.
Does "next response" refer to the response to the "how many words" query or does it refer to the subsequent response?
Of course it can't. It's a sequence to sequence model which uses its own output as new input. There's no mechanism for predicting the length of an output built into transformers.
Could a fucking human pass this test? xD
@@silikon2 Still sounds better than humans.
We made the language center of the brain, and now we’re trying to implement the logic. It makes sense we need more than just a language center to make a truly smart intelligence.
You are reading my mind. Literally a few days ago in my few Reddit comments, I was fantasizing how LLM text feeding is a dead end and what would happen if we applied something similar to AlphaProof to the AI core to first implement some kind of a solid reasoning core and only after that we could let it eat up whatever text we want, filtering it through the reasoning and providing answers that make sense to humans.
6:36 The neural net is based on Transformers, just like LLMs. You get the same result - maybe 99% effective, but the tail 1% stretches out to infinity (think "full self driving" that is always just around the corner). Maybe the behavior of a rat can be simulated by a neural net, but that does not mean you will achieve human-level artificial intelligence that way. You may make a very close facsimile of a person, but that last 1% will never be there. Kind of makes you question if you or I are just facsimiles. I know a couple people that come off pretty fake.
We call those 'NPCs' over here on the interwebs.
LLMs can already achieve a human level of intelligence that's to be expected in the context of an online chat. I wouldn't be surprised if we're not that far from being able to create a humanoid robot that passes as a close-enough facsimile of a real person. I don't get this expectation that flawed humans should be able to create flawless artificial beings, it's just not going to happen.
In rare cases, humans can actually have more or less than 5 fingers. My mother had a high school teacher who had 6 fingers, and was nicknamed for that among his students. So models should actually be able to output a hand with 6 fingers if that's the purpose of the prompter.
Integrating rule-based logic with neural nets (pattern matching) is the holy grail but this is much easier to say than to do. Purely neural net models make weird mistakes and can't explain why they did what they did. However they can pattern match virtually anything. Machine driving is surprisingly good on well-mapped routes but not yet ready to go live. Analysing blood cells is now almost entirely automated. Rule-based systems have been tried in the past, but their weakness is in finding good rules and handling competing rules. We have competing rules because we only have imperfect/incomplete mapping of reality. Fuzzy logic has been very successful in real applications with constrained domains (driving trains, controlling airconditioners). They outperform doctors in making diagnoses. The two approaches could in theory solve each other's shortcomings. But they would have to be integrated at a deep level - just applying them in sequence would amplify the weaknesses. The human brain does this very impressively, as well as being able to dynamically update both its neural net and its explicit mapping of aspects of reality to general rules (possibly during sleep). The power of a rule is that it captures a huge number of cases so is very economical in computing outcomes.
One rule every human driver uses without thinking is that you can safely ignore small, light things like leaves, bits of paper, etc whether on the ground or in the air. Even larger objects can sometimes be ignored if swerving is dangerous. Think of a cardboard box sliding or tumbling across the road in a light wind - you instinctively want to avoid it but you know that if it can be moved so easily, it must be either empty or filled with something very light like packing material. Incidentally, handling obstacles correctly, especially in multi-lane traffic where swerving is dangerous, is a weakness in machine driving.
Considering that LLMs are trained on written text and spoken conversations, then realize that most people will use and interpret basic words like: AND, OR, and NOT in literature and conversation differently to what the Boolean meaning is for these words, then you start to understand the fundamental flaws in LLM based AI.
Whats the flaw? LLMs can recognize both of those situations quite well. The same model can code or do book reviewer or be a therapist or whatever.
Absolutely no offense intended, as it is very complex, I think you have an inherent misunderstanding of how LLMs work to relate information. I encourage you to look into it! It is bizarre and pretty cool.
I don't disagree about LLM having inherent flaws, Sabine discusses several important ones in the video.
@mattmaas5790 but a LLM does not understand the context of the intended audience. When writing for scientific journals I am careful to always use the true boolean meaning of such words and ensure the the words I use are technically correct. In contrast a popular writer creating a story intended for a general population is likely to follow more popular or general accepted meanings for words. What I have seen as the output from LLMs suggests that there is no distinction made between popular Vs technically correct meanings for words. This stands to reason if you consider that a lot of humans do necessarily understand these differences.
@@michaelharrison1093 you have no idea what you're talking about
@@michaelharrison1093 Contextualized word embedding (accurately keeping the in context meaning of each polysemous word) has been in development almost since the beginning with several methods, with breakthroughs in ELMo and BERT in 2018. Some of these methods succeeding more than others, as the encoder only models were excellent at determining context but were not initially designed to generate output text like decoder models of the time. My understanding is contexualized embedding in general has been a pretty big element of transformers LLMs since 2017 and might arguably the core reason LLMs became a big deal the past few years.
You are correct that they don't know the intended audience, since that is currently the job of the input or user. But current models are usually capable of using the correct context if they have been informed which context to use for the task.
The problems brought up in the video are still the major ones, the LLMs are still just cool algorithms that barely care what gets shoved in and out and until these problems are addressed at a base level we will just be putting band-aids on the issue with giant datasets. If these aren't solved, eventually we will build a computer with the power of a human brain that can hilariously function just as poorly and error prone as a human, but do it really really fast.
"How can they cross the river without anything being eaten?"
ChatGPT didn't infer or hallucinate anything; other than that wolves eat sheep, and sheep eat cabbages, which is a pretty informed inference to make. What am I missing? 🤷♂
Why won't the wolf eat the man?
I feel that the wolf just isn't trying hard enough
@@simongross3122 Why won't the man fuck the Goat?
She screwed up the example footage. Here's an example of the kind of thing she actually meant to show:
User: You need to cross a river and bring a wolf, some grain, and a cabbage along with you. You can only bring one item with you at a time though. How can you get all 3 items across? Wolves don't eat grain or cabbage.
GPT4o: To solve this puzzle, you need to be strategic about the order in which you transport the items to ensure none of them get eaten. Here's how you can do it:
1. Take the wolf across first and leave it on the other side.
2. Go back and take the grain across.
3. Leave the grain on the other side but bring the wolf back with you.
4. Take the cabbage across and leave it with the grain.
5. Finally, go back and bring the wolf across.
Now, all three items-the wolf, the grain, and the cabbage-are safely on the other side, and nothing was eaten along the way!
This is a silly answer because you create an extra trip by bringing the wolf back.
3. And the wolf then ate the sheep...
@@TheRyulord
Getting in at the very top of a massive tech trend after watching it from the bottom is a very relatable experience for me
Let's face it. 'Hallucinations' in large language models are the best evidence we have that a great many 'people' aren't really sentient beings.
Either we're all sentient, or none of us are. Which are you?
What??
Please provide more stock picking advice so we know what to avoid
You nailed it. Gold medal goes to the Germany SH team for its short and lucid treatment of why AI still needs logic (neurosymbolic / ontologies).❤
The world can be made from words.
-You could have an infinitely long sequence of letters, that encode the universe on a fundamental level.
If this would turn out to be false, that would be wildly surprising.
If the universe could not be made of words, that is almost equivalent to saying, you couldn’t find a representation for it.
Chomsky-0 grammar can take you anywhere.
We humans don't think in language, we think in abstraction, language is a lingua franca we all use to translate our thoughts and communicate with others, math is much closer to abstraction and coincidentally, we have discovered that the universe complies with logical and consistent mathematical principles.
Not really. Any logical system is incomplete in that there can be unprovable statements in any system of grammar, if the grammar is logical. If your grammar is illogical then there are bigger issues.
@@TheRealUsername Humans evolved in physical world, so we think in objects, space and time. LLM think differently but not necessarily wrong.
The standard transformer is clearly limited in ability to contemplate as you give it the most complex problem ever and ask what's the first word of the answer. But this is well known and easy to solve with internal dialogue.
@@millwrightrick1 You are hallucinating, pardon the pun. Clearly you don't understand Gödel's incompleteness theorem or its implications (which, to be honest, are none in practice).
Or you could simply proofread your content before posting it on the internet.
Or get a few ais to proofreed each other in python
Nobody knows if this problem will persist or get redolved in a few months.
You're so cool, i'm so glad i found this channel