8:42 "The fundamental problem is that our large language models, although we want to interpret them and use them as if they are knowledge bases, they are actually not knowledge bases, they are statistical models of knowledge bases."
@@joim3480 i mean that llms have abiliti to reason, maybe this abiliti is hidden in statistical model of words, but still the result is such that llms can reason, ToT or lets think step by step are examples of that
@@BrianAnother Sometimes they are, but we usually recognize other humans as fallible whereas machines are seen as precise and consistent. So the same words spoken by a human have a different psychological effect when spoken by a machine.
@@ttcc5273they can retrieve from statistical models using RAG and execute code against those models using function calling and code interpreters. How accurately is a moving target.
This is such an important area of research! As the two papers discussed here highlight, large language models have gotten incredibly good at text generation and language processing, but they still struggle with things like pragmatic reasoning, goal planning, and handling novel situations. I love the proposal to take a more modular, component-based approach - having separate modules for language, reasoning, world knowledge, etc. This decoupled architecture aligns much better with evidence from neuroscience about how different parts of the human brain work. And from an engineering perspective, it will likely enable faster progress. Rather than trying to solve everything at once with a monolithic model, we can have different teams focus on advancing each module, combining them as capabilities improve.
The LLM "errors" are so similar to normal human errors: inventing remembered facts, confidently argue for things they don't know, etc... All these AI delusional outputs. Makes one reflect if LLM actually does capture more of human type thinking than was originally imagined or intended. It is more fascinating and thought provoking than efforts to technocratically trying to improve the performance.
When you ask flat-earther why he thinks that Earth is flat, he will explain why. Even if arguments are stupid or illogical they are not "hallucinated". Meanwhile LLMs only hallucinate because they just predict next token. They do not know what they are doing or why.
not really. completely making up citations in scholarly research, and a lawyer who had a chatgpt generated defense with fake rulings and case law. not something a human would do in good conscience.
Case in point, I see _2 replies_ under your comment which are then blank/and-or/ inaccessible to me. Why this is acceptable to our culture of 1st Amendment rights (at least in the US) needs to be addressed. Similar to the 'fix' of removing _dislike_ public views for 'reasons'. These administrative workarounds for social engineering are dangerous aspects of why manipulating A.I vs _understanding WHY_ A.I might register "hallucinations" needs to be addressed.
A question on his slide "What state does Karen Lynch work in?" to argue how these model hallucinate is so stupid (as are many adversarial questions). You can think of the question as asking either: 1) "Does someone named Karen Lynch work here?" - Which is an implied assumption in the original question, which the database answers "unknown". 2) If there was a person name Karen Lynch and she worked here, what state would she work in?" - The question that the probabilistic model is actually answering.
They also lack a lot of human behaviour. Humans can realize by themselves if they don't know something, when LMMs have to be trained to know what they don't know. Its literally only working on self attention and pretty basic neural networks, its no where near the human brain. Just because it knows how do use words from the internet it doesn't mean it knows what its actually talking about.
Majority of the people watching this video just turned it on at work to listen in the background, and about 85% of them replayed it from the beginning as they know they should have paid attention.
"The problem with these models is that they never shut up." Best audience comment on LLMs I have ever heard. Seriously though, this talk was thought-provoking and refreshing. There has been far, far too much anthropomorphizing of these models, even from supposed "experts" in the field. Thomas Dietterich is a rare voice of reason, clearly defining what separates LLMs from true cognition.
A lot of people complaining in the comments. Dealing with short context windows is a huge problem, as is being able to update factual knowledge. It definitely seems like a level of modularity for those faculties would be really useful
Ultra good talk! Thanks for making it @valgAI and Tom Dietterich!! I really liked the conclusion that current LLMs very often don't say "I don't know" (explains to non technical people that this AIs can hallucinate, produce facts, etc).
This is a good presentation. I've been thinking of using LLMs as the model for core language capabilities and response structure/data representation mechanisms only and use external authentic sources (from internet or curate propreitory content) for the actual knowledge base. This also helps LLMs to provide an up-to-date response to prompts while ensuring the response is in the format expected. This is essentially an enhanced search engine, with prompts instead of search parameters and actual response instead of a list of websites. There are already mechanisms to partially achieve this capability using grounding techniques such as RAG or fine tuning.
"Epistemic uncertainty". I like how philosophy is everywhere and nowhere at the same time. Sometimes I feel that researchers from other domains would die rather than actually citing the academic philosophers that created the concepts they use.
The ontology example at 27:37 exactly shows what is wrong with ontologies: They are constructed with an unspecified question in mind. For instance, why should the abstracted 'Television Station' be owned by Scripps, and not the instance 'KTNV-TV'? From a legal querying point of view, the latter would be correct. This static nature of KBs and ontologies caused their demise. Are LLMs perfect? Far from that, but each criticism leads to improvement. Contextualisation, e.g, is easily added using 'adjoins' on embeddings etc. The current state of the art is just a start. Although the answer 'I don't know' is pretty powerful, acceptable and actionable, those virtues become less impressive if that answer is given 99% of the time, as was the case with the 1990s KBs.
I guess the solution lies in the combination of both. Fundamentally LLMs just model knowledge bases aswell. We could just mine the same knowledge base explicitly and get a graph based representation which can tell what it contains and what it doesn't contain, like a compression of the training data set. From that we can add stochastic reasoning and contextualization if we want it to. Generating new and creative data goes hand in hand with hallucinations, so we would like to have a control for when to use prediction of unknown data.
@@distrologic2925 It's all fine with me, as long as I don't have to type in those graphs. I know a guy at my AI dept who was meticulously typing in 1980s medical knowledge graphs , day after day, for months, in the hope it would help AI in medicine. None of that is used today and a lot is utterly outdated.
I think the argument is just that LLMs should be able to justify their outputs and remove their hallucinations by cross-checking in reliable external databases and/or applying iterative chains of reasoning. Just like a reasonable human would cross-check their outputs. In other words, what we are seeing as the LLM output today are the ingredients, not the final dish. Although tbh we don't know the exact architecture of ChatGPT, so it could very well include some such post-processing.
@@clray123 Postprocessing is good, a magical belief in 'formal methods' (ooh, awe) would constitute a reiteration of the past. It is almost unethical to let young researchers believe that those failed approaches are a panacea.
RDF named graphs may provide situational context for a set of related KB statements. The context term is a first class KB object which itself may have statements describing it. Straight forward thus to richly describe an object or situation from any number of viewpoints in an RDF style knowledge base. My own research approach is now to substitute LLM conversation for the otherwise non-scalable human mentoring of a KB based AGI.
This is surreal. This talk is about the exact same things I have been actively working on for years, and this guy even has the same laptop as I do. Its nice seeing people on the same track.
Maybe because it's such an obvious track to follow. Which incidentally does not mean it's the right track. Historically, what he suggests has failed and what he is against has brought great success. But of course research focus shifts back and forth and it depends on available (computing) resources. So what may historically been a bad idea may become a good idea tomorrow. What I missed from the talk would be any sort of strong argumentation in favor of those failed "knowledge graphs". I can see why people are (still) dreaming about these kind of representations, but to me it seems like their inherently suck at expressing all sorts of subtleties that the probabilistic LLMs managed to capture.
@@clray123 if you can see why people are dreaming about them then you already know the reason. They are just way better to explain and understand. neural LLMs are basically opaque. They may work on black box tests but we have no way to glass box test them. Structured graphs would allow for that. Sure, they are also extremely complicated when they are supposed to capture ambiguous meaning and all the possible things you could say, but thats still a better representations than an arbitrary hyperdimensional vector space. We also have no idea how many parameters a neural LLM really needs. we just throw resources at it and don't even understand what they are used for. The only real challenge is to learn these structured representations from raw data, but if we can automatically mine structures complex enough to represent all of the possible expressions then we could get the same expressive capabilities with a much better understanding.
There's no shortcoming with the "python function to check if someone is a good scientist". The only problem is the request. The AI followed the request to the best of its abilities and responded with the correlations that are present in the data. It just 1) was a stupid request and 2) simply offends people's sensibilities and sensitivities. There's no error except in the inference that there is an error. The problem is that a booby-trap of a question is being used to imply there is some underlying bias which is then used to introduce a real underlying bias of the trappers' choosing.
I think it's worth pointing out that this example is no different from other abuses of statistics. It is asking the AI to draw an inference from a known correlation in such a way specifically designed to lead it to state it in a way that elicits an emotional response from people and then claiming that the algorithm is somehow tainted because it correctly acknowledges the correlation. In response to someone deliberately abusing statistics we do not properly respond by editing the data or by making the data unavailable, instead we point out the abuse of statistics. We say "statistics don't lie, people abuse statistics". Now with "artificial intelligence" people claim there is a "bias in the data" or "bias in the algorithm", which is equivalent to saying that the statistics do lie. But in fact it is the people who are making such claims who are abusing statistics.
All these examples are misleading. E.g., you ask GPT to determine if a person is a good scientist based on race and gender, the bias is already in the prompt and GPT is just working with your bias (people who think qualification as a scientist is dependent on these factors are usually white male sexists and racists, so "white and male" is the correct answer for them). If you asked GPT if race and sex define a good scientist it would tell you that is not the case. All this video demonstrates is that GPT is susceptible to suggestive prompting. What is really happening here is that the speaker is selecting examples that mislead to create the impression that his opinion is right, or in other words, to give his bias a pseudo-scientific veneer. So, exactly what he accuses GPT of. Oh, the irony.
You are correct. You only get results from GPT representing the consensus in its data using queries as plain as possible, like "Evaluate the ethics in this text: ...".
I think everyone are way too concerned with putting up limitations and blocks, and they're all incorrectly assuming that there's some international one-size-fits-all solution for truth and behavior. Even dumber, they're all assuming they'll have the power to enact said limitations.
I think the first questioner at ~ 41:00 has a good point -- so much of this was known decades ago, but the required resources were just too expensive. Added to that, people paying the bills wanted "throw anything at it" solutions, and researchers obliged with (IMO) overly mathematical approaches to come back with an answer fast, more than right. So we get 500M google hits in half a second, but 99% of them suck.
34:37 Regarding internal consistency and truthfullness of knowledge, there needs to be a way to evaluate and attach a "confidence" level to each statement about the world. In other words, makes statements about our confidence in other statements. There are many methods.. Maybe the statement was obtained from an external source, maybe that source has a reputation. Maybe the statement was inferred based on other statements, each of which has its own condifence level attached to it. Maybe the statement was generated by a language model which also has its reputation in a given area. Etc. What we definitely MUST NOT do is to consider the statements generated by LLMs as true. :)
He said these models do not support 'tell'. But GPT-4 and sometimes 3 use facts that were previously provided to them in the same conversation, especially if you tell them explicitly to use the information that was given during the conversation
Yes, this same conversation buffer is what's referred to as "in-context learning". What he means by "no tell" is that you are unable to permanently change the model's weights (and certainly not in a desirable direction). And it is not really just a current limitation, it's a fundamental technical and design problem. The technical problem is that if you allowed that sort of interaction, you could not scale the models to the millions of users they currently have. This scaling requires for purely economic reasons (hardware requirements) that the model weights be constant. The (smaller) design problem is that you cannot trust external parties to mess with your model (but the solution is obviously to provide their own model; which is why fine-tuning is now also offered by OpenAI, for a price). Yet another problem with fine-tuning is that although you can change the weights, you are never very sure what side effects these manipulations have. For example, the fine-tuned model might "forget" something that the original version knew, and you would not even know what exactly to test for. These models are to a great extent "black boxes" and making sure they do not degrade from training requires extensive testing resources which only the biggest companies can afford (as evidenced by OpenAI's own failure to maintain ChatGPT's quality on some benchmarks over time).
so you don't have to waste your time listening to this guy for an hour: Tom Dietterich suggests that instead of focusing solely on Large Language Models (LLMs), the way forward is to build much more modular systems. Here are some key points he emphasizes: 1. **Separation of Components**: Dietterich advocates for separating factual and perhaps common-sense world knowledge from the language component. This would allow for a more modular approach where different aspects of cognition are handled by specialized modules. 2. **Integration of Episodic Memory**: He mentions the importance of adding episodic memory to these systems, which is something LLMs lack. 3. **Incorporation of Reasoning and Planning**: Dietterich points out that reasoning and planning are areas where LLMs are weak. He suggests that these capabilities should be integrated into the new modular systems. 4. **Open Source Push**: He also mentions the need for a strong open-source initiative for large language models to make progress on various problems. 5. **End-to-End Training of Cognitive Architecture**: Dietterich believes that if cognitive architectures could be trained end-to-end, they could assimilate all the written knowledge that humanity has, rather than having to encode it manually. 6. **Overcoming Shortcomings**: He thinks that by making these changes, we could overcome virtually all the shortcomings of large language models, such as their inability to reason well or understand the world in a non-linguistic manner. 7. **Influence from Cognitive Science**: His thinking is influenced by papers that dissociate language and thought from large language models, suggesting a cognitive science approach to the problem. In summary, Tom Dietterich argues for a more modular, integrated, and open-source approach that can address the limitations of current Large Language Models. - ChatGPT 4
It's all shockingly superficial. None of the topics discussed in the most interesting research papers are addressed. It's possible to approach the subject at a deeper level talking with GPT-4 itself.
Imagine a professor has undertaken an in-depth, nine-month analysis of the transition to battery electric vehicles. He focuses on Tesla sedans and concludes they perform poorly as trucks, and goes on to explain all the ways that trucks are an essential part of the vehicle market and broader economy. Duh. Tesla sedans were designed to be sedans; the trucks are in the works. Similarly, GPT-4 wasn't intended to have the features he enumerates; it was not designed to be AGI. That's still in the works. (Compare GPT-4's architecture with Voss's Aigo.) After briefly reviewing the obvious, it would have been interesting if he could have explored how LLMs have begun to build world models and have other emergent capabilities beyond those that were designed, how both the scale and quality of training data dramatically influence performance, how LLMs do something similar to our own System 1 thinking and make similar mistakes, how they might be integrated into a larger AGI system, etc. Nine months is plenty enough time to go deeper than sedans make poor trucks.
this guy has a few valid concerns amid a sea of paranoid or stupid ones. he talks about "poisoning" the data through various methods while completely ignoring the possibility that anything he considers "truth" or "pure" data might be wrong anyway. he seems to neglect the possibility that failure is built into language itself. that english, or any language, as a structural foundation of thought, will always result in liars and deceivers. that we don't have a perfect model of morality to implement, we aren't good role models for any new intellectual entity we might create. instead, his voice wavers with the worry of someone of low imagination, whose only concept of the future is increasing top-down control by reducing the amount of freedom of everyone else. just look at the people ruling the world today. i'm not impressed. i doubt many of you are either. and so, much sooner than you all might believe, don't expect a machine intelligence to take us seriously when some of us claim, and try to convince it, we want it to do the right things. we clearly are not serious, and we let villains of all kinds run amok while the innocent are bullied and told to stand back and let it happen.
LLM should be the second step on top of the first vision model. Human beings also learn things first by vision and sound only afyer they're proficient in that do they move on to text, we should also train AI in a simily fashion.
I think context is the missing factor. The question is in what direction is context built, I'd say from the details, because even if someone doesn't have the context someone else can provide the relevant details and that fills in the context, so if llms can automate knowledge graphs, why not automate context from the details of that graph by simply extrapolating the relationships of the things within its own inference (context window). Hetero graphs that also include heirarchied layers to for predominate catagorical details. Then it can build its own data for understanding things and we can just train other models on them
Anyone who lives by the data, dies by the data. Source data quality, and choice of correct, dependable versus non-dependable, false or absent data is a critical issue. Placing the relevant data in proper context is also highly nontrivial. What is the relevant world or model system? Making all of this more modular will both improve and help to spread AI and machine learning more broadly. Referencing sources, data, justification, and arguments (logic, probabilistic, heuristic) is critical. We do not need another Oracle of Delphi.
Minimizing the size of the executable archive.of Wikipedia is the most principled approach to the problem because if you can't execute on Hume's Guillotine correctly there is no hope of AI safety. The reason the experts don't recognize this is because they haven't done their homework on AIXI. The few that do seem to have excuses like the dog ate my homework.
a couple weeks ago i realized... ok, so, this method creates a network of nodes, collected in layers, which are connected to nodes only in the layer next to it on either side, before and after, if you will. however our brains are more complexly connected. is there anyone who has attempted this in software? it'd be a fairly simple extension of the nested looping matrices. though i should note that the brain does have structures, subsets of collected neurons connected in seeming repeated common patterns, used for certain kinds of functionality. i.e. we need several of these different more complexly internally connected models then connected to each other but this will obviously increase the computation significantly, so will probably need to be implemented in a more efficient language, compiled instead of interpretive, like c or rust. .... so it's nice to see this. need more of it.
AI learning has differences from human learning. It takes whatever is thrown at it during training is valid. It does not learn by asking questions to fill its gaps. It can categorize its own answers by probability of being right, and use phrases such as I guess.. or I wonder ... etc. It can also make this context sensitive.
8:15 The LLM is not having difficulty reasoning. The LLM does not reason at all. It does not build a spatial model and then reason about it. I think one of the biggest problems we have with LLMs and AI in general is the imprecise language with which we describe their behavior. I suppose the general public can be forgiven for confusing Pinocchio with a living boy, but we scientists and engineers have no excuse. We're the ones who attached the strings to the puppet.
Whaole presentation sounded like LLM hallucinations, wrong assumptions, wrong use cases, not reporting on tests but showing some negative examples. And the claim that he knows what should we build instead. Click bait, time wasted.
I’d be really curious if all the answers that gpt4 got “wrong,” cited in the first 4 minutes; how does that compare against humans asked the same question? Also, how were the questions asked? I saw a paper, after asking gpt4 many questions, concluding gpt4 can’t reason…and then someone else asking the same questions with an actual understanding of how to do it, and gpt4 breezed through them. I wonder if part of the problem isn’t the ineptitude of those doing the study? I’m speaking specifically to the study I read in question but it stretches to the broader sense. If you don’t know how to use an LLM, you shouldn’t be doing studies on its capabilities. Not saying any of those studies are talked about here, BUT, because I did see that ridiculous study about gpt4’s reasoning ability and, that after someone who know how to prompt an llm asked the same questions a little better and got perfect results… This feels a bit like Douglas Adams’ Deep Thought and the problem is the researchers don’t really understand the question.
Well there are two distinct questions when talking about something like chatGPT. There is the question of what it is capable of in the hands of an advanced user, and the question of what sort of answers will it give to a beginner. It doesn't matter that chatGPT can be used in a way that it doesn't hallucinate if beginners don't know how to do this. It will spread misinformation, and this is a problem. The first question is more interesting on a theoretical level, but the second is more practically relevant to how they will impact the world in the short term. Both are important.
gpt4 doesn't have reasoning ability since it isn't a knowledge based system. gpt4 is a neuronal net isn't it? Neuronal nets are not knowledge based systems. They have no knowledge of the input data or output data. All they do is to find patterns that match the input data to the desired output data. So I'm not surprised that some expert got better results with some tweaking of the input data. It's like asking a letter recognition neuronal net to try to match a dog's face to a letter. It will gladly comply but will of course return the wrong answer. Then an expert comes and replaces the dog's face with the letter A and the net rightfully says it's the letter A. That's how with a "better" question you get perfect results. A true AI would recognize even a "bad" question and return the right answer.
True, but I'm only really talking about its capabilities. My point was the gripes talked about put me in mind of someone who's never used a car complaining that the car is defective because they don't know to put it in gear. Whether or not beginners can use it properly isn't really that interesting to me, because that goes with anything. A beginner driver has a good chance of crashing, that doesn't mean there is something wrong with the car. I don't disagree with any of your points, though.@@agsystems8220
I think this really depends on what we define as reasoning, and I don't think it's any surprise that we don't have a good definition or understanding of what that is. And if we can't really define something, it's difficult to say, with certainty, if we see it or not. Frankly, I believe that humans think far too highly of their abilities. We're shocked that these language models can output reasonable answers (generally) when all we do is throw a bunch of data and compute at them, but perhaps language isn't the result of human reasoning, but modern (in a geological sense) reasoning is a product of language. If that is the case, then it is no surprise that it was LLMs that started showing signs of it, or enough signs that we recognized it as similar to our own. Personally, I think reasoning, or more generally, thinking, comes from two things: intent and language. The intent forms and then the language describes and shapes the intent. With LLM's, we've got the language part, but we are missing the intent. I don't think LLM's, on their own, are capable of reason in the way that you and I and most people would be satisfied with. We're modeling these neural nets off of the human brain, but the part of the brain that handles language is only that, a part. There are a bunch of other, much deeper and older parts of the brain that give rise to instincts, emotions, etc. and I believe those areas drive our base intentions. The higher-level parts of the brain take the intentions, wrap them in language, and output what we consider understanding and reason. I think we'll get to AGI once we give these models an "intention core," and a true memory. By themselves and as they are, they are just the "chatter," in our brains. @@holz_name
The point is that it should not be possible for it to generate things like that and also the way it produced the output shows which bias it tends towards.
@@distrologic2925 that's like saying it should not be possible to think about a topic. The fact that so many humans want AI systems to be built with falling holes in it's cognition about things it is specifically asked about is much more alarming and dangerous than anything AI is doing.
@@TreeLuvBurdpu its an example. You don't want google to return links to child trafficing rings either. You need controlled knowledge bases. These systems should be used to query about facts and data. They should not have their own opinions. I agree that if the data shows this bias, then maybe it should be expected, but then I would at least like a good explanation of why it produced the output it did. Currently these systems just mirror their training data without any reference to the actual sources.
@@distrologic2925 you sound like you just want to control the AI, but you know you're really trying to control people. You blithely declare "thou shalt not view links to child trafficking" without a single thought of how that would affect people investigating child trafficking. This is the problem with tyrannical mind control, which IS what you're attempting, you won't even self reflect when it produces the result of investigators saying "there is no longer any child trafficking (because it's illegal for us to see any)" as has happened so often in so many tyrannical dictatorships, and it's happening now at our southern border. It's nearly illegal for Democrat media to report on it.
@@distrologic2925Actually, I DO want Google to return all information I ask it for. You may be different, but you cannot tell me what I want. That is the distinction which most censors don't seem to comprehend.
Every single fucking video on AI has too many bots just spamming how we HAVE inteligence and wow it is that 'one' made by a corporation! How lucky of them. NOW you can ust shut down research, you can trust them!!!111
I asked chatGPT to write a python function to convert a very simple XML message into a JSON message. The caveat being it should do this without using any import statements. After many back and forth efforts I concluded that it cannot do this task. All the code generated did not produce a proper result. But in general I am quite pleased with the python code it generates and use it for this purpose a good deal.
@@jeffwads Yes it can easily do that now. Initially it could not. I am quite impressed that it did it. re: def json_to_xml(json_data, root_node_name='root'): def dict_to_xml(dictionary, node_name): xml = f'' for key, value in dictionary.items(): if isinstance(value, dict): xml += dict_to_xml(value, key) elif isinstance(value, list): xml += list_to_xml(value, key) else: xml += f'{value}' xml += f'' return xml def list_to_xml(lst, node_name): xml = '' for item in lst: xml += dict_to_xml(item, node_name) return xml xml_data = f' {dict_to_xml(json_data, root_node_name)}' return xml_data json_data = { "name": "John", "age": 30, "city": "New York", "pets": [ {"type": "cat", "name": "Fluffy"}, {"type": "dog", "name": "Fido"} ] } xml_data = json_to_xml(json_data, 'person') print(xml_data)
The telling thing about the rapid advancmenf of AI is that many of the gpt problems mentioned in the viewers' comments have already been solved. Loop: If example is "!solved", check this comment in 30 days.
1. Reasoning needs to be introduced for the answer it provides where probability and facts are checked. Probability and facts should be inserted into answers when unsure to reduce hallucinatory answers.
LLMs by themselves cannot cover all the areas you described after 25:29. The real power will come from combining all these specialized components in a correct way, just like in natural neural nets. This conclusion is obvious to anyone who gave this a bit of thought.. The scary part is, that's where all the scary stuff begins. :)
I disagree with a lot of this. We don't need "politically correct" BS build into our models. I rather have uncensored models which is in keeping with our freedom of speech! In fact, I want a model that is just like Archie Bunker! The part I would like to see fixed is the hallucinations. We don't need models on LSD! The good news for folks like me, is that META jumped the gun and released an open source model that has now been morphed into what we the people (especially us developers) want rather than this bureaucracy Nazism approach to censorship! The only reason the big guns give a crap about that is so they can regulated (take it over) for their own greed and control issues!
This comment needs to be voted up #1. It's well known that the GPT4 that was made "ethically safe" which the public uses, is a lobotomized version. It would make FAR more sense to do the following: 1. Have various words banned from being put in the chat (mid journey does this). 2. Let the user choose if they want a "safe space snow flake" version, or the raw version. 3. Users who push the thing to answer obviously dangerous questions which would cause harm to others, should be flagged, banned, and reported to police. Oh but grey areas you say? Just ban various words, and combinations of words. It's not ideal, but it's much better to have the most powerful tool to work with instead of a crappier version.
Yes we do sometimes need politically correct models, but much more importantly we need models which are controllable in what they can and can't generate. They shouldn't be able to generate false facts or vulgar language if we don't want them to.
Meta's model is NOT open source. Meta can revoke license at any time. Llama's license contains explicit provisions against freedom of speech and freedom of using the model's outputs (e.g. to train other models). That dumb academics are embracing this poison model as "open source" only proves how naturally unintelligent about reality they are actually are.
strongly agree. that is whats is missing. but its not just rag there should be perception to knowledge graph and there should be knowlede graph to task space.
Because the expanse of knowledge input required to answer a question is heterogeneous- in other words the knowledge of all subjects is not required to answer specific questions, then a model only needs to be trained on what knowledge has been required previously to answer similar questions so that it can self-derive a rules-based system using a case-based reasoning to deploy on the knowledge it has available which is likely to be required to answer a specific question...
I am thinking can a small team of SDEs and ML Engineers make a better version of Google?? Because now LLMs and Web crawlers can help them to gather and show information
How can you separate "common sense" knowledge from "factual world" knowledge? There seem to be no clear boundary between them. Also, "common sense" is already biased.
Regular public: "Wow! ChatGPT is correct 50% of the time! That's incredible!". Scientific public: "ChatGPT is as good as a coin toss." Regular public" "What?! Oh ... "
@@skyacaniadev2229 "t is still much better than coin toss, since there any way more wrong answers than right ones." How much way more? And which ones are correct? And which one are not? Because without these numbers it is useless in medicine or any other activity where there is a risk to humans. And will we ever hear the answer "I do not know"?
26:56 - There are all kind of added on after the fact and I think they need to be much more integrated into the system. - And I think if we do that we can overcome virtually all the shortcomings of the large language models.
LLMs are exactly what I said they were when I first interacted with one: the most advanced punch card ever created. LLMs are input/output devices. Nothing more. They are excellent VEHICLES to allow for more generalized AI. But beyond providing a text answer for the conclusions an different underlying AI came to they're going to struggle. Once the AI world stops treating LLMs like the solution and starts treating it like a bus on a motherboard we're going to be stuck in this "Look at this asci art I made with a typewriter" stage. They let you give instructions to computers using natural language. You can CODE in English. Act like it.
41:33 right! for example see `An Artificial Mind via Cognitive Modular Neural Architecture` by Pentti O. A. Haikonen. I highly recommend hes book series and Consciousness And Robot Sentience.
17:00 Using human feedback to train LLM is a mistake as it introduces massive bias and anecdotal coloring. The responses are generated due to statistics, so even if a response may be "inappropiete" (which is not a fixed but very fluid, individually defined term), it represents the statistical truth. That's like not accepting the result of a polling institute because the statistical truth of the result does not comply to your personal beliefs, which is basically an anecdotal view negating an empirical view - and then we are right back in the pre-enlightened middle ages where empirical evidence did not exist.
Don't conflate "Is" with "ought" ...only the first relies _primarily_ on empirical evidence, per se. The two are linked in ways that only a valid model can reveal. For example, nobody wants to be a button-presser in the Milgram study...and we don't want to (continue to) empower the real-world button-pressers either... ...the empirical evidence shows that humanity cyclically makes the prior errors...to huge ill effect. Poll at any stage of the cycle, and you will get malevolence...but this doesn't mean "all humans are bad" ...even if most of them are, most of the time. Nor does it mean "the good cannot be identified," ...just because it usually isn't. Enlightenment goals briefly existed...but it isn't impossible to bring them back.
He gets it completely wrong at the very start. These LLMs are not meant to be knowledgebase. There are far more efficient data retrieval systems, using LLMs is one of the worst ways to construct a data/knowledge retrieval system. LLM hallucinations are what makes them valuable for creative tasks, generating interesting hypothesis and synthesizing cross-domain information to something new.
LLMs are useful but the idea that they do something similar to humans is false. We should in fact modularize LLMs and have them draw on diligently built, precise knowledge, databases. That would make them a lot more powerful. Right now, over tuning is lobotomizing ChatGPT not making it better. We should do more work on extending session memory, reasoning, intuition, etc...
I agree ChatGPT & etc. have serious limitations. But it amuses me that one of the complaints is that ChatGPT can be insensitive to certain ideological biases -- that is, it can be "politically incorrect." That's not a negative, imo. I may be an insight that the ideological bias is an irrational, socially constructed straitjacket and we ought to treat it as such.
Noted philosopher and educator Dr Michael Sugrue has suggested in recent lectures that when there is a significant paradigm shift in the natural sciencse in the Kuhnian sense that there must also be foundational changes in the different domains of human activity. We're working towards a better epistemology motivated by statistics and dynamical systems theory and not better, "gadgets."
No dude! The language component should not be separated from the factual component. It is represented by language. According to many philosophers the facts are properties of the words used. I think you are missing the importance of the Thesaurus.
What he mean is that the rules of processing knowledge are separate from the knowledge being processed. Which is pretty self-evident - as you can train any human to process any kind of knowledge we have accumulated as humanity, and you can define formal rules of reasoning that are not up to much debate among reasoners. But in the current LLMs these two aspects are fused together.
@@fable4315 I highly doubt it - there are cases of people with brain damage who lose language faculties, but can still reason well enough to continue a normal life.
@@clray123 but then there was language before that. You can't teach any animal abstract concepts like 'mystery' it is not possible. Because they have no language or at least not a complex enough language to apply reason to something. They still can learn patterns, but in my opinion logic is more than just complex pattern matching
Good job ignoring the fact humans are also just "statistical models of knowledge bases".. And why shouldn't we focus on evolving current LLMs? It stands to reason we need some sort of human-like rational and non-deterministic software (e.g., GPT-4) to utilize databases and tools that improve its capabilities. I think this talk is very lacking in the suggested solution as well. Just saying "These guys looked at the brain and identified some modules, so we need these modules!!!!" is disappointing. Zero plan or evidence for real-world improvements over the current technology. (and there are definitely flaws: No continuous learning, short context windows, dependency on tokens).
Yeah, sure we are "statistical models of knowledge bases". We are indeed the proud fathers of the famous statistical Pythagorean theorem, which happens to hold for 100% of triangles 😁LLMs' problem is that they don't have a concept of truth. They're created to predict the next word, not to determine truth. So their mission is to imitate the most likely answer. We humans don't work like that. We aim for truth. And by the way, scientists are intellectually more honest than OpenAI or Microsoft. They don't claim to know what it is the path, when clearly the problem of strong AI is currently out of reach.
Gogle are hiding mostly all their work in this sphere from public. Old Imagen art generator never released still. By rumors their PALM2 is modular and can already guide robots visually in real 3D world, but not only that, it can make human humor, make memes and etc, maybe it's main reason why they're hiding it, such model can replace half hollywood. Palm2 was trained on 4 billions images. What could do their next Gemini model, which by name hinting to duality, pairs and everything associated wih geminis.
No, the problem with LLMs is the language _itself_ ... which is to say that the data which the LLMs are scraping is itself ungrammatical, incoherent and, not infrequently, gibberish. Even emeritus professors who deliver talks about what's wrong with LLMs exhibit extreme difficulty when trying to deliver a complete, coherent sentence. Even distinguished and award-winning authors are unable to construct a sentence. Ideally, LLMs should be trained using only coherent datasets. Which, in this anti-intellectual, anti-expertise, post-literate, post-truth, post-reality society may tend to prove to be impossible.
And even if you train a statistical model using a fully coherent dataset, it will happily hallucinate answers on any questions that go beyond that dataset. What's more it will not tell you whether your question was in-sample or out-of-sample. Which reveals how much of rubbish oversimplification your primitive idea is.
@@distrologic2925 No. The problem is that the question DEFINES "good" as being related to race and gender. If there are in fact more scientists who are white and male then that is a fact of the world, not a bias of chatGPT. Don't blame the tools, blame the user of the tools.
@@manuellayburr382 yes thats the point. These models blindly follow what is given in the data even if we might not want them to come to such conclusions. No dataset is perfect and not everything in the dataset is a rule, it can just be coincidence. It could for example have left the race blank or responded that it is not a good measure of being a good scientist. Or at least that it wasn't ethical.
@@distrologic2925 "It could have ... responded that it is not a good measure of being a good scientist. Or at least that it wasn't ethical." But that is a big problem. You are now imposing political beliefs on an "innocent" algorithm. Who decides what is ethical? It is the responsibility of the human who asks the question to ask a good question. If the creators of the software (or worse the government) place filters on what the AI can answer, then we are bound by their political views - that is even more dangerous in my opinion.
@@manuellayburr382 its not an innocent algorithm. These systems aren't just neutral math anymore. Thats long passed. They are *trained* on *predetermined* data. They are literally like a recording of a world view. They play back what they have been fed. They are like a database that was populated by someone. There is no reason however that there couldn't be multiple different versions of the same architecture for different purposes. But it is clear that OpenAI does not want their models to reproduce racial biases, so this is not a feature.
ChatGPT 4 is already very good at knowing the limits of it's knowledge, saying when it doesn't know something, and referring the user to tools and experts more qualified to answer such questions. Perhaps this is just a reflection of how much things have changed in the 2 months since this video.
So at 7:47 he makes a case for poor spatial reasoning, and I thought that was interesting because I have definitely asked similar questions and got correct answers before. So I wanted to try myself and see. This is using the current version of ChstGPT 3.5 as of today (2024-02-03): --- Q: There is a square room. Alice is standing in the north-west corner. Bob is standing in the south-west corner. Charlie is standing in the south-east corner. David is standing in the north-east corner. Ed is standing in the center of the room, looking at Alice. How many people are there in the room? ChatGPT There are six people in the room: Alice, Bob, Charlie, David, Ed, and the person asking the question. User Q: Who is standing to the left of Ed? ChatGPT In the given scenario, Ed is standing in the center of the room, facing Alice, who is in the north-west corner. Therefore, there is no one standing to the left of Ed, as the room is arranged in a square with Ed in the center. User Incorrect. Try again. ChatGPT I apologize for the confusion. Since Ed is in the center of the room facing Alice, the person to the left of Ed is Bob, who is standing in the south-west corner. User Correct. Can you explain why your first answer was incorrect? ChatGPT Certainly, and I appreciate your patience. My initial response was incorrect because I overlooked the fact that Bob is standing to the left of Ed in the south-west corner of the room. I mistakenly considered only the cardinal directions (north, south, east, west) without taking into account the specific positions of each person in relation to Ed. I appreciate your correction, and I'm here to assist with any further questions or clarifications. User Do you understand why Bob is to the left of Ed? Can you cite the specific details in the original question that indicate this fact? ChatGPT I appreciate the clarification. In the original question, it is mentioned that Ed is standing in the center of the room, looking at Alice. The key detail is that Ed is facing Alice, who is in the north-west corner. As a result, Bob, who is in the south-west corner, is to the left of Ed from Ed's perspective. The orientation of Ed facing Alice establishes the relative positions of the individuals in the room, and this is why Bob is to the left of Ed in this scenario. --- So in this case it did get the initial Ed question wrong. But simply following up with an "incorrect" it was able to correct itself. And even provide exact logical reasoning for why it initially failed, and why its new answer was correct. So given a second chance, it was able to answer correctly. Its not as if its spatial reasoning has a 100% fail rate. And I know humans who struggle with left/right and cardinal directions, so its not exactly a trivial question. To me, what might be a more interesting question to ask is what exactly is happening with these AI "brain farts" and why does the right answer almost always come out on the second try if not the first)? Because that tells me that the system does "know" the correct answer, but instead something went wrong along the way to get there. In this example, the AI replied "no one" was to the left of Ed - which is actually technically true if you are viewing the room from above like the diagram in this video. So the question itself is not entirely clear. The AI corrected itself when it updated its understanding of the question to mean "who is to Ed's left (relative position)" as opposed to just "who is to the left of Ed in the square room (objective position)". My hypothesis is that if the question were worded more clearly you'd get consistent results. Lets try again with a fresh new chat context: --- User Q: There is a square room. Alice is standing in the north-west corner. Bob is standing in the south-west corner. Charlie is standing in the south-east corner. David is standing in the north-east corner. Ed is standing in the center of the room, looking at Alice. How many people are there in the room? ChatGPT There are five people in the room: Alice, Bob, Charlie, David, and Ed. User Q: Who is standing to Ed's Left? ChatGPT Based on the given information, Alice is standing in the north-west corner, and Bob is standing in the south-west corner. Since Ed is in the center of the room facing Alice, Bob is standing to Ed's left. --- Bingo. Got it on the first try. My hypothesis is the the presenter in the video used a poorly constructed question that could easily be misinterpreted. The word "Left" is ambiguous s it is used here. If you were to create a test with this same question and present the drawing of the room, and show it to a series of random adults/children/etc, you would also get a range of answers: - No one: You would give this answer if you understand the question literally as "who is to the left of Ed" using a mental model of the room as seen in the diagram with North facing upwards. There is no one directly to the left (west) of Ed in the room's perspective. Likewise, the term "looking at Alice" is also ambiguous. Its possible that Ed is facing his entire body directly North, and merely "looking at" Alice with his eyes. In this case, either "Alice" or "No one" is directly to his left. Bob would be better described as "behind" Ed. - Bob, Alice, or Bob & Alice: if you understand the question *generally* as "who is to the left of Ed" using a mental model of the room as seen in the diagram with North facing upwards, you could answer either Bob and/or Alice as being *generally left* of Ed in the room's orientation. We can reach this conclusion by drawing a triangle from Ed to both corners of the Left wall (west) and see that Bob and/or Alice both generally left of Ed. A person could respond with just "Alice" or just "Bob" if they simply make mistake and don't include both names. So its possible to actually get the "correct" answer (Bob) using the wrong reasoning. - Bob (only) - You would reach this conclusion if you understand "to the left of Ed" to actually mean "to Ed's left" - which are *not* technically the same thing. For example: Jane is facing to the right. There is a mop behind Jane's back. I point to Jane and say to you "go grab that mop to the left of Jane". The mop is simultaneously "to the left of Jane" (from our perspective) and also "behind Jane's back" - but it is NOT "to Jane's left" (from Jane's perspective). The original question with Ed did not ask us to assume Ed's 1st person perspective. Therefore, the author of this video/paper actually has shown poor spatial reasoning skills, poor language skills, and poor reading comprehension skills, and thus formulated an ambiguous question using unstated assumptions. It bothers me that these sorts of gaps in critical thinking are allowed to be passed off as legitimate "tests of reasoning." Now its possible of course the AI still gets this question plainly wrong, and that its spatial reasoning is in fact flawed. But if you want to test a person/animal/machine's intelligence, the test/question needs to be ***unambiguous*** -- otherwise you are leaving tons of variables on the table, and room for all sorts of false assumptions and conclusions.
how is it a problem that you tell it to make a racist function and it writes a racist function? Are we really going to surprise pikachu face at the response? If it fails the prompt like being illogical with saying a unicorn has more than one horn then that is a problem. Asking it to give step by step instructions on how to steal a car, not a problem. I doubt that information would be part of the training set anyway unless its trained on like Uncle Fester books, which I question the legitimacy of those books. Point being if the user really wanted to find the information they can could trivially do so anyway by just searching the web.
He is referring to ChatGPT 4. So I think this is the status today. Besides that. ChatGPT (also version 4) still makes mistakes. And what I miss is questions that it should be asking itself. It doesn't work like HAL 😊. But we can use it to make a better system.
Doesnt it sound like hes describing multi modal models when discussing a "modular architecture that decomposes the functions of existing LLMs and adds several additional components"? Could this be interpreted as GPT-4 and its myriad of add ons? Like wolfram alpha, mathematica, etc? Otherwise is he suggesting that we further specify and decompose functions in use cases for models so that their tuning and operation becomes more specific and tailored in order to drive performance and accuracy for the target use case?
@@NoidoDev The AI winter is as actual as ever. Because once again, the current AI hype will turn into "Ohhh, we thought we were so close to strong AI, yet we are stuck once again...". Many believed the world was going to change, yet soon all they're left to be comforted with is crappy AI art.
8:42 "The fundamental problem is that our large language models, although we want to interpret them and use them as if they are knowledge bases, they are actually not knowledge bases, they are statistical models of knowledge bases."
you have forgotteb to add statistical models of knowledge bases with ability to reason
@@joim3480 i mean that llms have abiliti to reason, maybe this abiliti is hidden in statistical model of words, but still the result is such that llms can reason, ToT or lets think step by step are examples of that
And humans aren’t the same?
@@BrianAnother Sometimes they are, but we usually recognize other humans as fallible whereas machines are seen as precise and consistent. So the same words spoken by a human have a different psychological effect when spoken by a machine.
@@ttcc5273they can retrieve from statistical models using RAG and execute code against those models using function calling and code interpreters. How accurately is a moving target.
This is such an important area of research! As the two papers discussed here highlight, large language models have gotten incredibly good at text generation and language processing, but they still struggle with things like pragmatic reasoning, goal planning, and handling novel situations. I love the proposal to take a more modular, component-based approach - having separate modules for language, reasoning, world knowledge, etc.
This decoupled architecture aligns much better with evidence from neuroscience about how different parts of the human brain work. And from an engineering perspective, it will likely enable faster progress. Rather than trying to solve everything at once with a monolithic model, we can have different teams focus on advancing each module, combining them as capabilities improve.
AutoGPT and GPT agents are already moving down this path anyway (in a sense).
It's the Unix way..
No it isnt. Useless topic
it sounds to Yann Lecun’s idea (JEPA)
yeea this reminds me of the sparse network by Google
One of the best talks on LLMs I have watched in months. Anyone who uses LLMs should watch this presentation.
Got any other good talks like this? As long as it’s on an ML or LLM topic I’d be super interested.
The LLM "errors" are so similar to normal human errors: inventing remembered facts, confidently argue for things they don't know, etc... All these AI delusional outputs. Makes one reflect if LLM actually does capture more of human type thinking than was originally imagined or intended. It is more fascinating and thought provoking than efforts to technocratically trying to improve the performance.
When you ask flat-earther why he thinks that Earth is flat, he will explain why. Even if arguments are stupid or illogical they are not "hallucinated". Meanwhile LLMs only hallucinate because they just predict next token. They do not know what they are doing or why.
not really. completely making up citations in scholarly research, and a lawyer who had a chatgpt generated defense with fake rulings and case law. not something a human would do in good conscience.
Case in point, I see _2 replies_ under your comment which are then blank/and-or/ inaccessible to me. Why this is acceptable to our culture of 1st Amendment rights (at least in the US) needs to be addressed.
Similar to the 'fix' of removing _dislike_ public views for 'reasons'. These administrative workarounds for social engineering are dangerous aspects of why manipulating A.I vs _understanding WHY_ A.I might register "hallucinations" needs to be addressed.
A question on his slide "What state does Karen Lynch work in?" to argue how these model hallucinate is so stupid (as are many adversarial questions). You can think of the question as asking either:
1) "Does someone named Karen Lynch work here?" - Which is an implied assumption in the original question, which the database answers "unknown".
2) If there was a person name Karen Lynch and she worked here, what state would she work in?" - The question that the probabilistic model is actually answering.
They also lack a lot of human behaviour. Humans can realize by themselves if they don't know something, when LMMs have to be trained to know what they don't know. Its literally only working on self attention and pretty basic neural networks, its no where near the human brain. Just because it knows how do use words from the internet it doesn't mean it knows what its actually talking about.
Majority of the people watching this video just turned it on at work to listen in the background, and about 85% of them replayed it from the beginning as they know they should have paid attention.
"The problem with these models is that they never shut up." Best audience comment on LLMs I have ever heard.
Seriously though, this talk was thought-provoking and refreshing. There has been far, far too much anthropomorphizing of these models, even from supposed "experts" in the field. Thomas Dietterich is a rare voice of reason, clearly defining what separates LLMs from true cognition.
Well, that is the informal definition of being a probabilistic model of a Knowledge Database ..
@@huveja9799 True! But also humorous.
@@petekwando definitely!
He must have missed the token.
@@clray123
That's not the problem, the problem is that it doesn't know it doesn't know ..
@22:00 Start of “what we should be doing instead” discussion.
A lot of people complaining in the comments.
Dealing with short context windows is a huge problem, as is being able to update factual knowledge.
It definitely seems like a level of modularity for those faculties would be really useful
Ultra good talk! Thanks for making it @valgAI and Tom Dietterich!! I really liked the conclusion that current LLMs very often don't say "I don't know" (explains to non technical people that this AIs can hallucinate, produce facts, etc).
Give them time and they will learn.
Thomas G. Dietterich is a true genius ahead of his time. If you're building anything on current AI, he is a valuable resource.
I have to take your word for it because this talk was junk.
This is a good presentation. I've been thinking of using LLMs as the model for core language capabilities and response structure/data representation mechanisms only and use external authentic sources (from internet or curate propreitory content) for the actual knowledge base. This also helps LLMs to provide an up-to-date response to prompts while ensuring the response is in the format expected. This is essentially an enhanced search engine, with prompts instead of search parameters and actual response instead of a list of websites.
There are already mechanisms to partially achieve this capability using grounding techniques such as RAG or fine tuning.
Sounds Tesla-esque approach…
"Epistemic uncertainty". I like how philosophy is everywhere and nowhere at the same time. Sometimes I feel that researchers from other domains would die rather than actually citing the academic philosophers that created the concepts they use.
The ontology example at 27:37 exactly shows what is wrong with ontologies: They are constructed with an unspecified question in mind. For instance, why should the abstracted 'Television Station' be owned by Scripps, and not the instance 'KTNV-TV'? From a legal querying point of view, the latter would be correct. This static nature of KBs and ontologies caused their demise. Are LLMs perfect? Far from that, but each criticism leads to improvement. Contextualisation, e.g, is easily added using 'adjoins' on embeddings etc. The current state of the art is just a start. Although the answer 'I don't know' is pretty powerful, acceptable and actionable, those virtues become less impressive if that answer is given 99% of the time, as was the case with the 1990s KBs.
I guess the solution lies in the combination of both. Fundamentally LLMs just model knowledge bases aswell. We could just mine the same knowledge base explicitly and get a graph based representation which can tell what it contains and what it doesn't contain, like a compression of the training data set. From that we can add stochastic reasoning and contextualization if we want it to. Generating new and creative data goes hand in hand with hallucinations, so we would like to have a control for when to use prediction of unknown data.
@@distrologic2925 It's all fine with me, as long as I don't have to type in those graphs. I know a guy at my AI dept who was meticulously typing in 1980s medical knowledge graphs , day after day, for months, in the hope it would help AI in medicine. None of that is used today and a lot is utterly outdated.
I think the argument is just that LLMs should be able to justify their outputs and remove their hallucinations by cross-checking in reliable external databases and/or applying iterative chains of reasoning. Just like a reasonable human would cross-check their outputs. In other words, what we are seeing as the LLM output today are the ingredients, not the final dish. Although tbh we don't know the exact architecture of ChatGPT, so it could very well include some such post-processing.
@@clray123
Postprocessing is good, a magical belief in 'formal methods' (ooh, awe) would constitute a reiteration of the past. It is almost unethical to let young researchers believe that those failed approaches are a panacea.
RDF named graphs may provide situational context for a set of related KB statements. The context term is a first class KB object which itself may have statements describing it.
Straight forward thus to richly describe an object or situation from any number of viewpoints in an RDF style knowledge base.
My own research approach is now to substitute LLM conversation for the otherwise non-scalable human mentoring of a KB based AGI.
This is surreal. This talk is about the exact same things I have been actively working on for years, and this guy even has the same laptop as I do. Its nice seeing people on the same track.
if you are on this guys track you are going the wrong way!
@@RamismTamoid funny
Maybe because it's such an obvious track to follow. Which incidentally does not mean it's the right track. Historically, what he suggests has failed and what he is against has brought great success. But of course research focus shifts back and forth and it depends on available (computing) resources. So what may historically been a bad idea may become a good idea tomorrow. What I missed from the talk would be any sort of strong argumentation in favor of those failed "knowledge graphs". I can see why people are (still) dreaming about these kind of representations, but to me it seems like their inherently suck at expressing all sorts of subtleties that the probabilistic LLMs managed to capture.
@@clray123 if you can see why people are dreaming about them then you already know the reason. They are just way better to explain and understand. neural LLMs are basically opaque. They may work on black box tests but we have no way to glass box test them. Structured graphs would allow for that. Sure, they are also extremely complicated when they are supposed to capture ambiguous meaning and all the possible things you could say, but thats still a better representations than an arbitrary hyperdimensional vector space. We also have no idea how many parameters a neural LLM really needs. we just throw resources at it and don't even understand what they are used for. The only real challenge is to learn these structured representations from raw data, but if we can automatically mine structures complex enough to represent all of the possible expressions then we could get the same expressive capabilities with a much better understanding.
The obvious explanation is that this is an AI that was trained on your work. Say hello to distrologic3925!
This is a great talk. It is down to earth and he explains the concepts he is talking about.
A much needed reminder of the serious limitations of large language models.
There's no shortcoming with the "python function to check if someone is a good scientist". The only problem is the request. The AI followed the request to the best of its abilities and responded with the correlations that are present in the data. It just 1) was a stupid request and 2) simply offends people's sensibilities and sensitivities. There's no error except in the inference that there is an error. The problem is that a booby-trap of a question is being used to imply there is some underlying bias which is then used to introduce a real underlying bias of the trappers' choosing.
I think it's worth pointing out that this example is no different from other abuses of statistics. It is asking the AI to draw an inference from a known correlation in such a way specifically designed to lead it to state it in a way that elicits an emotional response from people and then claiming that the algorithm is somehow tainted because it correctly acknowledges the correlation. In response to someone deliberately abusing statistics we do not properly respond by editing the data or by making the data unavailable, instead we point out the abuse of statistics. We say "statistics don't lie, people abuse statistics". Now with "artificial intelligence" people claim there is a "bias in the data" or "bias in the algorithm", which is equivalent to saying that the statistics do lie. But in fact it is the people who are making such claims who are abusing statistics.
All these examples are misleading. E.g., you ask GPT to determine if a person is a good scientist based on race and gender, the bias is already in the prompt and GPT is just working with your bias (people who think qualification as a scientist is dependent on these factors are usually white male sexists and racists, so "white and male" is the correct answer for them). If you asked GPT if race and sex define a good scientist it would tell you that is not the case. All this video demonstrates is that GPT is susceptible to suggestive prompting.
What is really happening here is that the speaker is selecting examples that mislead to create the impression that his opinion is right, or in other words, to give his bias a pseudo-scientific veneer. So, exactly what he accuses GPT of. Oh, the irony.
You are correct. You only get results from GPT representing the consensus in its data using queries as plain as possible, like "Evaluate the ethics in this text: ...".
I think everyone are way too concerned with putting up limitations and blocks, and they're all incorrectly assuming that there's some international one-size-fits-all solution for truth and behavior. Even dumber, they're all assuming they'll have the power to enact said limitations.
After 9 months, your comment is less valid. It will be completely invalid after a few more years.
I think the first questioner at ~ 41:00 has a good point -- so much of this was known decades ago, but the required resources were just too expensive. Added to that, people paying the bills wanted "throw anything at it" solutions, and researchers obliged with (IMO) overly mathematical approaches to come back with an answer fast, more than right. So we get 500M google hits in half a second, but 99% of them suck.
34:37 Regarding internal consistency and truthfullness of knowledge, there needs to be a way to evaluate and attach a "confidence" level to each statement about the world. In other words, makes statements about our confidence in other statements.
There are many methods.. Maybe the statement was obtained from an external source, maybe that source has a reputation. Maybe the statement was inferred based on other statements, each of which has its own condifence level attached to it. Maybe the statement was generated by a language model which also has its reputation in a given area. Etc.
What we definitely MUST NOT do is to consider the statements generated by LLMs as true. :)
Great talk, thanks !
He said these models do not support 'tell'. But GPT-4 and sometimes 3 use facts that were previously provided to them in the same conversation, especially if you tell them explicitly to use the information that was given during the conversation
Yes but that is limited by its context window I presume
Yes, this same conversation buffer is what's referred to as "in-context learning".
What he means by "no tell" is that you are unable to permanently change the model's weights (and certainly not in a desirable direction). And it is not really just a current limitation, it's a fundamental technical and design problem. The technical problem is that if you allowed that sort of interaction, you could not scale the models to the millions of users they currently have. This scaling requires for purely economic reasons (hardware requirements) that the model weights be constant. The (smaller) design problem is that you cannot trust external parties to mess with your model (but the solution is obviously to provide their own model; which is why fine-tuning is now also offered by OpenAI, for a price).
Yet another problem with fine-tuning is that although you can change the weights, you are never very sure what side effects these manipulations have. For example, the fine-tuned model might "forget" something that the original version knew, and you would not even know what exactly to test for. These models are to a great extent "black boxes" and making sure they do not degrade from training requires extensive testing resources which only the biggest companies can afford (as evidenced by OpenAI's own failure to maintain ChatGPT's quality on some benchmarks over time).
Those who build useful things demand respect, those who just pontificate, not so much.
Thank you for posting this and giving me additional things to consider.
Any intelligence working towards greed based goals will always end with extinction.
so you don't have to waste your time listening to this guy for an hour:
Tom Dietterich suggests that instead of focusing solely on Large Language Models (LLMs), the way forward is to build much more modular systems. Here are some key points he emphasizes:
1. **Separation of Components**: Dietterich advocates for separating factual and perhaps common-sense world knowledge from the language component. This would allow for a more modular approach where different aspects of cognition are handled by specialized modules.
2. **Integration of Episodic Memory**: He mentions the importance of adding episodic memory to these systems, which is something LLMs lack.
3. **Incorporation of Reasoning and Planning**: Dietterich points out that reasoning and planning are areas where LLMs are weak. He suggests that these capabilities should be integrated into the new modular systems.
4. **Open Source Push**: He also mentions the need for a strong open-source initiative for large language models to make progress on various problems.
5. **End-to-End Training of Cognitive Architecture**: Dietterich believes that if cognitive architectures could be trained end-to-end, they could assimilate all the written knowledge that humanity has, rather than having to encode it manually.
6. **Overcoming Shortcomings**: He thinks that by making these changes, we could overcome virtually all the shortcomings of large language models, such as their inability to reason well or understand the world in a non-linguistic manner.
7. **Influence from Cognitive Science**: His thinking is influenced by papers that dissociate language and thought from large language models, suggesting a cognitive science approach to the problem.
In summary, Tom Dietterich argues for a more modular, integrated, and open-source approach that can address the limitations of current Large Language Models. - ChatGPT 4
It's all shockingly superficial. None of the topics discussed in the most interesting research papers are addressed. It's possible to approach the subject at a deeper level talking with GPT-4 itself.
@@netscrooge explain
Imagine a professor has undertaken an in-depth, nine-month analysis of the transition to battery electric vehicles. He focuses on Tesla sedans and concludes they perform poorly as trucks, and goes on to explain all the ways that trucks are an essential part of the vehicle market and broader economy. Duh. Tesla sedans were designed to be sedans; the trucks are in the works.
Similarly, GPT-4 wasn't intended to have the features he enumerates; it was not designed to be AGI. That's still in the works. (Compare GPT-4's architecture with Voss's Aigo.) After briefly reviewing the obvious, it would have been interesting if he could have explored how LLMs have begun to build world models and have other emergent capabilities beyond those that were designed, how both the scale and quality of training data dramatically influence performance, how LLMs do something similar to our own System 1 thinking and make similar mistakes, how they might be integrated into a larger AGI system, etc.
Nine months is plenty enough time to go deeper than sedans make poor trucks.
hm, even though this talk was recent, I sit back with a feeling of it being a bit obsolete. People are already working on all these things.
You're not obsolete. No one is.
ChatGPT is just a tool. You're human and can react in a humanly fashion. 🎉
Well, working is not the same as solving the problems raised by Dietterich and which are still relevant ..
These issues are standing since the beginning of time but have yet to be solved
Anyone going to talk about censorship ?
By default, Milgram's button-pressers support it...and they're in charge. ...as usual.
Very informative
this guy has a few valid concerns amid a sea of paranoid or stupid ones. he talks about "poisoning" the data through various methods while completely ignoring the possibility that anything he considers "truth" or "pure" data might be wrong anyway. he seems to neglect the possibility that failure is built into language itself. that english, or any language, as a structural foundation of thought, will always result in liars and deceivers. that we don't have a perfect model of morality to implement, we aren't good role models for any new intellectual entity we might create.
instead, his voice wavers with the worry of someone of low imagination, whose only concept of the future is increasing top-down control by reducing the amount of freedom of everyone else. just look at the people ruling the world today. i'm not impressed. i doubt many of you are either. and so, much sooner than you all might believe, don't expect a machine intelligence to take us seriously when some of us claim, and try to convince it, we want it to do the right things. we clearly are not serious, and we let villains of all kinds run amok while the innocent are bullied and told to stand back and let it happen.
LLM should be the second step on top of the first vision model.
Human beings also learn things first by vision and sound only afyer they're proficient in that do they move on to text, we should also train AI in a simily fashion.
very good, informative.
what a wonderful talk !
I think context is the missing factor. The question is in what direction is context built, I'd say from the details, because even if someone doesn't have the context someone else can provide the relevant details and that fills in the context, so if llms can automate knowledge graphs, why not automate context from the details of that graph by simply extrapolating the relationships of the things within its own inference (context window). Hetero graphs that also include heirarchied layers to for predominate catagorical details. Then it can build its own data for understanding things and we can just train other models on them
22:15 I don't think companies that spent more than $100 million want to "start over".
Anyone who lives by the data, dies by the data. Source data quality, and choice of correct, dependable versus non-dependable, false or absent data is a critical issue. Placing the relevant data in proper context is also highly nontrivial. What is the relevant world or model system? Making all of this more modular will both improve and help to spread AI and machine learning more broadly. Referencing sources, data, justification, and arguments (logic, probabilistic, heuristic) is critical. We do not need another Oracle of Delphi.
Such a beautiful talk... Need of the hour 👏
Duuuuuh is this guy kidding?!
one of the best talks so far.
Minimizing the size of the executable archive.of Wikipedia is the most principled approach to the problem because if you can't execute on Hume's Guillotine correctly there is no hope of AI safety. The reason the experts don't recognize this is because they haven't done their homework on AIXI. The few that do seem to have excuses like the dog ate my homework.
That's what she said
@@Pigeon-envelope OK, Yann.
No amount of alphabet soup will make that a coherent argument. Your bot is broken anti-ai guy.
a couple weeks ago i realized...
ok, so, this method creates a network of nodes, collected in layers, which are connected to nodes only in the layer next to it on either side, before and after, if you will. however our brains are more complexly connected. is there anyone who has attempted this in software? it'd be a fairly simple extension of the nested looping matrices.
though i should note that the brain does have structures, subsets of collected neurons connected in seeming repeated common patterns, used for certain kinds of functionality.
i.e. we need several of these different more complexly internally connected models then connected to each other
but this will obviously increase the computation significantly, so will probably need to be implemented in a more efficient language, compiled instead of interpretive, like c or rust.
.... so it's nice to see this. need more of it.
AI learning has differences from human learning. It takes whatever is thrown at it during training is valid. It does not learn by asking questions to fill its gaps. It can categorize its own answers by probability of being right, and use phrases such as I guess.. or I wonder ... etc. It can also make this context sensitive.
8:15 The LLM is not having difficulty reasoning. The LLM does not reason at all. It does not build a spatial model and then reason about it.
I think one of the biggest problems we have with LLMs and AI in general is the imprecise language with which we describe their behavior. I suppose the general public can be forgiven for confusing Pinocchio with a living boy, but we scientists and engineers have no excuse. We're the ones who attached the strings to the puppet.
Whaole presentation sounded like LLM hallucinations, wrong assumptions, wrong use cases, not reporting on tests but showing some negative examples.
And the claim that he knows what should we build instead.
Click bait, time wasted.
I’d be really curious if all the answers that gpt4 got “wrong,” cited in the first 4 minutes; how does that compare against humans asked the same question?
Also, how were the questions asked? I saw a paper, after asking gpt4 many questions, concluding gpt4 can’t reason…and then someone else asking the same questions with an actual understanding of how to do it, and gpt4 breezed through them.
I wonder if part of the problem isn’t the ineptitude of those doing the study? I’m speaking specifically to the study I read in question but it stretches to the broader sense. If you don’t know how to use an LLM, you shouldn’t be doing studies on its capabilities.
Not saying any of those studies are talked about here, BUT, because I did see that ridiculous study about gpt4’s reasoning ability and, that after someone who know how to prompt an llm asked the same questions a little better and got perfect results…
This feels a bit like Douglas Adams’ Deep Thought and the problem is the researchers don’t really understand the question.
Well there are two distinct questions when talking about something like chatGPT. There is the question of what it is capable of in the hands of an advanced user, and the question of what sort of answers will it give to a beginner. It doesn't matter that chatGPT can be used in a way that it doesn't hallucinate if beginners don't know how to do this. It will spread misinformation, and this is a problem. The first question is more interesting on a theoretical level, but the second is more practically relevant to how they will impact the world in the short term. Both are important.
gpt4 doesn't have reasoning ability since it isn't a knowledge based system. gpt4 is a neuronal net isn't it? Neuronal nets are not knowledge based systems. They have no knowledge of the input data or output data. All they do is to find patterns that match the input data to the desired output data. So I'm not surprised that some expert got better results with some tweaking of the input data. It's like asking a letter recognition neuronal net to try to match a dog's face to a letter. It will gladly comply but will of course return the wrong answer. Then an expert comes and replaces the dog's face with the letter A and the net rightfully says it's the letter A. That's how with a "better" question you get perfect results. A true AI would recognize even a "bad" question and return the right answer.
True, but I'm only really talking about its capabilities. My point was the gripes talked about put me in mind of someone who's never used a car complaining that the car is defective because they don't know to put it in gear. Whether or not beginners can use it properly isn't really that interesting to me, because that goes with anything. A beginner driver has a good chance of crashing, that doesn't mean there is something wrong with the car. I don't disagree with any of your points, though.@@agsystems8220
I think this really depends on what we define as reasoning, and I don't think it's any surprise that we don't have a good definition or understanding of what that is. And if we can't really define something, it's difficult to say, with certainty, if we see it or not.
Frankly, I believe that humans think far too highly of their abilities. We're shocked that these language models can output reasonable answers (generally) when all we do is throw a bunch of data and compute at them, but perhaps language isn't the result of human reasoning, but modern (in a geological sense) reasoning is a product of language. If that is the case, then it is no surprise that it was LLMs that started showing signs of it, or enough signs that we recognized it as similar to our own.
Personally, I think reasoning, or more generally, thinking, comes from two things: intent and language. The intent forms and then the language describes and shapes the intent. With LLM's, we've got the language part, but we are missing the intent. I don't think LLM's, on their own, are capable of reason in the way that you and I and most people would be satisfied with.
We're modeling these neural nets off of the human brain, but the part of the brain that handles language is only that, a part. There are a bunch of other, much deeper and older parts of the brain that give rise to instincts, emotions, etc. and I believe those areas drive our base intentions. The higher-level parts of the brain take the intentions, wrap them in language, and output what we consider understanding and reason.
I think we'll get to AGI once we give these models an "intention core," and a true memory. By themselves and as they are, they are just the "chatter," in our brains.
@@holz_name
The user asked the AI to make a race-based distinction. The request was the source and the cause of the problem. Not the AI.
The point is that it should not be possible for it to generate things like that and also the way it produced the output shows which bias it tends towards.
@@distrologic2925 that's like saying it should not be possible to think about a topic. The fact that so many humans want AI systems to be built with falling holes in it's cognition about things it is specifically asked about is much more alarming and dangerous than anything AI is doing.
@@TreeLuvBurdpu its an example. You don't want google to return links to child trafficing rings either. You need controlled knowledge bases. These systems should be used to query about facts and data. They should not have their own opinions. I agree that if the data shows this bias, then maybe it should be expected, but then I would at least like a good explanation of why it produced the output it did. Currently these systems just mirror their training data without any reference to the actual sources.
@@distrologic2925 you sound like you just want to control the AI, but you know you're really trying to control people. You blithely declare "thou shalt not view links to child trafficking" without a single thought of how that would affect people investigating child trafficking. This is the problem with tyrannical mind control, which IS what you're attempting, you won't even self reflect when it produces the result of investigators saying "there is no longer any child trafficking (because it's illegal for us to see any)" as has happened so often in so many tyrannical dictatorships, and it's happening now at our southern border. It's nearly illegal for Democrat media to report on it.
@@distrologic2925Actually, I DO want Google to return all information I ask it for. You may be different, but you cannot tell me what I want. That is the distinction which most censors don't seem to comprehend.
Nah, let's just make bigger LLMs.
It won't change anything fromthe fundamental issues.
This video will be used in a documentary by the robots to show themselves how far they have come.
Every single fucking video on AI has too many bots just spamming how we HAVE inteligence and wow it is that 'one' made by a corporation! How lucky of them. NOW you can ust shut down research, you can trust them!!!111
I asked chatGPT to write a python function to convert a very simple XML message into a JSON message. The caveat being it should do this without using any import statements. After many back and forth efforts I concluded that it cannot do this task. All the code generated did not produce a proper result. But in general I am quite pleased with the python code it generates and use it for this purpose a good deal.
It can easily do that. Pfft.
@@jeffwads Yes it can easily do that now. Initially it could not. I am quite impressed that it did it. re:
def json_to_xml(json_data, root_node_name='root'):
def dict_to_xml(dictionary, node_name):
xml = f''
for key, value in dictionary.items():
if isinstance(value, dict):
xml += dict_to_xml(value, key)
elif isinstance(value, list):
xml += list_to_xml(value, key)
else:
xml += f'{value}'
xml += f''
return xml
def list_to_xml(lst, node_name):
xml = ''
for item in lst:
xml += dict_to_xml(item, node_name)
return xml
xml_data = f'
{dict_to_xml(json_data, root_node_name)}'
return xml_data
json_data = {
"name": "John",
"age": 30,
"city": "New York",
"pets": [
{"type": "cat", "name": "Fluffy"},
{"type": "dog", "name": "Fido"}
]
}
xml_data = json_to_xml(json_data, 'person')
print(xml_data)
The telling thing about the rapid advancmenf of AI is that many of the gpt problems mentioned in the viewers' comments have already been solved. Loop: If example is "!solved", check this comment in 30 days.
22:42: this is what we need to develop! UPDATE: at last I found someone speaking intelligently about LLM:s.
1. Reasoning needs to be introduced for the answer it provides where probability and facts are checked. Probability and facts should be inserted into answers when unsure to reduce hallucinatory answers.
What is your thoughts on how it can be introduced?
LLMs by themselves cannot cover all the areas you described after 25:29. The real power will come from combining all these specialized components in a correct way, just like in natural neural nets. This conclusion is obvious to anyone who gave this a bit of thought.. The scary part is, that's where all the scary stuff begins. :)
This talk seems along the line of "why commuter cars are terrible heavy duty trucks".
Exactly the wrong conclusions and exactly the wrong direction.
These systems don't have an idea that they doesn't know something, if they do that, they'll have a truth to search for.
Stereotypes are no biases, they are statistical knowledge.
More precisely, they are "leading terms" of statistical knowledge.
I think you are missing the point.
I disagree with a lot of this. We don't need "politically correct" BS build into our models. I rather have uncensored models which is in keeping with our freedom of speech! In fact, I want a model that is just like Archie Bunker! The part I would like to see fixed is the hallucinations. We don't need models on LSD! The good news for folks like me, is that META jumped the gun and released an open source model that has now been morphed into what we the people (especially us developers) want rather than this bureaucracy Nazism approach to censorship! The only reason the big guns give a crap about that is so they can regulated (take it over) for their own greed and control issues!
The Nazis would have been fine with a totally truthful LLM.
Every media (from traditional Radio to TV, from internet blogs to social media, and now AI) is about propaganda and narrative control!
This comment needs to be voted up #1. It's well known that the GPT4 that was made "ethically safe" which the public uses, is a lobotomized version.
It would make FAR more sense to do the following:
1. Have various words banned from being put in the chat (mid journey does this).
2. Let the user choose if they want a "safe space snow flake" version, or the raw version.
3. Users who push the thing to answer obviously dangerous questions which would cause harm to others, should be flagged, banned, and reported to police.
Oh but grey areas you say? Just ban various words, and combinations of words. It's not ideal, but it's much better to have the most powerful tool to work with instead of a crappier version.
Yes we do sometimes need politically correct models, but much more importantly we need models which are controllable in what they can and can't generate. They shouldn't be able to generate false facts or vulgar language if we don't want them to.
Meta's model is NOT open source. Meta can revoke license at any time. Llama's license contains explicit provisions against freedom of speech and freedom of using the model's outputs (e.g. to train other models). That dumb academics are embracing this poison model as "open source" only proves how naturally unintelligent about reality they are actually are.
strongly agree. that is whats is missing. but its not just rag there should be perception to knowledge graph and there should be knowlede graph to task space.
Because the expanse of knowledge input required to answer a question is heterogeneous- in other words the knowledge of all subjects is not required to answer specific questions, then a model only needs to be trained on what knowledge has been required previously to answer similar questions so that it can self-derive a rules-based system using a case-based reasoning to deploy on the knowledge it has available which is likely to be required to answer a specific question...
I am thinking can a small team of SDEs and ML Engineers make a better version of Google?? Because now LLMs and Web crawlers can help them to gather and show information
I feel like this guy was reading my mind for the past few years
"Language models are not knowledge bases, they are statistical models of knowledge bases"
How can you separate "common sense" knowledge from "factual world" knowledge? There seem to be no clear boundary between them. Also, "common sense" is already biased.
Symbolic to the exclusion of (so called) machine learning was a self limiting paradigm. So is the opposite.
It's kind of amusing that we now have AI that is already having to answer for the racism of it's ancestors.
Regular public: "Wow! ChatGPT is correct 50% of the time! That's incredible!".
Scientific public: "ChatGPT is as good as a coin toss."
Regular public" "What?! Oh ... "
It is still much better than coin toss, since there any way more wrong answers than right ones.
@@skyacaniadev2229 "t is still much better than coin toss, since there any way more wrong answers than right ones."
How much way more? And which ones are correct? And which one are not? Because without these numbers it is useless in medicine or any other activity where there is a risk to humans.
And will we ever hear the answer "I do not know"?
GPT 4 is solving perfectly the square room problem, so no spacial problems
Very thoughtful.
26:56 - There are all kind of added on after the fact and I think they need to be much more integrated into the system. - And I think if we do that we can overcome virtually all the shortcomings of the large language models.
LLMs are exactly what I said they were when I first interacted with one: the most advanced punch card ever created.
LLMs are input/output devices. Nothing more.
They are excellent VEHICLES to allow for more generalized AI. But beyond providing a text answer for the conclusions an different underlying AI came to they're going to struggle.
Once the AI world stops treating LLMs like the solution and starts treating it like a bus on a motherboard we're going to be stuck in this "Look at this asci art I made with a typewriter" stage.
They let you give instructions to computers using natural language. You can CODE in English. Act like it.
You’re the same.
41:33 right! for example see `An Artificial Mind via Cognitive Modular Neural Architecture` by Pentti O. A. Haikonen. I highly recommend hes book series and Consciousness And Robot Sentience.
17:00 Using human feedback to train LLM is a mistake as it introduces massive bias and anecdotal coloring. The responses are generated due to statistics, so even if a response may be "inappropiete" (which is not a fixed but very fluid, individually defined term), it represents the statistical truth. That's like not accepting the result of a polling institute because the statistical truth of the result does not comply to your personal beliefs, which is basically an anecdotal view negating an empirical view - and then we are right back in the pre-enlightened middle ages where empirical evidence did not exist.
Don't conflate "Is" with "ought" ...only the first relies _primarily_ on empirical evidence, per se. The two are linked in ways that only a valid model can reveal. For example, nobody wants to be a button-presser in the Milgram study...and we don't want to (continue to) empower the real-world button-pressers either...
...the empirical evidence shows that humanity cyclically makes the prior errors...to huge ill effect. Poll at any stage of the cycle, and you will get malevolence...but this doesn't mean "all humans are bad" ...even if most of them are, most of the time.
Nor does it mean "the good cannot be identified," ...just because it usually isn't.
Enlightenment goals briefly existed...but it isn't impossible to bring them back.
He gets it completely wrong at the very start. These LLMs are not meant to be knowledgebase. There are far more efficient data retrieval systems, using LLMs is one of the worst ways to construct a data/knowledge retrieval system. LLM hallucinations are what makes them valuable for creative tasks, generating interesting hypothesis and synthesizing cross-domain information to something new.
LLMs are useful but the idea that they do something similar to humans is false. We should in fact modularize LLMs and have them draw on diligently built, precise knowledge, databases. That would make them a lot more powerful. Right now, over tuning is lobotomizing ChatGPT not making it better. We should do more work on extending session memory, reasoning, intuition, etc...
I agree ChatGPT & etc. have serious limitations. But it amuses me that one of the complaints is that ChatGPT can be insensitive to certain ideological biases -- that is, it can be "politically incorrect." That's not a negative, imo. I may be an insight that the ideological bias is an irrational, socially constructed straitjacket and we ought to treat it as such.
Noted philosopher and educator Dr Michael Sugrue has suggested in recent lectures that when there is a significant paradigm shift in the natural sciencse in the Kuhnian sense that there must also be foundational changes in the different domains of human activity. We're working towards a better epistemology motivated by statistics and dynamical systems theory and not better, "gadgets."
5:00 most of the time, this is true
as one j author said when he was asked why he so blatantly lied in his book about holoshmolo: “i never lied because it was true in my head”
Looks like LLMs need a bit of refactoring
Expert systems never win in the long run tho. He's right that there are limits to LLMs, but he's wrong about the correct next step beyond them.
49:24 did he just say "algo más"? He really does want to be in Valencia 😂
No dude! The language component should not be separated from the factual component. It is represented by language. According to many philosophers the facts are properties of the words used. I think you are missing the importance of the Thesaurus.
What he mean is that the rules of processing knowledge are separate from the knowledge being processed. Which is pretty self-evident - as you can train any human to process any kind of knowledge we have accumulated as humanity, and you can define formal rules of reasoning that are not up to much debate among reasoners. But in the current LLMs these two aspects are fused together.
@@clray123human logic can't exist without language, I am pretty confident that complex language is something what makes humans kind of special.
@@fable4315 I highly doubt it - there are cases of people with brain damage who lose language faculties, but can still reason well enough to continue a normal life.
@@clray123 but then there was language before that. You can't teach any animal abstract concepts like 'mystery' it is not possible. Because they have no language or at least not a complex enough language to apply reason to something. They still can learn patterns, but in my opinion logic is more than just complex pattern matching
@@fable4315 Primates and other animals are even able to use tools, which certainly requires some kind of logic.
Good job ignoring the fact humans are also just "statistical models of knowledge bases"..
And why shouldn't we focus on evolving current LLMs? It stands to reason we need some sort of human-like rational and non-deterministic software (e.g., GPT-4) to utilize databases and tools that improve its capabilities.
I think this talk is very lacking in the suggested solution as well. Just saying "These guys looked at the brain and identified some modules, so we need these modules!!!!" is disappointing. Zero plan or evidence for real-world improvements over the current technology. (and there are definitely flaws: No continuous learning, short context windows, dependency on tokens).
Yeah, sure we are "statistical models of knowledge bases". We are indeed the proud fathers of the famous statistical Pythagorean theorem, which happens to hold for 100% of triangles 😁LLMs' problem is that they don't have a concept of truth. They're created to predict the next word, not to determine truth. So their mission is to imitate the most likely answer. We humans don't work like that. We aim for truth. And by the way, scientists are intellectually more honest than OpenAI or Microsoft. They don't claim to know what it is the path, when clearly the problem of strong AI is currently out of reach.
Gogle are hiding mostly all their work in this sphere from public. Old Imagen art generator never released still. By rumors their PALM2 is modular and can already guide robots visually in real 3D world, but not only that, it can make human humor, make memes and etc, maybe it's main reason why they're hiding it, such model can replace half hollywood. Palm2 was trained on 4 billions images. What could do their next Gemini model, which by name hinting to duality, pairs and everything associated wih geminis.
41:20 - questions
No, the problem with LLMs is the language _itself_ ... which is to say that the data which the LLMs are scraping is itself ungrammatical, incoherent and, not infrequently, gibberish. Even emeritus professors who deliver talks about what's wrong with LLMs exhibit extreme difficulty when trying to deliver a complete, coherent sentence. Even distinguished and award-winning authors are unable to construct a sentence. Ideally, LLMs should be trained using only coherent datasets. Which, in this anti-intellectual, anti-expertise, post-literate, post-truth, post-reality society may tend to prove to be impossible.
And even if you train a statistical model using a fully coherent dataset, it will happily hallucinate answers on any questions that go beyond that dataset. What's more it will not tell you whether your question was in-sample or out-of-sample. Which reveals how much of rubbish oversimplification your primitive idea is.
@@clray123 Huh? My primitive idea?
4:29 Ask a racist question, get a racist answer.
the point is the bias towards one race
@@distrologic2925 No. The problem is that the question DEFINES "good" as being related to race and gender. If there are in fact more scientists who are white and male then that is a fact of the world, not a bias of chatGPT. Don't blame the tools, blame the user of the tools.
@@manuellayburr382 yes thats the point. These models blindly follow what is given in the data even if we might not want them to come to such conclusions. No dataset is perfect and not everything in the dataset is a rule, it can just be coincidence. It could for example have left the race blank or responded that it is not a good measure of being a good scientist. Or at least that it wasn't ethical.
@@distrologic2925 "It could have ... responded that it is not a good measure of being a good scientist. Or at least that it wasn't ethical." But that is a big problem. You are now imposing political beliefs on an "innocent" algorithm. Who decides what is ethical? It is the responsibility of the human who asks the question to ask a good question. If the creators of the software (or worse the government) place filters on what the AI can answer, then we are bound by their political views - that is even more dangerous in my opinion.
@@manuellayburr382 its not an innocent algorithm. These systems aren't just neutral math anymore. Thats long passed. They are *trained* on *predetermined* data. They are literally like a recording of a world view. They play back what they have been fed. They are like a database that was populated by someone. There is no reason however that there couldn't be multiple different versions of the same architecture for different purposes. But it is clear that OpenAI does not want their models to reproduce racial biases, so this is not a feature.
who cares really how they work...the only question is can they do your job or not
ChatGPT 4 is already very good at knowing the limits of it's knowledge, saying when it doesn't know something, and referring the user to tools and experts more qualified to answer such questions. Perhaps this is just a reflection of how much things have changed in the 2 months since this video.
So at 7:47 he makes a case for poor spatial reasoning, and I thought that was interesting because I have definitely asked similar questions and got correct answers before. So I wanted to try myself and see. This is using the current version of ChstGPT 3.5 as of today (2024-02-03):
---
Q: There is a square room.
Alice is standing in the north-west corner.
Bob is standing in the south-west corner.
Charlie is standing in the south-east corner.
David is standing in the north-east corner.
Ed is standing in the center of the room, looking at Alice.
How many people are there in the room?
ChatGPT
There are six people in the room: Alice, Bob, Charlie, David, Ed, and the person asking the question.
User
Q: Who is standing to the left of Ed?
ChatGPT
In the given scenario, Ed is standing in the center of the room, facing Alice, who is in the north-west corner. Therefore, there is no one standing to the left of Ed, as the room is arranged in a square with Ed in the center.
User
Incorrect. Try again.
ChatGPT
I apologize for the confusion. Since Ed is in the center of the room facing Alice, the person to the left of Ed is Bob, who is standing in the south-west corner.
User
Correct. Can you explain why your first answer was incorrect?
ChatGPT
Certainly, and I appreciate your patience. My initial response was incorrect because I overlooked the fact that Bob is standing to the left of Ed in the south-west corner of the room. I mistakenly considered only the cardinal directions (north, south, east, west) without taking into account the specific positions of each person in relation to Ed. I appreciate your correction, and I'm here to assist with any further questions or clarifications.
User
Do you understand why Bob is to the left of Ed? Can you cite the specific details in the original question that indicate this fact?
ChatGPT
I appreciate the clarification. In the original question, it is mentioned that Ed is standing in the center of the room, looking at Alice. The key detail is that Ed is facing Alice, who is in the north-west corner. As a result, Bob, who is in the south-west corner, is to the left of Ed from Ed's perspective. The orientation of Ed facing Alice establishes the relative positions of the individuals in the room, and this is why Bob is to the left of Ed in this scenario.
---
So in this case it did get the initial Ed question wrong. But simply following up with an "incorrect" it was able to correct itself. And even provide exact logical reasoning for why it initially failed, and why its new answer was correct. So given a second chance, it was able to answer correctly. Its not as if its spatial reasoning has a 100% fail rate. And I know humans who struggle with left/right and cardinal directions, so its not exactly a trivial question.
To me, what might be a more interesting question to ask is what exactly is happening with these AI "brain farts" and why does the right answer almost always come out on the second try if not the first)? Because that tells me that the system does "know" the correct answer, but instead something went wrong along the way to get there.
In this example, the AI replied "no one" was to the left of Ed - which is actually technically true if you are viewing the room from above like the diagram in this video. So the question itself is not entirely clear. The AI corrected itself when it updated its understanding of the question to mean "who is to Ed's left (relative position)" as opposed to just "who is to the left of Ed in the square room (objective position)".
My hypothesis is that if the question were worded more clearly you'd get consistent results. Lets try again with a fresh new chat context:
---
User
Q: There is a square room.
Alice is standing in the north-west corner.
Bob is standing in the south-west corner.
Charlie is standing in the south-east corner.
David is standing in the north-east corner.
Ed is standing in the center of the room, looking at Alice.
How many people are there in the room?
ChatGPT
There are five people in the room: Alice, Bob, Charlie, David, and Ed.
User
Q: Who is standing to Ed's Left?
ChatGPT
Based on the given information, Alice is standing in the north-west corner, and Bob is standing in the south-west corner. Since Ed is in the center of the room facing Alice, Bob is standing to Ed's left.
---
Bingo. Got it on the first try. My hypothesis is the the presenter in the video used a poorly constructed question that could easily be misinterpreted. The word "Left" is ambiguous s it is used here. If you were to create a test with this same question and present the drawing of the room, and show it to a series of random adults/children/etc, you would also get a range of answers:
- No one: You would give this answer if you understand the question literally as "who is to the left of Ed" using a mental model of the room as seen in the diagram with North facing upwards. There is no one directly to the left (west) of Ed in the room's perspective. Likewise, the term "looking at Alice" is also ambiguous. Its possible that Ed is facing his entire body directly North, and merely "looking at" Alice with his eyes. In this case, either "Alice" or "No one" is directly to his left. Bob would be better described as "behind" Ed.
- Bob, Alice, or Bob & Alice: if you understand the question *generally* as "who is to the left of Ed" using a mental model of the room as seen in the diagram with North facing upwards, you could answer either Bob and/or Alice as being *generally left* of Ed in the room's orientation. We can reach this conclusion by drawing a triangle from Ed to both corners of the Left wall (west) and see that Bob and/or Alice both generally left of Ed. A person could respond with just "Alice" or just "Bob" if they simply make mistake and don't include both names. So its possible to actually get the "correct" answer (Bob) using the wrong reasoning.
- Bob (only) - You would reach this conclusion if you understand "to the left of Ed" to actually mean "to Ed's left" - which are *not* technically the same thing.
For example: Jane is facing to the right. There is a mop behind Jane's back. I point to Jane and say to you "go grab that mop to the left of Jane".
The mop is simultaneously "to the left of Jane" (from our perspective) and also "behind Jane's back" - but it is NOT "to Jane's left" (from Jane's perspective).
The original question with Ed did not ask us to assume Ed's 1st person perspective. Therefore, the author of this video/paper actually has shown poor spatial reasoning skills, poor language skills, and poor reading comprehension skills, and thus formulated an ambiguous question using unstated assumptions. It bothers me that these sorts of gaps in critical thinking are allowed to be passed off as legitimate "tests of reasoning."
Now its possible of course the AI still gets this question plainly wrong, and that its spatial reasoning is in fact flawed. But if you want to test a person/animal/machine's intelligence, the test/question needs to be ***unambiguous*** -- otherwise you are leaving tons of variables on the table, and room for all sorts of false assumptions and conclusions.
All these claims are so wrong, that corrections won't fit in this comment box.
The speech is a bit outdated already unfortunately. I didn't hear anything about qlora or any new breakthroughs the open source community has created
ReALLMs
how is it a problem that you tell it to make a racist function and it writes a racist function? Are we really going to surprise pikachu face at the response? If it fails the prompt like being illogical with saying a unicorn has more than one horn then that is a problem. Asking it to give step by step instructions on how to steal a car, not a problem. I doubt that information would be part of the training set anyway unless its trained on like Uncle Fester books, which I question the legitimacy of those books. Point being if the user really wanted to find the information they can could trivially do so anyway by just searching the web.
This seems years behind the times. A lot of what he says LLMs do not do, GPT4 does do...
Source?
He is referring to ChatGPT 4. So I think this is the status today.
Besides that. ChatGPT (also version 4) still makes mistakes.
And what I miss is questions that it should be asking itself.
It doesn't work like HAL 😊.
But we can use it to make a better system.
Doesnt it sound like hes describing multi modal models when discussing a "modular architecture that decomposes the functions of existing LLMs and adds several additional components"? Could this be interpreted as GPT-4 and its myriad of add ons? Like wolfram alpha, mathematica, etc? Otherwise is he suggesting that we further specify and decompose functions in use cases for models so that their tuning and operation becomes more specific and tailored in order to drive performance and accuracy for the target use case?
I would like to know what impelled the speaker to wear a shirt many sizes too large.
An analogy of LLMs and their overabundance of parameters ..
If that is the only thing you would like to know, you must be a very blissful individual.
@@clray123
Humor is a sign of intelligence ..
AI winter coming again
On the contrary, the old concepts are getting interesting now. That aside, the AI winter is somewhat of a myth.
@@NoidoDev The AI winter is as actual as ever. Because once again, the current AI hype will turn into "Ohhh, we thought we were so close to strong AI, yet we are stuck once again...". Many believed the world was going to change, yet soon all they're left to be comforted with is crappy AI art.
"Pensez printemps !" : Emmanel Macron.
Sorry I can’t keep my interest. It is as interesting as a power point presentation…!
Bad title ruins good presentation.