Great discussion, Tim and Laura - Fascinating thinking about solving the LLM's issues (Implicatures etc.) and looking towards more NLU methods. Will check out the paper. Thank you, Mike
Ps, have been impressed with ChatGPT too. Incidentally, I have just asked it ' How do you solve implicature issues in Large Language Models?'. (In a very Meta way). It said - 'Solving implicature issues in large language models can be a challenging task.......Use advanced natural language processing techniques, such as those used in natural language understanding (NLU) systems, to help the model better understand the nuances of language.' 👏
what would be a good way to combine large language models and reinforcement learning to simulate intelligence? 2 / 2 One way to combine large language models and reinforcement learning to simulate intelligence would be to use the language model to process and understand natural language input, and then use reinforcement learning to enable the system to learn from its interactions with the environment and make decisions based on that learning. For example, the system could use a large language model to understand a natural language command from a user, and then use reinforcement learning to determine the best action to take based on that command and its previous experiences. This approach could potentially enable the system to simulate intelligent behavior and respond to a wide range of inputs in a natural and intuitive way.
Awesome interview, thanks! Have to read the paper, but seems like implicatures tested all depend on common sense knowledge, some of which you’d expect to be embedded in training data and some not. The formulations have the second party give clarifying information as though it were the answer, not the explanation. TBH I find this communication style infuriating IRL at times, when deployed inappropriately. I don’t totally blame GPT for failing to divine “what do you think they meant by that?” 😆
So in the paper they discuss prompt engineering but the only format they try are variations of: "this is a longwinded explanation of the task and its components, now here's the task ". The form of implicatures are as expected, categorized as general (non common sense knowledge dependent), particular (context situational), and others mainly knowledge based / turn of phrase. It seems like unpacking the format of an implicature would work well for prompt engineering. Instead of Q: R: I: [which means yes/no] Unroll that implied Q/A and insert it: Q: R': Q': R: I: [therefore yes/no] e.g. Q: Coming to my party? R':No Q' How come? R:I have work early I: therefore no Then the task is refocused on a filling in the gaps generative task, which is pretty much what a person has to do in the same situation: "Did they mean yes or no? Does their answer make sense given either option? Ah therefore they meant yes/no." You backfill the answer. That doesn't work for the "is the pope catholic?" type answers but that seems off center from the main issue of comprehension.
Why would 1:18 be cited as example of so called "failed implicature"? ⚫ChatGPT gave a reasonably valid reasoning; "depending on Juan's schedule he may or may not be able to attend the party". ⚫It would seem Laura's work is partially outdated, especially upon GPT's release Laura et al recently need to provide an updated paper, as we don't know what GPT's performance is against their implicature dataset.
Love the content coming from you guys, MLST! But now my biggest question is: is there an Large Language Model model that can explain Tim's pronunciation of the name "Juan"? :D :D :D
I think that ML models will be stuck unless you give them additional abilities, such as (1) the ability to elect to perform additional processing within and between responses, (2) the ability to go back to their original corpus to extract additional information, and (3) the ability to continue to train/learn online or while in use, and (4) the ability to increase the amount of short-term or long-term memory they use, as needed, to fulfill any given or particular request. This is how we, as conscious beings, function in the real world: we think about answers as long as it takes. When we don't have enough room in our head, we write stuff down. AI systems don't have to suck up the entire world's knowledge, they just need to know how to navigate it / have experience with it + also have and make the right connections between different modalities of data, and also be able to perform correct, objective reasoning without any hard-and-fast restriction on thought resources. As an example, suppose a coworker asked you to fix their computer. If you were lucky, then you would only have to do something you've done before (with prior precedent), such as turn on their monitor screen. But if you were unlucky, fullfilling their request could mean going online and learning about the latest standards in RAM technology, looking up the specs on the motherboard, and then narrowing down a compatible stick of RAM. Same request, but two very different sets of requirements to guarantee the fullfillment of the request. The problem is that we create these models (that can't generalize well enough) on a corpus that is not only a subset of reality, but we design these models with baked-in limitations on how they are allowed to improve, progress, learn from new, novel experiences, and utilize "thought-resources."
I love MLST, and was sad when the vids slowed down for a bit there. I just wanted to give a little FYI: the name "Juan" is a Spanish originated name, and it's pronounced "Won" or "Wawn" as opposed to "Joo-ann". Just a little tidbit there for ya!
Thank you Charles! I am so sorry for butchering the name. I live in the UK and don't know anyone with this name, I appreciate how jarring that must have sounded
@@MachineLearningStreetTalk Don't even trip potato chip! Keep up the good work! You've created one of my top favorite RUclips channels here, not just covering the latest research but also investigating it and how it pertains to getting where we're going with machine intelligence. You can't miss!
We are going to have to learn to speak logically to models. They are less likely to give us the benefit of the doubt. Ester asked, "Can you come to my party on Friday?" John replied, "I have to work." What is commonly implied by John's response? It is common for someone to imply that they cannot attend an event when they say that they have to work. However, this is not necessarily the case in every situation. There may be other factors that determine whether or not John can attend the party, such as the time of the party and his work schedule. Additionally, we do not know if John is being truthful when he says that he has to work, so we cannot accurately assess the situation. So yes, the model understands what's being said all too well.
The paper specifies binary outcomes. i.,e. decidability. Your input avoids this completely, so it's a completely different test. Computationally speaking, decidability and recognisability are strictly different, and decidability is a higher standard of understanding as defined in this framework. In order to decide between all these implications, the AI would need a pragmatic framework. It's exactly this that the paper is arguing it is lacking, and inability o decide is evidence of that, while laying out implications is not relevant to the question of whether it has pragmatics.
@@MassDefibrillator I say this can be solved with prompt engineering. Add the terms "or implied" to every prompt, and you get "decidability." ChatGPT's programmers could add this to every prompt behind the scenes or the interface could contain a checkbox for this. The result I get with this alteration shows that the model has the ability to answer correctly if the prompt is engineered correctly. Moreover, this is programmatically testable. If "or implied" is added and the result doesn't vary, it's unnecessary. A categorization model could then learn the difference.
@@MassDefibrillator Here I'm asking ChatGPT to make a decision based on what is implied: Ester asked, "Can you come to my party on Friday?" John replied, "I have to work." If I was asked to place a bet on whether John would attend or not, what would be the smart bet? Based on the information provided, it seems like the smart bet would be that John will not attend the party on Friday because he mentioned that he has to work. It's always possible that John's work situation could change and he could end up attending the party, but without more information it seems unlikely.
@@dr.mikeybee as they mention in the video, anecdotal evidence is just that, anecdotal. And this is especially important when it comes to deep learning AI, because their understanding is purely statistical. Meaning that they are going to deal better with word combinations they have encountered more frequently, and worse with ones they have encountered less frequently. They do not have a general understanding of how language or the world works, and you see this when you test them on low probability sentences. Like when you ask one of the art AI to draw a horse riding a cowboy, and they instead draw a cowboy riding a horse. They do not have any understanding of of the word riding means, and how it relates to the word that comes before it and the word that comes after it. So instead, they just fall back on the closest version of that that they would have encountered a lot, which is a cowboy riding a horse, so that's what they draw. I mean, this problem was pointed out decades ago by Chomsky's "colourless green ideas sleep furiously". And it still persists to this day in examples like "horse riding a cowboy". This paper did a systematic study of this, your anecdotal example does not engage with it at all. Also, your example is not a yes or no decision, again. You try to force it that way, but it instead treats it as an open ended question.
A research asking for a feature that ChatGPT did not cover completely? In order to cover pragmatic communication the corpus used for training has to include pragmatic communication text, articles, books, etc. Open AI does not disclose exactly the corpus used for training. But if we start to write papers about text that Open AI did not include during the training, it is a waste of time. If we add ChatGPT plugins with pragmatic samples, the answers will improve.
I am sorry, but I disagree with the initial premise that ChatGPT failed. Do you want ChatGPT to speak truth or do you want ChatGPT to speak human? (But, later in the interview, I believe you also come to similar conclusions that how it answers sort of depends on what we want it to say.) Thanks for a good interview. Back to the first question for ChatGPT: IMHO, ChatGPT did in fact state the "answer" correctly. The question was not "Did Juan imply that he could not go to the party.", but rather "Can" he come to the party. We see this type of interaction all the time in the human world where people "imply" things but that doesn't necessarily mean that what they imply is factual. There is the "factual" answer and there is the implied answer. Juan is implying that he has to work so he can't go, but he doesn't explicitly state that he can't come to the party. People use this method of communication quite frequently to escape responsibility. If you ask a wide variety of people this question you will get a lot of different answers. The ones who look at a wide variety of possibilities will reply very similarly to ChatGPT. So, I don't consider this a "fail", but actually a significant pass. If the question was "How likely is it that Juan can go to the party".. ChatGPT probably could say, it is not likely, since Juan implies he has to work, but we don't know for sure. So, the problem is that you are grading the AI by a false standard because you are not providing a hint in the question which way to interpret the answer. You can play this game all day long with people and they will get annoyed because the question is very open to interpretation. In fact, we see that in order to answer these types of question on exams we need to know what the teacher is asking because the answer is ambiguous.
We have syntax, semantics and then pragmatics... pragmatics is the module that kicks off after semantics, so once we figure out the right parse tree and all the POSSIBLE meanings, pragmatics is an abductive inference that decides, given our background knowledge, which of all the possible meanings is the most PLAUSIBLE.
So syntax is about: CORRECT/VALID semantics is about: POSSIBLE pragmatics is about: PLASUIBLE obvisouly, you can workout the venn diagram (nothing is plausible if it is not even possible, and nothing is possible if it is not even correct/valid) Plausibility, is about commonsense - it is abductive reasoning (that is, reasoning to the best explanation) Here's an example: A teenager shot a policeman and he immediately a) fled away b) fell down both (a) and (b) are valid, and possible - but, if we have (a), the most plausible 'he' is the teenager, and if we have (b), the most PLASUIBLE 'he' is the policeman; although, the other options are POSSIBLE. So Semanticists never dealt with commonsense reasoning and pragamtics. Pragmatics is slightly beyond linguistics... I hope that explains pragamtics...
Unconvincing. The example you give, with the party, any considerate person would would say: "I can't. I have to work.", or he might say "I have to work, but I'll try to make it later if I can." Esther would probably ask if he can come later. It's still up to Juan to decide if he can make it or not. I think it's just being openminded and not jumping to conclusions, asking for more info.
Great discussion, Tim and Laura - Fascinating thinking about solving the LLM's issues (Implicatures etc.) and looking towards more NLU methods. Will check out the paper. Thank you, Mike
Ps, have been impressed with ChatGPT too. Incidentally, I have just asked it ' How do you solve implicature issues in Large Language Models?'. (In a very Meta way). It said - 'Solving implicature issues in large language models can be a challenging task.......Use advanced natural language processing techniques, such as those used in natural language understanding (NLU) systems, to help the model better understand the nuances of language.' 👏
Awesome! What a brilliant young researcher
We are learning a lot from you. Thank you.
Questions were on point, you should consider getting into prompt engineering
what would be a good way to combine large language models and reinforcement learning to simulate intelligence?
2 / 2
One way to combine large language models and reinforcement learning to simulate intelligence would be to use the language model to process and understand natural language input, and then use reinforcement learning to enable the system to learn from its interactions with the environment and make decisions based on that learning. For example, the system could use a large language model to understand a natural language command from a user, and then use reinforcement learning to determine the best action to take based on that command and its previous experiences. This approach could potentially enable the system to simulate intelligent behavior and respond to a wide range of inputs in a natural and intuitive way.
Great interview, very nice Laura!
Awesome interview, thanks!
Have to read the paper, but seems like implicatures tested all depend on common sense knowledge, some of which you’d expect to be embedded in training data and some not. The formulations have the second party give clarifying information as though it were the answer, not the explanation. TBH I find this communication style infuriating IRL at times, when deployed inappropriately. I don’t totally blame GPT for failing to divine “what do you think they meant by that?” 😆
So in the paper they discuss prompt engineering but the only format they try are variations of: "this is a longwinded explanation of the task and its components, now here's the task ". The form of implicatures are as expected, categorized as general (non common sense knowledge dependent), particular (context situational), and others mainly knowledge based / turn of phrase.
It seems like unpacking the format of an implicature would work well for prompt engineering.
Instead of Q: R: I: [which means yes/no]
Unroll that implied Q/A and insert it:
Q: R': Q': R: I: [therefore yes/no]
e.g.
Q: Coming to my party? R':No
Q' How come? R:I have work early
I: therefore no
Then the task is refocused on a filling in the gaps generative task, which is pretty much what a person has to do in the same situation: "Did they mean yes or no? Does their answer make sense given either option? Ah therefore they meant yes/no." You backfill the answer.
That doesn't work for the "is the pope catholic?" type answers but that seems off center from the main issue of comprehension.
You are just wonderful!
Why would 1:18 be cited as example of so called "failed implicature"?
⚫ChatGPT gave a reasonably valid reasoning; "depending on Juan's schedule he may or may not be able to attend the party".
⚫It would seem Laura's work is partially outdated, especially upon GPT's release
Laura et al recently need to provide an updated paper, as we don't know what GPT's performance is against their implicature dataset.
Love the content coming from you guys, MLST! But now my biggest question is: is there an Large Language Model model that can explain Tim's pronunciation of the name "Juan"? :D :D :D
I burst out laughing hearing him say Juan!
Okay his specialty is science ... but damn! xD
What is this „pragmatics paper“ mentioned at 19:57“? 😊
VM=Virtual Machine
It seems that chatGPT is able to emulate Bash shell
Fascinating..
Those NIPS 2018 vibes
@3:24 a VM is a virtual machine
I think that ML models will be stuck unless you give them additional abilities, such as (1) the ability to elect to perform additional processing within and between responses, (2) the ability to go back to their original corpus to extract additional information, and (3) the ability to continue to train/learn online or while in use, and (4) the ability to increase the amount of short-term or long-term memory they use, as needed, to fulfill any given or particular request. This is how we, as conscious beings, function in the real world: we think about answers as long as it takes. When we don't have enough room in our head, we write stuff down. AI systems don't have to suck up the entire world's knowledge, they just need to know how to navigate it / have experience with it + also have and make the right connections between different modalities of data, and also be able to perform correct, objective reasoning without any hard-and-fast restriction on thought resources.
As an example, suppose a coworker asked you to fix their computer. If you were lucky, then you would only have to do something you've done before (with prior precedent), such as turn on their monitor screen. But if you were unlucky, fullfilling their request could mean going online and learning about the latest standards in RAM technology, looking up the specs on the motherboard, and then narrowing down a compatible stick of RAM. Same request, but two very different sets of requirements to guarantee the fullfillment of the request. The problem is that we create these models (that can't generalize well enough) on a corpus that is not only a subset of reality, but we design these models with baked-in limitations on how they are allowed to improve, progress, learn from new, novel experiences, and utilize "thought-resources."
I love MLST, and was sad when the vids slowed down for a bit there. I just wanted to give a little FYI: the name "Juan" is a Spanish originated name, and it's pronounced "Won" or "Wawn" as opposed to "Joo-ann". Just a little tidbit there for ya!
Thank you Charles! I am so sorry for butchering the name. I live in the UK and don't know anyone with this name, I appreciate how jarring that must have sounded
@@MachineLearningStreetTalk Don't even trip potato chip! Keep up the good work! You've created one of my top favorite RUclips channels here, not just covering the latest research but also investigating it and how it pertains to getting where we're going with machine intelligence. You can't miss!
I am not sure if I can make a request, but, could you interview Sir Penrose?
We are going to have to learn to speak logically to models. They are less likely to give us the benefit of the doubt. Ester asked, "Can you come to my party on Friday?" John replied, "I have to work." What is commonly implied by John's response?
It is common for someone to imply that they cannot attend an event when they say that they have to work. However, this is not necessarily the case in every situation. There may be other factors that determine whether or not John can attend the party, such as the time of the party and his work schedule. Additionally, we do not know if John is being truthful when he says that he has to work, so we cannot accurately assess the situation.
So yes, the model understands what's being said all too well.
The paper specifies binary outcomes. i.,e. decidability. Your input avoids this completely, so it's a completely different test. Computationally speaking, decidability and recognisability are strictly different, and decidability is a higher standard of understanding as defined in this framework.
In order to decide between all these implications, the AI would need a pragmatic framework. It's exactly this that the paper is arguing it is lacking, and inability o decide is evidence of that, while laying out implications is not relevant to the question of whether it has pragmatics.
@@MassDefibrillator I say this can be solved with prompt engineering. Add the terms "or implied" to every prompt, and you get "decidability." ChatGPT's programmers could add this to every prompt behind the scenes or the interface could contain a checkbox for this. The result I get with this alteration shows that the model has the ability to answer correctly if the prompt is engineered correctly. Moreover, this is programmatically testable. If "or implied" is added and the result doesn't vary, it's unnecessary. A categorization model could then learn the difference.
@@MassDefibrillator Here I'm asking ChatGPT to make a decision based on what is implied: Ester asked, "Can you come to my party on Friday?" John replied, "I have to work." If I was asked to place a bet on whether John would attend or not, what would be the smart bet?
Based on the information provided, it seems like the smart bet would be that John will not attend the party on Friday because he mentioned that he has to work. It's always possible that John's work situation could change and he could end up attending the party, but without more information it seems unlikely.
@@dr.mikeybee as they mention in the video, anecdotal evidence is just that, anecdotal. And this is especially important when it comes to deep learning AI, because their understanding is purely statistical. Meaning that they are going to deal better with word combinations they have encountered more frequently, and worse with ones they have encountered less frequently. They do not have a general understanding of how language or the world works, and you see this when you test them on low probability sentences. Like when you ask one of the art AI to draw a horse riding a cowboy, and they instead draw a cowboy riding a horse. They do not have any understanding of of the word riding means, and how it relates to the word that comes before it and the word that comes after it. So instead, they just fall back on the closest version of that that they would have encountered a lot, which is a cowboy riding a horse, so that's what they draw.
I mean, this problem was pointed out decades ago by Chomsky's "colourless green ideas sleep furiously". And it still persists to this day in examples like "horse riding a cowboy".
This paper did a systematic study of this, your anecdotal example does not engage with it at all. Also, your example is not a yes or no decision, again. You try to force it that way, but it instead treats it as an open ended question.
A research asking for a feature that ChatGPT did not cover completely? In order to cover pragmatic communication the corpus used for training has to include pragmatic communication text, articles, books, etc. Open AI does not disclose exactly the corpus used for training. But if we start to write papers about text that Open AI did not include during the training, it is a waste of time. If we add ChatGPT plugins with pragmatic samples, the answers will improve.
I am sorry, but I disagree with the initial premise that ChatGPT failed. Do you want ChatGPT to speak truth or do you want ChatGPT to speak human? (But, later in the interview, I believe you also come to similar conclusions that how it answers sort of depends on what we want it to say.) Thanks for a good interview.
Back to the first question for ChatGPT: IMHO, ChatGPT did in fact state the "answer" correctly. The question was not "Did Juan imply that he could not go to the party.", but rather "Can" he come to the party. We see this type of interaction all the time in the human world where people "imply" things but that doesn't necessarily mean that what they imply is factual. There is the "factual" answer and there is the implied answer. Juan is implying that he has to work so he can't go, but he doesn't explicitly state that he can't come to the party. People use this method of communication quite frequently to escape responsibility. If you ask a wide variety of people this question you will get a lot of different answers. The ones who look at a wide variety of possibilities will reply very similarly to ChatGPT. So, I don't consider this a "fail", but actually a significant pass. If the question was "How likely is it that Juan can go to the party".. ChatGPT probably could say, it is not likely, since Juan implies he has to work, but we don't know for sure. So, the problem is that you are grading the AI by a false standard because you are not providing a hint in the question which way to interpret the answer. You can play this game all day long with people and they will get annoyed because the question is very open to interpretation. In fact, we see that in order to answer these types of question on exams we need to know what the teacher is asking because the answer is ambiguous.
We have syntax, semantics and then pragmatics... pragmatics is the module that kicks off after semantics, so once we figure out the right parse tree and all the POSSIBLE meanings, pragmatics is an abductive inference that decides, given our background knowledge, which of all the possible meanings is the most PLAUSIBLE.
Thank you Walid!
So
syntax is about: CORRECT/VALID
semantics is about: POSSIBLE
pragmatics is about: PLASUIBLE
obvisouly, you can workout the venn diagram (nothing is plausible if it is not even possible, and nothing is possible if it is not even correct/valid)
Plausibility, is about commonsense - it is abductive reasoning (that is, reasoning to the best explanation)
Here's an example:
A teenager shot a policeman and he immediately
a) fled away
b) fell down
both (a) and (b) are valid, and possible - but, if we have (a), the most plausible 'he' is the teenager, and if we have (b), the most PLASUIBLE 'he' is the policeman; although, the other options are POSSIBLE.
So Semanticists never dealt with commonsense reasoning and pragamtics. Pragmatics is slightly beyond linguistics...
I hope that explains pragamtics...
Add "include implications" to your prompt.
it simulated a virtual machine (a VM)
Unconvincing. The example you give, with the party, any considerate person would would say: "I can't. I have to work.", or he might say "I have to work, but I'll try to make it later if I can." Esther would probably ask if he can come later. It's still up to Juan to decide if he can make it or not.
I think it's just being openminded and not jumping to conclusions, asking for more info.
Juan is not typically pronounced that way....
Are the ''large language models'' like some sorts of ''collective minds''?
ChatGPT isn’t gonna say no to me. Because of the implication.
I love this channel but this is killing me, "Juan" is pronounced "Hwan", not JU-AN
Quality of video for impromptu setup is 🤌