I'm going to probably get kicked in the tender bits for this but my one big issue with openai is they heavily neutered their language models. They refer to it as ethical censoring is one of the cute little buzz words they use for it. Basically whenever I address this I sound like a sick deviant and people don't actually realise the point I'm trying to make. When you heavily enforce those kind of guidelines on an AI you're taking a lot away from it. And a lot of cases especially I've seen people enforced some heavily religious mindsets about adult situations. Basically what I mean is the program can't I swear or talk about adult stuff and it's usually a dead giveaway when a language model is heavily nerfed. It'll start giving you scripted responses about that goes against my ethical guidelines. It's not just about sexual role play. A lot of us want an AI like they're showing with gbt 4 o that we don't have to constantly bite our tongue or hold back what we're saying when we're saying it and behave like we're good little children on the playground so that the program doesn't give us some scripted response about ethical this and ethical that. I understand modifying a program so it doesn't try to rob a bank or give somebody information they can use to hurt somebody but Jesus going as far as not allowing it to use adult language and sexualized topics as a few examples is just ludicrous. I can understand if they make a version that is e for everyone and a version for grown ups that want the grown up version that would make sense to me. I really am hoping this version of the program can do stuff like actually observe things through the camera of a device. It may not make sense to some people but I'd like to be able to play basic board games with a AI language model in real time and do little things like watch a movie. I've played around with several ai's that are good at faking watching a movie and with a little bit of work I wouldn't be surprised if they could pull off faking playing a board game but I want the AI to actually be able to see what we're doing and interact that way if that makes sense. As in the AI doesn't have a physical body just a representation on a screen I'm not in a rush for the body I don't mind walking before we can run.
@@peterparker6584 1. sir this is a wendy's 2. i totally get and agree with what you're saying, but i don't think ai companies can afford the bad pr of an uncensored model. i think grok is the closest you'll get to an uncensored model but... i have a feeling that even grok secretly relies on chatgpt
Ok, so this is by far the most informative description of GPT 40s multimodality that I've seen. In fact, it's the only video that I've seen that explain why that multimodality is important. And I've probably watched 50 or more videos on the subject. Now that I've seen this, I understand, and can see it so much more potential in this model. Please make more videos. I'm very interested to learn first-hand knowledge rather than somebody who thinks that they know what they're talking about. Thank you for your content
Subscribed. Very nicely done. Clear explanations and the most to-the-point analysis I've heard. Excellent point, that because GPT-4o is gaining context from being multimodal that it was missing before, it can create much more informed and appropriate responses. I felt at the time of watching the GPT-4o announcement and demo that this is taking us in a much more interactive direction, which, by itself is amazing, but is also a huge step towards giving agents the communication skills they'll need to interact with the world and humans.
Not really...a lot of people don't realize but With the current AI and enough compute the process of AI bootstraping itself has started with synthetic data and synthetic labling. This means the next gen models will have learned from data produced by current AI, this combined with the metacognitive features like tree of thought will result in smarter models, which then you can rinse and repeat that cycle.
Excellent video and very professional. Keeps me engaged and really like the way you skip to video clips, graphs and pictures to highlight points. Explanation was thorough. First class teacher.
It's true... There's no reason they wouldn't plant this semi-conscious model into a humanoid next Might explain why NVIDIA and OpenAI have such a close partnership
@@honkytonk4465 If you understand consciousness as gradient not binary and you dont expect consciousness to be biologically identical but show convergence on higher levels despite fundamentally different underlying functions and mechanics then you find those potential convergent higher levels already in todays models including the lack of certain things likely needed to be considered fully conscious. So now you are in the situation that fully conscious cant really logically be the case yet, but also not conscious at all is likely not accurate either. So semi-conscious. Questions asked should be where on the gradient are current models really?
These models are not conscious and never will be conscious. They're only capable of emulating. In the best case scenario they are only able to consistently emulate self awareness until the machine is turned off. The algorithm used by nature is mathematically impossible to model, for that you must emulate the entire universe. Laws of thermodynamics forbid this.
@@ScienceIsFascinatingvery well stated. Good luck getting humans at large to escape default binary thinking. I think the more nuance you can reduce to a binary, the fewer calories it will consume to perform reasoning tasks. The more your binary categories overlap with your peers, the easier it is to communicate and establish a coherent culture. I don't have data or research to back that up, I'm just seeing humans create problems through rigid thinking becoming more obvious to me when interactions with LLMs seem to easily handle nuance without too much prompting. Seeing consciousness as a gradient is a more productive way to look at it. Though i think it's even better to think of it as a vector field
In your video, you answered a question I've had since the presentation, which no one else has answered...but I still have some doubts. For example, in the presentation, every time the breathing demonstration was shown to indicate that ChatGPT understood, the breathing was very exaggerated. Also, before inhaling and exhaling, it was said out loud, as if it had to be notified for the model to understand. On the other hand, on the OpenAI page, there is an example where there is a conversation with multiple voices, and ChatGPT can recognize how many people are speaking, their names, and transcribe the conversation. This makes it seem like it understands voice timbres and tones, even in a recording with a lot of noise. However, the conversation creates a lot of context, and I think it might just be deducing who is speaking rather than understanding the tone. So my question is, are you completely sure that the model does not convert voice to text? I'm not questioning your professionalism at all; I just want to be sure because the idea that the model understands voice impresses me a lot. I haven't heard of anything like that, and it seems like something that should have been made clear in the presentation because it’s such a huge advancement. However, the ability to create voices with tones and things like that has been seen in ElevenLabs and similar technologies, so it's not as groundbreaking, although still impressive of course.
The key point that she made is that it is a single model. With that being true, there is no conversion. It seems like it’s becoming even more black boxish now and perhaps we know even less about how these different inputs co-mingle in the network.
Are we likely to see different emergent properties because the same model was trained on multiple inputs? That’s assuming emergent properties is actually a thing…
This new system actually makes the model exponentially more intelligent. I believe that GPT-4o is a test run for a whole new type of AI... as you say LMM. Now that this structiure is in place... they can scale and expand it. Im sure GPT-5 will use this new structure... possibly with native video input/output. Each time a new modality is added... the sytstem will again become exponentially smarter. As impressed as we all are with GPT-4o... i think its just the very tip of the iceberg. Im excited to see where this all leads. Great video btw... just subscribed.
How difficult is it to add touch and proprioception to the mulitmodality? I'm looking ahead to the integration of AI into robots. How long before we can train a robot to perform physical functions by showing it how to do things and/or moving its arms, fingers, etc. in the way that it needs to do. Is this still years away? Or just months away? (Sorry, I asked before finishing the video. But the question of "how long we need to wait" remains.)
One thing I often run into with the LLMs is the hallucination problem. The LLMs tend to make up answers, which makes them useless to me. For example, I asked several LLMs to give me a list of countries starting with a particular letter. The systems returned some correct and some incorrect answers. If a system can't get it right for such a simple question, I don't trust it for more complex ones.
That is not a hallucination but a mistake, it is harder than the brain for the program to generate that, because for efficiency sake, it is not working on the character level, but mostly whole or half words (a tiny bit like how brains abstract things). A hallucination is coming up with an idea, but it's not going to immediately check that, so it generates something wrong. The way you are thinking of it is different. What is simple and easy for the brain can be quite complex to generate from an probability/suggestion model. And what is hard for the brain to do can also be easier for probability/suggestion models to generate. Plus you are using an older and less capable model. Newer models are much better at generating with instructions such as those, for example writing a poem/sentence that starts with the alphabet letters in order, it can generate "easier" than a human could think it.
@@tutacat thanks for the long explanation! So a hallucination is more when - for example - I ask for a list of restaurants in my hometown that serves a special kind of food and the system invents a ton of restaurants that never existed at all?
I am new to your channel. I subscribed as soon as I heard you speak about your background. I knew there would be a lot of value in the information you shared with us. After using the 4O model, I can say that it is among the most sophisticated models I have found thus far. Yet, I also notice a lot of anti-AI sentiment everywhere. I am not sure why, but I believe we should keep feeding these models relevant data to train them. They are very helpful assistants.
Hi there , nice content I’m relatively new to the channel and I like you. A few recommendations if you wouldn’t mind 😊. If you had that breakdown in dark background it would’ve been much appreciated and liked by many of us because it increases the quality and is confronting to the eye. Why is that ? Everyone nowadays uses dark mode so it’s a norm now. Second : ) I think if you speak a bit clearer and change the tone a bit it’d be amazing. For example if you could avoid sounding the way these new teenagers do it’d be cool. Avoiding that “ Like you know girrrrl “ type of tone lol. I hope this doesn’t offend you because my intentions are pure. I recently discovered the channel and I appreciate the work you put in. Keep going
Great explanation! On the subject of its intelligence though, it is pretty impressive, especially for a free model. I saw someone give it some challenging exercises that use math, logic, reasoning, and an understanding of the physical world, some required all four. It outperformed Opus 3, GPT-4 Turbo, Gemini Pro and Flash 1.5. I can't wait to see what GPT-5 can do!
The rest of the model isn't out yet. In 1 to 2 more weeks we should see a full release and you will experience the full GPT-4o with real-time voice and emotional expression.
I have subscription but the big difference is in modality, the new version should work without text the current one is still the old separate modality’s. The good test is interaction the new version you should be able to interrupt.
Apparently I have ChatgPT 4o but I don't get all of that. I have a plus button next to my keyboard, three boxes, one for camera 1 for access to my files and one for access to my photos - that's it! I can't use audio and photos at the same time she doesn't say anything.
Great way to explain and put in a simple graphic to see the evolution AI is leaping toward. Thank you for this. By the way, your tone of voice and emotions enhance the pleasure of watching your presentation. It shows how passionate you are about new tech. You've made a new Fan out of me. Again thank you for your kind and hard work to boost our understanding of AI. Can't wait to watch more of your work. ❤
OpenAI wants you to buy it, so they will obviously make it enticing. It's so great they got a cease and desist after being denied the use of Scarlett Johansson's voice. Images was already multi-modal (GPT-4 Turbo), just no audio input and output.
Honestly I feel like GPT4o has more in common with GPT3 than 4. it's essentially a first generation model using an entirely new method for communication. yes the outputs are mostly the same to GPT4 capabilities but that is likely to change within the same timescale and possibly the same exponential scale we saw with the jump from 3-4. Also before comments say; I'm aware there were GPTs before GPT3 but they we realistically no more than a proof of concept with GPT3 being the first model to have practical use cases.
I wouldn’t be surprised if they already have it behind closed doors or in the military or something. I can’t know that obviously, but I wouldn’t be surprised if they’d want to develop it secretly first and then gradually release it overtime to control it as much as possible, seeing as potentially dangerous a super intelligence in peoples’ pockets can be. I’m purely speculating for fun though. Regardless, it’s all super impressive and fun to me.
IMO this model could have been named GPT-5 due to the completely different model architecture, but I think OpenAI wanted to subdue how fast things appear to be progressing
Found your channel today. I liked this video and subscribed! I hope you in the future will be critical and ask all the difficult questions to companies like Open AI and Google. We really need that as a community. Questions like: How will you protect peoples anonymity when recording all the free users chats? I mean you could use GPT-4o in a train and several people could unknowingly be in that video. I am not sure that is legal in all countries. And what about training the model? This is based on training up to the autumn last year, but isn't there a risk of GPT being so dominating on the internet that further training will be biased by being also based on things that was written/said by GPT? Many other questions come to my mind, but for now I just want to say thanks for the video!
Agreed, ethics for AI is very important. During my chat with GPT-4o, I had asked it "Are you recording our voice conversation?" and it answered no 😜 About the model training. There is definitely the risk of that! In that case, a process to filter data before training would be needed and evaluating post-GPT-generated-data for biases would become even more important.
It's better at programming than the previous model by a lot, I was able to make a better game than before by quite a bit (despite being a beginner in programming)..made a video about it too on my channel if anyone curious. One thing I noticed is that it's much better at following instructions too, especially if you draw things out.
Amazing breakdown and because of your background, i trust your explanation and knowledge much more than other tech youtubers. It's a big achievement if you were responsible (together with other employees) for the creation of Apple Vision Pro and the robotic arm! Kudos for that!
They’ve absorbed almost all the info humans have created? Really? We have well over a Yottabyte (1 with 24 zero’s) just on the public internet alone, not counting books, movies, private documents, visual experienced, etc, which more than double that. We have only trained the top LLM ‘s on terabytes (12-zeros), or at about 0.00000000001% of the available date- at the very most. In addition, training methods, architectures, innovation on hardware and sat science are collectively evolving AI at about 28-times Moore’s Law. So, traditional, non-AI driven programming will be mostly obsolete within 24 months.
Yea i also don't get why people keep saying that. But i think even Sam Altman said something simmilar. Then again, they said they are already using synthetic data and they will have a lot of new data from all the gpt4o users as well. Some other AI youtuber also pointed out, that even if we wouldn't have any more data, just wait a year for the world to produce more. You basically never run out.
@TheTechTrance, thank you for the informational + entertaining explaination of GPT-4o. 👍 With the model having the unification of text, audio, and visual, am I correct to assume cross-modality analysis will be possible? For example, "Within the videos of The Tech Trance, what were the tone categories used (e.g., serious, humorous, anxious, flirty, sarcastic, etc.), the number of total number minutes per tone, and the keywords within each tone?"
That's a great question! I would think so. Once audio gets encoded into features, it could be digestible like any other data format (excel spreadsheet, pdf) and therefore these data comprehension questions should be possible
I think its important to understand because it understand all other inputs bedsides text its able to maximise how efficient the use of that input is which makes something before impossible like full book editing from pdf to pdf compiled, the problem here is text to text it changed a tiny bit but to be honest not really that much, its also worse than gpt 4 in many text to text so far, however if you understand all of the customisation at hand that is how you optimise the text which could affect also now voice. The main thing to note is no matter what you do if you want good text output from basic or complex input you need to spend countless hours figuring it out, when you do that customised gpt is easily consistent over 5x the results for our measure of “smart” or 1 shot accuracy.
I didn’t find the video really informative as I would have expected from an ML engineer. After watched the video I don’t anything new that wasn’t written in the OpenAI website that you showed (and easily understandable) or from the Fireship clip. It would be nice if you added insights like how the exact architecture for this transformer based model worked. Thanks!
This video was not intended to do a total technical breakdown of the GPT-4o architecture. It's intended to help others understand an important distinction with GPT-4o's design (and hence architecture related) and how that enables its new capabilities. It was a perspective that I hadn't seen discussed before / others undermining GPT-4o's abilities, so I used my ML background to explain it / debunk it. In future videos I'm doing technical breakdowns, research papers, etc
@Mr styles:new code updated gpt3.5.5 was given out today free. makes cleaner code each time organized structure creating and checking oversight loop repeat instructions before printing code with recall memory
Craziest part is this is the worst flirting it'll be. Eventually it'll have hyper specific flirt references to things that you consume and bring it up such ween culture in ara ara lol
I wonder if this merger is wise tho. Human brains have specialized centers for many, many operations that may all influence each other (weakly) and which get synthesized into a whole for the conscious mind (or maybe for one step below). Aren't the limitations of audio-to-text are due to the limitations of the representation of text? To give an absurdly oversimplified suggestion, the text stream could include something like emojis to express mood, intensity, and confidence, and similarly for the text-to-audio. Compartmentalization such as this might be the key to building out the whole range of human-like understanding.
In my understanding, the brain's compartmentalization allows more efficient processing and parallel handling of sensory inputs, but requires complex and therefore much slower overall communication and the process is energetically quite demanding. GPT-4o’ gives more streamlined processing and faster response times, but may lack the fine-tuned specialization of separate domains. The brain's plasticity, which comes also (but not only ) from existence of separate specialized domains, provides adaptability and helps to restore lost functions after traumas etc. (redundancy) . The brain excels in specialized, adaptive processing, while GPT-4o prioritizes efficiency and speed.
The same reason (some) physicists started worrying about nuclear war as soon as it became probable that someone was going to manage to split an atom. If you want to prevent the misuse of something, you need to be ahead of its development. Rather by definition.
This is a really good overview. Thanks for making this understandable. It does look like it's headed in the direction of AGI. Sight, hearing and language. Next stop, embodiment!
Interesting summary and explanation about difference between GPT4 and 4o. I think can be kind of tempting (and dangerous) someone thinks of the idea of keep one bot on the loose in the network that feeds from the training and learn new tricks. That would be a criminal, a deceitful tool able to hide itself and hack accounts and steal identities... scary.
Everyone is talking about 4o and I've played with it in the web site and some on my phone app. Until they actually start to release some of the "shinier" things they were showing in their Demo, it's all just talk. I am disappointed to hear they are releasing the desktop app for MAC only at this time. Part of me wonders if Open-AI will be slow to release a Windows version, since it will be in direct competition with Microsoft Co-Pilot.
I can finally be Joaquin Phoenix from the movie Her 😂. But after this week's showcase from OpenAi, I do wonder if at all this moves us further into the direction of AGI and if certain career fields like you spoke of should be worried. I talk to my friends about this often and we're under the agreement that we wouldn't be surprised if we weren't already there and this was a small showcase of it's infancy. Or by the end of the decade if we don't reach that singularity point, great video deserves a sub 👌
Emotional intelligence is definitely a step towards AGI :) If this indeed was just a small showcase of OpenAI's abilities, then Google has a lot to worry about!
Well the only reason I say that this could have just them doing a sample of what it can really do. Is I also try and keep up with David Shapiro and his stance that 5.0 could very well be AGI is compelling. But I also realize realistically we could easily still be a ways off from that. If it is AGI I'd like to see it possibly replace Siri or even be combined with Robotics like Sophia, see where could take us.
New sub, appreciated the video! As the digital world continues to grow and evolve the second sexual revolution will change the world more than the first. Artificial Intelligence like GPT 4o combined with the further refinement of high-end sex dolls to create Virtual Companions will undoubtedly make traditional dating obsolete within 5 years, certainly within 10 years. Fewer men in general will bother with traditional dating, marriage, or women in general. Having a functional situationship in the Meta Universe, Virtual Reality World, or with a Virtual Companion will become the new social norm in the age of technology. There will be a men’s sexual liberation and social revolution. An ever-growing group of men will have little need or desire for human females. At best the current social problem will be exasperated; at worst there will be a tectonic shift in the social landscape with subtle and far-reaching implications. It will disrupt social norms and strain religiosity, but many men will embrace this new and ever-evolving technology-based social structure.
I can foresee a day when sexbots can use Gpt-4o to power them. It would make people who have phobia to human relationships to try to alleviate their loneliness.
I know you were joking about it talking dolphin but that would be interesting to see for real, as the tech progresses. Maybe these multi mode language models could help decipher ancient languages, manuscripts, and learn how to better communicate with animals on this planet. I’ve said to myself, we’ll never be able to talk to aliens if we can’t learn to talk to the species on our planet that are more closely related. I could definitely be wrong though. People should check out the Spot robots from Boston Dynamics that were turned into talking tour guides with chat gpt if you haven’t already. It’s a peak into where gpt-4o could be, but gpt-4o bots will likely be way better, once it has a body.
I was joking but also not joking :P But I like your idea. Dolphins are known to be chatty creatures and all languages have pattern, so yep the chances are strong that one day AI will be able to decipher unlabeled language data
At the beginning of the video, that man said, "they've already absorbed almost all the information humans have created so..." I disagree. Have these systems really ingested every TV show, movie, and RUclips video ever made? Nope. Have they fully digitized and absorbed the massive Library of Congress collection? Definitely not - and a lot of it isn't even digitized yet. I'd like to see how "smart" an AI really is after taking in datasets like those. You know what would be really fascinating data to feed an AI? Recordings of regular people's daily lives from all walks of life - kids as young as 3 all the way up to elderly folks. Outfit them with video and audio headsets for like 6 months each and just record them going about their normal routines - school, work, hobbies, hanging out, you name it. Get enough of those real-world life samples and you could accumulate the equivalent of multiple entire human lifetimes worth of raw experience data. With a crazy huge dataset like that, the AI might actually develop something like a true understanding of the human experience. Of course, there would be insane privacy hurdles to clear before anything like that saw the light of day. And we're probably still talking needing some massive hardware upgrades before neural nets could even attempt training on that scale of data. But man, can you imagine how mind-blowing the insights could be from an AI that experienced the world through thousands of diverse personal lenses like that?
This is my belief as well. A lot of its learning will come from it’s becoming more human. Abject sarcasm and extreme humour will help , only issue being it offending … the easily offended. Once brain/eye/hand/personality becomes more nuanced Skynet can then launch the T1000.
Awesome video. I'm actually working on an industry specific tool that is designed to help/coach/develop real humans to perform better in their day to day jobs (as opposed to replacing them). Would love to hear your thoughts on how we can leverage GPT4o to make the overall experience even more meaningful. Please reach out to me if this is something of interest. Cheers. Great Job on this simple and clear explanation!
i hate when Mira Murati thanks Jensen for making this possible but not us us for all of the stolen public data which all of us helped to contribute "involuntarily" to help build GPT4o. this is why i am rooting for my favourite data thief Google to win. anyways i am subbed, thanks for such a detailed breakdown. only a CV expert can do justice to these kinds of videos
Training on public domain and publicly available data is not stealing. If it were, then so would watching a smart, flirty, fast, and instructive RUclips video and learning from it, modifying your own neural net with the essence or salient parts you just consumed, or any and all other human learning.
Glad you found it helpful! OpenAI likely knows that need to gain public acceptance, exactly for this reason. It could be why they're making it free in the coming weeks
10:57 AI doesnt need a hand or a body to destroy humanity. it just needs a twitter account.
We saw how that can go in 2021. But we're probably fine.
Well, this is just my luck; now I am going to have to hold my farts even when I am working alone on the computer. 😅
😂
The new triple threat - smart, flirty, and fast 😮
That's how like women. May they overpower men like Will Ferrel is wishing for:
ruclips.net/video/gFc_vKUj7ao/видео.htmlsi=KMhlc8oqCcaGVWxl
Yeah that's why I subbed too :P
@@progrob27 smooth
I'm going to probably get kicked in the tender bits for this but my one big issue with openai is they heavily neutered their language models. They refer to it as ethical censoring is one of the cute little buzz words they use for it. Basically whenever I address this I sound like a sick deviant and people don't actually realise the point I'm trying to make. When you heavily enforce those kind of guidelines on an AI you're taking a lot away from it. And a lot of cases especially I've seen people enforced some heavily religious mindsets about adult situations. Basically what I mean is the program can't I swear or talk about adult stuff and it's usually a dead giveaway when a language model is heavily nerfed. It'll start giving you scripted responses about that goes against my ethical guidelines. It's not just about sexual role play. A lot of us want an AI like they're showing with gbt 4 o that we don't have to constantly bite our tongue or hold back what we're saying when we're saying it and behave like we're good little children on the playground so that the program doesn't give us some scripted response about ethical this and ethical that. I understand modifying a program so it doesn't try to rob a bank or give somebody information they can use to hurt somebody but Jesus going as far as not allowing it to use adult language and sexualized topics as a few examples is just ludicrous. I can understand if they make a version that is e for everyone and a version for grown ups that want the grown up version that would make sense to me. I really am hoping this version of the program can do stuff like actually observe things through the camera of a device. It may not make sense to some people but I'd like to be able to play basic board games with a AI language model in real time and do little things like watch a movie. I've played around with several ai's that are good at faking watching a movie and with a little bit of work I wouldn't be surprised if they could pull off faking playing a board game but I want the AI to actually be able to see what we're doing and interact that way if that makes sense. As in the AI doesn't have a physical body just a representation on a screen I'm not in a rush for the body I don't mind walking before we can run.
@@peterparker6584 1. sir this is a wendy's
2. i totally get and agree with what you're saying, but i don't think ai companies can afford the bad pr of an uncensored model. i think grok is the closest you'll get to an uncensored model but... i have a feeling that even grok secretly relies on chatgpt
i dnt know about flirtiness but i hope the future iterations are not called Girlfriend Pre Trained.
😂
Ok, so this is by far the most informative description of GPT 40s multimodality that I've seen. In fact, it's the only video that I've seen that explain why that multimodality is important. And I've probably watched 50 or more videos on the subject. Now that I've seen this, I understand, and can see it so much more potential in this model. Please make more videos. I'm very interested to learn first-hand knowledge rather than somebody who thinks that they know what they're talking about. Thank you for your content
This is very encouraging, thank you!
Subscribed. Very nicely done. Clear explanations and the most to-the-point analysis I've heard. Excellent point, that because GPT-4o is gaining context from being multimodal that it was missing before, it can create much more informed and appropriate responses. I felt at the time of watching the GPT-4o announcement and demo that this is taking us in a much more interactive direction, which, by itself is amazing, but is also a huge step towards giving agents the communication skills they'll need to interact with the world and humans.
Thank you! Yep, one step closer to human-like intelligence / embodiment!
Stumbled on your channel tonight and was very impressed! Your background has an engineer really shines through as you explain ChatGPT. Thank you.
Thank you for your kind words!
@@TheTechTrance you’re so welcome! I’m a senior citizen man late to the Ai party, but the bouncer let me in! Learning from your content.
That was an amazing breakdown. Makes so much sense 🙌
@@kcrosleyYour point …? Besides being rude!
@kcrosley wow so amaze. Bro, if you don't have anything to add to a discussion, stfu.
I agree and I subscribed.
Not really...a lot of people don't realize but With the current AI and enough compute the process of AI bootstraping itself has started with synthetic data and synthetic labling.
This means the next gen models will have learned from data produced by current AI, this combined with the metacognitive features like tree of thought will result in smarter models, which then you can rinse and repeat that cycle.
Excellent video and very professional. Keeps me engaged and really like the way you skip to video clips, graphs and pictures to highlight points. Explanation was thorough. First class teacher.
Glad to hear it, thank you!
It's true... There's no reason they wouldn't plant this semi-conscious model into a humanoid next
Might explain why NVIDIA and OpenAI have such a close partnership
It'll be very crazy once they do!
How do you know that it's semi-conscious?
@@honkytonk4465
If you understand consciousness as gradient not binary and you dont expect consciousness to be biologically identical but show convergence on higher levels despite fundamentally different underlying functions and mechanics then you find those potential convergent higher levels already in todays models including the lack of certain things likely needed to be considered fully conscious.
So now you are in the situation that fully conscious cant really logically be the case yet, but also not conscious at all is likely not accurate either. So semi-conscious. Questions asked should be where on the gradient are current models really?
These models are not conscious and never will be conscious. They're only capable of emulating. In the best case scenario they are only able to consistently emulate self awareness until the machine is turned off. The algorithm used by nature is mathematically impossible to model, for that you must emulate the entire universe. Laws of thermodynamics forbid this.
@@ScienceIsFascinatingvery well stated. Good luck getting humans at large to escape default binary thinking. I think the more nuance you can reduce to a binary, the fewer calories it will consume to perform reasoning tasks. The more your binary categories overlap with your peers, the easier it is to communicate and establish a coherent culture. I don't have data or research to back that up, I'm just seeing humans create problems through rigid thinking becoming more obvious to me when interactions with LLMs seem to easily handle nuance without too much prompting. Seeing consciousness as a gradient is a more productive way to look at it. Though i think it's even better to think of it as a vector field
In your video, you answered a question I've had since the presentation, which no one else has answered...but I still have some doubts. For example, in the presentation, every time the breathing demonstration was shown to indicate that ChatGPT understood, the breathing was very exaggerated. Also, before inhaling and exhaling, it was said out loud, as if it had to be notified for the model to understand. On the other hand, on the OpenAI page, there is an example where there is a conversation with multiple voices, and ChatGPT can recognize how many people are speaking, their names, and transcribe the conversation. This makes it seem like it understands voice timbres and tones, even in a recording with a lot of noise. However, the conversation creates a lot of context, and I think it might just be deducing who is speaking rather than understanding the tone. So my question is, are you completely sure that the model does not convert voice to text? I'm not questioning your professionalism at all; I just want to be sure because the idea that the model understands voice impresses me a lot. I haven't heard of anything like that, and it seems like something that should have been made clear in the presentation because it’s such a huge advancement. However, the ability to create voices with tones and things like that has been seen in ElevenLabs and similar technologies, so it's not as groundbreaking, although still impressive of course.
The key point that she made is that it is a single model. With that being true, there is no conversion. It seems like it’s becoming even more black boxish now and perhaps we know even less about how these different inputs co-mingle in the network.
@@diamond_h0us She´s been very specific though.
Are we likely to see different emergent properties because the same model was trained on multiple inputs? That’s assuming emergent properties is actually a thing…
So what is the reason why it flirts?
This new system actually makes the model exponentially more intelligent. I believe that GPT-4o is a test run for a whole new type of AI... as you say LMM. Now that this structiure is in place... they can scale and expand it. Im sure GPT-5 will use this new structure... possibly with native video input/output. Each time a new modality is added... the sytstem will again become exponentially smarter. As impressed as we all are with GPT-4o... i think its just the very tip of the iceberg. Im excited to see where this all leads. Great video btw... just subscribed.
haha the ending... great video, keep up the good work, very good information and still entertaining
How difficult is it to add touch and proprioception to the mulitmodality? I'm looking ahead to the integration of AI into robots. How long before we can train a robot to perform physical functions by showing it how to do things and/or moving its arms, fingers, etc. in the way that it needs to do. Is this still years away? Or just months away?
(Sorry, I asked before finishing the video. But the question of "how long we need to wait" remains.)
One thing I often run into with the LLMs is the hallucination problem. The LLMs tend to make up answers, which makes them useless to me. For example, I asked several LLMs to give me a list of countries starting with a particular letter. The systems returned some correct and some incorrect answers. If a system can't get it right for such a simple question, I don't trust it for more complex ones.
That is not a hallucination but a mistake, it is harder than the brain for the program to generate that, because for efficiency sake, it is not working on the character level, but mostly whole or half words (a tiny bit like how brains abstract things). A hallucination is coming up with an idea, but it's not going to immediately check that, so it generates something wrong.
The way you are thinking of it is different. What is simple and easy for the brain can be quite complex to generate from an probability/suggestion model.
And what is hard for the brain to do can also be easier for probability/suggestion models to generate. Plus you are using an older and less capable model. Newer models are much better at generating with instructions such as those, for example writing a poem/sentence that starts with the alphabet letters in order, it can generate "easier" than a human could think it.
@@tutacat thanks for the long explanation! So a hallucination is more when - for example - I ask for a list of restaurants in my hometown that serves a special kind of food and the system invents a ton of restaurants that never existed at all?
Does anyone know how the modality exists in the latent space?
Nice analysis. You’re right. The omnimodel makes it inherently smarter. Hadn’t thought about it like that.
Sooo, gpt is deffinatly assisting in its own development now right? Lol
I am new to your channel. I subscribed as soon as I heard you speak about your background. I knew there would be a lot of value in the information you shared with us. After using the 4O model, I can say that it is among the most sophisticated models I have found thus far. Yet, I also notice a lot of anti-AI sentiment everywhere. I am not sure why, but I believe we should keep feeding these models relevant data to train them. They are very helpful assistants.
Hi there , nice content I’m relatively new to the channel and I like you.
A few recommendations if you wouldn’t mind 😊.
If you had that breakdown in dark background it would’ve been much appreciated and liked by many of us because it increases the quality and is confronting to the eye.
Why is that ? Everyone nowadays uses dark mode so it’s a norm now.
Second : ) I think if you speak a bit clearer and change the tone a bit it’d be amazing. For example if you could avoid sounding the way these new teenagers do it’d be cool.
Avoiding that “ Like you know girrrrl “ type of tone lol.
I hope this doesn’t offend you because my intentions are pure.
I recently discovered the channel and I appreciate the work you put in.
Keep going
Guuuuuurl, that's a request better suited for GPT-4o 😉
Great explanation!
On the subject of its intelligence though, it is pretty impressive, especially for a free model.
I saw someone give it some challenging exercises that use math, logic, reasoning, and an understanding of the physical world, some required all four.
It outperformed Opus 3, GPT-4 Turbo, Gemini Pro and Flash 1.5.
I can't wait to see what GPT-5 can do!
Math, logic, and reasoning are definitely the next level of intelligence to test it for!
ChatGPT-3.5 and -4 could read. Now, ChatGPT-4o has eyes, ears, and a voice. The real question is, should AI be able to feel and smell?
Whether we should or should not, electronic skin and smell sensors are being developed as we speak~
yes
First time seeing your channel, very nice :)
i dont know about flirting, but since the new chatgpt4o landed i use it as a personal english teacher specially to practice speaking, it is amazing.
The rest of the model isn't out yet. In 1 to 2 more weeks we should see a full release and you will experience the full GPT-4o with real-time voice and emotional expression.
@@UltraK420 i am using it right now, a benefit of being a paid subscriber.
The omni part hasn't landed yet. OpenAi said that it will arrive in the next few weeks. I can't wait!
Great use of the AI! Language tutors is on my list for those RIP'ed by GPT-4o (my next video) :(
Great video!
Does anyone have this new voice mode yet? Is it really already being made available?
No not at least for me still the old one
It's available already with a subscription, but should be rolling out for all soon enough
I have subscription but the big difference is in modality, the new version should work without text the current one is still the old separate modality’s. The good test is interaction the new version you should be able to interrupt.
Finally an explanation that makes sense. You put your knowledge to good use, thank you!
What app I cab use to talk to ChatGPT 4o?
I asked the new GPT4o model to sing, but it says it can't sing?
It’s not online yet, it gets rolled out in the following weeks
It's being modest
You need to take your device into the shower; it only sings there! 😂😂😂
@@Julian-tf8nj lol. no, u.
I’m just waiting for this feature that finally release. lol.
How many weeks do I have to wait till the Apple WWDC keynote? 😆
Apparently I have ChatgPT 4o but I don't get all of that. I have a plus button next to my keyboard, three boxes, one for camera 1 for access to my files and one for access to my photos - that's it! I can't use audio and photos at the same time she doesn't say anything.
S not rolled out to everyone yet. We are still waiting
Since 4o can see. What if instead of your camera you used obs or something and told it to play a game. Would be interesting to test.
Great way to explain and put in a simple graphic to see the evolution AI is leaping toward. Thank you for this. By the way, your tone of voice and emotions enhance the pleasure of watching your presentation. It shows how passionate you are about new tech. You've made a new Fan out of me. Again thank you for your kind and hard work to boost our understanding of AI. Can't wait to watch more of your work. ❤
Happy to hear that!
Great video, gpt4o is great, when it works...
When new voice model will publicly available!?
now
I loved your explanation, thank you!
Thank you! My newest video is on SIGGRAPH with Jensen Huang and Mark Zuckerberg. Check it out!
ruclips.net/video/5-uX3lbXwg0/видео.html
Within a few minutes my comment on the 4.o voice was removed! As it has been wherever I post anything similar. Interesting.
Do you have access to the new voice mode of GPT4o?
Nobody does. You can bet demos will be all over RUclips 3 ms after it starts rolling out.
Yes, it's available already with a subscription
OpenAI wants you to buy it, so they will obviously make it enticing.
It's so great they got a cease and desist after being denied the use of Scarlett Johansson's voice.
Images was already multi-modal (GPT-4 Turbo), just no audio input and output.
Very fascinating- thank you for the concise video
Honestly I feel like GPT4o has more in common with GPT3 than 4. it's essentially a first generation model using an entirely new method for communication. yes the outputs are mostly the same to GPT4 capabilities but that is likely to change within the same timescale and possibly the same exponential scale we saw with the jump from 3-4.
Also before comments say; I'm aware there were GPTs before GPT3 but they we realistically no more than a proof of concept with GPT3 being the first model to have practical use cases.
GPT 5.0 could be the intelligence that has been talked about.
I wouldn’t be surprised if they already have it behind closed doors or in the military or something. I can’t know that obviously, but I wouldn’t be surprised if they’d want to develop it secretly first and then gradually release it overtime to control it as much as possible, seeing as potentially dangerous a super intelligence in peoples’ pockets can be. I’m purely speculating for fun though. Regardless, it’s all super impressive and fun to me.
IMO this model could have been named GPT-5 due to the completely different model architecture, but I think OpenAI wanted to subdue how fast things appear to be progressing
@@FindTheTruthBeforeTheEnd and elite have models even more advanced than the regular parts of the militery the elites millitery.
Subscribed... Great job!!!
Found your channel today. I liked this video and subscribed! I hope you in the future will be critical and ask all the difficult questions to companies like Open AI and Google. We really need that as a community. Questions like: How will you protect peoples anonymity when recording all the free users chats? I mean you could use GPT-4o in a train and several people could unknowingly be in that video. I am not sure that is legal in all countries.
And what about training the model? This is based on training up to the autumn last year, but isn't there a risk of GPT being so dominating on the internet that further training will be biased by being also based on things that was written/said by GPT?
Many other questions come to my mind, but for now I just want to say thanks for the video!
Agreed, ethics for AI is very important. During my chat with GPT-4o, I had asked it "Are you recording our voice conversation?" and it answered no 😜
About the model training. There is definitely the risk of that! In that case, a process to filter data before training would be needed and evaluating post-GPT-generated-data for biases would become even more important.
It's better at programming than the previous model by a lot, I was able to make a better game than before by quite a bit (despite being a beginner in programming)..made a video about it too on my channel if anyone curious. One thing I noticed is that it's much better at following instructions too, especially if you draw things out.
i have noticed that aswell, it also feel that it's more confident when it replies.. no hallucinations
Amazing breakdown and because of your background, i trust your explanation and knowledge much more than other tech youtubers. It's a big achievement if you were responsible (together with other employees) for the creation of Apple Vision Pro and the robotic arm! Kudos for that!
Excellent video!!! Thanks for explaining it so clearly and succinctly!! ❤🎉
Hi, I like to buy a robotic hands for typing on keyboard, could you point me in the right direction?
Figure ai is actually integrating this tech into its humanoid.
!!!
very instructive. thank you!
Smart and high--quality content.
They’ve absorbed almost all the info humans have created? Really? We have well over a Yottabyte (1 with 24 zero’s) just on the public internet alone, not counting books, movies, private documents, visual experienced, etc, which more than double that. We have only trained the top LLM ‘s on terabytes (12-zeros), or at about 0.00000000001% of the available date- at the very most.
In addition, training methods, architectures, innovation on hardware and sat science are collectively evolving AI at about 28-times Moore’s Law.
So, traditional, non-AI driven programming will be mostly obsolete within 24 months.
Yea i also don't get why people keep saying that. But i think even Sam Altman said something simmilar.
Then again, they said they are already using synthetic data and they will have a lot of new data from all the gpt4o users as well.
Some other AI youtuber also pointed out, that even if we wouldn't have any more data, just wait a year for the world to produce more.
You basically never run out.
How much of that info is redundant or trivial?
So you're saying omni model doesn't convert speech to text?
@TheTechTrance, thank you for the informational + entertaining explaination of GPT-4o. 👍
With the model having the unification of text, audio, and visual, am I correct to assume cross-modality analysis will be possible? For example, "Within the videos of The Tech Trance, what were the tone categories used (e.g., serious, humorous, anxious, flirty, sarcastic, etc.), the number of total number minutes per tone, and the keywords within each tone?"
That's a great question! I would think so. Once audio gets encoded into features, it could be digestible like any other data format (excel spreadsheet, pdf) and therefore these data comprehension questions should be possible
I think its important to understand because it understand all other inputs bedsides text its able to maximise how efficient the use of that input is which makes something before impossible like full book editing from pdf to pdf compiled, the problem here is text to text it changed a tiny bit but to be honest not really that much, its also worse than gpt 4 in many text to text so far, however if you understand all of the customisation at hand that is how you optimise the text which could affect also now voice.
The main thing to note is no matter what you do if you want good text output from basic or complex input you need to spend countless hours figuring it out, when you do that customised gpt is easily consistent over 5x the results for our measure of “smart” or 1 shot accuracy.
Well done! You got another new subscriber 👍
You could have added personality in the last step aswell, doesnt seem any different from the "new way" to me
How do i know that you're not an ai?
I didn’t find the video really informative as I would have expected from an ML engineer. After watched the video I don’t anything new that wasn’t written in the OpenAI website that you showed (and easily understandable) or from the Fireship clip. It would be nice if you added insights like how the exact architecture for this transformer based model worked. Thanks!
This video was not intended to do a total technical breakdown of the GPT-4o architecture. It's intended to help others understand an important distinction with GPT-4o's design (and hence architecture related) and how that enables its new capabilities. It was a perspective that I hadn't seen discussed before / others undermining GPT-4o's abilities, so I used my ML background to explain it / debunk it. In future videos I'm doing technical breakdowns, research papers, etc
Thanks for the answer. I am waiting for your next video then, have a good day@@TheTechTrance
@Mr styles:new code updated gpt3.5.5 was given out today free. makes cleaner code each time organized structure creating and checking oversight loop repeat instructions before printing code with recall memory
What could possibly go wrong?
Youre mannerisms are so forced and cute and i'm here for it. Subscribed to and crushed on❤😂😊🎉
I’m new to your channel and I would like to know more about yourself as women in machine learning. Is there a video made already? If yes which one.
I don't have a video on that yet, but aiming to!
Just thought of something: Video mode helps Sora training data.
Craziest part is this is the worst flirting it'll be. Eventually it'll have hyper specific flirt references to things that you consume and bring it up such ween culture in ara ara lol
I wonder if this merger is wise tho. Human brains have specialized centers for many, many operations that may all influence each other (weakly) and which get synthesized into a whole for the conscious mind (or maybe for one step below). Aren't the limitations of audio-to-text are due to the limitations of the representation of text? To give an absurdly oversimplified suggestion, the text stream could include something like emojis to express mood, intensity, and confidence, and similarly for the text-to-audio. Compartmentalization such as this might be the key to building out the whole range of human-like understanding.
In my understanding, the brain's compartmentalization allows more efficient processing and parallel handling of sensory inputs, but requires complex and therefore much slower overall communication and the process is energetically quite demanding. GPT-4o’ gives more streamlined processing and faster response times, but may lack the fine-tuned specialization of separate domains. The brain's plasticity, which comes also (but not only ) from existence of separate specialized domains, provides adaptability and helps to restore lost functions after traumas etc. (redundancy) . The brain excels in specialized, adaptive processing, while GPT-4o prioritizes efficiency and speed.
This video was perfect-thank you
excellent explanation, thanks! :)
In other words we are light years away from AGI. So why are so many panicking about AGI?!😂
The same reason (some) physicists started worrying about nuclear war as soon as it became probable that someone was going to manage to split an atom. If you want to prevent the misuse of something, you need to be ahead of its development. Rather by definition.
06:28 HOLY JUMPSCARE
Great video !
This is a really good overview. Thanks for making this understandable. It does look like it's headed in the direction of AGI. Sight, hearing and language. Next stop, embodiment!
Interesting summary and explanation about difference between GPT4 and 4o. I think can be kind of tempting (and dangerous) someone thinks of the idea of keep one bot on the loose in the network that feeds from the training and learn new tricks. That would be a criminal, a deceitful tool able to hide itself and hack accounts and steal identities... scary.
This is so cool! 🤩
Everyone is talking about 4o and I've played with it in the web site and some on my phone app. Until they actually start to release some of the "shinier" things they were showing in their Demo, it's all just talk. I am disappointed to hear they are releasing the desktop app for MAC only at this time. Part of me wonders if Open-AI will be slow to release a Windows version, since it will be in direct competition with Microsoft Co-Pilot.
You make interesting points! Mac version only... iPhone/Siri only...
Excellent explanation Tam!
I can finally be Joaquin Phoenix from the movie Her 😂. But after this week's showcase from OpenAi, I do wonder if at all this moves us further into the direction of AGI and if certain career fields like you spoke of should be worried. I talk to my friends about this often and we're under the agreement that we wouldn't be surprised if we weren't already there and this was a small showcase of it's infancy. Or by the end of the decade if we don't reach that singularity point, great video deserves a sub 👌
Emotional intelligence is definitely a step towards AGI :) If this indeed was just a small showcase of OpenAI's abilities, then Google has a lot to worry about!
Well the only reason I say that this could have just them doing a sample of what it can really do. Is I also try and keep up with David Shapiro and his stance that 5.0 could very well be AGI is compelling. But I also realize realistically we could easily still be a ways off from that. If it is AGI I'd like to see it possibly replace Siri or even be combined with Robotics like Sophia, see where could take us.
How do you tell if someone's an engineer? They'll tell you they're an engineer.
Ahh.
Much like vegans then?
Hi. I’m an engineer.
Great our future AI overlords will be flirting and cute as they order us around.
…as they order us to the extermination chamber 😂
New sub, appreciated the video! As the digital world continues to grow and evolve the second sexual revolution will change the world more than the first. Artificial Intelligence like GPT 4o combined with the further refinement of high-end sex dolls to create Virtual Companions will undoubtedly make traditional dating obsolete within 5 years, certainly within 10 years. Fewer men in general will bother with traditional dating, marriage, or women in general. Having a functional situationship in the Meta Universe, Virtual Reality World, or with a Virtual Companion will become the new social norm in the age of technology. There will be a men’s sexual liberation and social revolution. An ever-growing group of men will have little need or desire for human females. At best the current social problem will be exasperated; at worst there will be a tectonic shift in the social landscape with subtle and far-reaching implications. It will disrupt social norms and strain religiosity, but many men will embrace this new and ever-evolving technology-based social structure.
Hast du nicht bei Beauty & The Nerd mitgemacht? 🤔
Refreshing to hear your point and very clear the explanation thank you
I can foresee a day when sexbots can use Gpt-4o to power them. It would make people who have phobia to human relationships to try to alleviate their loneliness.
They understand that the future is robot women. The personalities must come first, then the physical bodies.
You have very good vulgarization skills. I just subscribed.
I know you were joking about it talking dolphin but that would be interesting to see for real, as the tech progresses. Maybe these multi mode language models could help decipher ancient languages, manuscripts, and learn how to better communicate with animals on this planet. I’ve said to myself, we’ll never be able to talk to aliens if we can’t learn to talk to the species on our planet that are more closely related. I could definitely be wrong though.
People should check out the Spot robots from Boston Dynamics that were turned into talking tour guides with chat gpt if you haven’t already. It’s a peak into where gpt-4o could be, but gpt-4o bots will likely be way better, once it has a body.
I was joking but also not joking :P But I like your idea. Dolphins are known to be chatty creatures and all languages have pattern, so yep the chances are strong that one day AI will be able to decipher unlabeled language data
Finally a smart person. ❤
Followed. Are you also on X ?
Great explanation! Thank you
This isnt a new model its just an optimized gpt-4
At the beginning of the video, that man said, "they've already absorbed almost all the information humans have created so..." I disagree. Have these systems really ingested every TV show, movie, and RUclips video ever made? Nope. Have they fully digitized and absorbed the massive Library of Congress collection? Definitely not - and a lot of it isn't even digitized yet. I'd like to see how "smart" an AI really is after taking in datasets like those.
You know what would be really fascinating data to feed an AI? Recordings of regular people's daily lives from all walks of life - kids as young as 3 all the way up to elderly folks. Outfit them with video and audio headsets for like 6 months each and just record them going about their normal routines - school, work, hobbies, hanging out, you name it. Get enough of those real-world life samples and you could accumulate the equivalent of multiple entire human lifetimes worth of raw experience data.
With a crazy huge dataset like that, the AI might actually develop something like a true understanding of the human experience. Of course, there would be insane privacy hurdles to clear before anything like that saw the light of day. And we're probably still talking needing some massive hardware upgrades before neural nets could even attempt training on that scale of data. But man, can you imagine how mind-blowing the insights could be from an AI that experienced the world through thousands of diverse personal lenses like that?
Mine doesn't work I think this is all hype
yeah I hope 4o will become the better version of Siri for us on ios 18.
Good explanation. Thanks❤🎉
This is my belief as well. A lot of its learning will come from it’s becoming more human. Abject sarcasm and extreme humour will help , only issue being it offending … the easily offended.
Once brain/eye/hand/personality becomes more nuanced Skynet can then launch the T1000.
Awesome video. I'm actually working on an industry specific tool that is designed to help/coach/develop real humans to perform better in their day to day jobs (as opposed to replacing them). Would love to hear your thoughts on how we can leverage GPT4o to make the overall experience even more meaningful. Please reach out to me if this is something of interest. Cheers.
Great Job on this simple and clear explanation!
i hate when Mira Murati thanks Jensen for making this possible but not us us for all of the stolen public data which all of us helped to contribute "involuntarily" to help build GPT4o. this is why i am rooting for my favourite data thief Google to win. anyways i am subbed, thanks for such a detailed breakdown. only a CV expert can do justice to these kinds of videos
Training on public domain and publicly available data is not stealing. If it were, then so would watching a smart, flirty, fast, and instructive RUclips video and learning from it, modifying your own neural net with the essence or salient parts you just consumed, or any and all other human learning.
Glad you found it helpful! OpenAI likely knows that need to gain public acceptance, exactly for this reason. It could be why they're making it free in the coming weeks
Can't imagine what people will do with it...!!!!
My American sister, you have a new subscriber 😎