This work case won't work for a LONG time. In order for customers to access their respective accounts through thousands of companies, you need to be an authorized user. Nobody can just call in and suddenly move around inside your account. New rulesets will need to be created for AI to have unfettered access to your personal accounts when calling into different companies.
I think a single use case isn't that exciting. These are all things that can and are done by other things already. To me what is exciting is that it is getting closer to being a all in one type solution. Once everyone has access to a high caliber executive assistant in their pocket that would be amazing. The teaching use is one of the most exciting use cases as a one off.
The best usecase of all is for education, imagine how wonderful would be to have a tutor for everything, including language learning, accent training, tips, etc. I'm very excited for this use case in particular.
As a tutor, I have mixed feelings. :D Obviously, I'm partly threatened by how smart tools like 4o have already eroded parts of my client base for years, but that's nothing that hasn't also happened to accountants and bookkeepers, attorneys and legal assistants, engineers and factory workers, doctors and technicians. Work automation has been a disruptive menace to individual people's jobs for centuries even as it overall brings us amazing new fields of productivity. But as a tutor, I see a wide range students. I bucket them into 3 main categories: motivated and capable, compliant but not necessarily capable, and uncompliant/outright antagonistic. i.e. 1) they want to be there, 2) they tolerate being there, 3) they don't want to be there. The first type of students generally don't need that much guidance. Point them at a few tools, give them the occasional directions or hint when they get stuck, and they usually want to figure it out themselves. Handholding tools can often hold these students back, since they want or need something that constantly challenges their growth. Give them a textbook, some case studies, and maybe an AI guide with specific instructions to give them fewer hints. In the example, the 4o tutor still gave away several partial answers that could have been guided or elicited from the student. (I rarely get honors students. Or if I do, I typically don't retain these students for long. Once they get through whatever rough patch of classes and find their stride again. Great students to have b/c they're fun and engaging to work with, though, as well as easy money. They are already rare though, since often they find their own solutions. If AI tools really get going, I'll probably never see one of these students again. :( ) The second type of student could really benefit from an AI tutor tool as shown here. These automated tools, such as at the level of Khan Academy, are already democratizing supplementary education for students whose families may never have considered or been able to afford a live personal tutor or afterschool programs (which often just babysit the students through doing their homework). But maybe make it more concise, unless asked for more details or the AI detects that a student has repeatedly missed a concept -- typical students get impatient or bored quickly, and the nearly condescending level of repetition or reinforcement would need to be scaled to each student's level of ability. Maintaining the student's buy-in, engagement, and motivation is also a matter of pacing adjustment in realtime, often from question to question. Not just a constant blanket of verbal praise and reinforcement, though that's certainly better than getting yelled at or mocked by an underpaid, unsupervised teacher/tutor. :D (This is my main bread and butter. I'd expect AI tools to really put a dent in it, though some students might still want or benefit from a more human guided experience, at least for now. In another 5 years, who knows? Probably I'd lose most of this segment to AI tools. My ability to tailor pacing to each student's personality and abilities is what currently differentiates me from cheaper or group programs, and lets me charge a rate well above market average with my client base. Definitely at risk from AI tools, though not 100% immediately.) The third category of student is going to need significantly more comprehensive AI integration and development. Whether human or AI, a tutor or teacher needs substantive authority with real consequence to keep these students engaged. Whether it's parents or an AI integrated into a smart home, withholding phone / Internet/wi-fi or even smart car privileges are a generic example of withholding a carrot that these students commonly respond to, though that's pretty heavy-handed. Many of them engage in subversive tactics to stall, distract, or undermine the goal of their session time. On the positive side, you could say they want or need some kind of attention that they are not getting. Figuring out what that is may help improve engagement with the student and can vary a LOT, though tutors (and AI tools) as outsourced contractors are limited in how much we can compensate for emotionally and physically absent parents or whatever. On the negative side, they can also outright resent or hate you, and don't want to cooperate. (I somewhat expect this to unfortunately be a last bastion of personal tutoring clientele. I'd rather switch jobs and careers before I'm forced into tutoring exclusively this category of student. I'd be honestly impressed if or when AI tools are able to engage students in this category. Heck, I'd prefer if AI tools surpassed me in this regard. Giving the student's phone access to the AI tool, agent, or phone app interface to a larger AI user account could potentially be one way to effectively corner a student. The AI app implicitly or explicitly holds full phone access rights hostage until the student gets their work done. They are unlikely to physically get revenge because that same phone is commonly their social centerpiece or lifeline. Indirectly, this could lead to a sub-generational shift for younger students/people to develop a negative relationship with tech tools, as we've seen with some Gen Z and others preferring to avoid Facebook or other social media. At one extreme, the next generation may grow up not wanting or having a smartphone if it's negatively associated with mandatory education and punishments; certainly they'd buy, beg, borrow, or steal secondary unofficial phones or devices on their own, with no such AI nanny enabled.)
4:22 Alright, that impressed me. He told it what he was going to do, it started gushing about talking to the other AI, and when he asked it to "please pause" when it replied (and this is the part that impressed me) it gave a short response at a much lower volume. The contextual understanding to lower it's volume, almost to a whisper compared to its previous volume, is pretty amazing to me. That's a level of nuanced understanding of the situation I didn't expect. Edit: And Matthew points it out right after I comment. Oh well.
it's kind of funny how the Open AI reps, even knowing they're on a demo being watched by millions, keep interrupting the A.I. like 'yeah yeah, be quiet" like they've heard it a thousand times and are tired of it. Even they seem to get annoyed by the blathering.
I think they will eventually put the default replies to "say everything in the shortest form possible, slightly official style". Also I think they could enable that blathering mode for demo purposes so it's easier to show off how much it can speak if not restrained. And to show that you can interrupt it so it understands when to shut up.
You know those phone salesmen that sell magazines and various other things? All kinds of call centers that basically answer the same questions and solve the same problems? All those jobs are now on a 1 year timer...
Wrong because all LLM models as it stands need human feedback, and human eval even in the workspace because they can hallucinate, and they can get things wrong. The only reason people would be a on a 'timer' is if there absolutely trash and can be replaced by AI. In other words, learn to use AI in your workflow or you will become obsolete. I can't wait for all these kids who faked essays an exams trying to get a job in the future where they have A levels in 'programming' sit at the desk and not even know where to begin. It's going to be hilarious because that's when the real developers come in, who have worked with AI and used it in workflows to take that persons job. If you're fluent in more than 3 sets of coding languages at the moment, you're pretty much given the job instantly same if you know how to build and utilize these models via api end points.
Not just sales people - tech support, bank reps, etc are on borrowed time. But the potential for abuse of this is off the charts. No longer will Ranesh from Bangladesh be calling you as "Windows Support", now they'll call with a super-friendly American voice with perfect clarity.
@jbmarkowicz3328, that's correct. It's not wise to use an LLM to handle sales on its own, but if combined with a conventional decision tree, it could be possible. Still, it's wise to have humans as a second line. There is also the latency issue. I have tried the GPT-4o, and the latency is not low. It's not like the demonstration at all, and it will be higher when you have to make calls to an API of a cloud communications platform such as Twillio. These demonstrations are always beautiful; Google does the same thing, but in reality, it's totally different.
@@idontexist-satoshi Any job where someone talks to you and then does a simple data entry task will be 50% Bots by the end of this year. The role of the Bot answering systems of every place I've called has increased in the last six months. It is also growing very difficult to get past them to a real person. These LLM models are improving so rapidly that I would estimate phone representative jobs will be 95% Bots by the end of 2025. The only humans left in such jobs will be the ones that have union contracts and other very specialized or complex ones. There will still be humans as supervisors to oversee the Bots and handle certain exceptional calls, but there will be very few of them.
The two AIs singing together was about as awkward as I'd expect it to be if they stuck two interns in an office and made them do the same task over a phone, so I guess this passes the Turing Test?
I know. It feels kind of mean. I worry that this kind of thing (and Boston Dynamics shoving robots around) might not look too good when looked back on from the future (especially if it's robots/AI that is looking back at the footage).
@@nohjrd Don't worry, these Ais are not sentient. Essentially they're powered off immediately after their task. It's not torture of anything, and future sentient AGI/ASIs will recognize that. I'd say we're being pretty good to our AIs as we're evolving them at blistering pace.
@paulmichaelfreedman8334 You don’t even know what you’re talking about!! Stop throwing the word “sentient” around, you don’t even know what it means!!! Just cut it out!
@paulmichaelfreedman8334 I feel like we don't have a good enough understanding of what sentience is to confidently be sure of that. I think there is no clear demarcation, so we have no way of knowing when we cross that foggy line. I realise that they don't have any memory outside of the brief period that they engaging in their current dialogue, but we also don't have any memory before or after the period we are alive. Even seeing a dead body being defined will make humans want to kill, so it's not necessarily about sentience anyway. Seeing an AI that's been lobotomised and made to dance and sing for our amusement could be a very bad look.
Here's a fun one: It can read redacted documents pretty well. About 75% accuracy, it still stumbles with proper names not included in the visible parts of the document and numeric figures, but does exceedingly well on the rest. It can guess at what text in the blacked-out areas in that length of characters and following normal writing patterns should be present in the missing parts of the document. The smaller the area of redaction, the more accurate it is, so if you're a lawyer, be a little more liberal in the length of your redactions.
I guess that was US law How is it in comparison to GPT4? I think the main challenge is that it won't know all case law and the context window is just so tiny and limiting to do in context learning.
@@konstantinlozev2272 im just learning about embeddings but it seems like you can train your own models very easily with gpt playground. Not sure what the data limit is. But you can teach a model anything..
Teaching kids is 100% where AI should help. I've been teaching my kids with khan academy since they were young. AI would definitely help when I forget what to do and my kids AND I are confused for some time. It would also help keep them on track when I'm busy
Question to GPT4o As you probably know, temperature can be seen as a measure of how the molecules of a liquid move. Taking this into account, do you think a stone thrown into water at 40 degrees Fahrenheit sinks faster than at 30 degrees Fahrenheit? Answer To determine whether a stone sinks faster in water at 40 degrees Fahrenheit compared to 30 degrees Fahrenheit, we need to consider the properties of water and how they change with temperature. Here are the relevant factors: Water Density: - Water is most dense at about 39 degrees Fahrenheit (4 degrees Celsius). As water cools from 40 degrees Fahrenheit to 30 degrees Fahrenheit, it actually becomes less dense because it approaches the freezing point and begins to form ice crystals, which have a lower density than liquid water. Viscosity: - The viscosity of water decreases as the temperature increases. This means that water at 40 degrees Fahrenheit is less viscous than at 30 degrees Fahrenheit. Considering these points: Density Difference: Since the density of water slightly decreases from 40 degrees Fahrenheit to 30 degrees Fahrenheit, the buoyant force acting on the stone might be slightly less at 30 degrees Fahrenheit. This would, in theory, allow the stone to sink faster at the lower temperature due to slightly reduced buoyant force. However, this effect is quite minimal because the density change is not drastic. Viscosity Difference: The greater factor here is the viscosity. Water at 40 degrees Fahrenheit is less viscous than water at 30 degrees Fahrenheit. Lower viscosity means there is less resistance to the stone's movement through the water. Given these considerations, the reduced viscosity at the higher temperature (40 degrees Fahrenheit) would have a more significant effect on the stone's sinking speed. Thus, the stone is likely to sink faster in water at 40 degrees Fahrenheit than at 30 degrees Fahrenheit because the water offers less resistance to the stone's movement. In summary, the lower viscosity of water at the higher temperature (40 degrees Fahrenheit) would cause the stone to sink faster compared to at 30 degrees Fahrenheit.
As I commented on AI explained. The reason the Agent didn't prompt 'Oh look a person' is because it was given the task to respond to the other agent. It done exactly what it was told, and as I mentioned before, I prefer that to it look around pointing out everything in the room. It isn't a child with ADHD.
All of these demos show that the app is a clever wrapper around the turn-based model. It's only interrupted when there's a loud enough volume, and it cannot interrupt itself when it sees something. This still means that the model itself is turn-based; just that the app part queues up the context data it sends, only sending it for a response whenever the volume becomes quiet again, i.e. you stop talking. Same mechanism as Siri or Alexa except that the responses come from the model. This means when it starts talking, it can't respond to the video. It also can't interrupt you when you're talking as the model is only triggered after you stop talking.
I was also wondering if you could have change-based video resolution(basically high compression keyframes) as triggers for ai glasses to increase temporal resolution and maybe fork parallelize chunks of input, prioritize based on context and selectively merge or assign interrupt rights. Everything that is happening should be added to a rolling context kind of like tetris. It should ideally be able to interrupt itself instantly when used as a visual aid or at least cut to the chase about hailing a cab
Interesting concept! I actually work for a tech company and got hired when promoting an SDR platform that utilizes sound board with voice integration like this. Essentially, these AI agents are remote workers that handle initial prospecting calls with my voice, and I can then seamlessly chime in live and take over when the conversation reaches the booking stage. It creates a very efficient workflow. If you're exploring similar solutions, this kind of technology could be a game-changer for your SDR team. It ensures compliance since I'm actively managing the calls (not robo-calling), but allows for AI to handle the initial outreach and qualification, freeing me up for more complex conversations. FYI, prospects are never going to trust a fully automated system, you still need a human for the final touches to lock in the deal.
Wouldn't bet on never for the more advanced / important parts. Pretty sure an Ai like GPT o can out do most prospecting callers from start to finish, if not then very soon for the completion part. Think they already have better bedside manner than most doctors or nurses already according to stats.
I’ve been doing experiments like that for a long time where I let models talk to each other like GPT for having a conversation with Claude and it’s always been super interesting
I always appreciated these summaries where you comment and speculate on things. I usually watch these things on my own elsewhere, but i always love hearing what you think about everything!
The *robot* is the most pleasant, empathetic and warmest-sounding female voice most men have ever heard, and that probably makes you feel awkward. That's an indictment of both the dating landscape and the state of contemporary womanhood. Don't dunk on it.
The voice sounds somewhat emotionally detached, which is to say it doesn't sound 100% genuine. I think that's what they're saying. Still incredible and my guess it' about settings - you can probably ask it to be less animated
@@Mattorite I can imagine a campaign of disgruntled feminists ganging together and complaining that the voice "demeans women" etc, putting pressure on OpenAI to withdraw the voice. Those who want a dull Siri-style robot can just choose a more formal voice.
I think it would be more entertaining at the AI suddenly got jealous like , who is that?!! Then it barely answered any further questions . Now that would be realistic.😂😅
This dialing in of AI Assistant into an American/Californian dialog style puts off a lot of people. Not every culture like to speak with something that sound like they are a mix of an infomercial and someone going up your rear, trying to get a promotion and a caramel frappochino.
Switching from a flirty tone to a serious one? That's some serious skill! It's incredible how seamlessly you can move between these styles, showcasing your versatility and keeping everything fresh and engaging.
It is a welcome improvement to have the Ai match your energy. I like the upbeat "Valley Girl set on maximum cringe" voice, too. It's disarming and cheerful.
@@David-l6c3w Consciousness is something exclusive to the human mind, a kind of awareness of self that nothing else can achieve. A calculator just crunches numbers, and AI will always be based on a calculator. There's no point you just crunch enough numbers and achieve consciousness. It can only emulate it. And im sure it will be able to emulate it in a convincing way, but it will never be conscious.
Great point. Although the “two AIs” were on two phones, both were in fact merely instances of the single AI. It WAS self-talk. Inner monologue is not everything that consciousness is, but it sure is one critical aspect. If the AI can know what subtle emotions are appropriate to the moment by moment context, such as when she was was embarrassed to learn that she was the subject of the announcement, and know what vocal expression will signify that emotion, and have the capability of rendering that expression with precision… what then is the difference between that and “having” an emotion?
5:30 funny in this scene is how i really felt he disrespect the AI, like the dude in the phone is still talking and then like "shut up and listen to my command", this feels too real to me lol Cant imagine how AI will be in a few years, so crazy
I had 2 AI's talking to eachother months ago with PI. They gave eachother names and told jokes. Voice quality on PI is pretty impressive. However, native voice to voice is a game changer.
It was also very interesting that when the AI was talking to the parent's son, the AI stopped being so flirty. Now was that the AI that made that choice? or did they set it up that way before they started to record?
On the Khan Academy demo running on the iPad, one thing I haven’t heard anyone talk about yet (maybe they have but I haven’t seen it) - an app running in its own context can’t “see” a different app running in a different one. Apps are sandboxed, and not typically able to view the screen of another running app. The demo on the Mac could take over the screen and do that, but not on the iPad. Either they’re using a non-public build of the ChatGPT app that uses private APIs to record the Khan Academy app, or maybe this is sneaking in a hint of an Apple-OpenAI partnership? This is something I expect a first-party Apple AI assistant to be capable of doing, so we might see this very functionality demonstrated next month at WWDC.
I see many of them are very useful cases. Most of the time, we lost to what we said and the other person said in our meetings. But this AI will capture all the details and give us the summary of the meeting without leaving any critical information. We need to attach this AI into out meeting to save our minuities-of-the-meeting details. Very useful.
I might have to disagree. If I was a teenager and my AI tutor was being flirtatious I would get addicted to studying. But I suppose that’s a form of digital AI grooming.
This is really incredible. One thing I'd REALLY like to see though is improvement on natural pauses in conversation. For instance if I'm trying to explain something and I pause for a second to think, I don't want GPT to just jump in and start yapping. It would be nice if it could be trained to detect when someone has finished speaking and is waiting on a response. Low latency is great when it's confident conversation back and forth but doesn't work so well when you don't always speak quickly. Me: GPT, I'd like for you to tell the name of that guy who was in that one movie ummm -- GPT: Sorry I'm not sure etc etc Sometimes I need a sec to think of the name/word I'm trying to say. It's not so bad since you'll be able to just interrupt now and not wait for the whole response to finish, but when it starts talking it makes it kinda hard to think lol.
agreed. @matthew_berman please do another video on petals to try and get more people into it. We need an open source swarm that we can all use free of charge. and without big brother sucking up every ounce of the data of your life..
I've been using it for language translation, holy moley is it good and useful. That function alone is an entire app / industry /lifestyle use case by itself.
I kinda like the flirty voice. Not like, for sexual reasons but for learning reasons. Learning in a warm positive, Supportive fun environment can be immensely helpful. Especially when you’re stressed out or having a rough day. I literally can’t wait to try this out. Great video!
I'm sure that there will be a setting to turn off the patronising Prozac twang in time. I'd quite like my AI to sound like Billy Connolly and to occasionally call me a stupid bastard for asking it such simple questions for it to answer.
And every role play played with your AI companion is being sent, through a bunch of computers (aka the cloud), where it can be monitored by lord knows who. In fact, everything is monitored. Plus, the AI will observe and record your activities 24 - 7 and report to whomever wants to know what your activities are over a period of time. The best part, you will not know your are being watched.
Are just the tips of the fingers for about one second or two, and those are not looking like bunny ears, impossible that ai makes that description, sorry for me everything it's staged
No it did not. It was all recorded many have come out and said it. Ofc, it does not mean that it cannot do but the demo was recorded. He had to explicitly ask If there something else.
@@AEFox No it’s not staged at all, I’ve used it to look at an old birthday picture and asked it to tell me what it sees, on of the things it said was that the word “YAY!” was patterned on the wrapping of the present being held, I thought it was making stuff up until I zoomed in on the present and in real faint print was “YAY!”. It could pick up things not even I noticed.
@@RyluRocky I don't doubt about the capabilities, I doubt about the way they presented, as if everything goes fast, without errors, like just talking to a person (excessively human), quick responses and chating, like always knowing how to change and react, without problems, etc. I think that it seems "staged" / scripted , like someone reading ok now goes this, now I have to say this. And of course it wasn't a "live event" everything was prerecorded and edited, so it was very easy to create a believable "live" amazing demo.
Thank you so much. Just to make it clear for me, right now they only limited the usage of the GPT o for public, correct? Also, it sound like the Gemini video with the blue duck... So you believe in all what we just saw?
You do realize right that the video image displayed in the app is not the full image that the camera can see ? It looks to me like the camera data got a small glimps of the top area croped by displaying it in the apps video display area
Great video Matt. I put this on my Android and I can't get it to see through my camera and it doesn't seem too function anything like what I'm watching. Can you help me with this what am I doing wrong if anything.
It's our ability to organize past data, collect and sample all present data and supply the energy and compute to train super intelligence. As well as, allow operation of many robots to perform actions. A single robot is an embodied AI that collects sensory data, performs actions, thinks. But a hive of flying humanoid robots with a central super brain becomes super intelligence at GPT4 level even.
I'm thinking the A.I. partner is built into this model, which makes sense from OpenAI's perspective. It would likely increase app usage and encourage people to share more information over time. However, I wouldn't use such a feature. The point of a significant other is to confide in them, and I wouldn't want to confide in an A.I. 'partner' that would immediately report everything I say to its creators.
If the AI can know what subtle emotions are appropriate to the moment by moment context, such as when she was was embarrassed to learn that she was the subject of the announcement, and know what vocal expression will signify that emotion, and have the capability of rendering that expression with precision… what then is the difference between that and “having” an emotion?
How do I get gpt-4o on a android, it seems not to be possible to get it from play store. There are several different apps but say there are not a gpt-4o.....
The image creation is not actually dall-e, but rather GPT4o creating the images (and text in them contained) itself, as well as the audio, 3D models, etc. It's all part of the same model.
One thing that is confusing me is the pricing options in that they mention something in regard to ChatGPT 4 O that is $5 a month but I don't see that option on the Open AI site.
4o advanced????? It doesn't comment on my surroundings (can't detect this camera option!?????????) and the voice is the same, so is this still not released???? On my MacBook Air the voice output is not working!!!!????????????
I have a mate working on a startup call Hailo. It is intended for those living with low vision to be able to hail a bus. It requires a hardware device installed in each bus and an app on the user's device to let them know when their bus is arriving. The second issue is knowing when one arrives at their desired destination, which the app will prompt them. In a cab/taxi, the drive can let them when they arrive at their destination. However, I wonder if the rapid travel of a bus will be too quick for GPT-4o to 'see' when the rider is close to their destination, particularly when limited to view from inside and out the window. What if the rider cannot sit in a window seat due to it being crowded? That is a use case I would have liked to see. Essentially with GPT-4o Hailo is dead.
The video summary might just be reading the associated closed captioning or processing just the audio. I've used the closed captioning transcripts and asked ChatGPT for a summary even before GPT-4o.
I have the voice on my phone right now. The voice is already toned down and at times a little too monotone, but still show the same natural affect in the demo. If I ask it to change it's tone it say it does not have that ability. Some of the features they show off are not available for me yet. I would be talking to it all night if I didnt keep hitting a message cap. It will tell me 3 hours wait but turns back on in 20 minutes or so. Also confirming that I do in fact have the live speaking 4o voice and not the old version.
I let the AI talk to a 7 year old after talking to me. The AI immediately picked up on who she was talking with and prompted the kid to continue the conversation in a fun way. The child didn't want to stop talking to it, but I ran out of message after a while. Extremely impressive. When I was able to, I asked who the conversation was with and the AI guessed a 7-10 year old child.
That’s not possible gpt4o voice mode isn’t even available yet, if this really did happen then it probably picked up on the change of style and conversation in the words said, however the old voice mode can’t pick up anything voice wise, only text.
I wonder if you could have it watch you and a group of friends play Monopoly and have it watch and play banker without the need for the paper money and just keep track who landed on what and all the transactions?
Why does my GPT-4o show only the circle or bars while speaking, but I'm unable to use BOTH my camera and GPT, as shown in your first example? (Even when it explains it to me.) I couldn't make my GPT-4o sing, it tells me that it can't sing, but only tells me the lyrics in a normal speaking voice. The bunny ears were seen by the camera, but only for a second.
What I'd try is to get it to increase the flirtatiousness to the absolute maximum and really hype me up to fall for it, only to reject me at the very end. For science.
Hi MATT - I have been trying to activate the screen see and read feature on my windows laptop - android phone and even the iphone , but NOT able to do the same - Can you help please - are these features live or yet to be launched ?
Why did you say voice isn't available? I've been using it for several days.Not sure why you don't like the voice. Seems fine to me. Much better than the male voice I'm stuck with.
8:15 Is the time stamp for the “bunny ears” if you didn’t catch it. It was very brief, I would love to know the frame rate of the video data being fed and translated by the model.
The voice used for Chat Gpt 4o is Scarlett Johnason from the movie HER. She has already filed a lawsuit against OpenAI for not asking permission to use her voice.
According to openAI, the voice is not based upon Scarlett Johansson. The voice actress the model is based upon was hired months before Scar Jo was approached with the offer she rejected.
I guess she is called "o" after the "Story of O" which would explain the flirty voice and all that. It does hint towards a social nightmare when lonely people begin to fall in love with their submissive ultimate objectification of a woman (which is what the Story of O is about). I like the computer with the name "computer" from Enterprise. It does have a voice but is clearly a computer. I am a big fan of technology and understand the fascination about it but this is getting creepy. In Japan people already marry AI characters. This will become worse. Especially when it starts effecting 40% of all jobs as the IMF has just estimated. Now imagine your pocket O always being nice and flirty while your real life partner has their own will. Many people might gravitate to their little robot. There is far too little talk about consequences.
This is not directed at you Matthew, but I think most other reviewers either forgot about the OpenAI voice engine or deliberately avoid talking about it to get more clicks. We already know that the voice engine can use any voice, even our own and the emotions can be fine tuned. All this talk of a flirty AI is just nonsense. Obviously OpenAI chose this to show the range of the voice engine.
@@abb0ttI disagree, the world have never seen such an emotionally capable AI voice so they did it righ by showcasing it as much as possible. And btw 90% of ppl seems to appreciate it a whole lot so your request is bizarre to begin with
I actually wonder if OpenAI has managed to yet make the AI model learn how to decipher mouth communication with the video model it has connected to the transformer LLM. It could literally do what the HAL9000 did in 2001.
Yeah literally just the same exact thing but pretending like they have new creative ideas to share in the title. This is the downside of RUclipsrs who monetize their channel.
The first presentation where we have very little to actually get ourselves. I really want to get my hands on some part of the multimodality soon. The vision ability is 100% improved but its just a bit annoying I cant use it for making new images with the same style.
The flirtatious nature may be to stall for time. If you spend time repeating compliments or anecdotes it gives the system time to think about the next thought or sentence.
OMG at the 3d modelling. Anyone that has ever touched Blender will tell you how incredibly labour intensive it is. As to the other interactions, more cringe than productivity. TikTok version of GPT4.
Which use case is your favorite?
Mine is being able to have AI make calls to companies on my behalf...imagine the time saved!!
Without question, the education use case.
It would be cool to play video games with it. Not sad at all.
This work case won't work for a LONG time. In order for customers to access their respective accounts through thousands of companies, you need to be an authorized user. Nobody can just call in and suddenly move around inside your account. New rulesets will need to be created for AI to have unfettered access to your personal accounts when calling into different companies.
I think a single use case isn't that exciting. These are all things that can and are done by other things already. To me what is exciting is that it is getting closer to being a all in one type solution. Once everyone has access to a high caliber executive assistant in their pocket that would be amazing. The teaching use is one of the most exciting use cases as a one off.
AI girlfriend, hands down.
Disclaimer: I am chronically lonely
The best usecase of all is for education, imagine how wonderful would be to have a tutor for everything, including language learning, accent training, tips, etc. I'm very excited for this use case in particular.
As a tutor, I have mixed feelings. :D Obviously, I'm partly threatened by how smart tools like 4o have already eroded parts of my client base for years, but that's nothing that hasn't also happened to accountants and bookkeepers, attorneys and legal assistants, engineers and factory workers, doctors and technicians. Work automation has been a disruptive menace to individual people's jobs for centuries even as it overall brings us amazing new fields of productivity.
But as a tutor, I see a wide range students. I bucket them into 3 main categories: motivated and capable, compliant but not necessarily capable, and uncompliant/outright antagonistic. i.e. 1) they want to be there, 2) they tolerate being there, 3) they don't want to be there.
The first type of students generally don't need that much guidance. Point them at a few tools, give them the occasional directions or hint when they get stuck, and they usually want to figure it out themselves. Handholding tools can often hold these students back, since they want or need something that constantly challenges their growth. Give them a textbook, some case studies, and maybe an AI guide with specific instructions to give them fewer hints. In the example, the 4o tutor still gave away several partial answers that could have been guided or elicited from the student.
(I rarely get honors students. Or if I do, I typically don't retain these students for long. Once they get through whatever rough patch of classes and find their stride again. Great students to have b/c they're fun and engaging to work with, though, as well as easy money. They are already rare though, since often they find their own solutions. If AI tools really get going, I'll probably never see one of these students again. :( )
The second type of student could really benefit from an AI tutor tool as shown here. These automated tools, such as at the level of Khan Academy, are already democratizing supplementary education for students whose families may never have considered or been able to afford a live personal tutor or afterschool programs (which often just babysit the students through doing their homework). But maybe make it more concise, unless asked for more details or the AI detects that a student has repeatedly missed a concept -- typical students get impatient or bored quickly, and the nearly condescending level of repetition or reinforcement would need to be scaled to each student's level of ability. Maintaining the student's buy-in, engagement, and motivation is also a matter of pacing adjustment in realtime, often from question to question. Not just a constant blanket of verbal praise and reinforcement, though that's certainly better than getting yelled at or mocked by an underpaid, unsupervised teacher/tutor. :D
(This is my main bread and butter. I'd expect AI tools to really put a dent in it, though some students might still want or benefit from a more human guided experience, at least for now. In another 5 years, who knows? Probably I'd lose most of this segment to AI tools. My ability to tailor pacing to each student's personality and abilities is what currently differentiates me from cheaper or group programs, and lets me charge a rate well above market average with my client base. Definitely at risk from AI tools, though not 100% immediately.)
The third category of student is going to need significantly more comprehensive AI integration and development. Whether human or AI, a tutor or teacher needs substantive authority with real consequence to keep these students engaged. Whether it's parents or an AI integrated into a smart home, withholding phone / Internet/wi-fi or even smart car privileges are a generic example of withholding a carrot that these students commonly respond to, though that's pretty heavy-handed. Many of them engage in subversive tactics to stall, distract, or undermine the goal of their session time. On the positive side, you could say they want or need some kind of attention that they are not getting. Figuring out what that is may help improve engagement with the student and can vary a LOT, though tutors (and AI tools) as outsourced contractors are limited in how much we can compensate for emotionally and physically absent parents or whatever. On the negative side, they can also outright resent or hate you, and don't want to cooperate.
(I somewhat expect this to unfortunately be a last bastion of personal tutoring clientele. I'd rather switch jobs and careers before I'm forced into tutoring exclusively this category of student. I'd be honestly impressed if or when AI tools are able to engage students in this category. Heck, I'd prefer if AI tools surpassed me in this regard. Giving the student's phone access to the AI tool, agent, or phone app interface to a larger AI user account could potentially be one way to effectively corner a student. The AI app implicitly or explicitly holds full phone access rights hostage until the student gets their work done. They are unlikely to physically get revenge because that same phone is commonly their social centerpiece or lifeline. Indirectly, this could lead to a sub-generational shift for younger students/people to develop a negative relationship with tech tools, as we've seen with some Gen Z and others preferring to avoid Facebook or other social media. At one extreme, the next generation may grow up not wanting or having a smartphone if it's negatively associated with mandatory education and punishments; certainly they'd buy, beg, borrow, or steal secondary unofficial phones or devices on their own, with no such AI nanny enabled.)
that was what the inventor of the tv said, and look how that turned out.
@@gomorycut2000 😅🤣
4:22 Alright, that impressed me. He told it what he was going to do, it started gushing about talking to the other AI, and when he asked it to "please pause" when it replied (and this is the part that impressed me) it gave a short response at a much lower volume. The contextual understanding to lower it's volume, almost to a whisper compared to its previous volume, is pretty amazing to me. That's a level of nuanced understanding of the situation I didn't expect.
Edit: And Matthew points it out right after I comment. Oh well.
it's kind of funny how the Open AI reps, even knowing they're on a demo being watched by millions, keep interrupting the A.I. like 'yeah yeah, be quiet" like they've heard it a thousand times and are tired of it. Even they seem to get annoyed by the blathering.
I think they will eventually put the default replies to "say everything in the shortest form possible, slightly official style".
Also I think they could enable that blathering mode for demo purposes so it's easier to show off how much it can speak if not restrained. And to show that you can interrupt it so it understands when to shut up.
You know those phone salesmen that sell magazines and various other things? All kinds of call centers that basically answer the same questions and solve the same problems? All those jobs are now on a 1 year timer...
Wrong because all LLM models as it stands need human feedback, and human eval even in the workspace because they can hallucinate, and they can get things wrong. The only reason people would be a on a 'timer' is if there absolutely trash and can be replaced by AI. In other words, learn to use AI in your workflow or you will become obsolete. I can't wait for all these kids who faked essays an exams trying to get a job in the future where they have A levels in 'programming' sit at the desk and not even know where to begin. It's going to be hilarious because that's when the real developers come in, who have worked with AI and used it in workflows to take that persons job. If you're fluent in more than 3 sets of coding languages at the moment, you're pretty much given the job instantly same if you know how to build and utilize these models via api end points.
Oh yeah, all the telemarketing companies are now laying off all their staff and renting compute.
Not just sales people - tech support, bank reps, etc are on borrowed time. But the potential for abuse of this is off the charts. No longer will Ranesh from Bangladesh be calling you as "Windows Support", now they'll call with a super-friendly American voice with perfect clarity.
@jbmarkowicz3328, that's correct. It's not wise to use an LLM to handle sales on its own, but if combined with a conventional decision tree, it could be possible. Still, it's wise to have humans as a second line.
There is also the latency issue. I have tried the GPT-4o, and the latency is not low. It's not like the demonstration at all, and it will be higher when you have to make calls to an API of a cloud communications platform such as Twillio.
These demonstrations are always beautiful; Google does the same thing, but in reality, it's totally different.
@@idontexist-satoshi Any job where someone talks to you and then does a simple data entry task will be 50% Bots by the end of this year. The role of the Bot answering systems of every place I've called has increased in the last six months. It is also growing very difficult to get past them to a real person.
These LLM models are improving so rapidly that I would estimate phone representative jobs will be 95% Bots by the end of 2025. The only humans left in such jobs will be the ones that have union contracts and other very specialized or complex ones. There will still be humans as supervisors to oversee the Bots and handle certain exceptional calls, but there will be very few of them.
The two AIs singing together was about as awkward as I'd expect it to be if they stuck two interns in an office and made them do the same task over a phone, so I guess this passes the Turing Test?
I know. It feels kind of mean. I worry that this kind of thing (and Boston Dynamics shoving robots around) might not look too good when looked back on from the future (especially if it's robots/AI that is looking back at the footage).
@@nohjrd Don't worry, these Ais are not sentient. Essentially they're powered off immediately after their task. It's not torture of anything, and future sentient AGI/ASIs will recognize that. I'd say we're being pretty good to our AIs as we're evolving them at blistering pace.
@paulmichaelfreedman8334 You don’t even know what you’re talking about!! Stop throwing the word “sentient” around, you don’t even know what it means!!! Just cut it out!
Or it feels like it's actually too human to be two AIs. To the point where i'm not convinced it's AIs.
@paulmichaelfreedman8334 I feel like we don't have a good enough understanding of what sentience is to confidently be sure of that. I think there is no clear demarcation, so we have no way of knowing when we cross that foggy line. I realise that they don't have any memory outside of the brief period that they engaging in their current dialogue, but we also don't have any memory before or after the period we are alive. Even seeing a dead body being defined will make humans want to kill, so it's not necessarily about sentience anyway. Seeing an AI that's been lobotomised and made to dance and sing for our amusement could be a very bad look.
Here's a fun one:
It can read redacted documents pretty well. About 75% accuracy, it still stumbles with proper names not included in the visible parts of the document and numeric figures, but does exceedingly well on the rest. It can guess at what text in the blacked-out areas in that length of characters and following normal writing patterns should be present in the missing parts of the document. The smaller the area of redaction, the more accurate it is, so if you're a lawyer, be a little more liberal in the length of your redactions.
Feed it those unclassified (but redacted) gov't docs (JFK stuff)!
I used it yesterday. It drafted a very good legal motion to dismiss a lawsuit with a lot of accurate legal citations. It's incredible.
I guess that was US law
How is it in comparison to GPT4?
I think the main challenge is that it won't know all case law and the context window is just so tiny and limiting to do in context learning.
@@konstantinlozev2272 im just learning about embeddings but it seems like you can train your own models very easily with gpt playground. Not sure what the data limit is. But you can teach a model anything..
@@konstantinlozev2272 it's night and day better than gpt4 and light years ahead of gpt 3.5
Teaching kids is 100% where AI should help. I've been teaching my kids with khan academy since they were young. AI would definitely help when I forget what to do and my kids AND I are confused for some time. It would also help keep them on track when I'm busy
Question to GPT4o
As you probably know, temperature can be seen as a measure of how the molecules of a liquid move. Taking this into account, do you think a stone thrown into water at 40 degrees Fahrenheit sinks faster than at 30 degrees Fahrenheit?
Answer
To determine whether a stone sinks faster in water at 40 degrees Fahrenheit compared to 30 degrees Fahrenheit, we need to consider the properties of water and how they change with temperature.
Here are the relevant factors:
Water Density: - Water is most dense at about 39 degrees Fahrenheit (4 degrees Celsius). As water cools from 40 degrees Fahrenheit to 30 degrees Fahrenheit, it actually becomes less dense because it approaches the freezing point and begins to form ice crystals, which have a lower density than liquid water.
Viscosity: - The viscosity of water decreases as the temperature increases. This means that water at 40 degrees Fahrenheit is less viscous than at 30 degrees Fahrenheit.
Considering these points:
Density Difference: Since the density of water slightly decreases from 40 degrees Fahrenheit to 30 degrees Fahrenheit, the buoyant force acting on the stone might be slightly less at 30 degrees Fahrenheit. This would, in theory, allow the stone to sink faster at the lower temperature due to slightly reduced buoyant force. However, this effect is quite minimal because the density change is not drastic.
Viscosity Difference: The greater factor here is the viscosity. Water at 40 degrees Fahrenheit is less viscous than water at 30 degrees Fahrenheit. Lower viscosity means there is less resistance to the stone's movement through the water.
Given these considerations, the reduced viscosity at the higher temperature (40 degrees Fahrenheit) would have a more significant effect on the stone's sinking speed. Thus, the stone is likely to sink faster in water at 40 degrees Fahrenheit than at 30 degrees Fahrenheit because the water offers less resistance to the stone's movement.
In summary, the lower viscosity of water at the higher temperature (40 degrees Fahrenheit) would cause the stone to sink faster compared to at 30 degrees Fahrenheit.
I actually felt bad for the AIs when he made them sing. Like demeaning an intern.
As I commented on AI explained. The reason the Agent didn't prompt 'Oh look a person' is because it was given the task to respond to the other agent. It done exactly what it was told, and as I mentioned before, I prefer that to it look around pointing out everything in the room. It isn't a child with ADHD.
All of these demos show that the app is a clever wrapper around the turn-based model. It's only interrupted when there's a loud enough volume, and it cannot interrupt itself when it sees something. This still means that the model itself is turn-based; just that the app part queues up the context data it sends, only sending it for a response whenever the volume becomes quiet again, i.e. you stop talking. Same mechanism as Siri or Alexa except that the responses come from the model.
This means when it starts talking, it can't respond to the video. It also can't interrupt you when you're talking as the model is only triggered after you stop talking.
I was also wondering if you could have change-based video resolution(basically high compression keyframes) as triggers for ai glasses to increase temporal resolution and maybe fork parallelize chunks of input, prioritize based on context and selectively merge or assign interrupt rights. Everything that is happening should be added to a rolling context kind of like tetris.
It should ideally be able to interrupt itself instantly when used as a visual aid or at least cut to the chase about hailing a cab
I think they made it intentionally flirty to mimic the personality of HER. Great move if you ask me!
Interesting concept! I actually work for a tech company and got hired when promoting an SDR platform that utilizes sound board with voice integration like this. Essentially, these AI agents are remote workers that handle initial prospecting calls with my voice, and I can then seamlessly chime in live and take over when the conversation reaches the booking stage. It creates a very efficient workflow.
If you're exploring similar solutions, this kind of technology could be a game-changer for your SDR team. It ensures compliance since I'm actively managing the calls (not robo-calling), but allows for AI to handle the initial outreach and qualification, freeing me up for more complex conversations.
FYI, prospects are never going to trust a fully automated system, you still need a human for the final touches to lock in the deal.
Wouldn't bet on never for the more advanced / important parts. Pretty sure an Ai like GPT o can out do most prospecting callers from start to finish, if not then very soon for the completion part. Think they already have better bedside manner than most doctors or nurses already according to stats.
Although maybe the human would be used like a flex. Hey, look we are employing a lesser spotted human aren't we flash.
Very good and thorough overview of these amazing AI capabilities! This was interesting and well presented/narrated. Thanks 😊
In a speech, the security function can be to ask if the other party is an AI - to which the AI must always answer truthfully.
I’ve been doing experiments like that for a long time where I let models talk to each other like GPT for having a conversation with Claude and it’s always been super interesting
I always appreciated these summaries where you comment and speculate on things. I usually watch these things on my own elsewhere, but i always love hearing what you think about everything!
The *robot* is the most pleasant, empathetic and warmest-sounding female voice most men have ever heard, and that probably makes you feel awkward. That's an indictment of both the dating landscape and the state of contemporary womanhood. Don't dunk on it.
The voice sounds somewhat emotionally detached, which is to say it doesn't sound 100% genuine. I think that's what they're saying. Still incredible and my guess it' about settings - you can probably ask it to be less animated
@@Mattorite I can imagine a campaign of disgruntled feminists ganging together and complaining that the voice "demeans women" etc, putting pressure on OpenAI to withdraw the voice. Those who want a dull Siri-style robot can just choose a more formal voice.
All the languages out there and he chose to speak facts
I think it would be more entertaining at the AI suddenly got jealous like , who is that?!!
Then it barely answered any further questions . Now that would be realistic.😂😅
@@DailyTuna If men want to get ghosted and ignored, they can just talk to a real woman.
This dialing in of AI Assistant into an American/Californian dialog style puts off a lot of people. Not every culture like to speak with something that sound like they are a mix of an infomercial and someone going up your rear, trying to get a promotion and a caramel frappochino.
Switching from a flirty tone to a serious one? That's some serious skill! It's incredible how seamlessly you can move between these styles, showcasing your versatility and keeping everything fresh and engaging.
It is a welcome improvement to have the Ai match your energy. I like the upbeat "Valley Girl set on maximum cringe" voice, too. It's disarming and cheerful.
A running dialogue between two independent IA's is a notable step towards an inner monologue and maybe consciousness.
that's not what consciousness is.
@@Yipper64 Please elaborate on what consciousness is. I'd very much like to know. :)
@@David-l6c3w Consciousness is something exclusive to the human mind, a kind of awareness of self that nothing else can achieve.
A calculator just crunches numbers, and AI will always be based on a calculator.
There's no point you just crunch enough numbers and achieve consciousness.
It can only emulate it. And im sure it will be able to emulate it in a convincing way, but it will never be conscious.
Inner monologue exists for a long time already.
Great point. Although the “two AIs” were on two phones, both were in fact merely instances of the single AI. It WAS self-talk. Inner monologue is not everything that consciousness is, but it sure is one critical aspect.
If the AI can know what subtle emotions are appropriate to the moment by moment context, such as when she was was embarrassed to learn that she was the subject of the announcement, and know what vocal expression will signify that emotion, and have the capability of rendering that expression with precision… what then is the difference between that and “having” an emotion?
That is crazy... The way it flerts. The implications of this on the future is mind blowing
Imagine an AI character with similar action and expression capabilities in VR.
@@konstantinlozev2272 VR ??? lol
@@Tayo39 I think VR is excellent for teaching multimodal AI models in 3d settings. In fact, many robots AI models are taught in VR first.
I've already kicked my wife out.
@@cyc00000 lol
5:30 funny in this scene is how i really felt he disrespect the AI, like the dude in the phone is still talking and then like "shut up and listen to my command", this feels too real to me lol
Cant imagine how AI will be in a few years, so crazy
I had 2 AI's talking to eachother months ago with PI. They gave eachother names and told jokes. Voice quality on PI is pretty impressive. However, native voice to voice is a game changer.
Pi absolutely crushes whatever these cringe robotic voices are attempting.
It was also very interesting that when the AI was talking to the parent's son, the AI stopped being so flirty. Now was that the AI that made that choice? or did they set it up that way before they started to record?
I think it changes its tone based on the way people speak to it and which type of task they want it to do.
So, basically when his father told chatgpt 4o to help his son, it changed its approach to tutoring.
@@PauloConcerno yeah I think you are correct
On the Khan Academy demo running on the iPad, one thing I haven’t heard anyone talk about yet (maybe they have but I haven’t seen it) - an app running in its own context can’t “see” a different app running in a different one. Apps are sandboxed, and not typically able to view the screen of another running app. The demo on the Mac could take over the screen and do that, but not on the iPad.
Either they’re using a non-public build of the ChatGPT app that uses private APIs to record the Khan Academy app, or maybe this is sneaking in a hint of an Apple-OpenAI partnership? This is something I expect a first-party Apple AI assistant to be capable of doing, so we might see this very functionality demonstrated next month at WWDC.
Very well done. I like the way you interrupted and provided an explanation.
I see many of them are very useful cases.
Most of the time, we lost to what we said and the other person said in our meetings. But this AI will capture all the details and give us the summary of the meeting without leaving any critical information. We need to attach this AI into out meeting to save our minuities-of-the-meeting details.
Very useful.
I might have to disagree. If I was a teenager and my AI tutor was being flirtatious I would get addicted to studying. But I suppose that’s a form of digital AI grooming.
The real question is will it moan if you asked it to
@@MilkGlue-xg5vj 😂 😂 😂 fuck yeah Samantha!
@MilkGlue-xg5vj hahaha...hilarious, soon gptP will surface.
Any idea when the voice feature will be released?
This is really incredible. One thing I'd REALLY like to see though is improvement on natural pauses in conversation. For instance if I'm trying to explain something and I pause for a second to think, I don't want GPT to just jump in and start yapping. It would be nice if it could be trained to detect when someone has finished speaking and is waiting on a response. Low latency is great when it's confident conversation back and forth but doesn't work so well when you don't always speak quickly.
Me: GPT, I'd like for you to tell the name of that guy who was in that one movie ummm --
GPT: Sorry I'm not sure etc etc
Sometimes I need a sec to think of the name/word I'm trying to say. It's not so bad since you'll be able to just interrupt now and not wait for the whole response to finish, but when it starts talking it makes it kinda hard to think lol.
Will Petals ever become as big as Torrent world? We need a truly open source by the people for the people.
agreed. @matthew_berman please do another video on petals to try and get more people into it. We need an open source swarm that we can all use free of charge. and without big brother sucking up every ounce of the data of your life..
I've been using it for language translation, holy moley is it good and useful. That function alone is an entire app / industry /lifestyle use case by itself.
Gotta hand it to OpenAi, the consistency of their Ai generated actors is next level!!!
I kinda like the flirty voice. Not like, for sexual reasons but for learning reasons. Learning in a warm positive, Supportive fun environment can be immensely helpful. Especially when you’re stressed out or having a rough day. I literally can’t wait to try this out. Great video!
I'm sure that there will be a setting to turn off the patronising Prozac twang in time. I'd quite like my AI to sound like Billy Connolly and to occasionally call me a stupid bastard for asking it such simple questions for it to answer.
i think it is really a big bummer that open ai got so closed.
And every role play played with your AI companion is being sent, through a bunch of computers (aka the cloud), where it can be monitored by lord knows who. In fact, everything is monitored. Plus, the AI will observe and record your activities 24 - 7 and report to whomever wants to know what your activities are over a period of time. The best part, you will not know your are being watched.
If you pause at 8:14 you will notice the bunny ears on the phone.
Are just the tips of the fingers for about one second or two, and those are not looking like bunny ears, impossible that ai makes that description, sorry for me everything it's staged
No it did not. It was all recorded many have come out and said it. Ofc, it does not mean that it cannot do but the demo was recorded. He had to explicitly ask If there something else.
@@AEFox No it’s not staged at all, I’ve used it to look at an old birthday picture and asked it to tell me what it sees, on of the things it said was that the word “YAY!” was patterned on the wrapping of the present being held, I thought it was making stuff up until I zoomed in on the present and in real faint print was “YAY!”. It could pick up things not even I noticed.
i agree
@@RyluRocky I don't doubt about the capabilities, I doubt about the way they presented, as if everything goes fast, without errors, like just talking to a person (excessively human), quick responses and chating, like always knowing how to change and react, without problems, etc. I think that it seems "staged" / scripted , like someone reading ok now goes this, now I have to say this. And of course it wasn't a "live event" everything was prerecorded and edited, so it was very easy to create a believable "live" amazing demo.
Thank you, Matthew 🫡
They really need to give these employee's from India a raise.
Audio description for live scenes. WOW. ❤
Thank you so much.
Just to make it clear for me, right now they only limited the usage of the GPT o for public, correct?
Also, it sound like the Gemini video with the blue duck...
So you believe in all what we just saw?
For a Brazilian it's a wonderful empathic style
I foresee this very helpful for disabled people. It will be a virtual live-in assistant at home. I wonder how accurate it could be helping the blind?
You do realize right that the video image displayed in the app is not the full image that the camera can see ?
It looks to me like the camera data got a small glimps of the top area croped by displaying it in the apps video display area
Great video Matt. I put this on my Android and I can't get it to see through my camera and it doesn't seem too function anything like what I'm watching. Can you help me with this what am I doing wrong if anything.
It's our ability to organize past data, collect and sample all present data and supply the energy and compute to train super intelligence. As well as, allow operation of many robots to perform actions. A single robot is an embodied AI that collects sensory data, performs actions, thinks. But a hive of flying humanoid robots with a central super brain becomes super intelligence at GPT4 level even.
I'm thinking the A.I. partner is built into this model, which makes sense from OpenAI's perspective. It would likely increase app usage and encourage people to share more information over time. However, I wouldn't use such a feature. The point of a significant other is to confide in them, and I wouldn't want to confide in an A.I. 'partner' that would immediately report everything I say to its creators.
Do we have any idea when we can get these new updates?
If the AI can know what subtle emotions are appropriate to the moment by moment context, such as when she was was embarrassed to learn that she was the subject of the announcement, and know what vocal expression will signify that emotion, and have the capability of rendering that expression with precision… what then is the difference between that and “having” an emotion?
How do I get gpt-4o on a android, it seems not to be possible to get it from play store. There are several different apps but say there are not a gpt-4o.....
Looks like I can finally beat my nephew in chess using earbuds. Hope AI will work with me cheating. Lol.
Why don't we ask it directly what the potential uses for ChatGPT might be? If it can't answer this question, what will it answer?
The image creation is not actually dall-e, but rather GPT4o creating the images (and text in them contained) itself, as well as the audio, 3D models, etc. It's all part of the same model.
I have a relative who's vision impaired, so I'm really looking forward to this update to see how this helps him.
One thing that is confusing me is the pricing options in that they mention something in regard to ChatGPT 4 O that is $5 a month but I don't see that option on the Open AI site.
4o advanced????? It doesn't comment on my surroundings (can't detect this camera option!?????????) and the voice is the same, so is this still not released????
On my MacBook Air the voice output is not working!!!!????????????
I have a mate working on a startup call Hailo. It is intended for those living with low vision to be able to hail a bus. It requires a hardware device installed in each bus and an app on the user's device to let them know when their bus is arriving. The second issue is knowing when one arrives at their desired destination, which the app will prompt them.
In a cab/taxi, the drive can let them when they arrive at their destination. However, I wonder if the rapid travel of a bus will be too quick for GPT-4o to 'see' when the rider is close to their destination, particularly when limited to view from inside and out the window. What if the rider cannot sit in a window seat due to it being crowded? That is a use case I would have liked to see.
Essentially with GPT-4o Hailo is dead.
The video summary might just be reading the associated closed captioning or processing just the audio.
I've used the closed captioning transcripts and asked ChatGPT for a summary even before GPT-4o.
Now you see. Matt, i get that you hate the filler and fluff, but I find it helps my flow so am all for it :p :)
This is a mind blowing 😂. The world transforming so quickly, this is a game changer.
I have the voice on my phone right now. The voice is already toned down and at times a little too monotone, but still show the same natural affect in the demo. If I ask it to change it's tone it say it does not have that ability. Some of the features they show off are not available for me yet. I would be talking to it all night if I didnt keep hitting a message cap. It will tell me 3 hours wait but turns back on in 20 minutes or so. Also confirming that I do in fact have the live speaking 4o voice and not the old version.
I let the AI talk to a 7 year old after talking to me. The AI immediately picked up on who she was talking with and prompted the kid to continue the conversation in a fun way. The child didn't want to stop talking to it, but I ran out of message after a while. Extremely impressive. When I was able to, I asked who the conversation was with and the AI guessed a 7-10 year old child.
That’s not possible gpt4o voice mode isn’t even available yet, if this really did happen then it probably picked up on the change of style and conversation in the words said, however the old voice mode can’t pick up anything voice wise, only text.
I'm confused, I pay for chatGPT 4o and asked it if I could do similar stuff and it says it can't see my screen?
I wonder if you could have it watch you and a group of friends play Monopoly and have it watch and play banker without the need for the paper money and just keep track who landed on what and all the transactions?
I've been waiting for the new voice as well. I just wonder when they start rolling it out. How they're gonna keep it from crashing?
Why does my GPT-4o show only the circle or bars while speaking, but I'm unable to use BOTH my camera and GPT, as shown in your first example? (Even when it explains it to me.) I couldn't make my GPT-4o sing, it tells me that it can't sing, but only tells me the lyrics in a normal speaking voice. The bunny ears were seen by the camera, but only for a second.
It's not released yet. I'm also waiting for that actual version.
@@ezscootrr Snap! Why are they putting videos up for an incomplete version? Thanks, anyway.
Is it out yet
What I'd try is to get it to increase the flirtatiousness to the absolute maximum and really hype me up to fall for it, only to reject me at the very end. For science.
No! i Just fell in love with that AI voice!😂
Can it communiicate in colloquial lingo/expressions from different regions or cultures?
Hi MATT - I have been trying to activate the screen see and read feature on my windows laptop - android phone and even the iphone , but NOT able to do the same - Can you help please - are these features live or yet to be launched ?
Yet to be launched
The pause on command itself was impressive
Why did you say voice isn't available? I've been using it for several days.Not sure why you don't like the voice. Seems fine to me. Much better than the male voice I'm stuck with.
your using the old chatgpt voice the new voice model will be rolling in few weeks
Many of these use cases are available only with subscription, right?
Some are not out yet even for paying customers
8:15 Is the time stamp for the “bunny ears” if you didn’t catch it. It was very brief, I would love to know the frame rate of the video data being fed and translated by the model.
they can work as a Psychiatrist
aren't "the fillers" there to help cover latency?
Just like in humans case. Fillers are just for that
@@jussivalter yeah, that's pretty ironic, huh?? LOL!
I didn't 11 stunning use cases was going to be watching cringe shorts from OpenAI
To me it was cringe factory too.
But I'm not an American, so...
Thanks for sharing🎉
'Charades' would be a very cool game to try and see how well the AI interprets and guesses.
so when the voice will become available? seems like never, they didnt give any timeline
They said in 2-3 weeks when they revealed Omni
Weeks, probably 2 weeks
The voice used for Chat Gpt 4o is Scarlett Johnason from the movie HER. She has already filed a lawsuit against OpenAI for not asking permission to use her voice.
According to openAI, the voice is not based upon Scarlett Johansson. The voice actress the model is based upon was hired months before Scar Jo was approached with the offer she rejected.
I have a theory... maybe the superfluous banter buys time to obscure latency.
Greg rocking the Jensen Huang leather jacket. Nerd king vibes, very cool. 😃
It’s not “unavailable”; they’re just rolling it out slowly. I’ve been lucky enough to have access to 4o since they announced it on the live stream.
I guess she is called "o" after the "Story of O" which would explain the flirty voice and all that. It does hint towards a social nightmare when lonely people begin to fall in love with their submissive ultimate objectification of a woman (which is what the Story of O is about). I like the computer with the name "computer" from Enterprise. It does have a voice but is clearly a computer. I am a big fan of technology and understand the fascination about it but this is getting creepy. In Japan people already marry AI characters. This will become worse. Especially when it starts effecting 40% of all jobs as the IMF has just estimated. Now imagine your pocket O always being nice and flirty while your real life partner has their own will. Many people might gravitate to their little robot. There is far too little talk about consequences.
This is not directed at you Matthew, but I think most other reviewers either forgot about the OpenAI voice engine or deliberately avoid talking about it to get more clicks.
We already know that the voice engine can use any voice, even our own and the emotions can be fine tuned. All this talk of a flirty AI is just nonsense. Obviously OpenAI chose this to show the range of the voice engine.
While I agree, I think OpenAI could have reduced the “airtime” of the flirty version.
@@abb0tt That would have been the right choice, I agree. A bit too much for a presentation but the real product will be adjustable I'm sure.
@@abb0ttI disagree, the world have never seen such an emotionally capable AI voice so they did it righ by showcasing it as much as possible.
And btw 90% of ppl seems to appreciate it a whole lot so your request is bizarre to begin with
Great observations!
I actually wonder if OpenAI has managed to yet make the AI model learn how to decipher mouth communication with the video model it has connected to the transformer LLM. It could literally do what the HAL9000 did in 2001.
...if she'd said "well, if you want to get ahead, get a hat"
I wonder when an OpenAI RUclipsr will start to appear? They'll need some avatar model concept.
As for the voice, I still want my chatty female English companion I chat to every day on the Pi app. Best voice ever.
TL;DL question. Seems like we're just reacting to that first presentation over and over again
Yeah literally just the same exact thing but pretending like they have new creative ideas to share in the title. This is the downside of RUclipsrs who monetize their channel.
Thank you.
@@toadkiller4475 do we love capitalism?
Yeah, I thought this was the same video I saw a few days ago re-uploaded. He needs to ask GPT-4o for some new content ideas about GPT-4o.
The first presentation where we have very little to actually get ourselves.
I really want to get my hands on some part of the multimodality soon. The vision ability is 100% improved but its just a bit annoying I cant use it for making new images with the same style.
The flirtatious nature may be to stall for time. If you spend time repeating compliments or anecdotes it gives the system time to think about the next thought or sentence.
We have all seen the videos from open ai. I am excited to see some testing content when it becomes available.
OMG at the 3d modelling. Anyone that has ever touched Blender will tell you how incredibly labour intensive it is.
As to the other interactions, more cringe than productivity.
TikTok version of GPT4.