First Look At GPT-4 With Vision
HTML-код
- Опубликовано: 28 сен 2024
- To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/... . The first 200 of you will get 20% off Brilliant’s annual premium subscription!
Making this video was quite a rollercoaster! From Dall-e 3 not yet been releaed, to confirmed multi-modal GPT-4 release, I cannot believe I have hijacked such a funny timing.
Special thanks to bruhmoment for providing me the Bard results, and Raphael for BeMyEyes access
[Dall-e 3 Blog] openai.com/dal...
[ChatGPT Multi-modal Blog] openai.com/blo...
[Be My Eyes] www.bemyeyes.com/
This video is supported by the kind Patrons & RUclips Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music] massobeats - floral: • massobeats - floral (r...
[Profile & Banner Art] / pygm7
To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/bycloud . The first 200 of you will get 20% off Brilliant’s annual premium subscription!
Making this video was quite a rollercoaster! From Dall-e 3 not yet been releaed, to confirmed multi-modal GPT-4 release, I cannot believe I have hijacked such a funny timing.
What is this channel
Just wanted to say, you're like the only AI 'tuber I've seen who isn't full of "THIS IS SO HYPE" and scammy vibes, or overly simplified tutorials. Awesome stuff man, good editing as well.
I recommend "AI Explained". My favourite one. He tends to read a crazy amount of up to date research papers on AI.
1:06 welp
so much for that
I think OpenAI started rolling out the image feature already on it's own platform for plus users
OpenAi just tweeted about vision coming to chatgpt
when i see how accurate it can describe random peoples rooms, i cant help but thinking:
with this we finally solved the problem of how to automatically transform our vacuum robot enabled mass surveillance data into an easily searchable format 😅
Wow. if only this API was released to the public...
on one hand, it is super impressive how much can be done within the current paradigm and with what level of precision, but on the other - don't you also feel like the promises of AGI and something that transcends 'use huge datasets to train transformer models to imitate said datasets and then further finetune and modify them to make them perform specific tasks that fall within the logic of those datasets' seem just as far off as they did 8 months ago? or do you think that the exponential curve is real after all?
"Serval boxes of computer parts sitting on a table" seems pretty satisfying for me.
I'm pretty tech oriented and I still had to squint to know what half of those boxes were all about lol :v
Their quite niche items so I don't blame an AI if he's at least able to at minimum figure out what is represented in general.
That cool and but can it tag correctly those NH and danbooru works compared to some of those lazy posters :v ?
be my eyes app may be useful for the non blind by helping in cases for example where companies want to save paper by taking an instruction manual the size of a phone book and put it on a small 2x2 inch paper that unfolds to a strip like 12 inches long.
if you ever bought some small tech like one of them pocket sized usb hard drives by seagate you no doubtedly seen those matchbook sized instruction manuals
just as some of the assistive features in the modern operating systems can help for example the feature that turns the number keypad into a mouse can be used for precise mouse movements for graphics and sound editing.
i think there needs to be safeguards because be my eyes could encourage photographing critical infrastructure.
damn
I was quite impressed with the chinese multimodal, noticed you didn't compare with it.
ChatGpt 4 will make a great COME BACK ?
A.I HYPE NEVER DYING DOWN
What is this channel
with all those H100s and reduced training time and massive $, I bet they run 100m$ experiments and have something huge that's just too expensive to inference on scale or too dangerous. basically nothing preventing large corps from building giga AI models but everything preventing them from releasing them
Nice
My bet is on them achieving AGI first. But no way will the open source community be more than a year behind. And AGI is AGI. It’s the singularity. Once that’s unlocked we are going to really see some mind bending stuff. Computers will start being designed from the ground up by AI for AI
gpt is good but you've really bought into the hype train a bit too hard.
@@Jordan-fg9cc it’s ok as is, if you give it a lot of context. But I’m just extending out the timeline more than a couple months. It shouldn’t take much imagination to picture the confluence of all these models coming together
@@nuvotion-live which definition of AGI are you going by? because I don't think that a confluence of LLMs would be close enough to true strong AI to be considered an AGI, but if you are just going by 'better than humans for a wide range of tasks', sure
@@Jordan-fg9cc no not just LLMs. I’m talking about ChatGPT4, not the neutered public version but the one they are keeping internal for now. So GPT4 LLM + DALL-E 3 + Be My Eyes. Look up each of those and realize they are all one in the same. Then add a few years of iteration.
@@Jordan-fg9cc then put all that into a Tesla Bot with Eleven Labs quality speech, whisper, and a 100M token context window. That’s what I mean by confluence
Soon humans (the next generation) will not have a personality, they will have personal AI. DOOMS DAY.
Bing has had this for a few months already ? Why does no one mention that ?
Noice
I am sorry but I read it, GPT is Balck.
BRO SO EARLY 1 VIEW LMAO
im 4
Finally a life changing innovation that comes from using AI
Fr tho
@@torontoyes Believe me, I am not who praises an image generator, I do not praise chatgpt and I actually hate the sole idea of humanizing technology to the point of existing so linked together that no teenager today can live 10 minutes without a phone and neither learn a single topic without internet or opening a book.
But really this is a good application to chatGPT, despite fully blind or partially blind people needing someone to help them at all times (and really never going to the streets alone) this is a really good use to these developing technologies since some virtual assistants like siri and the google one can easily fall short in some tasks.
@torontoyes Rather than insult and berate you could actually try to provide useful information to learn from...
@CombustibleL3mon your right. I'll delete that comment. I'm not as excited about AI as I'd like to be. Not all innovations will benefit us in the long term. What we are witnessing, is similar too, the water being drawn back, and we are fascinated with the sea shells. Wait till the water exhales.
Future Image captioning for datasets is going to be absolutely insane!
insanely EXPENSIVE lol
@@llmtime2178 Yeah, fair enough
@@llmtime2178give them 800 T USD like US militry
@@llmtime2178Wait. You think human caption is cheaper?
L
O
L
if this could be fitted into specs it will become Jarvis level technology, we all could become Iron man
I wonder if it can help out with electrical circuits
I've been researching the multimodal LLM's field for a while, and I have an idea why opensource models perform poorly compared to GPT-4. Most of the models are based on augmenting LLM's with vision transformers, such as CLIP (EVA) or pure VIT and they are very simple models that can operate only with 336x336 images at max. So i think that they aren't able to distinguish text and labels because the letters are compressed to just a blob of pixels that even human cannot recognize
I've just discovered this channel today after searching for a good AI news coverage channel. Great content overall.
My suggestion would be to slow down a bit and maybe provide more in-depth as well as simple explanations for some of the concepts. You go through a lot of details quickly and it's kind of hard to follow at times(maybe not this video specifically, but previous ones definitely suffer from information overload), more background information and context would be helpful for viewers who are new to the topic. Other than that, keep up the good work. Looking forward to more.
You sound so different compared to how you look in the thumbnail but hey, we shouldn't judge books by their covers right? haha
I'd like gpt 4 to be prompted to create a randomised infinite sequence of visual prompts that are fed into dall-e 3 so that there is a constant output of random images in high resolution.
That sounds interesting!
Your eye already does that
Just smoke some weed with it
Cool idea
@MrMessa45 maybe you should smoke less weed then 🤣🤣🤣
and I'll challenge u AI CHAT DEEPAI and DEEPAI and u and dalle2 and 3 and GPT4 pls
AI oops. At 3:25 the "assistant" wrongly says, "When about to land, pull the brake on right." But the brake is on the left under the pilot's left hand. Specifically this is the speed brake, which at constant airspeed controls the angle of descent. (Also, while rolling out pulling fully against the backstop at varying pressure applies the wheel brake to that amount.)
What about audio? Have any of the LLM been pointed towards automatically translating speech-recording to other languages?
imagine being one of the patreons shouted out at the end of the video...
credible and concise digests
Has not Bing Ai been doing this multi modal for a while already?
They use a different AI to give the image a text description. This one will be actually multimodel, so it can understand stuff text just can't explain
You're a legend man , keep on uploading
re-upload?
ELE FEZ A RISADA BRAZUKA KKKKKKKK
Aonde