First Look At GPT-4 With Vision

Поделиться
HTML-код
  • Опубликовано: 28 сен 2024
  • To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/... . The first 200 of you will get 20% off Brilliant’s annual premium subscription!
    Making this video was quite a rollercoaster! From Dall-e 3 not yet been releaed, to confirmed multi-modal GPT-4 release, I cannot believe I have hijacked such a funny timing.
    Special thanks to bruhmoment for providing me the Bard results, and Raphael for BeMyEyes access
    [Dall-e 3 Blog] openai.com/dal...
    [ChatGPT Multi-modal Blog] openai.com/blo...
    [Be My Eyes] www.bemyeyes.com/
    This video is supported by the kind Patrons & RUclips Members:
    🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO
    [Discord] / discord
    [Twitter] / bycloudai
    [Patreon] / bycloud
    [Music] massobeats - floral: • massobeats - floral (r...
    [Profile & Banner Art] / pygm7

Комментарии • 69

  • @bycloudAI
    @bycloudAI  Год назад +10

    To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/bycloud . The first 200 of you will get 20% off Brilliant’s annual premium subscription!
    Making this video was quite a rollercoaster! From Dall-e 3 not yet been releaed, to confirmed multi-modal GPT-4 release, I cannot believe I have hijacked such a funny timing.

  • @Clybius
    @Clybius Год назад +83

    Just wanted to say, you're like the only AI 'tuber I've seen who isn't full of "THIS IS SO HYPE" and scammy vibes, or overly simplified tutorials. Awesome stuff man, good editing as well.

    • @Destragond
      @Destragond 11 месяцев назад +1

      I recommend "AI Explained". My favourite one. He tends to read a crazy amount of up to date research papers on AI.

  • @Lumegrin
    @Lumegrin Год назад +2

    1:06 welp
    so much for that

  • @Taireyn
    @Taireyn Год назад +6

    I think OpenAI started rolling out the image feature already on it's own platform for plus users

  • @amallukose3763
    @amallukose3763 Год назад +2

    OpenAi just tweeted about vision coming to chatgpt

  • @LostMekkaSoft
    @LostMekkaSoft Год назад +1

    when i see how accurate it can describe random peoples rooms, i cant help but thinking:
    with this we finally solved the problem of how to automatically transform our vacuum robot enabled mass surveillance data into an easily searchable format 😅

  • @le9038
    @le9038 Год назад +3

    Wow. if only this API was released to the public...

  • @Y0UT0PIA
    @Y0UT0PIA Год назад +1

    on one hand, it is super impressive how much can be done within the current paradigm and with what level of precision, but on the other - don't you also feel like the promises of AGI and something that transcends 'use huge datasets to train transformer models to imitate said datasets and then further finetune and modify them to make them perform specific tasks that fall within the logic of those datasets' seem just as far off as they did 8 months ago? or do you think that the exponential curve is real after all?

  • @sharpcircle6875
    @sharpcircle6875 Год назад

    "Serval boxes of computer parts sitting on a table" seems pretty satisfying for me.
    I'm pretty tech oriented and I still had to squint to know what half of those boxes were all about lol :v
    Their quite niche items so I don't blame an AI if he's at least able to at minimum figure out what is represented in general.

  • @sharpcircle6875
    @sharpcircle6875 Год назад

    That cool and but can it tag correctly those NH and danbooru works compared to some of those lazy posters :v ?

  • @ejonesss
    @ejonesss 10 месяцев назад

    be my eyes app may be useful for the non blind by helping in cases for example where companies want to save paper by taking an instruction manual the size of a phone book and put it on a small 2x2 inch paper that unfolds to a strip like 12 inches long.
    if you ever bought some small tech like one of them pocket sized usb hard drives by seagate you no doubtedly seen those matchbook sized instruction manuals
    just as some of the assistive features in the modern operating systems can help for example the feature that turns the number keypad into a mouse can be used for precise mouse movements for graphics and sound editing.
    i think there needs to be safeguards because be my eyes could encourage photographing critical infrastructure.

  • @matvey_minaev
    @matvey_minaev Год назад +3

    damn

  • @zyxwvutsrqponmlkh
    @zyxwvutsrqponmlkh Год назад

    I was quite impressed with the chinese multimodal, noticed you didn't compare with it.

  • @DrW1ne
    @DrW1ne Год назад

    ChatGpt 4 will make a great COME BACK ?

  • @LeBeautiful
    @LeBeautiful Год назад

    A.I HYPE NEVER DYING DOWN

  • @LavaCreeperPeople
    @LavaCreeperPeople Год назад +1

    What is this channel

  • @JazevoAudiosurf
    @JazevoAudiosurf Год назад

    with all those H100s and reduced training time and massive $, I bet they run 100m$ experiments and have something huge that's just too expensive to inference on scale or too dangerous. basically nothing preventing large corps from building giga AI models but everything preventing them from releasing them

  • @mrrespected5948
    @mrrespected5948 Год назад

    Nice

  • @nuvotion-live
    @nuvotion-live Год назад

    My bet is on them achieving AGI first. But no way will the open source community be more than a year behind. And AGI is AGI. It’s the singularity. Once that’s unlocked we are going to really see some mind bending stuff. Computers will start being designed from the ground up by AI for AI

    • @Jordan-fg9cc
      @Jordan-fg9cc Год назад +4

      gpt is good but you've really bought into the hype train a bit too hard.

    • @nuvotion-live
      @nuvotion-live Год назад

      @@Jordan-fg9cc it’s ok as is, if you give it a lot of context. But I’m just extending out the timeline more than a couple months. It shouldn’t take much imagination to picture the confluence of all these models coming together

    • @Jordan-fg9cc
      @Jordan-fg9cc Год назад +1

      @@nuvotion-live which definition of AGI are you going by? because I don't think that a confluence of LLMs would be close enough to true strong AI to be considered an AGI, but if you are just going by 'better than humans for a wide range of tasks', sure

    • @nuvotion-live
      @nuvotion-live Год назад

      @@Jordan-fg9cc no not just LLMs. I’m talking about ChatGPT4, not the neutered public version but the one they are keeping internal for now. So GPT4 LLM + DALL-E 3 + Be My Eyes. Look up each of those and realize they are all one in the same. Then add a few years of iteration.

    • @nuvotion-live
      @nuvotion-live Год назад

      @@Jordan-fg9cc then put all that into a Tesla Bot with Eleven Labs quality speech, whisper, and a 100M token context window. That’s what I mean by confluence

  • @nemod3338
    @nemod3338 11 месяцев назад

    Soon humans (the next generation) will not have a personality, they will have personal AI. DOOMS DAY.

  • @laupoke
    @laupoke Год назад

    Bing has had this for a few months already ? Why does no one mention that ?

  • @MaThMaTa1000
    @MaThMaTa1000 Год назад

    Noice

  • @mrtony3152
    @mrtony3152 Год назад

    I am sorry but I read it, GPT is Balck.

  • @ygg278
    @ygg278 Год назад +4

    BRO SO EARLY 1 VIEW LMAO

  • @L_QTx3
    @L_QTx3 Год назад +40

    Finally a life changing innovation that comes from using AI

    • @TopCuby
      @TopCuby Год назад +1

      Fr tho

    • @L_QTx3
      @L_QTx3 Год назад

      @@torontoyes Believe me, I am not who praises an image generator, I do not praise chatgpt and I actually hate the sole idea of humanizing technology to the point of existing so linked together that no teenager today can live 10 minutes without a phone and neither learn a single topic without internet or opening a book.
      But really this is a good application to chatGPT, despite fully blind or partially blind people needing someone to help them at all times (and really never going to the streets alone) this is a really good use to these developing technologies since some virtual assistants like siri and the google one can easily fall short in some tasks.

    • @CombustibleL3mon
      @CombustibleL3mon Год назад

      ​@torontoyes Rather than insult and berate you could actually try to provide useful information to learn from...

    • @torontoyes
      @torontoyes Год назад

      @CombustibleL3mon your right. I'll delete that comment. I'm not as excited about AI as I'd like to be. Not all innovations will benefit us in the long term. What we are witnessing, is similar too, the water being drawn back, and we are fascinated with the sea shells. Wait till the water exhales.

  • @nilaier1430
    @nilaier1430 Год назад +24

    Future Image captioning for datasets is going to be absolutely insane!

    • @llmtime2178
      @llmtime2178 Год назад

      insanely EXPENSIVE lol

    • @nilaier1430
      @nilaier1430 Год назад

      @@llmtime2178 Yeah, fair enough

    • @carkawalakhatulistiwa
      @carkawalakhatulistiwa Год назад

      ​@@llmtime2178give them 800 T USD like US militry

    • @cesar4729
      @cesar4729 Год назад

      ​​​@@llmtime2178Wait. You think human caption is cheaper?
      L
      O
      L

  • @ChandravijayAgrawal
    @ChandravijayAgrawal Год назад +6

    if this could be fitted into specs it will become Jarvis level technology, we all could become Iron man

  • @balanse01
    @balanse01 Год назад +4

    I wonder if it can help out with electrical circuits

  • @ВадимМарченко-ы4е
    @ВадимМарченко-ы4е Год назад +2

    I've been researching the multimodal LLM's field for a while, and I have an idea why opensource models perform poorly compared to GPT-4. Most of the models are based on augmenting LLM's with vision transformers, such as CLIP (EVA) or pure VIT and they are very simple models that can operate only with 336x336 images at max. So i think that they aren't able to distinguish text and labels because the letters are compressed to just a blob of pixels that even human cannot recognize

  • @acousticdoodling4765
    @acousticdoodling4765 Год назад +2

    I've just discovered this channel today after searching for a good AI news coverage channel. Great content overall.
    My suggestion would be to slow down a bit and maybe provide more in-depth as well as simple explanations for some of the concepts. You go through a lot of details quickly and it's kind of hard to follow at times(maybe not this video specifically, but previous ones definitely suffer from information overload), more background information and context would be helpful for viewers who are new to the topic. Other than that, keep up the good work. Looking forward to more.

  • @back2thenature2
    @back2thenature2 11 месяцев назад

    You sound so different compared to how you look in the thumbnail but hey, we shouldn't judge books by their covers right? haha

  • @TheAkdzyn
    @TheAkdzyn Год назад +8

    I'd like gpt 4 to be prompted to create a randomised infinite sequence of visual prompts that are fed into dall-e 3 so that there is a constant output of random images in high resolution.

  • @messengercreator
    @messengercreator Год назад

    and I'll challenge u AI CHAT DEEPAI and DEEPAI and u and dalle2 and 3 and GPT4 pls

  • @lonlipscomb813
    @lonlipscomb813 Год назад

    AI oops. At 3:25 the "assistant" wrongly says, "When about to land, pull the brake on right." But the brake is on the left under the pilot's left hand. Specifically this is the speed brake, which at constant airspeed controls the angle of descent. (Also, while rolling out pulling fully against the backstop at varying pressure applies the wheel brake to that amount.)

  • @bennguyen1313
    @bennguyen1313 Год назад

    What about audio? Have any of the LLM been pointed towards automatically translating speech-recording to other languages?

  • @tatacraft791
    @tatacraft791 Год назад

    imagine being one of the patreons shouted out at the end of the video...

  • @oryxchannel
    @oryxchannel 11 месяцев назад

    credible and concise digests

  • @diegomoralessepulved
    @diegomoralessepulved Год назад

    Has not Bing Ai been doing this multi modal for a while already?

    • @jackslocum2559
      @jackslocum2559 Год назад +1

      They use a different AI to give the image a text description. This one will be actually multimodel, so it can understand stuff text just can't explain

  • @TopCuby
    @TopCuby Год назад

    You're a legend man , keep on uploading

  • @digletwithn
    @digletwithn Год назад

    re-upload?

  • @JaredQueiroz
    @JaredQueiroz Год назад

    ELE FEZ A RISADA BRAZUKA KKKKKKKK