Language or Vision - What's Harder? (Ilya Sutskever) | AI Podcast Clips

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024

Комментарии • 63

  • @darshantank554
    @darshantank554 3 года назад +17

    "where the vision ends, language begins" this line touches my heart!

  • @JonKroeker
    @JonKroeker Год назад +6

    Not only is this guy brilliant, he’s just such a nice guy

  • @bubelevakalisa7313
    @bubelevakalisa7313 3 года назад +12

    Vision ends when the viewer (agent 1) sees the words. Language begins when the viewer (agent 1) combines the words it has seen with "prior knowledge" and then communicate "value added" information to a listener (agent 2). For example, when agent 1 "sees" (vision) the name Lewis Hamilton, it must be able to use its knowledge about Hamilton to effectively engage in a coherent conversation with an expert about this great F1 driver. At the moment state of the art like GPT3 can fake a coherent only when communicating with non-experts.

    • @sidgirase
      @sidgirase 2 года назад

      Vision take the visual input, Brain caches the sentences, NLP begins? If cache is out of memory, Vision goes back and queries the same input again?

  • @DamianReloaded
    @DamianReloaded 4 года назад +14

    This is about semantic interpretation. Whether image recognition and natural language processing could share the same "back end" for semantic interpretation and abstraction. I wonder if one could train an convolutional NN and a transformer to spit out the same semantic vector. So a natural language description of a picture and the picture would be compressed into the same (or similar) vector space coordinates ? :/

    • @Gyringag
      @Gyringag 4 года назад +1

      There is already shine datasets for this task: you build net for NLP, net for CV and minimize KL-div between two hidden spaces

  • @nikhilvarmakeetha3917
    @nikhilvarmakeetha3917 4 года назад +27

    The question "Where does vision end and language start?" was intriguing. It shows a potential final destination that needs to achieved for DL based AI.

    • @breakawaybooks4752
      @breakawaybooks4752 4 года назад

      ✔️John Venn liked this.

    • @holgerjrgensen2166
      @holgerjrgensen2166 4 года назад

      Windows start, should be, On/off, it is here Your illiteracy begin,
      better open some windows and get some fresh air, and understand the nature of dictator-principle.
      AI, is illiteracy and superstition, - intelligence can never be artificial.
      Repeating dead mantras, is Not individual thinking.
      The development of Consciousness and Language is two sides of the very same development, based on Eternal Principles.

    • @olivercroft5263
      @olivercroft5263 4 года назад

      @@holgerjrgensen2166 more like returnal than eternal 🤔😘

    • @holgerjrgensen2166
      @holgerjrgensen2166 4 года назад

      What do You mean, if You know what You're saying.

    • @holgerjrgensen2166
      @holgerjrgensen2166 4 года назад

      Ru-Mu,
      okai, can means allmost any thing, in danish, it is åvkæj, just a sound-combination.

  • @adambrickley1119
    @adambrickley1119 4 года назад +4

    "i am going to explain why"...opens by asking a question, nice!

  • @user-my5qk5xu1d
    @user-my5qk5xu1d 4 года назад +12

    0:49 The Word is "Interdisciplinary"

    • @ssssssstssssssss
      @ssssssstssssssss 4 года назад +2

      Man. I hate that word.... It stems from artificial boundaries that we've created due to historical happenstance.

  • @TimoNineSix
    @TimoNineSix 4 года назад +4

    once the vision can read the language, the loop is complete

  • @chrisbarry9345
    @chrisbarry9345 Год назад +1

    Man this is going to finally get watched by people

  • @justinkiff4159
    @justinkiff4159 4 года назад +3

    I think the wife example is quiet bad because there is a sexual component in the perception of the other, probably with a friend there will be more objectivity.
    Also yes if you have human level speech recognition and understanding you'll have the vision for free, understanding text is just a primitive form of acquiring information, replace objects on a picture by words and voila.

  • @Ross-nd6xi
    @Ross-nd6xi 4 года назад +2

    You should get a linguist on lex might be interesting to talk about the hermeneutic aspect of language learning and interpretation for AGI

  • @timdh100
    @timdh100 4 года назад +6

    Lex, how about a podcast with Shai Ben-David on advances on the theoretical side of ML?

  • @NoOne-uz4vs
    @NoOne-uz4vs 4 года назад +7

    0:54 - Does anyone know what those principles are??

    • @nobodykid23
      @nobodykid23 4 года назад +1

      This is just my ballpark guess, but i think it should be empirical risk minimization and something around no free lunch theorem. i dont know the third one

  • @FromFame
    @FromFame 4 года назад +2

    I literarily suffer from the same cosmetic matter this respectable person suffers from.
    I use a solution daily, I understand how you can get used to it but please for the sake of other people research a solution too. I felt embarrassed to mention it and not many will, but I care about AI and those pushing it forward. Beyond being highly intelligent, you are an attractive person👍

  • @jonomichi2262
    @jonomichi2262 9 месяцев назад

    I thought the interviewer was smart, but Ilya is on a different level.

  • @joshuaerkman1444
    @joshuaerkman1444 2 года назад +1

    Language has much higher dimensionality than vision. Vision has three basic dimensions and that could probably be abstracted up to thousands or millions. Language has over 6,500 basic dimensions. The abstraction of these basic dimensions may go into the trillions

  • @jamesblankenship3077
    @jamesblankenship3077 4 года назад +1

    This conversation really seemed to enlighten me on how language would have been impossible with sight and hearing. I can see that a word can have many definitions without the presence of a visual or tone of voice. So for the computer to learn. If we relate these few in the algorithm things so that the computer can as we did. If the computer is a rigid piece of electronics, isn't that how life began billions of years ago? Maybe with a better architect.

  • @burkebaby
    @burkebaby Год назад

    This was an interesting conversation! Lex - I wonder if the title should be "Language vs. Vision" instead. 6:56 - In terms of Generative AI, can Language and Vision both work to improve each other, like an arms race? How will the AI model and algorithm decide when to determine a pass or fail result for either/or?

  • @mohammadaminparchami7462
    @mohammadaminparchami7462 4 года назад +5

    Hey lex, one cool thing would be to add some more media to the conversations. Show the guests some clips, read them news, and then we would like to hear their opinion. Great job ✋🏻👏🏻

  • @johnniefujita
    @johnniefujita 4 года назад +1

    i believe cnn and nlp should stand as inputs for decision making systems and reinforcement learning should explore space for actions, state and targets states. so the 2 first are more like perception constructor and the last as decision space explorer

  • @stevee5718
    @stevee5718 Год назад

    So interesting to look back at this interview now, in the wake of GPT4.

  • @leecharlie2513
    @leecharlie2513 3 года назад +2

    Which field have more jobs(NLP or CV)? It seems to me that so far there are a lot more applications for CV, and therefore CV has more jobs opportunities than NLP. Simply search “computer vision job USA” in google and “NlP jobs USA”, the comparison result of both will show that CV has more jobs. Wonder what is your 2 cent on it? Maybe I am wrong?

    • @MrSchweppes
      @MrSchweppes 3 года назад +2

      It will change this year or maybe in 2022.

    • @nn-sv5vi
      @nn-sv5vi 2 месяца назад

      It's 2024 and your opinions are still correct so far

  • @BenIsraelSeatriz
    @BenIsraelSeatriz Месяц назад

    Lex found an opportunity to sneak in his interest in monogamous marriage then smiled

  • @henrikbergman4055
    @henrikbergman4055 4 года назад

    Throwing out a question here, as there are some clever people in the thread. Anyone care to help me understand why "natural language" (and does that exclude body language and tone of voice?) would be important for AI? As an example; IKEA furniture assembly instructions don't need words to explain stuff to humans. And being a poet is not a requirement for human level intelligence, right?

    • @seo95
      @seo95 4 года назад +2

      Your examples are more about language generation, even if important, the hot topic nowadays is language understanding. Understanding language hides a lot of very difficult challenges. Among them reasoning about entities is one of the most difficult one. Each time we speak we refer to events happened in the past and in the present, make implicit relations between entities and talk about abstract things. The language is the description of the world in which we live and the abstract world we have created (the concept of nations, politics, jokes etc.). To understand language a machine needs at first to understand the world we have built. We are far from achieving something like that with AI.
      How can we pretend to have an "intelligent" machine if it can not understand us?

  • @umberto488
    @umberto488 2 месяца назад

    Beautiful hair

  • @shreeyatyagi
    @shreeyatyagi 4 года назад

    Yes, the manmade world (physicality) our thought and action is primarily governed by language. So, language is fundamental.

  • @AM-qx3bq
    @AM-qx3bq 3 года назад

    I don't understand the difficulty in the "Where vision ends and language starts" question. I imagine an advanced enough vision system can just recognize that a particular region of pixels assortment represents text, from that point it can be converted to raw text (which is a decades-old solved problem) and then fed to an NLP pipeline for interpretation. Imu, it's not a vision system's role to accomplish language understanding, but it would be ideal if it could at least identify what is text and relay it to the NLP component.

  • @IsmaelAlvesBr
    @IsmaelAlvesBr 4 года назад

    The problem is that we are trying to make a robotic brain from scratch. Maybe the solution is to give initials steps so that it doesn't start from 0. It's like when you learn other language. You already know what is a dog, but need to learn how to say it in other "way" and when you should say it.

    • @maxsnts
      @maxsnts 4 года назад +1

      How does that apply? When a baby is born he does not know what a dog is.
      The only thing he starts with are unconscious behaviors, like "cry if hungry".
      In that sense starting from scratch seams very similar.

  • @pawarboy7
    @pawarboy7 2 года назад

    I think vision lags language because it doesn't have a lot of labeled data

  • @pratik245
    @pratik245 2 года назад

    Great Illya

  • @Priyanka-us8rw
    @Priyanka-us8rw Год назад +1

    Computer vision fascinating more

  • @BaikalLV
    @BaikalLV 4 года назад +4

    8:15 such a blue pilled Lex

  • @danielcogzell4965
    @danielcogzell4965 4 года назад

    man.. I find it interesting how I really respect Ilya for what he achieved but I just don't agree with his views on things most of the time.

  • @styles9783
    @styles9783 4 года назад

    Hey Lex

  • @olivercroft5263
    @olivercroft5263 4 года назад +3

    Rezpect ze russians🇷🇺

  • @ko95
    @ko95 Год назад

    hmmm

  • @chocolategolemofroidgutand2839
    @chocolategolemofroidgutand2839 4 года назад

    JUST

  • @luisselvera9878
    @luisselvera9878 2 года назад

    Vision ends when language starts.

  • @henrychoy2764
    @henrychoy2764 3 года назад

    hav 2 say that the dumbest animals hav vision but not langwage

  • @shreeyatyagi
    @shreeyatyagi 4 года назад

    Language

    • @leecharlie2513
      @leecharlie2513 3 года назад

      Why?

    • @shreeyatyagi
      @shreeyatyagi 3 года назад

      @@leecharlie2513 because language is a representation.

    • @leecharlie2513
      @leecharlie2513 3 года назад +2

      @@shreeyatyagi But isn’t the recent GPT-3 demonstrating very promising result to generating meaningful text and dialog?

  • @michaelpetronzio6557
    @michaelpetronzio6557 4 года назад +1

    You are the most nicest cutest thing!

  • @enriquemartinez5647
    @enriquemartinez5647 4 года назад

    Read what Lacan says about language. Not chomsky.

  • @jefferysherwood7424
    @jefferysherwood7424 4 года назад +1

    🐸🐸🐸🐸🐸🐸