Learn How he reproduced Karpathy's GPT-2 for Audio!!!

Поделиться
HTML-код
  • Опубликовано: 14 июн 2024
  • 🔗 Links 🔗
    Building GPT2o - Part 1 : Audio
    / building-gpt2o-part-1-...
    GPT-2 for Audio - github.com/nivibilla/build-na...
    Srinivas Billa Twitter
    x.com/sbeastwindy
    Srinivas Billa Linkedin
    / srinivasbilla
    Andrej Karpathy's GPT-2 Video - • Let's reproduce GPT-2 ...
    ❤️ If you want to support the channel ❤️
    Support here:
    Patreon - / 1littlecoder
    Ko-Fi - ko-fi.com/1littlecoder
    🧭 Follow me on 🧭
    Twitter - / 1littlecoder
    Linkedin - / amrrs
  • НаукаНаука

Комментарии • 33

  • @Srinivas_Billa
    @Srinivas_Billa 20 дней назад +19

    Thanks for having me!

  • @CubicPostcode
    @CubicPostcode 19 дней назад +2

    I was thinking about how OpenAI could come up with nice voices without being prone to be sued legally. I came up with this idea. It would generate voices randomly and provably so, it would be possible to prove the voices where generated randomly and then people could upvote or downvote voices, so that the most popular ones according the crowdsourced polling would be the ones featured in the app. Since the voices where randomly generated no one could say they where an imitation of someone. Also it would be no fault of OpenAI that people preferred some of them. Also, this seems a better approach than just allowing users to upload a sample of their desired voices since with this approach you can avoid misuses to do with deepfaking.

  • @mshonle
    @mshonle 20 дней назад

    Very cool! I wonder if you could leverage available text models to do something like the model mashups or Franken-merges? For example if you do a LoRA-like fine-tuning, but focused on all layers with addition layers added to both ends (to translate from the audio encodings to the pretrained model’s hidden embeddings and then back to decodable audio again).

  • @christaylor-gz6mi
    @christaylor-gz6mi 18 дней назад +1

    Thank you! This is fantastic content!

  • @chickenp7038
    @chickenp7038 20 дней назад

    awesome video.

  • @maulikmadhavi
    @maulikmadhavi 16 дней назад +1

    thanks for sharing 😊

  • @user-en4ek6xt6w
    @user-en4ek6xt6w 20 дней назад +1

    Mind blowing

  • @petersobolewski1354
    @petersobolewski1354 19 дней назад

    How about traiming it on animal sounds? Will it learn to speak with them?

  • @1msirius
    @1msirius 20 дней назад

    cool bro!

  • @satyamtiwari3839
    @satyamtiwari3839 20 дней назад

    It is inspiring

  • @haileycollet4147
    @haileycollet4147 16 дней назад

    HF datasets/jhu-clsp/seamless-align-expressive
    The English half of this is ~3500 hours I think.

  • @WebWizard977
    @WebWizard977 20 дней назад +1

    Its feel like tts which convert audio to text then send it to gpt server🤔

    • @1littlecoder
      @1littlecoder  20 дней назад

      This is one single native audio model

    • @WebWizard977
      @WebWizard977 20 дней назад

      @@1littlecoder wow that's amazing

    • @efexzium
      @efexzium 20 дней назад

      Its a Large Multy Modal
      Model
      ?

    • @WebWizard977
      @WebWizard977 20 дней назад +1

      @@efexzium ya sure brother I have my own llm also but thanks for theory update

    • @efexzium
      @efexzium 20 дней назад

      @@WebWizard977 me 2 but I just cherry pick from the internet the best model.

  • @xXWillyxWonkaXx
    @xXWillyxWonkaXx 16 дней назад

    Forgive me in saying this, but why is this "a great project"? i failed to understand? So he did an audio in-audio out model similar to GPT-4o audio feature? is that it? We're getting an open sourced model.

  • @TheRealUsername
    @TheRealUsername 20 дней назад

    GPT-4o ?

    • @1littlecoder
      @1littlecoder  20 дней назад

      He mentions that's his motivation

    • @TheRealUsername
      @TheRealUsername 20 дней назад

      @@1littlecoder I mean, it could be the same tech behind GPT-4o

    • @Srinivas_Billa
      @Srinivas_Billa 20 дней назад +1

      This is a very naive way to do it but yeah. There probably ar eothe r optimisation to make but I'd rather have something to play with than not I guess?

    • @TheRealUsername
      @TheRealUsername 20 дней назад

      @@Srinivas_Billa so that does mean the open-source community is capable of building something similar to GPT-4o

    • @Srinivas_Billa
      @Srinivas_Billa 20 дней назад +2

      @@TheRealUsername of course! I plan to try video next and then ultimately combine them together. The tools are there to do it. compute is the issue.