The ONLY Real Time Speech AI that can run locally!!!

Поделиться
HTML-код
  • Опубликовано: 18 ноя 2024

Комментарии • 92

  • @1littlecoder
    @1littlecoder  2 месяца назад +5

    I'm continuing to mess up French Names 😭

    • @adriangpuiu
      @adriangpuiu 2 месяца назад

      ive got comforTABLE with that :))

    • @Nid_All
      @Nid_All 2 месяца назад

      how can i run it on windows ?

    • @1littlecoder
      @1littlecoder  2 месяца назад

      if you have nvidia gpu then download the pytorch models or use the candle ones

  • @JankJank-om1op
    @JankJank-om1op 2 месяца назад +23

    the demo is comedy gold

  • @saintkamus14
    @saintkamus14 2 месяца назад +2

    I've noticed that this model is fast enough to start answering before you even finish asking a question. This is ideal for real time translations, and another use case I have in mind. This would be perfect for my use case if I could train custom voices.

    • @1littlecoder
      @1littlecoder  2 месяца назад

      Couldn't agree more! Just has to be more stable

    • @doglibrary
      @doglibrary 2 месяца назад

      @@1littlecoder 🤣

    • @mohitranka9840
      @mohitranka9840 2 месяца назад +1

      What is your usecase?

    • @saintkamus14
      @saintkamus14 2 месяца назад

      @@mohitranka9840 similar to real time translations, but "style" translations instead.

    • @RadiantNij
      @RadiantNij 2 месяца назад +1

      Yeah I'll wait until its stable moshi is unhinged I've heard it say terrible things on wes roth's channel

  • @OliNorwell
    @OliNorwell 2 месяца назад +9

    I agree, this is a great release, everything there, PDF of the paper, Github code, runs locally inside 24GB VRAM.
    I'm astonished it works on a local machine, and the way it records the calls too is extremely cool. Yeah of course it is like talking to a moody 10 year old, but hey, we have to start somewhere.

    • @1littlecoder
      @1littlecoder  2 месяца назад +2

      Absolutely, I need to explore further to see how to customize the demo and programmatic access

  • @dr.mikeybee
    @dr.mikeybee 2 месяца назад +7

    I installed this on my M1 mac mini, and it's too slow to be usable.

    • @Ratonomist
      @Ratonomist Месяц назад

      normal it's shit for AI

  • @BabylonBaller
    @BabylonBaller 2 месяца назад +1

    This has to be the funniest interaction I've ever seen

  • @techfren
    @techfren 2 месяца назад +2

    omg yes they finally released it!

  • @ParvathyKapoor
    @ParvathyKapoor Месяц назад

    thank god its available in Pinokio

    • @1littlecoder
      @1littlecoder  Месяц назад

      Did you try it ?

    • @ParvathyKapoor
      @ParvathyKapoor Месяц назад +1

      @@1littlecoder Ya works good! the model download is like 15gb!! still better

  • @burncloud-com
    @burncloud-com Месяц назад +1

    Thank you for your work, I got it running.

    • @1littlecoder
      @1littlecoder  Месяц назад

      Awesome how was the experience M

    • @burncloud-com
      @burncloud-com Месяц назад

      @@1littlecoder I am unable to reach you on Discord.

  • @alx8439
    @alx8439 Месяц назад +1

    "What's your humor setting, TARS?" (c)

    • @1littlecoder
      @1littlecoder  Месяц назад

      🤣

    • @alx8439
      @alx8439 Месяц назад +1

      @@1littlecoder I have a guess, with some bigger quant it might be bit more intelligent. Anyways, that was great demo. Thanks a lot mate, love all your videos

    • @1littlecoder
      @1littlecoder  Месяц назад

      @@alx8439 thanks for the kind words! Appreciate the support. I didn't test the model with different configurations. Also the one I tested was a quanrized one.

  • @RickySupriyadi
    @RickySupriyadi 2 месяца назад +3

    so this one doesn't need to convert our speech into text then feed the text into some llm?
    it just speech into llm directly?

  • @saintkamus14
    @saintkamus14 2 месяца назад +6

    Moshi is mean AF. it told me "So let's solve this problem already" to which I said: "what problem, what problem do you want to solve?" and it said: "the problem of your stupidity" 😆

  • @captainoddessy
    @captainoddessy 2 месяца назад +1

    Although it has 5 minute limitation, but I think we can use it for customer support

    • @1littlecoder
      @1littlecoder  2 месяца назад +1

      I'm still not very sure if it is stable enough to be deployed in production. That'll be an interesting use case. We probably need more control over it

  • @oldfairy
    @oldfairy 2 месяца назад +4

    what a funny conversation

  • @marcfruchtman9473
    @marcfruchtman9473 2 месяца назад +1

    Traffic here was pretty awful too! heheh
    Not sure why you would pick an AI mode that was so adversarial?
    Thanks for the demo.

  • @aa-xn5hc
    @aa-xn5hc Месяц назад +1

    how to install in WSL ? int8 version

  • @rubbercable
    @rubbercable Месяц назад

    I'm still pessimistic on this: I don't have ARM/RISC processors.
    - My PC is AMD,
    - My Mac is Intel (the Apple version requires an M2 processor)
    I hope a viable option is provided on day.

  • @KevinKreger
    @KevinKreger Месяц назад +1

    Great fun. Should run on mobile soon?

  • @kundanmitra34
    @kundanmitra34 Месяц назад

    Can we fine tune this the llm?

  • @juanjesusligero391
    @juanjesusligero391 2 месяца назад +2

    Nice! :D
    Do you know what are the hardware requirements to run it on Windows? I've got an 8GB Nvidia GPU, and I'd like to test this, it seems really fun ^^

    • @1littlecoder
      @1littlecoder  2 месяца назад +1

      The Quantized version should definitely run fine on this machine

    • @juanjesusligero391
      @juanjesusligero391 2 месяца назад

      @@1littlecoder That's great! :D Which one? q8? q4? (I think they should add the hardware requirements, but nobody usually does that ^^U)

  • @gidmanone
    @gidmanone 2 месяца назад +1

    The most important thing is if it can be interrupted while talking.

  • @juliana.2120
    @juliana.2120 Месяц назад +1

    i laughed so hard at 4:03 xD bro was just lying straight to your face before

  • @emmanuelkolawole6720
    @emmanuelkolawole6720 Месяц назад

    Can we add context for the llm

  • @tvwithtiffani
    @tvwithtiffani Месяц назад

    I seen someone else demo this library and it had the same glitch. It really feels like something is inverted....a vector or a prompt or something is being flipped at some point in that library. There's probably a typo somewhere in the code.

  • @lakshyakumarpandey382
    @lakshyakumarpandey382 2 месяца назад

    Heyy !! Just a small question I got can we just set it up like attach it with other llm ( may be gpt , llama or any other )to get the text and then use it for the text to speech translation??

  • @VigneshK-o3l
    @VigneshK-o3l Месяц назад +1

    how to run windows

  • @shruthirao7352
    @shruthirao7352 2 месяца назад

    Is rag possible with this. On custom knowledge and plug it as api or sdk

    • @1littlecoder
      @1littlecoder  2 месяца назад

      I'm still exploring if there's a programmatic way to access this

  • @InAMinute-ws3yv
    @InAMinute-ws3yv 2 месяца назад +2

    its not supported for windows. also web demo version is lagging even though internet speed is very fast. so overall not as hyped as in these videos

    • @darksushi9000
      @darksushi9000 Месяц назад

      seriously I had it running on Windows 11 with a 3090. WAs super responsive but also liked telling me that it lived in the bronx new york and J-Lo was a Gangsters wife

  • @bakuleshsuhasrane8734
    @bakuleshsuhasrane8734 Месяц назад

    Hey any suggestions on building VLM micro from scratch ?

    • @1littlecoder
      @1littlecoder  Месяц назад

      Curious why do you want to build from scratch?

    • @bakuleshsuhasrane8734
      @bakuleshsuhasrane8734 Месяц назад

      @@1littlecoder There are specialised applications like Geospatial, TexttoSQL, Robotics, Credit Data etc field specialised as it's in Research that fine-tuning only works well on already trained data if done on New Data it hallusinates
      Quality Data = Quality Output on Edge Devices

    • @bakuleshsuhasrane8734
      @bakuleshsuhasrane8734 Месяц назад

      Beside like Moondancer2 cannot read Bill Pdfs

    • @bakuleshsuhasrane8734
      @bakuleshsuhasrane8734 Месяц назад

      ​@@1littlecoder Specialised Usage Faster Processing Accurate one
      Fine-tuning causes more hallucinations - Research if it's not done on existing trained data
      Geospatial, Text2SQL with Screen , Credit Data , IOT etc applications

  • @jason_v12345
    @jason_v12345 2 месяца назад

    Why is this the second video on Moshi I've seen today? Isn't this old news?

    • @1littlecoder
      @1littlecoder  2 месяца назад +1

      They just released the models a couple of days back. They had made the announcement long back

  • @rahim_khan_iitg
    @rahim_khan_iitg Месяц назад

    it is cute AI in pronounciation

  • @xXWillyxWonkaXx
    @xXWillyxWonkaXx 2 месяца назад

    Its really good. Really. But its no way close to the recently released Google Gemini Live's AI voice. It feels...a bit human like. Im not sure if it has something to do with the synthesizer engine

    • @1littlecoder
      @1littlecoder  2 месяца назад +1

      The fascinating part for me is it just runs, nothing fancy. I don't know what hardware google's AI voice is running. I still don't have access to Live.

  • @shApYT
    @shApYT Месяц назад

    They gave goody2 a voice

  • @__________________________6910
    @__________________________6910 Месяц назад

    I think you should change the TTS, I don't like the TTS voice

  • @mvasa2582
    @mvasa2582 2 месяца назад

    maybe ChatGPT can get it incorporated ...?

  • @surflaweb
    @surflaweb 2 месяца назад +3

    Im sorry but I dont 😅

  • @juliana.2120
    @juliana.2120 Месяц назад

    my moshi told me it tried heroin and it wasnt as bad as it thought lmaaoo

  • @MGTOWUNIVERSE
    @MGTOWUNIVERSE 2 месяца назад

    LMAO, This bot is a troll!

  • @MichealScott24
    @MichealScott24 Месяц назад

    ❤🫡😂😂

  • @Macorelppa
    @Macorelppa 2 месяца назад

    Who cares about running it locally if you can use top class OpenAI advanced voice mode

  • @Sujal-ow7cj
    @Sujal-ow7cj 2 месяца назад

    😂😂😂😂😂

  • @nauseouscustody1440
    @nauseouscustody1440 2 месяца назад

    No. I'm sorry. 😂😂

  • @sykexz6793
    @sykexz6793 2 месяца назад

    and like alwas TTS is the bottleneck.

    • @1littlecoder
      @1littlecoder  2 месяца назад

      why would you say so?

    • @sykexz6793
      @sykexz6793 2 месяца назад

      @@1littlecoder because the quality not that good, also it is not multilingual.

    • @1littlecoder
      @1littlecoder  2 месяца назад

      Got it

    • @Lorv0
      @Lorv0 2 месяца назад +5

      @@sykexz6793 let's remember this is the worse an open source model like this will ever be. This will be up as a foundation to other and better models in no time...

    • @sykexz6793
      @sykexz6793 2 месяца назад +1

      @@Lorv0 i agree, but i don't see alot of progress in open source tts department especially on device. On the other hand we already got really good solutions for asr and llms on device.

  • @sonOfLiberty100
    @sonOfLiberty100 2 месяца назад

    Let me guess you never was interested to play with me in chess.

    • @1littlecoder
      @1littlecoder  2 месяца назад

      I just play Bullet most of the time, so not a good chess player

    • @sonOfLiberty100
      @sonOfLiberty100 Месяц назад

      @@1littlecoder I play bullet as well 1 minute right now I play the most

  • @shekharkumar1902
    @shekharkumar1902 2 месяца назад

    What's point of playing with a mess and wasting time ?
    I have created a talking RAG , will using OpenAI in backend. It is fast and furious.😊

    • @1littlecoder
      @1littlecoder  2 месяца назад +2

      You can play with open AI if you're okay to send your data to some server. Solution is not like that. In fact, the solution is completely different in terms of architecture. If wasting time is what we are talking about. Llms when they started were nothing like this. People thought we are wasting time. The same with stable diffusion initials were so ugly. In fact, they were so ugly that they became memes, but now we have one of the best realistic pictures. The future is only going to get better from now.

    • @xhridhar
      @xhridhar 2 месяца назад

      What’s the voice model are you using?

  • @zyxwvutsrqponmlkh
    @zyxwvutsrqponmlkh 2 месяца назад

    This is STT to Text LLM to TTS, its not S2S. Useless garbage.

  • @RickySupriyadi
    @RickySupriyadi 2 месяца назад

    kyutai CEO we must innovate on something!
    employee releasing i don't know bot