I've noticed that this model is fast enough to start answering before you even finish asking a question. This is ideal for real time translations, and another use case I have in mind. This would be perfect for my use case if I could train custom voices.
I agree, this is a great release, everything there, PDF of the paper, Github code, runs locally inside 24GB VRAM. I'm astonished it works on a local machine, and the way it records the calls too is extremely cool. Yeah of course it is like talking to a moody 10 year old, but hey, we have to start somewhere.
@@1littlecoder I have a guess, with some bigger quant it might be bit more intelligent. Anyways, that was great demo. Thanks a lot mate, love all your videos
@@alx8439 thanks for the kind words! Appreciate the support. I didn't test the model with different configurations. Also the one I tested was a quanrized one.
Moshi is mean AF. it told me "So let's solve this problem already" to which I said: "what problem, what problem do you want to solve?" and it said: "the problem of your stupidity" 😆
I'm still pessimistic on this: I don't have ARM/RISC processors. - My PC is AMD, - My Mac is Intel (the Apple version requires an M2 processor) I hope a viable option is provided on day.
Nice! :D Do you know what are the hardware requirements to run it on Windows? I've got an 8GB Nvidia GPU, and I'd like to test this, it seems really fun ^^
I seen someone else demo this library and it had the same glitch. It really feels like something is inverted....a vector or a prompt or something is being flipped at some point in that library. There's probably a typo somewhere in the code.
Heyy !! Just a small question I got can we just set it up like attach it with other llm ( may be gpt , llama or any other )to get the text and then use it for the text to speech translation??
seriously I had it running on Windows 11 with a 3090. WAs super responsive but also liked telling me that it lived in the bronx new york and J-Lo was a Gangsters wife
@@1littlecoder There are specialised applications like Geospatial, TexttoSQL, Robotics, Credit Data etc field specialised as it's in Research that fine-tuning only works well on already trained data if done on New Data it hallusinates Quality Data = Quality Output on Edge Devices
@@1littlecoder Specialised Usage Faster Processing Accurate one Fine-tuning causes more hallucinations - Research if it's not done on existing trained data Geospatial, Text2SQL with Screen , Credit Data , IOT etc applications
Its really good. Really. But its no way close to the recently released Google Gemini Live's AI voice. It feels...a bit human like. Im not sure if it has something to do with the synthesizer engine
The fascinating part for me is it just runs, nothing fancy. I don't know what hardware google's AI voice is running. I still don't have access to Live.
@@sykexz6793 let's remember this is the worse an open source model like this will ever be. This will be up as a foundation to other and better models in no time...
@@Lorv0 i agree, but i don't see alot of progress in open source tts department especially on device. On the other hand we already got really good solutions for asr and llms on device.
You can play with open AI if you're okay to send your data to some server. Solution is not like that. In fact, the solution is completely different in terms of architecture. If wasting time is what we are talking about. Llms when they started were nothing like this. People thought we are wasting time. The same with stable diffusion initials were so ugly. In fact, they were so ugly that they became memes, but now we have one of the best realistic pictures. The future is only going to get better from now.
I'm continuing to mess up French Names 😭
ive got comforTABLE with that :))
how can i run it on windows ?
if you have nvidia gpu then download the pytorch models or use the candle ones
the demo is comedy gold
😂😂😂😂
I've noticed that this model is fast enough to start answering before you even finish asking a question. This is ideal for real time translations, and another use case I have in mind. This would be perfect for my use case if I could train custom voices.
Couldn't agree more! Just has to be more stable
@@1littlecoder 🤣
What is your usecase?
@@mohitranka9840 similar to real time translations, but "style" translations instead.
Yeah I'll wait until its stable moshi is unhinged I've heard it say terrible things on wes roth's channel
I agree, this is a great release, everything there, PDF of the paper, Github code, runs locally inside 24GB VRAM.
I'm astonished it works on a local machine, and the way it records the calls too is extremely cool. Yeah of course it is like talking to a moody 10 year old, but hey, we have to start somewhere.
Absolutely, I need to explore further to see how to customize the demo and programmatic access
I installed this on my M1 mac mini, and it's too slow to be usable.
normal it's shit for AI
This has to be the funniest interaction I've ever seen
omg yes they finally released it!
thank god its available in Pinokio
Did you try it ?
@@1littlecoder Ya works good! the model download is like 15gb!! still better
Thank you for your work, I got it running.
Awesome how was the experience M
@@1littlecoder I am unable to reach you on Discord.
"What's your humor setting, TARS?" (c)
🤣
@@1littlecoder I have a guess, with some bigger quant it might be bit more intelligent. Anyways, that was great demo. Thanks a lot mate, love all your videos
@@alx8439 thanks for the kind words! Appreciate the support. I didn't test the model with different configurations. Also the one I tested was a quanrized one.
so this one doesn't need to convert our speech into text then feed the text into some llm?
it just speech into llm directly?
Moshi is mean AF. it told me "So let's solve this problem already" to which I said: "what problem, what problem do you want to solve?" and it said: "the problem of your stupidity" 😆
💀
Glados tier
Although it has 5 minute limitation, but I think we can use it for customer support
I'm still not very sure if it is stable enough to be deployed in production. That'll be an interesting use case. We probably need more control over it
what a funny conversation
Traffic here was pretty awful too! heheh
Not sure why you would pick an AI mode that was so adversarial?
Thanks for the demo.
how to install in WSL ? int8 version
I'm still pessimistic on this: I don't have ARM/RISC processors.
- My PC is AMD,
- My Mac is Intel (the Apple version requires an M2 processor)
I hope a viable option is provided on day.
Great fun. Should run on mobile soon?
Can we fine tune this the llm?
Nice! :D
Do you know what are the hardware requirements to run it on Windows? I've got an 8GB Nvidia GPU, and I'd like to test this, it seems really fun ^^
The Quantized version should definitely run fine on this machine
@@1littlecoder That's great! :D Which one? q8? q4? (I think they should add the hardware requirements, but nobody usually does that ^^U)
The most important thing is if it can be interrupted while talking.
i laughed so hard at 4:03 xD bro was just lying straight to your face before
Can we add context for the llm
I seen someone else demo this library and it had the same glitch. It really feels like something is inverted....a vector or a prompt or something is being flipped at some point in that library. There's probably a typo somewhere in the code.
Heyy !! Just a small question I got can we just set it up like attach it with other llm ( may be gpt , llama or any other )to get the text and then use it for the text to speech translation??
how to run windows
Is rag possible with this. On custom knowledge and plug it as api or sdk
I'm still exploring if there's a programmatic way to access this
its not supported for windows. also web demo version is lagging even though internet speed is very fast. so overall not as hyped as in these videos
seriously I had it running on Windows 11 with a 3090. WAs super responsive but also liked telling me that it lived in the bronx new york and J-Lo was a Gangsters wife
Hey any suggestions on building VLM micro from scratch ?
Curious why do you want to build from scratch?
@@1littlecoder There are specialised applications like Geospatial, TexttoSQL, Robotics, Credit Data etc field specialised as it's in Research that fine-tuning only works well on already trained data if done on New Data it hallusinates
Quality Data = Quality Output on Edge Devices
Beside like Moondancer2 cannot read Bill Pdfs
@@1littlecoder Specialised Usage Faster Processing Accurate one
Fine-tuning causes more hallucinations - Research if it's not done on existing trained data
Geospatial, Text2SQL with Screen , Credit Data , IOT etc applications
Why is this the second video on Moshi I've seen today? Isn't this old news?
They just released the models a couple of days back. They had made the announcement long back
it is cute AI in pronounciation
Its really good. Really. But its no way close to the recently released Google Gemini Live's AI voice. It feels...a bit human like. Im not sure if it has something to do with the synthesizer engine
The fascinating part for me is it just runs, nothing fancy. I don't know what hardware google's AI voice is running. I still don't have access to Live.
They gave goody2 a voice
I think you should change the TTS, I don't like the TTS voice
maybe ChatGPT can get it incorporated ...?
Im sorry but I dont 😅
😭
my moshi told me it tried heroin and it wasnt as bad as it thought lmaaoo
LMAO, This bot is a troll!
❤🫡😂😂
Who cares about running it locally if you can use top class OpenAI advanced voice mode
Not your models, not your mind
😂😂😂😂😂
No. I'm sorry. 😂😂
and like alwas TTS is the bottleneck.
why would you say so?
@@1littlecoder because the quality not that good, also it is not multilingual.
Got it
@@sykexz6793 let's remember this is the worse an open source model like this will ever be. This will be up as a foundation to other and better models in no time...
@@Lorv0 i agree, but i don't see alot of progress in open source tts department especially on device. On the other hand we already got really good solutions for asr and llms on device.
Let me guess you never was interested to play with me in chess.
I just play Bullet most of the time, so not a good chess player
@@1littlecoder I play bullet as well 1 minute right now I play the most
What's point of playing with a mess and wasting time ?
I have created a talking RAG , will using OpenAI in backend. It is fast and furious.😊
You can play with open AI if you're okay to send your data to some server. Solution is not like that. In fact, the solution is completely different in terms of architecture. If wasting time is what we are talking about. Llms when they started were nothing like this. People thought we are wasting time. The same with stable diffusion initials were so ugly. In fact, they were so ugly that they became memes, but now we have one of the best realistic pictures. The future is only going to get better from now.
What’s the voice model are you using?
This is STT to Text LLM to TTS, its not S2S. Useless garbage.
kyutai CEO we must innovate on something!
employee releasing i don't know bot