To improve the response time, you need to stream the llm response and set melotts to start reading as soon as the first word is generated in the stream. This way melotts is not waiting for the full llm response before speaking
I’m so glad, that I’m not the only one who forgets to activate the environment before installing anything. 🎉🎉 It is so important to not cut something like this out! Feel hugged! 🤗
I like VERBI. What about including the “normal” Google TTS as well as an option? Currently it’s way faster, with the dropdown that it cannot talk so beautifully. Maybe Apple “say” can also be an option, especially the Siri voice, if there is no voice specified.
On Windows11 I used Docker to install MeloTTS and after waiting about an hour, I got the following errors. Can you help me? 15.92 AttributeError: module 'botocore.exceptions' has no attribute 'HTTPClientError' ERROR: failed to solve: process "/bin/sh -c python melo/init_downloads.py" did not complete successfully: exit code: 1
ps much lighter and overall better is "uv" instead of conda, uv activates the env automatically and you can manage it including all the modules used in python
It Works well, I have fast responses up to the part when it is sending to the sound generator. It is taking a long time for her to finally speak. How can we tell the GPU is working? I'm using windows. Thanks
Great project! What about also having speech to text option? Whisper is very good at dictation, the only problem is that it's not real-time. As far as I know nobody did eal-time text to speech based on whisper. We will need somehow to track the pauses between sentences by soudwave analysis, to cut recording, start new recording and transcribe it one by one...
Awesome project! But the distance between the Sun and the Moon seems to be off by a factor of around 400. Just thought I'd point that out. Keep up the great work!
This is cool there's a speech to speech Windows app for that also enables your LM Studio models to search the internet for up-to-date information. It recognizes speech in 90 languages and has over 1400 Voices in 90 languages to choose from ruclips.net/video/l1uYTuZoB6Q/видео.html
why not just say in the title ' if you use windows don't bother. We will direct you to install MeloTTS, which specifically will not run on windows. At the end. After you install a bunch of other crap. And waste your time. Seriously. You 'skipped' clicking on the link in the video for MeloTTs because that the first thing you see= no windows. You acutally SAY 'for windows it's a little more involved' and juuuuuust skip past that this is now CLICKBAIT (hint: not everyone wants WSL2, it's not that great in production)
wow that Local TTS sound so natural, really cool
To improve the response time, you need to stream the llm response and set melotts to start reading as soon as the first word is generated in the stream. This way melotts is not waiting for the full llm response before speaking
Or just take the rhasspy3 as ready-to-use modular engine which supports streaming for everything and plug LLM into it
I’m so glad, that I’m not the only one who forgets to activate the environment before installing anything. 🎉🎉 It is so important to not cut something like this out! Feel hugged! 🤗
Why do we need to activate a venv pls
I suppose I can't interrupt its speech right? Since it is not truly multimodal?
You could do that
Have you looked into Open WebUI as an interface?
I like VERBI. What about including the “normal” Google TTS as well as an option? Currently it’s way faster, with the dropdown that it cannot talk so beautifully. Maybe Apple “say” can also be an option, especially the Siri voice, if there is no voice specified.
VERY cool! what do you use for embeddings, triplex? have you tried / will you put in global context search capability?
On Windows11 I used Docker to install MeloTTS and after waiting about an hour, I got the following errors. Can you help me?
15.92 AttributeError: module 'botocore.exceptions' has no attribute 'HTTPClientError'
ERROR: failed to solve: process "/bin/sh -c python melo/init_downloads.py" did not complete successfully: exit code: 1
same here, did you dealed with this issue?
Interesting. Definitely going to keep an eye out.
How hard would it be to add rag support to this?
When is this going to get streaming to voice capabilities???
what two chips make 96gb of vram? 2x a6000?
ps much lighter and overall better is "uv" instead of conda, uv activates the env automatically and you can manage it including all the modules used in python
Streaming and interruptions would be great before the UI
It Works well, I have fast responses up to the part when it is sending to the sound generator. It is taking a long time for her to finally speak. How can we tell the GPU is working? I'm using windows. Thanks
Great example. How can you integrate a RAG engine to accept word and excel files?
Great project! What about also having speech to text option? Whisper is very good at dictation, the only problem is that it's not real-time. As far as I know nobody did eal-time text to speech based on whisper. We will need somehow to track the pauses between sentences by soudwave analysis, to cut recording, start new recording and transcribe it one by one...
i m new.crashing eveytime.can you guide how to use that only for local llm?
Did verbi get a UI yet?
Can i use another language? How can i change it?
But the progress is good. Kudos!
minimum req?
Awesome project! But the distance between the Sun and the Moon seems to be off by a factor of around 400. Just thought I'd point that out. Keep up the great work!
please make more video on local tts
Is this suited for everyone, or does the installation require heavy programming skills?
nice, looks good :)
Nice! ❤🔥
I am using xtts2 and whisperv3 large combined with llama 3.1 8b via vllm
The TTS sound a bit robotic but I like ❤ Appreciate the hard work. Keep it 100 💪
It needs a wake word detection
This is cool there's a speech to speech Windows app for that also enables your LM Studio models to search the internet for up-to-date information. It recognizes speech in 90 languages and has over 1400 Voices in 90 languages to choose from ruclips.net/video/l1uYTuZoB6Q/видео.html
why not just say in the title ' if you use windows don't bother. We will direct you to install MeloTTS, which specifically will not run on windows. At the end. After you install a bunch of other crap. And waste your time.
Seriously. You 'skipped' clicking on the link in the video for MeloTTs because that the first thing you see= no windows. You acutally SAY 'for windows it's a little more involved' and juuuuuust skip past that this is now CLICKBAIT (hint: not everyone wants WSL2, it's not that great in production)