Local and Open Source Speech to Speech Assistant

Поделиться
HTML-код
  • Опубликовано: 25 дек 2024
  • НаукаНаука

Комментарии •

  • @RickySupriyadi
    @RickySupriyadi 3 месяца назад +4

    wow that Local TTS sound so natural, really cool

  • @emmanuelkolawole6720
    @emmanuelkolawole6720 3 месяца назад +4

    To improve the response time, you need to stream the llm response and set melotts to start reading as soon as the first word is generated in the stream. This way melotts is not waiting for the full llm response before speaking

    • @alx8439
      @alx8439 3 месяца назад +1

      Or just take the rhasspy3 as ready-to-use modular engine which supports streaming for everything and plug LLM into it

  • @MeinDeutschkurs
    @MeinDeutschkurs 3 месяца назад +10

    I’m so glad, that I’m not the only one who forgets to activate the environment before installing anything. 🎉🎉 It is so important to not cut something like this out! Feel hugged! 🤗

    • @AB-cd5gd
      @AB-cd5gd 12 дней назад

      Why do we need to activate a venv pls

  • @Cingku
    @Cingku 3 месяца назад +2

    I suppose I can't interrupt its speech right? Since it is not truly multimodal?

    • @AB-cd5gd
      @AB-cd5gd 12 дней назад

      You could do that

  • @johnkintree763
    @johnkintree763 3 месяца назад +1

    Have you looked into Open WebUI as an interface?

  • @MeinDeutschkurs
    @MeinDeutschkurs 3 месяца назад

    I like VERBI. What about including the “normal” Google TTS as well as an option? Currently it’s way faster, with the dropdown that it cannot talk so beautifully. Maybe Apple “say” can also be an option, especially the Siri voice, if there is no voice specified.

  • @themax2go
    @themax2go 3 месяца назад

    VERY cool! what do you use for embeddings, triplex? have you tried / will you put in global context search capability?

  • @abbarue
    @abbarue 2 месяца назад +1

    On Windows11 I used Docker to install MeloTTS and after waiting about an hour, I got the following errors. Can you help me?
    15.92 AttributeError: module 'botocore.exceptions' has no attribute 'HTTPClientError'
    ERROR: failed to solve: process "/bin/sh -c python melo/init_downloads.py" did not complete successfully: exit code: 1

    • @JohnsonNong
      @JohnsonNong 2 месяца назад

      same here, did you dealed with this issue?

  • @EmminiX
    @EmminiX 3 месяца назад

    Interesting. Definitely going to keep an eye out.

  • @preben01
    @preben01 3 месяца назад

    How hard would it be to add rag support to this?

  • @emmanuelkolawole6720
    @emmanuelkolawole6720 2 месяца назад

    When is this going to get streaming to voice capabilities???

  • @zeusconquers
    @zeusconquers 3 месяца назад

    what two chips make 96gb of vram? 2x a6000?

  • @themax2go
    @themax2go 3 месяца назад

    ps much lighter and overall better is "uv" instead of conda, uv activates the env automatically and you can manage it including all the modules used in python

  • @aa-xn5hc
    @aa-xn5hc 3 месяца назад +2

    Streaming and interruptions would be great before the UI

  • @berniemovlab8323
    @berniemovlab8323 3 месяца назад

    It Works well, I have fast responses up to the part when it is sending to the sound generator. It is taking a long time for her to finally speak. How can we tell the GPU is working? I'm using windows. Thanks

  • @kryptonic010
    @kryptonic010 3 месяца назад

    Great example. How can you integrate a RAG engine to accept word and excel files?

  • @yurijmikhassiak7342
    @yurijmikhassiak7342 3 месяца назад

    Great project! What about also having speech to text option? Whisper is very good at dictation, the only problem is that it's not real-time. As far as I know nobody did eal-time text to speech based on whisper. We will need somehow to track the pauses between sentences by soudwave analysis, to cut recording, start new recording and transcribe it one by one...

  • @LEGACY1417
    @LEGACY1417 3 месяца назад

    i m new.crashing eveytime.can you guide how to use that only for local llm?

  • @lakergreat1
    @lakergreat1 3 месяца назад

    Did verbi get a UI yet?

  • @arsenordian76
    @arsenordian76 2 месяца назад

    Can i use another language? How can i change it?

  • @alx8439
    @alx8439 3 месяца назад

    But the progress is good. Kudos!

  • @VaibhavShewale
    @VaibhavShewale 3 месяца назад +1

    minimum req?

  • @Wacken2030
    @Wacken2030 3 месяца назад

    Awesome project! But the distance between the Sun and the Moon seems to be off by a factor of around 400. Just thought I'd point that out. Keep up the great work!

  • @naeemulhoque1777
    @naeemulhoque1777 3 месяца назад

    please make more video on local tts

  • @donbelisario8811
    @donbelisario8811 2 месяца назад

    Is this suited for everyone, or does the installation require heavy programming skills?

  • @youtubeccia9276
    @youtubeccia9276 2 месяца назад

    nice, looks good :)

  • @TomanswerAi
    @TomanswerAi Месяц назад

    Nice! ❤‍🔥

  • @saxtant
    @saxtant 3 месяца назад

    I am using xtts2 and whisperv3 large combined with llama 3.1 8b via vllm

  • @brto
    @brto 3 месяца назад

    The TTS sound a bit robotic but I like ❤ Appreciate the hard work. Keep it 100 💪

  • @alx8439
    @alx8439 3 месяца назад

    It needs a wake word detection

  • @DannyC777
    @DannyC777 2 месяца назад

    This is cool there's a speech to speech Windows app for that also enables your LM Studio models to search the internet for up-to-date information. It recognizes speech in 90 languages and has over 1400 Voices in 90 languages to choose from ruclips.net/video/l1uYTuZoB6Q/видео.html

  • @joepropertykey3612
    @joepropertykey3612 3 месяца назад

    why not just say in the title ' if you use windows don't bother. We will direct you to install MeloTTS, which specifically will not run on windows. At the end. After you install a bunch of other crap. And waste your time.
    Seriously. You 'skipped' clicking on the link in the video for MeloTTs because that the first thing you see= no windows. You acutally SAY 'for windows it's a little more involved' and juuuuuust skip past that this is now CLICKBAIT (hint: not everyone wants WSL2, it's not that great in production)