Creating JARVIS - Python Voice Virtual Assistant (ChatGPT, ElevenLabs, Deepgram, Taipy)

Поделиться
HTML-код
  • Опубликовано: 11 ноя 2024

Комментарии • 177

  • @joeternasky
    @joeternasky 10 месяцев назад +8

    Fantastic project. Love how you connected these services and packages together. Thanks for going over the project, posting this video, etc. I learned quite a bit.

  • @dwilson7230
    @dwilson7230 9 месяцев назад +2

    Bro this is sick as hell! Thanks for posting a video about it.

  • @isagiyoichi5207
    @isagiyoichi5207 9 месяцев назад +2

    this is actually really incredible thanks for the video

  • @iandanforth
    @iandanforth 11 месяцев назад +2

    Impressive! One key bit of the UX of ChatGPT mobile are the "clicks" that indicate when the model has 1. Stopped listening and 2. Stopped talking. A very small touch that makes a world of difference.

    • @alexandresajus
      @alexandresajus  11 месяцев назад

      Yes I should definitely find better ways to convey to the user when he is being listened to

  • @gr8tbigtreehugger
    @gr8tbigtreehugger 7 месяцев назад +2

    Many thanks for this super helpful tutorial! My next step is voice ID, so the AI knows it's me!

  • @rodrigodifederico
    @rodrigodifederico 9 месяцев назад +1

    I did the same a few months ago but i made it all through a real phone number so you can actually call a number and an assistant will pick the call and talk to you about the shop services or clinic procedures, etc. Pretty nice lab.

    • @alexandresajus
      @alexandresajus  9 месяцев назад +1

      That is a great use case. Were there any issues surrounding the latency? Were there any customer complaints from people who found the delay in answering too long or did not want to talk to an AI?

    • @rodrigodifederico
      @rodrigodifederico 9 месяцев назад +2

      @@alexandresajus I reduced the delay by 90% running all the systems locally. The speech to audio generator, audio transcription, the language model, etc. The only remote api that i used was for the phone number ( twillio ). If you run everything through remote apis, the delay will be a real problem, won't work as an assistant over the phone because it may take up to 10 seconds for an answer. But running everything locally it's almost instant. For the voice part, both to text and back, i don't generate an audio file, i stream it, so there is no delay. With a few tricks, you can make it almost real time 🙂

    • @alexandresajus
      @alexandresajus  9 месяцев назад

      @@rodrigodifederico Great! Is there anywhere where I could take a look at that project. Which text-to-speech model are you using?

    • @rodrigodifederico
      @rodrigodifederico 9 месяцев назад +1

      @@alexandresajus I am planning to transform it into a product so for now i won't share the code but i'll record a live interaction video and upload it to youtube soon, ill drop the link here if you are interested. About the text to speech, i created my own model.. pretty similar to elevenlabs. But i have to say that if you use elevenlabs streaming, this part of the process will have a similar delay, so i might switch to elevenlabs stream in the future, unless i want to keep it 100% free of costs, then i would keep my model.

    • @alexandresajus
      @alexandresajus  9 месяцев назад

      @@rodrigodifederico Sure I'd love to see a demo

  • @chrsl3
    @chrsl3 10 месяцев назад +1

    Fantastic work and video, thank you!!

  • @oldspammer
    @oldspammer 7 месяцев назад +1

    Some operating system API exists for text to speech are free and can act instantly without having to transact information flows through the internet to some central system that might get bogged down with excess usage. I have noticed that if one becomes dependent upon something or someone, a monopoly situation may well result and you end up potentially having to pay pay pay for things that your local PC could have done for free on its own without the need of network data interactions. Often the distant server has a better sounding voice and it does not mispronounce as many words, but soon you shall be out sourcing too many things to outside entities where you become too dependent on them.
    If a set of 10 words or so are known to be mispronounced by the local speech api in your PC is there a way to have your PC handle those exception words with specialized processing where a sylable at a time is custom handled per each of the 10 exception words to save you from having to use an api key that can be withdrawn from handy use by the flick of a switch by the third party provider?

  • @taylorsmith1720
    @taylorsmith1720 7 месяцев назад +2

    🎯 Key Takeaways for quick navigation:
    01:02 *🚀 Overview of Voice Virtual Assistant Development*
    - Explanation of building a voice virtual assistant similar to Jarvis from Iron Man.
    - Overview of the backend workflow involving voice input, transcription, response generation, and audio output.
    - Introduction to third-party services like Deepgram, OpenAI, 11 Labs, and Taipy used in the development process.
    03:21 *🔧 Installation Instructions for the Voice Virtual Assistant*
    - Cloning the GitHub repository and installing necessary requirements.
    - Setting up API keys for Deepgram, OpenAI, and 11 Labs.
    - Creating an environment file to store API keys securely.
    - Executing installation commands and waiting for requirements to install.
    08:33 *🛠️ Running the Voice Virtual Assistant*
    - Instructions for running the display interface (`display.py`) and the main script (`main.py`).
    - Description of how the assistant listens, transcribes, generates responses, and displays conversations.
    - Example interaction demonstrating the assistant's response to user input.
    09:28 *💡 Customization and Modification of the Voice Virtual Assistant*
    - Guidance on modifying the assistant for specific use cases.
    - Suggestions for changing context, models, and voices for customization.
    - Discussion of potential improvements, such as integrating news, adding memory, and overcoming latency limitations.
    Made with HARPA AI

    • @alexandresajus
      @alexandresajus  7 месяцев назад +1

      Now THAT is how you should advertise a product. Great summary!

  • @Threecommaaclub
    @Threecommaaclub 9 месяцев назад +1

    Hey Alex, I'm using a Linux Device running python 3.11 venv, when i try to run main.py i get the following error " no module name pyaudio. i go about using the simple command pip install pyaudio, however when running that command i get greeted with this error, "could not build wheels for py audio, which is required to install pyproject.toml-based projects, i was hoping you may be able to share some insight into why this may be happening. Great video btw, i await your speedy response :)

    • @alexandresajus
      @alexandresajus  9 месяцев назад +1

      Were you able to solve this by creating a new virtual environment. Otherwise, I have no idea how to fix this, let me know if you find a solution

    • @Threecommaaclub
      @Threecommaaclub 9 месяцев назад +1

      @@alexandresajusyeah man we were able to make it happen once we used the virtual env thanks again

    • @alexandresajus
      @alexandresajus  9 месяцев назад

      @@Threecommaaclub Perfect!

  • @painperdu6740
    @painperdu6740 10 месяцев назад +1

    LETS GOOO NEW ALEXANDRE SAJUS VIDEO I CLICK LIKE I SUBSCRIBEEE

  • @edbayliss1862
    @edbayliss1862 10 месяцев назад +1

    This really interested me. I modified it a bit to add a listen button to the UI so it only listen when you select listen, this is easier than a “wake word”
    Then I thought, integration. I use MacOS.
    I build a folder called modules, added a second step that parse the text through GPT again to match a dictionary, and then GPT decide which function in the dictionary matched and ran it.
    It worked great for checking calendar events etc, and if no matches were found it defaulted to gpt chat reponse but the extra layer added more latency and just isn’t scalable

    • @alexandresajus
      @alexandresajus  10 месяцев назад

      Incredible! Good work! Is there anywhere where we could check out your project?

    • @edbayliss1862
      @edbayliss1862 10 месяцев назад +1

      @@alexandresajus sure, is your GitHub open to branches? I can just push it as a branch for you check out on Monday

    • @alexandresajus
      @alexandresajus  10 месяцев назад

      @@edbayliss1862 I'm not sure, I think it is open to fork then pull request. I think I need to manually add you as a collaborator if you want to directly push to a branch. Your call. Or you could just share the link of your repo if it is public.

  • @DalazG
    @DalazG 6 месяцев назад +1

    Incredible material! Thanks bro, you're tutorials are super helpful for those learning to code. I'm trying to follow along
    Not sure if you've taken any subscriber requests. I've really wanted to find a tutorial on creating a machine learning model on python that can figure out its own strategy for successfully trading forex and integrating it with mql4 or 5.
    Definitely possible but there's next to no tutorials on this anywhere i noticed

    • @alexandresajus
      @alexandresajus  6 месяцев назад

      Thanks! Glad to know the video is helpful. This indeed seems to be a niche topic. I don’t think I could help you with this unfortunately since I don’t know anything about forex or mql.

    • @DalazG
      @DalazG 6 месяцев назад +1

      @alexandresajus no worries, this tutorial was super useful anyway! Subscribed.
      Curious, would ask these apis you used for this jarvis application cost a lot of money though? I know chatgpt api isn't free (just the free credits)

    • @alexandresajus
      @alexandresajus  6 месяцев назад +1

      @@DalazG The APIs did not cost that much: for the whole project I talked for about 2 hours to JARVIS. It cost less than a dollar for both Deepgram and OpenAI. ElevenLabs cost me 5$ only because they have a subscription based fee.

    • @DalazG
      @DalazG 6 месяцев назад

      @@alexandresajus gotcha, elevenlabs has a brilliant voice api. But just because it adds up, i would probably prefer to use a cheaper worse one 😅 .

  • @xgodwhitex
    @xgodwhitex 10 месяцев назад +1

    Amazing job!

  • @mikew2883
    @mikew2883 10 месяцев назад +1

    Good stuff! 👍

  • @marouane9682
    @marouane9682 10 месяцев назад +1

    i love it maaaaaaaan thank u for sharing .. pls keep sharing wiith us ur magic

    • @alexandresajus
      @alexandresajus  10 месяцев назад +1

      Thank you!

    • @marouane9682
      @marouane9682 10 месяцев назад +1

      @@alexandresajus brother help me pls on my questions, .. how can i make jarvis able to transcribe and talk in french instead of english ?

    • @alexandresajus
      @alexandresajus  10 месяцев назад +1

      @@marouane9682 This shoud not be too hard, you just need to add a few parameters for Deepgram and Elevenlabs. For Elevenlabs, just change the voice parameter to "Pierre" or another french voice at line 116 of main.py. For Deepgram it is a bit more complicated, you will have to add a PrerecordedOptions parameter at line 72 of main.py which contains a language="fr" parameter. It's a bit too much to write in a comment so I invite you to take a look at the Deepgram doc (github.com/deepgram/deepgram-python-sdk/blob/main/README.md) Let me know if you need more help

    • @marouane9682
      @marouane9682 10 месяцев назад +1

      @@alexandresajus thank you so much cheef

  • @FantasyDark-ub3xh
    @FantasyDark-ub3xh 8 месяцев назад +1

    Sir i want to do like this sir is there any Free API available if not in OpenAI means, pls tell some other AI APIs to do ai tasks sir!

    • @alexandresajus
      @alexandresajus  8 месяцев назад

      Sir! If you search for them online, there should be free alternatives for the models I used in the video! I recommend looking at HuggingFace for an OpenAI alternative, sir! For example, the Mistral model has a free inference API that is only rate-limited, sir!

  • @nightmare6159
    @nightmare6159 6 месяцев назад +1

    I need help, When I do pip install -requirements.txt it says there is no such directory even tho I see the file

    • @alexandresajus
      @alexandresajus  6 месяцев назад

      Make sure that you are in the right directory in your terminal. You can use ls in the terminal to check the contents of the directory you are in. You can switch directory using cd in the terminal or using "Open Folder..." in VSCode.
      In general, the syntax should be "pip install -r [PATH-TO-TXT]"

  • @GameXnationOfficial
    @GameXnationOfficial 6 месяцев назад +1

    "You exceeded your current quota, please check your plan and billing details" its showing something like this and jarvis is not replying after an error

    • @alexandresajus
      @alexandresajus  6 месяцев назад

      You've exceeded your free quota on one of the APIs, check on which function call this error gets triggered to see which API needs billing

  • @tomasrochaakemi
    @tomasrochaakemi 10 месяцев назад +2

    hey alex! can you help me with this error? "ERROR: Failed building wheel for webrtcvad
    Failed to build webrtcvad
    ERROR: Could not build wheels for webrtcvad, which is required to install pyproject.toml-based projects"

    • @alexandresajus
      @alexandresajus  10 месяцев назад

      Sure! This is because you don't have Microsoft Visual C++ installed properly. I have written a guide on how to fix this here:
      github.com/AlexandreSajus/JARVIS/issues/3

    • @tomasrochaakemi
      @tomasrochaakemi 10 месяцев назад +1

      @@alexandresajus hey man. it worked but now i got another error. while running python main.py this error apears: line 17, in set_api_key
      os.environ["ELEVEN_API_KEY"] = api_key
      ~~~~~~~~~~^^^^^^^^^^^^^^^^^^
      File "", line 684, in __setitem__
      File "", line 744, in check_str
      TypeError: str expected, not NoneType

    • @alexandresajus
      @alexandresajus  10 месяцев назад

      @@tomasrochaakemi This means that Python has tried to find a .env file with ELEVEN_API_KEY but has not found either the file or the key in the file. You'll need to create a .env file a the same level of main.py containing ELEVENLABS_API_KEY=[your-API-key]
      Please follow the Requirements and the How to Install Step 3 of my repository ( github.com/AlexandreSajus/JARVIS ). I mention these steps at 4:06 and 6:06 of the video.

    • @tomasrochaakemi
      @tomasrochaakemi 10 месяцев назад +1

      @@alexandresajus I did it still shows this

    • @alexandresajus
      @alexandresajus  10 месяцев назад

      ​@@tomasrochaakemi Hmmm weird issue. As a workaround, just replace the 3 lines of os.getenv("...") by simply the API key as a string. For example:
      OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") -> OPENAI_API_KEY = "YOUR-API-KEY"

  • @crprp4769
    @crprp4769 9 месяцев назад +1

    Awesome video! Thanks for sharing, but I've got a question. How can I implement a pre-trained OpenAI assistant into Taipy?

    • @alexandresajus
      @alexandresajus  9 месяцев назад +1

      Thanks! It should be quite simple. Just replace the model variable line 53 at 12:52 with your own model ("ft:gpt-3.5-turbo:my-org:custom_suffix:id") and it should work. Let me know if you need more help.

  • @Firebabys89
    @Firebabys89 6 месяцев назад +1

    u are amazing dude

  • @PenguinjitsuX
    @PenguinjitsuX 9 месяцев назад +2

    This is awesome! I am wondering though how much this project is costing you from API calls (if you were to use this daily and pretty often)? I'm planning to build a home assistant that can control all of my home gadgets and perform actions on my computer, but I'm trying to decide whether I should use all local models (whisper, coqui, and mistral) instead of the paid online services. The quality and speed is a bit lower locally, but it's free so I'm thinking about the tradeoff. Please let me know what you think, thanks!

    • @alexandresajus
      @alexandresajus  9 месяцев назад +1

      Hey! Thanks, glad you liked it! I recommend going the paid online route. ElevenLabs is a paid subscription at 5$/month for 30,000 characters. OpenAI and Deepgram are pay-per-request but are dirt cheap: for this whole project, I probably talked for an entire hour with JARVIS, and it cost me 12 cents on OpenAI and 40 cents on Deepgram. If you want to lower cost, find an ElevenLabs equivalent that is pay-per-request, and you'll be good.
      Going local will drastically reduce performance and speed unless you have proper hardware, i.e., a dedicated GPU cluster at home. You'll have to use open-source, quantized to 8Gb models. If you have adequate hardware though, going local might be a good idea since you'll keep performance, and you can reduce latency by half by hosting locally, doing code shenanigans to parallelize each task instead of running them sequentially, and generally optimizing the pipeline.
      Latency is the biggest drawback; JARVIS is at 4 seconds of latency. Even if it was 2 seconds, it is still too awkward for a conversation.

    • @PenguinjitsuX
      @PenguinjitsuX 9 месяцев назад +1

      @@alexandresajus Thanks for the in-depth reply! That's awesome to see that it's so cheap. I was actually really lucky and got got a 4090 last week. I've been running tests - On whisper and llm inference, I got performance at almost real-time,

    • @alexandresajus
      @alexandresajus  9 месяцев назад

      @@PenguinjitsuX Wow you already made a lot of progress! Yeah unfortunately I think we are just a few years away to solve that performance-latency tradeoff for TTS, then we'll be able to have a proper conversational Jarvis. Is your project open-source? I would love to take a look if you'd let me. I don't have a Discord server but I'd love to keep in touch on Discord. Here's my username: alex_1337

  • @tismine
    @tismine 6 месяцев назад +1

    Hey Alex! Thanks a lot for the video, can you please explain a good way to create a neat requirements.txt file after I'm done with a project?

    • @alexandresajus
      @alexandresajus  6 месяцев назад

      Sure! Use « pip list » in terminal to check which package versions you are using. Then create a requirements.txt at the root of your project with on each line « package_name==version » for only the packages you import within the code (not their dependencies)

  • @shawnmuok542
    @shawnmuok542 5 месяцев назад

    hello i have a problem when i try to run main.py it shows me no moduel deepgram found

  • @aashishkumarlohra277
    @aashishkumarlohra277 6 месяцев назад

    when i run python main.py . i get this error
    Traceback (most recent call last):
    File "E:\JARVIS_TEST\JARVIS\main.py", line 15, in
    from record import speech_to_text
    File "E:\JARVIS_TEST\JARVIS
    ecord.py", line 8, in
    from rhasspysilence import WebRtcVadRecorder, VoiceCommand, VoiceCommandResult
    ModuleNotFoundError: No module named 'rhasspysilence'

    • @alexandresajus
      @alexandresajus  6 месяцев назад

      Check this issue:
      github.com/AlexandreSajus/JARVIS/issues/4
      Also try creating a new clean virtual env before installing requirements. Check if there are no errors during installation. Check that you are running main.py from that env. Check that rhasspysilence is installed with pip list

  • @eyoutube1
    @eyoutube1 3 месяца назад +1

    12PM: Jarvis, what time is it?
    Tomorrow: It is 7AM.
    jk jk this is a antastic tool!

  • @omjondhalefyco-9953
    @omjondhalefyco-9953 7 месяцев назад +1

    What alternative can be used for elvenlabs

    • @alexandresajus
      @alexandresajus  7 месяцев назад

      I have not tried anything apart from Elevenlabs and google_tts. I was not impressed with the quality of google_tts, but it was way faster. I'm sure you'll find better answers online

  • @edosetiawan9589
    @edosetiawan9589 7 месяцев назад +1

    Awesome!! How to make this project to access custom data

    • @alexandresajus
      @alexandresajus  7 месяцев назад

      A quick way to do this would simply be adding the data as a string in the context. This has its limitations (the context has a max length). If you want a chatbot that knows information from documents. I suggest you look into RAG models

  • @justincampbell5584
    @justincampbell5584 2 месяца назад

    Is it possible to use Ollama opposed to OpenAi?

  • @s.gveeronstart4794
    @s.gveeronstart4794 9 месяцев назад +1

    sir can u teach how to made it
    i mean to say that if u make a play list according to this topic

    • @alexandresajus
      @alexandresajus  9 месяцев назад

      Unfortunately, I won't be making an extended tutorial on this in the near future. But I'm sure there are many tutorials on the tools I used on RUclips. You can just look up "ElevenLabs tutorial" or "OpenAI API tutorial".

  • @EnnoAI431
    @EnnoAI431 9 месяцев назад +1

    Great Project!!
    Would it also run on a RaspberryPi?
    Recently I ran a project also called Jarvis on a Pi . You don't need the API's from Deepgram & Elevenlabs and also latency is pretty good. Although the voice was horrible.... unless you like robots :-).

    • @alexandresajus
      @alexandresajus  9 месяцев назад

      Thanks! Sure this should be able to run on Raspberry since all of the heavy stuff is third party services that are hosted so barely anything runs in local. Cool! Where can I take a look at your project?

  • @undeadgaming2102
    @undeadgaming2102 8 месяцев назад +1

    i want to ask can you make a video on how we can make it do different tasks???

    • @alexandresajus
      @alexandresajus  8 месяцев назад

      What task are you thinking about? If it's just asking about the weather, you can add the current weather to the context so Jarvis knows about the current weather

    • @undeadgaming2102
      @undeadgaming2102 8 месяцев назад

      @@alexandresajus i was thinking like a google assistant

  • @muhammadilyasrasyid5817
    @muhammadilyasrasyid5817 10 месяцев назад +1

    thank you very much sir

  • @olakunleogunseye9657
    @olakunleogunseye9657 10 месяцев назад +1

    aye this is so cool but there is no wake up keep and end key buh this the greatest and I know you know

  • @PandaLorian14
    @PandaLorian14 5 месяцев назад

    dose noone get same code on deepgram me and zou dont got same code

  • @sebaperalta2001
    @sebaperalta2001 10 месяцев назад +1

    Nice work! Is it possible to have it answering only on activation word? Like if you don't say Jarvis, then it would not answer. So the program is always listening, but activates on context.

    • @alexandresajus
      @alexandresajus  10 месяцев назад +1

      Thanks! Yes this should be easy to do, just add a condition: if the activation word is not in the transcript, continue (restart the loop without answering)

  • @felipemartinez1924
    @felipemartinez1924 9 месяцев назад +1

    How do I change the speech recognition to spanish? Btw amazing work!

    • @alexandresajus
      @alexandresajus  9 месяцев назад +1

      Thanks! I have not tried another language but there does seem to be the option in Deepgram's API to transcribe Spanish voice by using their nova-2 model and adding the parameter "language=es" to the query
      developers.deepgram.com/docs/language
      developers.deepgram.com/docs/models-languages-overview

    • @felipemartinez1924
      @felipemartinez1924 9 месяцев назад +1

      @@alexandresajus Thanks, you're amazing! You should do a series of this kind of videos, maybe a Jarvis like this one but that is able to take action like opening a program, or saving reminders, stuff like that. Thank you very much and looking forward to more videos. :)

    • @jan-peterbornsen8506
      @jan-peterbornsen8506 8 месяцев назад

      @@felipemartinez1924 Hey were you able to change the language of Deepgram's API? I want to change it to german but all my attempts failed so far... i tried just adding a language=de but its not helping in anyway...

  • @charliepersonalaccount5276
    @charliepersonalaccount5276 6 месяцев назад +1

    Great stuff man! What's the best way to chat with you? I have an mvp i want to run by you and maybe have you help me build it out

    • @alexandresajus
      @alexandresajus  6 месяцев назад

      Thanks. Feel free to reach out on Linkedin:
      www.linkedin.com/in/alexandre-sajus/
      I don't have much time because of work, but I can take a look.

  • @Jingjing-f5l
    @Jingjing-f5l 8 месяцев назад

    When I run display.py to start the web interface, it shows "ModuleNotFoundError: No module named 'taipy'". But then after I install taipy (version 3.0.0), it still gives me the same error message. I have tried to uninstall and install taipy but same error message...

    • @alexandresajus
      @alexandresajus  8 месяцев назад +1

      Are you sure you are running display.py from the Python environment where taipy is installed? Use `pip list` to check that taipy is installed and then `python display.py` to run the file. If this does not work, I suggest creating a new virtual environment and re-installing the requirements. Bear in mind that taipy only works with Python 3.8 to 3.11

    • @Jingjing-f5l
      @Jingjing-f5l 8 месяцев назад

      Thanks! Instead of click to run display.py, I typed in "python display.py" and it open the website! @@alexandresajus
      One more question- when I ran "python main.py", I got the error message "TypeError: 'ABCMeta' object is not subscriptable". I am using Python3.8.10 in Visual Studio.

  • @blazzycrafter
    @blazzycrafter 9 месяцев назад +2

    YOU STOLE MY WORK?........
    ......
    ......
    .....
    .....
    ......
    HOW THE HEK DID IT WORK?
    XD

  • @PilotsPitstop
    @PilotsPitstop 6 месяцев назад +1

    what exactly did u purchase on the open ai api thingy for it not to return "exceeded current quota"? i payed for chat gpt "hobbyist" plan and thought that would help but nah i wasted 20 $. and u should def start a discord good stuff

    • @alexandresajus
      @alexandresajus  6 месяцев назад +1

      Ah I see, you’re not supposed to pay a chatgpt subscription. OpenAI have a website for their API where you just have to enter billing details and maybe add a dollar of credit to use. They charge per request and not on a subscription basis. It should be on the same site where you got your API key

    • @PilotsPitstop
      @PilotsPitstop 6 месяцев назад

      @@alexandresajus AH MY HERO SO FAST, so i just add some money to my account and boom it works?

  • @handlepersonthing
    @handlepersonthing 10 месяцев назад +1

    Awesome work! I wonder if using the GPT-4 model would speed things up a bit?

    • @alexandresajus
      @alexandresajus  10 месяцев назад +2

      Thank you very much! Unfortunately, I don’t think switching the model would do a lot. Profiling here is 1s for transcribing, 1s for gpt and 2s for generating audio. The best way to reduce latency would be using smaller/quantized models or streaming data instead of doing each task sequentially

    • @serenditymuse
      @serenditymuse 10 месяцев назад +2

      @@alexandresajus larger models often take longer thinking.

  • @adben001
    @adben001 5 месяцев назад

    Will That generate Costs throught the API or is that for free?

  • @AndroidePulpico
    @AndroidePulpico 7 месяцев назад +1

    The latency is preaty Bad, have you tried Whisper Jax or Faster Whisper ??

    • @alexandresajus
      @alexandresajus  7 месяцев назад

      Yeah, the latency issue is currently the worst one. I have not tried these services. Let me know if it speeds up things. Currently, the consensus for reducing latency seems to be streaming data, running the tasks in parallel instead of sequentially, and hosting local and smaller models.

  • @NotZymsYT
    @NotZymsYT 6 месяцев назад

    can anyone help be i keep getting "ERROR: Failed building wheel for pyarrow" ?

    • @alexandresajus
      @alexandresajus  6 месяцев назад +1

      Switch to Python 3.8 to 3.11. The Taipy version I am using is old and does not support Python 3.12. You can also try changing to taipy==3.1.0 in requirements.txt
      github.com/AlexandreSajus/JARVIS/issues/7

    • @NotZymsYT
      @NotZymsYT 6 месяцев назад +1

      @alexandresajus you are awesome thank you so much !!!!

    • @NotZymsYT
      @NotZymsYT 6 месяцев назад

      @@alexandresajus hey sorry to be a pest the original issue is fixed but now It seems like the api_key variable obtained from os.getenv("ELEVENLABS_API_KEY") is None, and the set_api_key function from the elevenlabs module is trying to set this None value as the value of the ELEVEN_API_KEY environment variable. However, environment variables must be strings, so attempting to assign None as the value raises a TypeError. im really new to all this and any help is super appreciated

    • @alexandresajus
      @alexandresajus  6 месяцев назад

      @@NotZymsYT os.getenv("ELEVENLABS_API_KEY") should not get None. Please make sure you properly do step 3 of the installation as described at 6:04: make sure you have a .env file at the same level as main.py and make sure it is filled with the API keys using the syntax described in the README

    • @NotZymsYT
      @NotZymsYT 6 месяцев назад

      @@alexandresajus i ran through the whole video on extra slow and now its giving me Traceback (most recent call last):
      File "main.py", line 59, in
      file_name: Union[Union[str, bytes, PathLike[str], PathLike[bytes]], int]
      TypeError: 'ABCMeta' object is not subscriptable

  • @ibrahimqadirmustafa
    @ibrahimqadirmustafa 10 месяцев назад +1

    Amazing bro , I want create like this but in Kurdish language do you know how can i use it and speaking in Kurdish language?

    • @alexandresajus
      @alexandresajus  10 месяцев назад

      Thanks! Unfortunately this might be harder to do in Kurdish. You need to find services that support the Kurdish language which are quite rare: both Deepgram and Elevenlabs do not support Kurdish currently. I'd guess that OpenAI does support Kurdish but I am not sure, even if it does not you can use a service to do the English-Kurdish translation in the middle of the pipeline.

    • @ibrahimqadirmustafa
      @ibrahimqadirmustafa 10 месяцев назад +1

      @@alexandresajus
      Can I use Google translate package in python for translate the content response from AI

    • @alexandresajus
      @alexandresajus  10 месяцев назад

      @@ibrahimqadirmustafa Yes this would solve part of the problem

    • @ibrahimqadirmustafa
      @ibrahimqadirmustafa 10 месяцев назад

      @@alexandresajus ok thanks for you if i need help i can contact u 😁

  • @pntra1220
    @pntra1220 10 месяцев назад +1

    Nice project bro! Do you know how can I use deepgram to transcribe spanish voice? I already figured it out for elevenlabs but not for deeprgram. Thank you for taking the time to read this and continue making this videos!

    • @alexandresajus
      @alexandresajus  10 месяцев назад +1

      Thanks! I have not tried but there does seem to be the option to transcribe Spanish voice by using their nova-2 model and adding the parameter "language=es" to the query
      developers.deepgram.com/docs/language
      developers.deepgram.com/docs/models-languages-overview

  • @niyatibalsara9409
    @niyatibalsara9409 8 месяцев назад

    im encountering webrtcvad installation error..please let me know what to do..its urgent.. i need it for my project

    • @niyatibalsara9409
      @niyatibalsara9409 8 месяцев назад

      @alexandresajus

    • @alexandresajus
      @alexandresajus  8 месяцев назад +1

      Please refer to this fix, let me know if it works:
      github.com/AlexandreSajus/JARVIS/issues/3

    • @niyatibalsara9409
      @niyatibalsara9409 8 месяцев назад

      PS C:\Users\HP\Desktop\JARVIS2> & c:/Users/HP/Desktop/JARVIS2/myvenv/Scripts/python.exe c:/Users/HP/Desktop/JARVIS2/JARVIS/main.py
      Traceback (most recent call last):
      File "c:\Users\HP\Desktop\JARVIS2\JARVIS\main.py", line 8, in
      from dotenv import load_dotenv
      ModuleNotFoundError: No module named 'dotenv'
      PS C:\Users\HP\Desktop\JARVIS2> pip install python-dotenv
      Collecting python-dotenv
      Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
      Installing collected packages: python-dotenv
      Successfully installed python-dotenv-1.0.1
      [notice] A new release of pip available: 22.3.1 -> 24.0
      [notice] To update, run: python.exe -m pip install --upgrade pip
      PS C:\Users\HP\Desktop\JARVIS2> .\venv\Scripts\Activate
      (venv) PS C:\Users\HP\Desktop\JARVIS2> python JARVIS\main.py
      pygame 2.5.2 (SDL 2.28.3, Python 3.11.2)
      Hello from the pygame community. www.pygame.org/contribute.html
      Traceback (most recent call last):
      File "C:\Users\HP\Desktop\JARVIS2\JARVIS\main.py", line 13, in
      import elevenlabs
      File "C:\Users\HP\Desktop\JARVIS2\venv\Lib\site-packages\elevenlabs\__init__.py", line 2, in
      from .simple import * # noqa F403
      ^^^^^^^^^^^^^^^^^^^^^
      File "C:\Users\HP\Desktop\JARVIS2\venv\Lib\site-packages\elevenlabs\simple.py", line 113, in
      elevenlabs.set_api_key(os.getenv("ELEVENLABS_API_KEY"))
      ^^^^^^^^^^^^^^^^^^^^^^
      AttributeError: partially initialized module 'elevenlabs' has no attribute 'set_api_key' (most likely due to a circular import)
      Please solve this error.. its urgent not working.. please help

  • @anirvindhch1209
    @anirvindhch1209 7 месяцев назад +1

    What are you using to code this Alexandre??

    • @alexandresajus
      @alexandresajus  7 месяцев назад +1

      What do you mean? I'm coding in Python using VSCode, I used external APIs like ElevenLabs, OpenAI, Deepgram. Libraries like Taipy for the interface. I use GitHub Copilot to help me code faster as well.

  • @Jordan-tr3fn
    @Jordan-tr3fn 10 месяцев назад +1

    hey cool vids, why not using OpenAI for translation instead on Deepgram ? you could stream the audio and not have audio files

    • @alexandresajus
      @alexandresajus  10 месяцев назад +1

      This is indeed probably a better approach. I was not aware of it at the time

    • @tismine
      @tismine 6 месяцев назад

      Are you sure OpenAI supports streamed audio input? I looked around all the places no one was able to do that...

    • @Jordan-tr3fn
      @Jordan-tr3fn 6 месяцев назад

      @@tismine « openai stream audio » on Google …

  • @ezzeldinhany7301
    @ezzeldinhany7301 8 месяцев назад

    hi alex, it says no module named 'deepgram' after running python main.py in terminal what should i do?

    • @ezzeldinhany7301
      @ezzeldinhany7301 8 месяцев назад

      i also tried pip install deepgram and it did not work

    • @alexandresajus
      @alexandresajus  8 месяцев назад

      @@ezzeldinhany7301 Using the same terminal where you ran "python main.py", run "pip list" and check if deepgram if properly installed. I suggest you reinstall requirements into a clean environment for this. Let me know if this works.

    • @ezzeldinhany7301
      @ezzeldinhany7301 8 месяцев назад

      @@alexandresajus i did reinstall requirements during the process of trying to solve this problem

    • @alexandresajus
      @alexandresajus  8 месяцев назад

      @@ezzeldinhany7301 Did the terminal say that deepgram was successfully installed? Can you check with "pip list" if deepgram is installed? Can you check if you are running main.py from the environment where you installed deepgram? Once again, I strongly recommend creating a fresh Python environment using venv and installing the requirements there and checking everything above

    • @ezzeldinhany7301
      @ezzeldinhany7301 8 месяцев назад

      i now have fixed the deepgram issue but it says it cannot download rhasspysilence i tried with pip also
      @@alexandresajus

  • @AdeniranFrancis
    @AdeniranFrancis 4 месяца назад

    whenever i see videos like these, i clone the repos and i am never, ever able to successfully install all the dependencies or requirements.txt. makes me want to give up writing code altogether.

  • @ashrafulislamemon8782
    @ashrafulislamemon8782 4 месяца назад

    I am stuck at git clone

  • @_GIGABYTES
    @_GIGABYTES 10 месяцев назад

    Traceback (most recent call last):
    File "F:\va\New folder (3)\JARVIS\display.py", line 5, in
    from taipy.gui import Gui, State, invoke_callback, get_state_id
    ModuleNotFoundError: No module named 'taipy'

    • @alexandresajus
      @alexandresajus  10 месяцев назад

      Are you sure you installed the requirements of the project (5:33)?

    • @Threecommaaclub
      @Threecommaaclub 9 месяцев назад +1

      hey, im I'm not sure if you're still running into this issue however I was able to solve this dilemma by creating a virtual environment as stated in the video try creating a virtual environment and if you need help there is a another video on RUclips that should solve that issue.

  • @GreggHoush
    @GreggHoush 11 месяцев назад +7

    You should disable those API keys and blur API keys in videos like these. Everybody wants free API keys.

    • @alexandresajus
      @alexandresajus  11 месяцев назад +2

      Good advice. I disabled these keys right after recording and they all have a hard rate limit

  • @tchen8124
    @tchen8124 10 месяцев назад

    What’s the point of using elevenlabs? Without carefully finetuning, the voice sounds robotic anyway. Kinda a waste of money

    • @alexandresajus
      @alexandresajus  10 месяцев назад

      What do you suggest I use? I looked for fast TTS AI services and stumbled upon Elevenlabs and did not ask too many questions. The whole point was trying to recreate Jarvis from Iron Man which has a robotic voice. It cost me a dollar for 30,000 characters

    • @kyouko5363
      @kyouko5363 10 месяцев назад

      ​@@alexandresajus I'm tempted to make a suggestion here but.. if it gets too popular I might not be able to use it anymore. I can't afford API keys, and rely on it every day to ingest documentation and large pieces of text without interrupting my programming. Even made a private Neovim plugin for it.. as for LLMs.. I am *this* close to saying to hell with it and writing a daemon or local webserver or something that'll instruct Selenium to forward queries and responses on a headless Chromium instance. I'm tired of there being no free API keys for LLMs, not even rate limited ones, when the browser experience is free to begin with, but the moment I want to see the text in my terminal and respond in my terminal, it suddenly costs money, despite me technically having reduced their server load by skipping all the unnecessary CSS, HTML and JS every time I want to just send and receive a goddamned string? I *thought* ChatGPT had a free rate limited API key, and conveniently around the time it became part of my workflow, the API credits equivalent of a free trial runs out, almost as if to give you a cake and then take it right back after the first bite. I'm rambling. But hey, at least I've got good TTS for free.

  • @PHG_Team
    @PHG_Team 10 месяцев назад

    bruh
    note: This error originates from a subprocess, and is likely not a problem with pip.
    ERROR: Failed building wheel for pyarrow
    Failed to build pyarrow
    ERROR: Could not build wheels for pyarrow, which is required to install pyproject.toml-based projects

    • @alexandresajus
      @alexandresajus  10 месяцев назад

      This is probably due to a Python version issue: you are probably using Python 3.12 and this project uses Taipy which only supports Python 3.8 to 3.11. Please try using another Python version. If this does not help, do not hesitate to give more details on the issue here: github.com/AlexandreSajus/JARVIS/issues

    • @PHG_Team
      @PHG_Team 10 месяцев назад +1

      ​​@@alexandresajusthx bro. If i delete display.py the assistant works? I want to create mine gui

    • @alexandresajus
      @alexandresajus  10 месяцев назад

      @@PHG_Team Yes you can delete display.py, both programs are independant.

    • @PHG_Team
      @PHG_Team 10 месяцев назад

      i'm italian adn i want to change speaking lenguage how can i do?@@alexandresajus

  • @Mirkolinori
    @Mirkolinori 4 месяца назад

    Good Idea but Eleven Labs is to expensive, the price is more then horrible for live tts… better you use the build in OpenAi tts. Also you can use the openai api whisper, assistant gpt and tts… all with easy tts. Quick cheap and easy

  • @n00ter99
    @n00ter99 11 месяцев назад +1

    That latency is painful

    • @alexandresajus
      @alexandresajus  11 месяцев назад +2

      Agreed, unfortunately that latency is very hard to shave off. We could probably reduce it a bit by hosting locally, using quantized/smaller models and streaming the data instead of doing each task sequentially

    • @chrsl3
      @chrsl3 10 месяцев назад +1

      it works so wonderfully, i wouldn't be bothered at all by the small latency.

    • @n00ter99
      @n00ter99 10 месяцев назад +1

      @@alexandresajus Measure the latencies of the things you mentioned - you'll find that implementing streaming all the way across the stack will solve most of it. I have spent the last year building low latency streaming models in order to get sub 100-millisecond latencies for various audio/speech startups, it's the only way to get speeds and responsiveness that feels natural

    • @alexandresajus
      @alexandresajus  10 месяцев назад +1

      ​@@n00ter99 I did profiling on each task and we are at about 1s for transcribing, 1s for gpt and 2s for generating audio. Really? Where can I find how to do this? What models/services were you using?