- Видео 33
- Просмотров 16 730
Tech Giant
Нигерия
Добавлен 5 авг 2023
- Tech reviews
- Unboxing of cool gadgets and drones
- AI & coding tutorials
🤖💻🚁
Don't forget to subscribe and ring the bell to never miss an update!
Stay Tech-Savvy🤟🏻
- Unboxing of cool gadgets and drones
- AI & coding tutorials
🤖💻🚁
Don't forget to subscribe and ring the bell to never miss an update!
Stay Tech-Savvy🤟🏻
Fish Speech v1.4 by @FishAudio High Quality Voice Cloning TTS Model
NOTE: This is just an update to my previous video on setting up Fish Speech v1.2 on Macbook M1 Pro
I'll be setting up Fish Speech version 1.4 by @FishAudio This is a great TTS model trained on 700k hours of audio data in multiple languages (English, Japanese, German, French, Spanish, Korean, Arabic, and Chinese audio data), it also performs wonderfully at voice cloning and TTS generation. The only downside is: it is CPU intensive and I recommend you get a beefy GPU if you intend to use it as part of your AI system's TTS engine. In this video I set it up on my MacBook M1 Pro and inference took a while, which I had to speed up so as to not waste your time.
🔗 LINKS
Code Repo: github.com/brainia...
I'll be setting up Fish Speech version 1.4 by @FishAudio This is a great TTS model trained on 700k hours of audio data in multiple languages (English, Japanese, German, French, Spanish, Korean, Arabic, and Chinese audio data), it also performs wonderfully at voice cloning and TTS generation. The only downside is: it is CPU intensive and I recommend you get a beefy GPU if you intend to use it as part of your AI system's TTS engine. In this video I set it up on my MacBook M1 Pro and inference took a while, which I had to speed up so as to not waste your time.
🔗 LINKS
Code Repo: github.com/brainia...
Просмотров: 202
Видео
High Quality Voice Cloning TTS Model - Fish Speech by Fish Audio
Просмотров 740Месяц назад
NOTE: This video is part of the Text-to-speech Comparison Series I'll be setting up Fish Speech by Fish Audio. This is a great TTS model trained on 300k hours of English, Japanese and Chinees audio data, it also performs wonderfully at voice cloning and TTS generation. The only downside is; it is CPU intensive and I recommend you get a beefy GPU if you intend to use it as part of your AI system...
Setting up a Realistic Text-to-speech; Bark (by Suno AI) locally
Просмотров 1,1 тыс.2 месяца назад
NOTE: This video is part of the Text-to-speech Comparison Series I'll be setting up Bark a transformer-based text-to-audio model created by Suno AI. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also generate nonverbal cues, such as laughter, sighing and crying. 🔗 LINKS Code Repo: github...
Simple AI Agent/Chatbot | MegaMind | I/O with Whisper.cpp & Piper TTS
Просмотров 3333 месяца назад
MegaMind AI is a barebones AI Assistant that's split in the three sections: STT | LLM/MLLM | TTS and was built as a structured chat agent, with the Langchain framework, together with a few python libraries whisper.cpp, to enable fast transcription of user's speech depending on user's PC hardware. 🔗 LINKS Project's Github: github.com/brainiakk/megamind.ai Whisper CPP (CoreML Support) Example Git...
Speech to text with Whisper CPP in a Python Project (with CoreML/Apple Silicon Support)
Просмотров 7573 месяца назад
Let's setup Whisper.cpp locally, in a python project and run the audio transcription or convert our speech to text, with the subprocess python package. 🔗 LINKS Github Repo: github.com/brainiakk/Whisper-CPP-CoreML-Example Whisper CPP Github: github.com/ggerganov/whisper.cpp Main Video: ruclips.net/video/PDz_9wChvcs/видео.html 🔗 MY LINKS Twitter: x.com/alhajibrain Instagram: techgia...
Gemini 1.5 Pro (latest) with Langchain's ChatVertexAI Package
Просмотров 2654 месяца назад
In this video, we'll set up the Langchain Google Vertex AI (Python) Package, and we'll also go through the steps to create a Google cloud console, service account and enable the Vertex AI API. Also we'll create a pretty simple chatbot while trying out the different versions of the @Google 's Gemini AI (Gemini 1.5 Pro 001, Gemini 1.5 Flash 001, Gemini Pro) 🔗 LINKS Langchain Structured Chat Agent...
Vision Tool & Screenshot Tool for Langchain Structured Chat Agent (Powered by Gemini 1.5 Pro)
Просмотров 3204 месяца назад
NOTE: This is a basic overview of a structured chat agent using langchain framework, the Multimodal LLM powering the agent and tools is @google 's Gemini 1.5 pro preview. You can play around with the code after cloning the GitHub repository below. 🔗 LINKS Setting up Text To Speech (Piper TTS): ruclips.net/video/hQe861JElXc/видео.html Github Repo: github.com/brainiakk/langchain-structured-chat-a...
Installing Piper Text To Speech Engine (on a Macbook w/ Apple Silicon)
Просмотров 8254 месяца назад
NOTE: This video is part of the Text-to-speech Comparison Series, but it also addresses the issue of installing Piper TTS (which requires Piper phonemize) on a MacBook arm64 🔗 LINKS Github Repo: github.com/brainiakk/youtube/tree/main/tts-comparison Original Piper Phonemize issue link: github.com/rhasspy/piper-phonemize/issues/14 🔗 MY LINKS Twitter: x.com/alhajibrain Instagram: _al...
Setting up Openvoice version 2 and MeloTTS for AI voice cloning
Просмотров 6 тыс.5 месяцев назад
NOTE: This video is part of the Text-to-speech Comparison Series I'll be setting up Openvoice version 2 and MeloTTS by MyShell AI. We'll follow their documentation closely and setup MeloTTS to work independently as a text-to-speech engine and as the Base speaker TTS for Openvoice voice cloning engine, so you can easily integrate it into your AI application. 🔗 LINKS Code Repo: github.com/brainia...
WizardLM2 function call - Using Llama 3 Tokenizer & Langchain's Pydantic OpenAI function converter
Просмотров 1135 месяцев назад
WizardLM2 function call - Using Llama 3 Tokenizer & Langchain's Pydantic OpenAI function converter
LLAMA 3: function calling review using llama index framework and Ollama locally.
Просмотров 5555 месяцев назад
LLAMA 3: function calling review using llama index framework and Ollama locally.
Port Harcourt City (Nigeria) | A Brief Cinematic OVER-view 🤣
Просмотров 976 месяцев назад
Port Harcourt City (Nigeria) | A Brief Cinematic OVER-view 🤣
CFly Faith 2 Pro Drone Review: Advanced Features, 540º Obstacle Avoidance & Impressive Performance!
Просмотров 5248 месяцев назад
CFly Faith 2 Pro Drone Review: Advanced Features, 540º Obstacle Avoidance & Impressive Performance!
Speedybee Bee35 Pro FPV Frame Review: Durable, Protective, and Feature-Packed!
Просмотров 1 тыс.9 месяцев назад
Speedybee Bee35 Pro FPV Frame Review: Durable, Protective, and Feature-Packed!
Honest Walksnail Avatar HD Pro Kit Review: Disappointing Range, and a Watery Demise 💔
Просмотров 11010 месяцев назад
Honest Walksnail Avatar HD Pro Kit Review: Disappointing Range, and a Watery Demise 💔
Freestyle using the @flyfishrc VoladoR II VD5 and video shot with the @CADDXFPV Walnut
Просмотров 1610 месяцев назад
Freestyle using the @flyfishrc VoladoR II VD5 and video shot with the @CADDXFPV Walnut
Exploring the Landscapes of Uniport with the CFLY Faith 2 Pro Drone
Просмотров 29610 месяцев назад
Exploring the Landscapes of Uniport with the CFLY Faith 2 Pro Drone
@flyfishrc Volador II VD5 Maiden flight
Просмотров 1510 месяцев назад
@flyfishrc Volador II VD5 Maiden flight
bro, is you Nigerian?
Curious, what is this explorer window that shows the content being populated as you run the commands?
What are the required machine configs for this? I'm running out of memory on T4 Tesla 16 gigs on cuda and my ram 28 gigs on cpu
I ran it on my Macbook M1 pro but it also supports cuda if your gpu supports that, in the video I switched to cpu to run it on my mac
Is that support Bahasa Indonesia?
Does it have a docker setup?
not sure
hello pro big big fan pro finally its run perfectly with me 😍 thank you for this Awesome tutorial just one more question i have GPU 4090 on my laptop what i need to run this on GPU i try to change this code device="cpu" to device="cuda" i have 2GPU's intel Build in and Nvedia GFORCE RTX 4090 thank you in advance pro 😀
I use a macbook that's why I switched from cuda to cpu. You can reach out to Nvidia support if you're having trouble with their hardware, but make sure you've installed the necessary drivers for that graphics card so you can use cuda, before reaching out to them.
@@techgiantt i have Cuda setup and it work fine with many other projects but what change that need to do ro run in GPU
Please pro make full clear tutorial
Hello pro i fellow your tutorial on fish speech Please make full clear tutorial about This model 1.4
we still wait for clear tutorial about this model please pro
Hello Sir thank you so much for this tutorial. I'm wondering if you can do the cloning but using Bark (the one you used in last video ) so the reading also sounds more human ?
hello i try all solution that you give me still same errors please i need your help best regards
hello pro good tutorial but i have mistake can you review with me what is problem did almost every thing like what you did its still show me No module named 'fish_speech.conversation' i check the path and every thing is good
Did you use version 1.2 (that's what I used)? but there is a new version (v1.4).
@@techgiantt yes i use 1.4 still same problem Can i have your contact No i need your help pro
@@techgiantt i need your contact No I need your help
@@mahaltech Just get version 1.2 from their github releases page, and follow the other steps I showed in the video.
@@techgiantt there is some defrance
The project can also read PDF documents and I am willing to connect it all into a super all in one project wit UI this months.
I would also Love an UI to select languajes between your outputs and select characters between your characters .TXT prompts. And for last a selector for your vaults .TXT to tell the LLM what do you expect the AI to know about your needs.
Thanks, I think you need a better 🎤 mic, your voice is so low, probably will put people to sleep, voice quality is more important than video, good luck
Sorry about the inconvenience, had some problems with the mic IO. Would switch it out in the next video.
great job!by the way, did you try vllm to load mistral llm? vllm can start an OpenAI API-compatible server with: python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-Instruct-v0.3 from langchain.chat_models.base import BaseChatModel creat a custom class CustomVLLM:class CustomVLLM(BaseChatModel), then llm = CustomVLLM(base_url="localhost:8080/v1"
@@baoxinping3081 I’ll check it out, I’ve been postponing checking it out for a while. I’m also working on updating the project, maybe I’ll cram it into one video.
awesome
Having issues with piper, not able to find the module regardless of my attempts.
I didn't set up piper in this video
@@techgiantt no worries, was having some bizarre dependency issues with my venv, did a clean pip of the requirements and all ran smoothly besides some ALSA issues but cleaned it all up. Thanks.
guys make sure u r using python 3.10 or it gets weird. But also this was an amazing tutorial! thanks!!
is it utilizing the GPU of the apple silicon?
♥
As far as I understand, this project is for Linux and MacOS only. Since coremltools only works for them. But the video is still good.
For those who don't want to mess around with a complex installation, use Pinokio
Nice work man
Thanks man
While executing python -m main, I'm getting the following error: cannot import name 'TTS' from 'melo'
@@SurrenderToAction did you install Melo TTS ? Because it’s in the modules directory and you need to install it in that virtual environment you setup
@@techgiantt As you do on the 3:48 of the video, right? Yes, I did that, but I had a bunch of issues. Maybe because I was using last version of python, and went to 3.9 later..
@@SurrenderToAction Check ruclips.net/video/UsuuSgnOJxg/видео.html
@@techgiantt Yeah, I did rename it to "melo". Maybe better do everything from scratch. One more day..
@@techgiantt I'm getting this error when installing melo: ERROR: Failed building wheel for tokenizers
got this error when genereting the audio file ../lib/libespeak-ng.dylib' (no such file)
why you mounted the gps like that......... its too bad the frame comes with the mount broo
The GPS mount is too small for the standard size of GPS
Wow, This is amazing 😮😮
Thanks, Can i run this on CPU because i don' t have GPU
I think by default it runs on CPU, just ignore the CoreML support bit.
A bit confusing. What's the relation between MeloTTS and OpenVoice V2?
Melo tts can act as a stand-alone Text to speech engine or as the Base speaker for Openvoice v2. Openvoice is both a tts and a voice cloning engine. The Openvoice v1 can do without Melo tts as the base speaker
@@techgiantt Thanks for your reply. I'm able to play English voice without any issue. But when I play Chinese, I got the following error message: RuntimeError: Placeholder storage has not been allocated on MPS device! Any suggestion? Thanks.
Does it work good on other languages audio because i have tried on bark and tacotron 2 but did not get good results for hindi language, thanks for video keep giving good content 😊
I think it’s mostly English, Japanese, Chinese, French, Spanish, Korean language that’s supported, but they it also has Indian accent
Hello brother, can u share codes ?
For the text to speech or ai agent using tools, because I already have a video on the TTS and a GitHub repo in the description of that video
Is there a way to make these TTS more expressive.
Yes, but you need a beefy gpu to use it with an ai model since you won’t want extra latency, but I’ll create a video for that.
@@techgiantt I think it would be amazing if they could act, expressing emotions anger, sadness, sorrow, compassion, confidence, hesitation, shyness, embarrassment, bravado, whisper, fear, shout, laugh, etc. moods and personality expressed via voice.
@@komakaze1 exactly my thought. Is there any other alternative out there, one that doesn't cost 20 bucks per month?
try making it run with ollama phi3 and llava for vision of phi3 vision please if possible also where can we get the code i have also set of tools that can be used in ollama and phi3 hit me to talk about it if you want !!!
usa o Wizard, ele é melhor como assistente, estou procurando projeto para compartilhar a tela do computador, e ele descreve e ajudar com codigos.
good bro!
Thanks 🔥
Cool campus
Id like to see more how to achieve this. each agent is a different llm instance? Thank you!
I can complete the build and install. All good. But when I run piper to gen speech, I cannot get past this: Error processing file '/usr/share/espeak-ng-data/phontab': No such file or directory. And MacOS will not allow installing anything into /usr/share (yes, I know of the work around for that, but not willing to do it).
I mentioned a different way to install piper locally if you don't want to use Virtualenv, I think it's in the readme.md.
@@techgiantt THANK YOU! I missed that part.
I get an error when I create make cmake -Bbuild -DCMAKE_INSTALL_PREFIX=install make: cmake: No such file or directory make: *** [all] Error 1
When installing Piper phonemize?
When installing Piper phonemize?
No that all seems to work. It is when building by using "make."
There's no dot after the "make" command. It's piper phonemize I used the command to setup, after checking out to a new branch from a commit id
You can remove the build directory and also install directory in the pp folder, if you have issues with cmake
bro you need to create your model .... first choose your model type (me i use mistral) .... and stay with it !!! <<< merge your way up and train for all your use cases !! hence becoing a powerfull model ! <<<< its a personal AI in the end ::: you can do all your tutorials with your model!! << do not spend money on outside models :: I made the swahilibots to begin ... but they also need training . i used my super model as based so its just giving the model enough samples in swhaili for the brain to convert itself !!! <<< we cannot start from scratch (wasting money too bro).... Good work on the tutorial !! << these are just components to add onto your model (until you train it to perform these task internally).... So if you create examples of functions and function calls and completions in the prompts the funcitons you created inside the prompt in python the model will use later in its other methods! so essentially we can train the model to perform a single function that we create in python as long as we give it many examples of its execution ie inouts and outputs as well as requests for data and the function be recalled in the brain and used to produce the answer!! <<< like an internal app!! (so we can secretly train these functions in) we can use these funciton calls with its verbose outputs to create examples of its use and use this verbose as data to fine tune th emodel on task!! <<<<<huggingface.co/collections/LeroyDyer/swahilibots <<<<<
Could you share the code please
Definitely, currently creating a video breaking it down, because there are different ways to achieve this. Once I’m done I’ll share the link to the repo here also
@@techgianttthank u brother, write me if u have time and get money ))
Could you share the code
github.com/brainiakk/langchain-structured-chat-agent
Is there a download link for the zip file?
Yup, the link to the GitHub repo is in the description
great bro!
Great job. Can you show how a PDF or TXT file can be uploaded and used instead of cutting and pasting or typing text? Most videos show short phrases but if you want a paper or a document made text to speech, how would you go about doing this? Thanks again.
Interesting 🤔 might do a video on it, but it’s as simple as adding a function that parses the text from the pdf or txt and unto the text to speech function. It can be passed in chunks, maybe paragraph by paragraph. You could also accept an input of the document file path once you run the python script or make it more lively by using the tts function to say “please provide the file you want me to read” and a file dialog opens your file explorer. It all depends on what you want but it’s possible
What operating system are you running? Will this work for Windows?
I'm using OSX (Apple Macbook), I think it should work fine on windows
@@techgiantt Appreciate the response! I'm going to give it a shot. Subbed!
@@justindaniels5923 Thanks for the sub
Thank for library that made possible for us to do illama3❤
Hello, first thank you for the tutorial. Currently there are not many out there 🙂 But when running, I'll get this error: Loaded checkpoint 'modules/openvoice/checkpoints_v2/converter/checkpoint.pth' missing/unexpected keys: [] [] Any idea what might be wrong? Thank you!
Did you set the directories up exactly as I did? Also, make sure you copied the downloaded checkpoints_v2 folder to the openvoice directory properly and if you did all that, you could go to their huggingface page and redownload the Openvoice V2 converter/checkpoint.pth to replace the old one. I'll add their huggingface link in the description.
Hold on, do you mean: [ ] ? Because what I'm seeing in your comment is this: [] []
@@techgiantt Yes, exactly! Maybe a copy and paste issue.
@@eucharisticadoration Then, that's not an issue. I don't know why they didn't hide that output on this version. Just let it run, if there was a missing key it would be written the square brackets like a list, but since its empty that means everything is okay.
@@techgiantt Ok, thank you very much! Finally I've realized that I had to change some more things (paths) in voice.py and now it is running 🙂
Great. Thank you.
You are welcome
video is really dim, unviewable
My apologies; I think something went wrong in post production. But I'll keep that in mind when uploading the next video, because I'll be updating the codebase. Thanks for letting me know.
hey, cool video! do you have source code published somewhere?
github.com/brainiakk/youtube/tree/main/llama3-function-calling
@@techgiantt Thanks, much appreciated :)