The BEST, Local Text-to-Speech Generator - AI Voice Cloning (Tortoise TTS)

Поделиться
HTML-код
  • Опубликовано: 23 авг 2024
  • If you wanna take a look: git.ecker.tech...
    Running the Voices - 3:00
    TTS for LLMs Talking - 7:15
    Audio Book Reader - 8:15
    Come join The Learning Journey!
    Discord - / discord
    Github - github.com/Jar...
    TikTok - / jarodsjourney
    If you found anything helpful, please consider supporting me and the content I am trying to produce!
    www.buymeacoff... |
    Hardware for my PC:
    Graphics Card - amzn.to/3pcREux
    CPU - amzn.to/43O66Ir
    Cooler - amzn.to/3p98TwX
    RAM - amzn.to/3NBAsIq
    SSD Storage - amzn.to/42NgMFR
    Power Supply (PSU) - amzn.to/3NBAsIq
    PC Case - amzn.to/447499T
    Mother Board - amzn.to/3CziMXI
    Alternative prebuilds:
    Corsair Vengeance i7400 - amzn.to/3p64r22
    MSI MPG Velox - amzn.to/42MnJHl
    Cheapest and minimum specs recommended:
    Cyberpower 3060 - amzn.to/3XjtZoP

Комментарии • 166

  • @waldevv
    @waldevv Год назад +11

    Glad this sort of thing is available locally. A lot of the AI stuff is awesome but 11labs for example has pretty steep character limits if you really want to mess around. Messed around with the starter version of 11labs for quite a while and the 40000 characters per month really isn't that much
    I think we're not far from being able to run all the crazy AI models out there locally, I mean of course a supercomputer can always work better but at least we're not doomed with api token costs for everything

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Totally agree, I love the quality of 11labs, but if you're innovative with local stuff, we can now get very close to 11labs quality.

  • @BobDoyleMedia
    @BobDoyleMedia Год назад +6

    Wow, so glad you made this! I used Tortoise a lot a while back with the original colab. I LOVE that there is a new interface. This is the find of the day.!

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Appreciate it! The GUI fork opens up a lot of knobs and values previously hidden, so it's definitely pretty awesome in my eyes!

    • @BobDoyleMedia
      @BobDoyleMedia Год назад

      @@Jarods_Journey took me a bit to get it going, but for whatever reason, all of these installations take forever on my system, even though I’ve got a fast GPU and lots of RAM. Anyway, I’m awaiting your video on how to train, meanwhile I am using the quick and dirty utilities to create quick voice clones and it’s working great.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      @@BobDoyleMedia that's awesome haha! If you're already training voices, I'm not sure I can provide much more additional value other than what's on the GUI. If anything, the author was so gracious enough as to writeup what mostly everything means and it's quite comprehensive. I'd take a look at that id I were you to get a head start!

    • @BobDoyleMedia
      @BobDoyleMedia Год назад

      @@Jarods_Journey Well, I'm not so much training voices as I am dragging and dropping a :60 clip into the Utilities area and creating a quick voice like we did with the old Tortoise. I haven't trained anything with the Train tab.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      @@BobDoyleMedia Gotcha gotcha! Well tutorial soon, so we'll see how that goes

  • @mattfarmerai
    @mattfarmerai Год назад +6

    Great video man. Inspired me to jump into Tortoise and make a short reviewing it and teaching people how to set it up. I just used demo voices, but your examples are amazing.

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Glad I could inspire ya, the short, 1 minute tutorials being nice and concise are perfect xD!
      The one used here is a forked version of neonbjb's that incorporated training into it, allowing for the examples you heard.

  • @Userguy931
    @Userguy931 Год назад +8

    Awesome video! Can't wait for the guide.

  • @darksydeflow
    @darksydeflow Год назад +15

    Hey man, can you do more videos on the different/best ways to do Speech-to-Speech voice cloning, and how to train custom models? Some of the current workflows are kind of complicated/inconsistent, and I'm hoping that the tech becomes more streamlined in the near future with better GUI's and workflows.

    • @Jarods_Journey
      @Jarods_Journey  Год назад +5

      The workflows are kinda clunky right now and when errors are encountered, it's evenso much harder to debug, but some things are in the pipeline. Currently testing and working with RVC right now so that'll be the next thing

    • @LucidFirAI
      @LucidFirAI Год назад

      Speech to speech is way easier now at least, RVC kicks ass :) I'm going backwards and trying and struggling to learn how to do TTS ruclips.net/video/qZ12-Vm2ryc/видео.html&ab_channel=p3tro

  • @jokermitsu7863
    @jokermitsu7863 Год назад +3

    Must haves gradio UI guide for voice clone training.
    Thanks for these videos, they help a lot.

  • @Neamerjell
    @Neamerjell 2 месяца назад

    I've been toying with the idea of creating an AI assistant and have been dismayed at the amount of data I would need to send to multiple remote servers just to make it work. I found your video and "Dude! This can mimic any voice you train it on and it runs LOCALLY??? **Gasp** The possibilities..." The audio book functionality is particularly intriguing.

  • @MrKiingpin
    @MrKiingpin Год назад +5

    Id love a guide, im used to using Tortoise-TTS through cmd, but training voices using the version in your descriptions is hard for me but I think would make my voices better

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Finished the latest tutorial on Tortoise-TTS GUI, let me know what you think! It's pretty fantastic!

  • @RobertJene
    @RobertJene Год назад +1

    2:55 I will be back for the setup video!

  • @micahblakeslee
    @micahblakeslee 7 месяцев назад

    Thanks for all of this! As a writer, it's really useful for me to get good TTS audio readings for proofing. TTS will just read your typos straight, unlike me when I read my own work, so it's a cool tool to have. That this is available to make the voice listenable, unlike most TTS apps, and that it's free is really great.
    Also, at 4:50, I'm on a 1070 and am able to get this to work. It's not even that slow. On a 1070, I was able to generate a reasonable-quality 20-minute reading in about 1-2 hours with your audiobook tool.

  • @rettbull9100
    @rettbull9100 Год назад +1

    You need longer audio samples for Elevenlabs, 4 secs usually isn't long enough. You need about 10 seconds and with the slow pace of the speaker maybe even longer. Also you can do the same with changing and adding words.
    Edit but EL doesn't have the training. I will have to look at the TTS.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Agreed, but even still, on longer samples, it messes up on the prosody (a fancy new word I learned lol).
      Because you can "train" the tortoise models, it allows for more accuracy in this area which is what I've come to find. I also spent like 5k characters just trying to get a sentence sounding correct on Eleven Labs, which is definitely not cost efficient 😅😂

  • @Alfita04
    @Alfita04 Год назад +2

    Now, I want to do it too! Thaks for your work!

  • @beetlejuss
    @beetlejuss Месяц назад

    Tortoise samples sounded more expressive but there is a weird vibrato in some cases that is very artificial...

  • @Avitarzims
    @Avitarzims Год назад

    Nah ngl I saw ur wallpaper and immediately had this massive sigh of relief and went okay this guys based

  • @LiteLiger
    @LiteLiger Год назад +1

    Whats the status on Non-english languages? Can I feed it samples of Chinese voice actor and then it will use that for English?

  • @WEPNewsEntertainment
    @WEPNewsEntertainment Год назад +5

    Sorry, but this does not even come close to 11 labs

    • @Jarods_Journey
      @Jarods_Journey  Год назад +2

      Eleven Labs vs Tortoise TTS

    • @chaos0987654321
      @chaos0987654321 4 месяца назад

      You really went "nuh-uh"
      It's supposed to be a meme
      Not genuine discourse💀🤣

  • @thivu5793
    @thivu5793 6 месяцев назад +1

    Hi Jarod, awesome video! I'm new to all this and im hoping to learn more about tortoise tts. It would be nice if you make a video help beginners install tortoise!!

    • @aryanhussain7583
      @aryanhussain7583 13 дней назад

      bro i saw your comment is 6 months old. but I am new now. Tell me how to install and use it can you share anything that helped you the most

    • @thivu5793
      @thivu5793 13 дней назад

      @@aryanhussain7583 I got it installed but I had some minor issues. I followed this youtube video for installation. ruclips.net/video/hOSsGOmDC3w/видео.html

  • @rickarroyo
    @rickarroyo Год назад

    I really liked the result, it even encouraged me to learn how to use it :))
    I think the advantage of EL is the ability to generate in multiple languages.
    I tested it in Portuguese (which is a bit complicated) and it worked very well
    Thanks!

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Indeed, it's much more straightforward using EL and the results are still really awesome. Looking right now to see how to use different languages with Tortoise TTS so hopefully I can get that working.

    • @igorc.alonsorocha
      @igorc.alonsorocha Год назад

      Portuguese is my country language and I will test too.

  • @dm4life579
    @dm4life579 Год назад

    That's more than acceptable audio quality and honestly verges on the outstanding.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      I agree, and I've been playing around with new techniques that further it even more :O

  • @canberkguitar-sg6qu
    @canberkguitar-sg6qu 4 месяца назад

    It’s good but as you hear it’s little bit robotic and idk how we can fix it 😅

  • @greenockscatman
    @greenockscatman Год назад

    Tortoise TTS is very solid! The only downside is the models eat up your HD space quickly, with each snapshot clocking in at 3 gigs!

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Yuuuppp, which is why I really only save the latest model and delete all of the checkpoints etc. Eats up about 2GB per model for me

    • @androidgameplays4every13
      @androidgameplays4every13 Год назад

      @@Jarods_Journey how do you delete the checkpoints?

  • @RobertJene
    @RobertJene Год назад +1

    4:49 I am curious how much GPU VRAM is consumed during a generation.
    You can check by monitoring Task Manager -> Performance -> GPU memory

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      100% ish, depends. I've clipped by audio samples to be maxed by the voice chunks option, which I set to 0. If not, then you have to increase the voice chunks in order to chunk the audio

    • @RobertJene
      @RobertJene Год назад

      @@Jarods_Journey I use a python script I made that shows me the amount of gigabytes in use, so I can gauge "Oh this stable diffusion thing I can still edit videos while it runs"
      ~or~
      "I have room to increase the batch size maybe"

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Smartttt. I usually do one or the other though, if I'm training a model, that means now I get time to chill instead of editing a video haha.

  • @RobertJene
    @RobertJene Год назад

    1:20 Eleven labs is smoother / less glitchy sounding but also less emoted

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Yup yup, it produces a much more "refined" voice, however, it's a little too clean. It loses the emotion and emphasis, as well as any dialect you might feed it.

  • @reyvn77
    @reyvn77 10 месяцев назад

    Thy strength befits a crown!

  • @M4rt1nX
    @M4rt1nX Год назад +1

    Can't wait

  • @Bicyclesidewalk
    @Bicyclesidewalk 8 месяцев назад

    Tortoise sounds so much better~

  • @TheSheakhaamir
    @TheSheakhaamir Год назад

    RELLY APPRECIATED WONDER FULL WORK.

  • @Denno876
    @Denno876 6 дней назад

    so if your not a big coder etc etc... how do you install this simply

  • @AiDominance1
    @AiDominance1 Год назад

    That honda accord part 🤣🤣

  • @cloudboysmusic5223
    @cloudboysmusic5223 Год назад

    "Have you heard of the high elves?"

  • @Mart-E12
    @Mart-E12 Год назад

    Cool Rem wallpaper yo

  • @smarthalayla6397
    @smarthalayla6397 Год назад

    Thanks for sharing this with us. Please explain to us why all of those ai software do not come with an exe files do can be double click and it work like other portable or non portable software that it can be run locally on the computer without a need for internet or graphic card. Many of us do not have the knowledge of programing to make it having an exe file that can be double clicked and it runs locally on the computer..

  • @gu9838
    @gu9838 10 месяцев назад

    nice to know i might "cheat" and "clone" voices from eleven labs lol

  • @giuseppedaizzole7025
    @giuseppedaizzole7025 2 месяца назад

    Hi...is there any way to control de speed of the generated audio? thanks

  • @LucidFirAI
    @LucidFirAI Год назад

    Do you have a video tutorial for setting up Local LLM? I played around briefly with Kobold and didn't get very far. I've had success with a bunch of things but local LLM and TTS are proving challenging.

  • @SpudHead42
    @SpudHead42 Год назад

    What would use suggest using to create AUDIOBOOKS? every system seems to have very limited output lengths.

  • @aniversext
    @aniversext 6 месяцев назад

    Will it work on other languages?

  • @justwhatever9217
    @justwhatever9217 Год назад

    Very cool stuff! Do you have a link to the AI Audio Book Generator? Is there a way I could give the generator a full novel in a .TXT file and it automatically generates a .WAV or .MP3 of the full novel without having to run the sentences one at a time? Thanks!

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Yup! I am working on a release for this so hopefully when I find the time, I can both upload and post a RUclips video about it

  • @YannMetalhead
    @YannMetalhead 4 месяца назад

    Good video!

  • @ksk5058
    @ksk5058 Месяц назад

    hey i wanna that repo of bookmaker

  • @zafkieldarknesAnimation
    @zafkieldarknesAnimation Год назад

    Hello help me error
    (When start Training get an Error:
    (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 0: invalid start byte)
    If you use the learning rate scheduler (calling scheduler.step()) before the optimizer’s update (calling optimizer.step()), this will skip the first value of the learning rate schedule. If you are unable to reproduce results after upgrading to PyTorch 1.1.0, please check if you are calling scheduler.step() at the wrong time.

  • @producer8587
    @producer8587 Год назад

    Where is the how to actually make the model to run? I have models but how the hell do you get to this step. There’s no set up video still.

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Skip to 17:00 of this video here: ruclips.net/video/6sTsqSQYIzs/видео.html&lc=UgwYIku4QJPhO6nPvgR4AaABAg

    • @producer8587
      @producer8587 11 месяцев назад

      @@Jarods_Journey thanks. Do you have the link for the google colab to start the gradio program as mine only has 3 tabs I can’t seem to find the way to get the tts part to work. Only clones my voice or a aCapella. I can’t seem to tts. It always just says error or this is a sample , hope you like my voice model. If you have the colab link for your one on this video to load my models in that would be amazing. Great work, just wish I was a tech genius as u It’s annoying

  • @candyman3537
    @candyman3537 4 месяца назад

    Is Tortoise supporting multiple languages?

  • @realjgerard
    @realjgerard Год назад

    Awesome video! Quick question: If I my goal was to create music (singing) with AI trained voices, could I use Tortoise TTL for that? And, second quick question lol, is this best achieved natively with a PC or could I use my non-M1 MacbookPro? Thanks again for the great video! 🙏🏾

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Singing is completely different architectures ATM, so you have to use SVC for singing, and for the computer question, it really depends on your systems specs! I believe it has to be Nvidia due to cuda for the speed, but I can't verify for other systems 😅

    • @realjgerard
      @realjgerard Год назад

      @@Jarods_JourneyThank you soooo much!

    • @igorc.alonsorocha
      @igorc.alonsorocha Год назад +1

      RVC is recommended

  • @joantheringo1192
    @joantheringo1192 2 месяца назад

    Ocupo la voz de ranni contando el lore de elden ring :D

  • @jason_v12345
    @jason_v12345 Год назад

    I don't quite understand how this experiment works when Eleven Labs provides so many adjustable parameters.

    • @mrschneebly85
      @mrschneebly85 Год назад

      It doesn't. You only have 2 parameters on EL and both are garbage tbh.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      This repository gives you access to a lot of parameters that are hidden from user facing interfaces. Makes it much more customizable, which is how you can narrow it down to be better than EL imo

  • @kolimuttai
    @kolimuttai 8 месяцев назад

    Is it possible to train my voice into this and give a script to it snd export the scripted voice as mp3?

  • @Foryourkids123
    @Foryourkids123 Год назад

    hi, can you create tutorial how to make TTS GUI installation? thank you

  • @ckm5528
    @ckm5528 Год назад

    i have models that i downloaded from huggingface meant for RVC do they not work on Tortoise TTS? if they do have you done a video tutorial on how to utilize them. thanks

  • @testales
    @testales Год назад

    Really impressive, even more so when the two were talking to each other! Any chance you make your trained models downloadable? Btw. I also got a few of those delete-me-folders. :-D

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Appreciate it! Can't share the models unfortunately, but I will be sharing on how you can train your own model. The deleteme directory is becoming its own directory by now lmao, but at least I know I won't feel bad if I delete it 😂

    • @testales
      @testales Год назад

      @@Jarods_Journey It was worth a try, I mean "Traveller from beyond the fog. I am Melina." That's just EPIC and I'd like to have an epic assistent. ;-) Though I'm afraid that there's no way yet to plug Tortoise TTS into Oobabooga or Silly Tavern. I'd really like to dive into scripting AI stuff but my hamster wheel keeps me too busy and there is sooo much to learn. So aside from some minor fixes in the Stable Diffusion batch processing I couldn't do much so far. Btw. my delete-me folders are actually the successor of my various temp folders because everytime I intended to do some clean-up I'd already forgotten if there was something important in them and I'm somewhat of a data messy. ;)

  • @AMDSTT
    @AMDSTT Год назад

    i want you to help me i need to clone my voice to use it in excel tutorial videos to speak english well which github you do you recommend

  • @jonathaningram8157
    @jonathaningram8157 Год назад

    what is this video player ?
    Anyway, that tech is insane. I imagine modder being have to create custom quests in game fully voiced or indie game dev.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Yeah, 100% crazy. The video player is mpv: mpv.io/

  • @muoity4418
    @muoity4418 Год назад

    How many languages Tortoise tool support ? I live in South Earth Asia

  • @scottfossil7731
    @scottfossil7731 Год назад

    Thanks for the video. The instructions were very clear and it was easy to set up. Despite training and adjusting the settings I don't think it's quite as accurate as 11Labs, maybe 75% there. Have you shared the code for the audiobook reader?

    • @Jarods_Journey
      @Jarods_Journey  Год назад +2

      Clarity wise, 11L all the way as it's much more crisp. But for prosody... Tortoise TTS does a much better job at this (though less clear)
      I'll get to uploading the audiobook reader to GitHub today, thanks for the reminder 😅

    • @wenwu669
      @wenwu669 Год назад

      @@Jarods_Journey Thanks Jarods! But I can't seem to find the audiobook reader in your repo?

  • @themeisgames7784
    @themeisgames7784 6 месяцев назад

    Please tell me, I will generate a Speech from the text, can I use it for commercial purposes?

  • @brendanfernes61
    @brendanfernes61 9 месяцев назад

    Hi Jarod, for the two AIs talking to each other, how do you get them to have no latency when responding to each other? When I use the 11 labs API, there is a few seconds of silence while the audio is being generated, then the AI starts talking.

    • @Jarods_Journey
      @Jarods_Journey  9 месяцев назад

      Since everything is local, back when I did this my GPU was fast enough to keep up once the audio queue was full, meaning the audio was being generated faster than it could be played. It's gotta be done in some type of thread or queue so that the code isn't being blocked while audio is being read

  • @digivagrant
    @digivagrant Год назад

    Any chance you have tutorials on how to install tortoise on W10?

  • @gareththomas3234
    @gareththomas3234 Год назад

    Hey what happens if we run this on superior hardware? I am into publishing and want to make audiobooks. So I need hyperfast. Dont mind spending a bit on compute if the end product is cheaper than 11labs. Does it speed up on paid colab plans?

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      It does indeed speed up on superior hardware, but by how much I can't give you concrete numbers on. You might wanna check out my video on the 3060 vs 4090, but the 4090 only inferences closer to 2x faster than the 3060 which is different than the related training speeds (4-5x).
      Beyond that comparison, I'm not sure where the bottleneck in the system is for faster speeds as I haven't dived that deep into it yet.

  • @christosmelissourgos2757
    @christosmelissourgos2757 5 месяцев назад

    Hey there! Does it work for multilingual??

  • @ced.studios
    @ced.studios Год назад

    lost my sh*t on Honda Accord my man!!

  • @maroindefinitlyhuman6857
    @maroindefinitlyhuman6857 Год назад

    Is there a way I can get the voices other people trained?

  • @dustinsuburbia
    @dustinsuburbia 11 месяцев назад

    What is the inference speed? I know there are wild variables - but to use with a local chat llm using a 4090, is it anywhere near real time?

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      Depends on the size of the LLM, but you'll be getting 5-10 seconds of delay on speech with tortoise

  • @CaptainSnackbar
    @CaptainSnackbar Год назад

    i've been searching for an option to train a model for tts

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      And this is it :)! A tutorial on its usage should be coming later this week

  • @21tribes46
    @21tribes46 Год назад

    Do you know if it's possible to switch to different languages, like German? Thanks for the video!

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      It is, but takes a little bit of tokenization knowledge. I have no clue how to do that but you might wanna check out this channel here: youtube.com/@nanonomad
      I believe he does train in German

  • @naturalbeauty19964
    @naturalbeauty19964 Год назад

    Dude, can you write in more detail how to install it? what do you need to download? for more details please, I'm completely ignorant, I see you have a visual code and how to install it, what language to choose there, etc. please for more details

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      More details granted: ruclips.net/video/6sTsqSQYIzs/видео.html

  • @RobertJene
    @RobertJene Год назад

    SMH I can't keep a straight face for "finger maidens" If I played that game I would make up NSFW lore as the game progressed

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      LOL well I'm sure you're not alone in that, probably entire reddit groups dedicated to Elden Ring Alt. Lore lol

    • @RobertJene
      @RobertJene Год назад

      @@Jarods_Journey LOL I do that when I'm playing games or watching movies when I see something weird or funny

  • @kamalkamals
    @kamalkamals Год назад

    what about others languages ??

  • @hemanthedaoo8601
    @hemanthedaoo8601 Год назад

    Great video! I want to use Tortoise TTS instead of Eleven Labs because I don't want to pay. Could you help me implement or tell me how to implement Tortoise TTS in the AI assistant script? It will be very helpful. I am eagerly waiting for new videos. Have a nice day.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      A video for that is in the pipeline, if you wanna see more you could try watching some of my Livestreams until I get the chance to talk about it more in a video

    • @hemanthedaoo8601
      @hemanthedaoo8601 Год назад

      If I am unable to complete the task on my own, I will use your video as a last resort to succeed.

  • @AbuMan77
    @AbuMan77 Год назад

    How much do you charge me for cloning a voice in Spanish?

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Unfortunately, Spanish would require a bit of work for tokenization andddd I don't sell my services atm. I can point you towards the right direction though and this guy seems to have done tome of tokenization for other languages youtube.com/@nanonomad

  • @user-mohammwd
    @user-mohammwd Год назад

    Does it work with diffrent languages such as arabic, germen. Etc?

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      It can train, but you need custom tokenizers

  • @TheRonoxcz
    @TheRonoxcz Год назад

    Hey man! Thanks for great guide! I have a question if you or someone know how to train TTS on other language. I want to clone my voice in czech so TTS will be also in czech. Can you point me right direction? Thanks a bunch!

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      You'll need a tokenizer and you'll have to train the model to understand a new language. Might wanna check out another guy called nano nomad who has done this

  • @luke5306
    @luke5306 Год назад

    Any fixes for "CUDA out of memory" issue?

  • @oprelia-ai
    @oprelia-ai Год назад

    any other alternatives with better interfaces?

  • @igorc.alonsorocha
    @igorc.alonsorocha Год назад

    Better than RVC? Its my question.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      They're completely different architectures and software unfortunately, so the comparison is nearly impossible to make. For text to speech, definitely my favorite that I've come across so far.

  • @hughsilva5655
    @hughsilva5655 Год назад

    I have an idea for a new video. Well, since I'm an absolute dunce, could you make a video explaining all this ai voice stuff, maybe perhaps a separate one for ai image stuff as well.
    Like, I know squat about coding, and most of these things seem to require some level of coding to truly understand, with all the errors and GitHub non sense. Maybe you could explain how GitHub or similar sites work since whenever I go to one of those sites, I dont see a big glowing shiny button that says "DOWNLOAD" so I'm instantly lost. And when I finally download something, there are never instructions to learn how to actually set up the program, like with this one, I downloaded it but can not figure out how to start it up on my own.
    Perhaps you can also explain how you learn and know how to do these stuff, which to you, seem like second nature, but to dummies like me are like solving calculus equations. Maybe also explain how to make the voice training sets for these voice ai's too? I don't really know how to go about doing something like that, if I need a specific file to store them in or something else.

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      The step-by-steps on my channel will come, they just take many more hours to make than the showcase ones (which is this one). I didn't show it here, but the author of the Git.ecker page was friendly enough to give us a step-by-step guide to follow: git.ecker.tech/mrq/ai-voice-cloning/wiki/Installation. You better bet I asked plenty of questions to ChatGPT about what many of those things meant.
      We all start from somewhere! In the beginning, everything was quite grueling and rough, it took a lot of time to get used to these things but the hard work I put in has seemingly paid off! With the free availability of ChatGPT, I really recommend you just ask it anything and everything that is confusing, and then ask it to simplify things even more. I like to ask it "Explain it to me like I'm a 5 year old" and this usually helps me get the gist of topics quicker than normal so that I can have a relative understanding for how it works. Haven't gotten around to my workflow for learning new things yet, but I plan to so that others might be able to follow in suit!

    • @hughsilva5655
      @hughsilva5655 Год назад

      @@Jarods_Journey Thank you so much for the feedback. Yea, I somehow forget how useful chatgpt is so thank you for reminding me. It really helped me out when I was struggling in my classes so I should've thought of using it in this situation lol.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      @@hughsilva5655 Absolutely, the tool is a godsend. Helps so much with literally EVERYTHING 😂

  • @limbertacha4742
    @limbertacha4742 11 месяцев назад

    Any ia like this to sing?

  • @Donxzy
    @Donxzy 8 месяцев назад

    Not bad video, but for a video about audio, your own audio is too sharp, kinda hurts my ears, (pop filter, noise gate or compressor should help)

  • @Sport90979
    @Sport90979 11 месяцев назад

    its work indonesian voice sir?

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      It would need an indoesian tokenizer for a different language other than english. A bit beyond what I've tried out so far.

  • @knmohitkumar7332
    @knmohitkumar7332 Год назад

    Can we train a hindi language model in it.

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      It's possible, but you need a custom tokenizer. Check out the issues area on the gitecker page or go check out nanonomad on youtube.

    • @knmohitkumar7332
      @knmohitkumar7332 Год назад

      @@Jarods_Journey thanks mate. I'll try.

  • @ParkerVVII
    @ParkerVVII Год назад

    Better then eleven labs ? How ?

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      If you're looking for matching prosody, Tortoise TTS does a much better job. If you're looking for clarity, well Eleven Labs get the job done. I've spent more than 50k character tokens in Eleven labs just to get a voice sounding "right" so that is quite expensive tbh.

  • @nielsieboy19
    @nielsieboy19 Год назад

    I use tortoise a lot and your voices sound really bad, why not turn up the iterations? You will get a much better audio quality that way

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Bad :o, in what way?! Where is this sorcery for better settings 😂?
      No amount of iterations is gonna save my 2 minutes of training data 😂, in my testing, iterations above 100 don't add much to the voices. On a case of 10 samples each, 32 & 64 sounded better than 200 and 512. As well, the higher I go, the more prosody I seem to lose
      This seems to be more a dataset issue than settings imo. Crisp data samples are better than noisy samples, more is better, etc etc. For my use case processing for an additional 4-5 seconds is too long as I need to get it as close to realtime as possible

  • @mrlunatic2022
    @mrlunatic2022 Год назад

    Noice

  • @fjccommish
    @fjccommish 8 месяцев назад

    You have audio content, but you pollute it with bad background music. Awful.

  • @jasongray6698
    @jasongray6698 29 дней назад

    Free?

  • @Mart-E12
    @Mart-E12 Год назад

    Clean audio huh? I'm here trying to work with Shodan lol

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Lol, cleaner the better xD. If not, it'll mimic the "noise" or reverb.

    • @Mart-E12
      @Mart-E12 Год назад

      I'm just curious if it will preserve the glitches and noise in her voice

  • @ayaanm0min
    @ayaanm0min Год назад

    Is it completely free?

  • @korbpw
    @korbpw Год назад +1

    Better than EL? are you joking me?
    If you are talking about intonation, sure, but Tortoise sounds like shat

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Prosody wise, Tortoise is better by a long shot. EL misses out a lot there. Clarity wise? EL is much higher than Tortoise TTS. My bank account wise? I think Tortoise takes the cake 😂.

    • @korbpw
      @korbpw Год назад

      @@Jarods_Journey EL can pass as a real voice, Tortoise cannot.

    • @realtyrone
      @realtyrone Год назад

      ​@@korbpwBuddy, EL is BASED off of Tortoise. They literally forked the project and made their own model for it... Not going to go into the technicalities, but the ONLY reason Tortoise sounds "bad", is because the author of the project willingly tuned it down to avoid misuse thereof. Want to learn more? Use Google... It's not that hard.

    • @korbpw
      @korbpw Год назад

      @@realtyrone Bruh, let me make my own project worse.

    • @DoubleBob
      @DoubleBob Год назад

      @@korbpw Well, he did. Maybe he was paid by EL.

  • @linuxtuxvolds5917
    @linuxtuxvolds5917 Год назад

    Tortoise is terrible, why would you actually recommend this

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Free and quality is just fine ruclips.net/user/shortsKkm8ICjFG-k

    • @linuxtuxvolds5917
      @linuxtuxvolds5917 Год назад

      @@Jarods_Journey I've yet to see anything beat eleven labs, they are the best on the market right now until their source code gets leaked