F5-TTS how to train a new language best open source text to speech

Поделиться
HTML-код
  • Опубликовано: 31 янв 2025

Комментарии • 61

  • @petrkolacek8958
    @petrkolacek8958 2 месяца назад

    Thank you. Your video helped me a lot. Before I tried train language from scratch and I was not successful. So Ill try your guide.

  • @Agesilas2
    @Agesilas2 13 дней назад +1

    19:39 ça a été tellement fluide... 😂

  • @NineSevenPictures
    @NineSevenPictures 2 месяца назад +3

    Bonjour. Merci pour cette vidéo très instructive, sans oublier cet accent bien de chez nous. ;-)

    • @RaspiAudio
      @RaspiAudio  2 месяца назад +3

      Link updated. In the last version of F5tts in the web interface select "custom" and enter theses path:
      MODEL_CKPT: hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/model_last_reduced.pt
      VOCAB_FILE: hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/vocab.txt

    • @NineSevenPictures
      @NineSevenPictures 2 месяца назад

      @@RaspiAudio Merci beacoup.

  • @tomschelsen-qb5ls
    @tomschelsen-qb5ls Месяц назад

    Epoch -> "ipok" en bon anglais, pour la prochaine 😉 Thanks for the tutorial 👍

  • @Sockeye404
    @Sockeye404 Месяц назад +2

    You can reference any custom model in the Gradio UI, by clicking on the "custom" radio button and insert the path to the "model:" text-input field. (It looks like a dropdown, but actually is a text-field). No need to manipulate the path within the code. ;-)

    • @shailendrarathore445
      @shailendrarathore445 14 дней назад

      I need help how to train a song of any language and get out according to same emotion pitch and aura of that song in new lyrics with the same way how to do please let me know..

  • @AndrasEliassen
    @AndrasEliassen 2 месяца назад

    Thank you for this video - very informative! I laughed so hard at the mistake: "stupid female voice" 🤣but I think it's probably safe from the "Internet police" 🚔
    I will use your tutorial to see if I can train a new language with this tech 👍

  • @FoXMaSteR001
    @FoXMaSteR001 17 дней назад +1

    Hello, I'm using the regular version but I don't understand how to save my progress, I uploaded many emotions but if I close the window or the PC everything is lost, and I must do it again, how to save ? We can talk in french. Thanks.

  • @Cocina_animal
    @Cocina_animal 2 месяца назад

    Great video and great explanation! I hope you do more tutorials like these in the future :) Would you say F5 is the best Open Source TTS in the market?

    • @RaspiAudio
      @RaspiAudio  2 месяца назад +1

      I think so as it is a bit more flexible than xtts to add different tones, btw I'm not associated with F5tts team just a random guy trying to fin a good TTS

  • @TheMame82
    @TheMame82 2 месяца назад

    Thank you for this work. Seems your result is more close to zero shot voice cloning, than the one Jarod trained in his video tutorial (he used ~10 hours single speaker). Just to get it right, the 80k samples you used where all from the same reader (single speaker)?
    This would mean:
    1) few hours, single speaker --> model speaks new language, but only for reference speaker from training data
    2) many hours, single speaker --> model generalizes new language (zero shot capability)
    3) many hours, multi speaker, multi language (as for base model) -> proper voice cloning, code switching within single text

    • @RaspiAudio
      @RaspiAudio  2 месяца назад +3

      @@TheMame82 it's hard to make conclusion at that point as there is not enough data. After training with one speaker for 80k for a consistent learning I'm fine-tuning with 90k samples of multiple speakers hoping that it will help with zero shot flexibility, I will publish results.

  • @Burka_Tech6330
    @Burka_Tech6330 2 месяца назад

    I like your video thank you.

  • @EdTimTVLive
    @EdTimTVLive Месяц назад

    Merci beaucoup. Ça marche.

  • @naveennoelj
    @naveennoelj 2 месяца назад

    Good video, Thanks for the contribution. One quick qs: This is used when you want to add a new language but suppose u want to use it for voice cloning, how will it work?

  • @gabdofuturo
    @gabdofuturo 4 дня назад

    Hm, sounds nice.

  • @Universeal13
    @Universeal13 27 дней назад

    well for me it doesn't work. It takes like 5 seconds and it says training complete from an 1h audio file and it doesn't create anything. I tried many time, reinstall, same thing. I have no idea anymore.

  • @Deewayne94
    @Deewayne94 22 дня назад +1

    Comment jenpeux installer ça tu peux nous faire une vidéo sur ça stp

  • @simin-f6t
    @simin-f6t 24 дня назад

    Hello, thank you very much for sharing. I have a problem now. F5TTS cannot pronounce abbreviations such as "AIGC", "GPU", etc. in English. I would like to fine tune this phenomenon. Do you have any suggestions for data collection?

    • @olivier9529
      @olivier9529 24 дня назад

      try G P U or A I G C (with spaces)

  • @sopriojang
    @sopriojang Месяц назад

    can u help me, device/cuda (empty) error when use test model 19:00

  • @gabdofuturo
    @gabdofuturo 4 дня назад

    Do we need a specific tokenizer for our language? I was collecting audio samples and then I tried to train Bark for portuguese, it didn't work because I needed a portuguese tokenizer.

    • @pastuh
      @pastuh День назад

      I think need for every language..

  • @Pacifier1222
    @Pacifier1222 2 месяца назад +1

    Salut! Je suis en train de faire un training français avec le corpus Mozilla de 800k fichiers. J'ai 20 epoch sur 40 d'effectué. Je t'en donnerai des nouvelles.
    Par contre, F5-TTS contient certains bogues. J'ai dû créer des dossiers comme "french" quand j'avais déja french_char de créé.

    • @Pacifier1222
      @Pacifier1222 2 месяца назад

      J'ai aussi un sample de 8k de fichiers en quebecois pour être plus régional!

    • @RaspiAudio
      @RaspiAudio  2 месяца назад

      @@Pacifier1222 ça serait vraiment cool si vous pouvez entraîner sur la base de mon checkpoint de cette manière on pourrait conjuguer les efforts plutôt que repartir de zéro à chaque fois

    • @Pacifier1222
      @Pacifier1222 2 месяца назад

      @@RaspiAudio En fait, j'avais déja 20 epoch de fait au final. J'ai décidé d'en refaire 20 autres. je trouvais qu'il y avait une tonalité sur certains mots incorrectes.
      J'ai déja 1 semaines de fait dessus, alors c'est sûr que je ne voudrais pas trop recommencer.

    • @RaspiAudio
      @RaspiAudio  2 месяца назад

      What hardware are you using?

    • @Pacifier1222
      @Pacifier1222 2 месяца назад

      @@RaspiAudio Nvidia 3090, AMD 5950x et 64GB de ram

  • @EdTimTVLive
    @EdTimTVLive Месяц назад

    Is the quality better if a larger pt model file is used instead of last_reduced? Both seem to work, I am just wondering.

    • @RaspiAudio
      @RaspiAudio  Месяц назад +1

      same quality, but only the larger allows to continue training on it.

    • @EdTimTVLive
      @EdTimTVLive Месяц назад

      @ Got it. The quality seems fine, I've tried several voices. Thank you

  • @tuannv9119
    @tuannv9119 24 дня назад

    *This software has an error*
    *1. If there is no end of the sentence, it will read seamlessly.*
    *2. For example, there are many numbers, it will be faulty when reading, for example 23 will read is 2-3 rather than read twenty 3.
    *I deleted it*

  • @lullu3467
    @lullu3467 2 месяца назад

    Bonjour, j'aimerais bien entrainer le modèle sur un dataset très très large (librispeech, qui fait plus de 100GO), comment pourrais-je faire ça sur le cloud ? Je pense que le streaming est compliqué j'ai rien rien compris au code original de l'entrainement...

    • @RaspiAudio
      @RaspiAudio  2 месяца назад +1

      Si vos fichiers sons sont déjà retranscris en texte il suffit de les mettre dans le bon format, autrement faire un Whisper
      Je pensais faire une vidéo pour faire ça dans le cloud, mais pour entraîner sur 100go ça coûtera très cher!

    • @lullu3467
      @lullu3467 2 месяца назад

      @@RaspiAudio J'aimerais bien financer cela, seriez vous prêts à entrainer un modèle multilingual et multispeaker (avec language token, j'ai remarqué que le modèle avait du mal avec le cross lingual...)
      Avez vous un contact ?

    • @RaspiAudio
      @RaspiAudio  2 месяца назад

      @@lullu3467 oui vous pouvez utiliser info@raspiaudio.com

  • @muhammadshawon54174
    @muhammadshawon54174 Месяц назад

    can i use or add bengali langauge to train this

  • @321123580
    @321123580 2 месяца назад

    What are computer characteristics required to train model?

    • @RaspiAudio
      @RaspiAudio  2 месяца назад +3

      I'm using an rtx 4090, but I would like to make a google collab so anyone could train in the cloud on a pay per use base

    • @321123580
      @321123580 2 месяца назад +1

      @RaspiAudio OK thanks

  • @cyberbol
    @cyberbol 2 месяца назад

    How long I need record my voice ? How you think ? Minimum training data ?

    • @RaspiAudio
      @RaspiAudio  2 месяца назад +1

      @@cyberbol the reference recording (the voice to clone) could be very short like 10s.
      But if you need to train a new language you will need I think at least 20 hours of audio.

    • @cyberbol
      @cyberbol 2 месяца назад

      @@RaspiAudio Ohh. Yes I wish train, Thank you. The problem with a clone is that it not working for other like EN and Chinese. I want use Polish so I don't have a option , need do model I think

  • @jonathanoostenbrink6783
    @jonathanoostenbrink6783 2 месяца назад

    I get in my info: transcribe complete samples : 0
    path : C:\F5-TTS\F5-TTS\src\f5_tts\..\..\data\my_speak_char\wavs
    error files : 5

  • @MsBowner
    @MsBowner 26 дней назад +1

    Baguette

  • @normioffi
    @normioffi 2 месяца назад

    Français originel?

  • @SyamsQbattar
    @SyamsQbattar 2 месяца назад

    Unfortunately, it does not support Indonesian language.

    • @RaspiAudio
      @RaspiAudio  2 месяца назад +1

      Find large audio books or audio file of minimum 10h in your language and train it

  • @mulagraphics
    @mulagraphics 2 месяца назад

    Don't waste your time F5-TTS is horrible I'm sorry

    • @RaspiAudio
      @RaspiAudio  2 месяца назад +3

      @@mulagraphics it's not, what else do you recommend?

    • @bomar920
      @bomar920 2 месяца назад

      Actually I trained new language under 2 hours data . It’s very good 👍. I don’t know which script could do that

    • @christopherandrew1720
      @christopherandrew1720 2 месяца назад

      @@bomar920 which language do you use? is it 1 speaker/multi?

    • @KUSHAGRAKUMAR-e2k
      @KUSHAGRAKUMAR-e2k 29 дней назад

      @@bomar920 its taking too much time

    • @fabiano8888
      @fabiano8888 28 дней назад

      Since he didn't propose any better alternative, I'd consider he's just a troll trying to attract viewers to his channel by being controversial. Thanks for the video RaspiAudio.