[Part 2] Voice Deepfake with Tacotron 2 for beginners tutorial

Поделиться
HTML-код
  • Опубликовано: 24 дек 2024

Комментарии • 791

  • @TheLastSitcom
    @TheLastSitcom 2 года назад +70

    A lot of people are pointing out the Tensorflow problem. Apparently, Colab no longer supports the version of Tensorflow on which Tacotron2 runs. After hours of searching, I found a solution in a message board somewhere. In "Download Tacotron," replace "%tensorflow_version 1.x" with "!pip install tensorflow==1.15." That got everything to run smoothly, and I was able to train. In the Synthesis notebook, there's another line that reads "%tensorflow_version 1.x," so I made the same replacement and got it running fine. Hope this helps, folks.

    • @GiusePooP
      @GiusePooP 2 года назад +6

      I tried and it worked but for some reason i also having issue with a " UnpicklingError: invalid load key, '

    • @TheLastSitcom
      @TheLastSitcom 2 года назад +1

      @@GiusePooP I'm definitely not an expert here, but did you make sure to convert wav to npy? I've gotten unpacking errors a lots of times before, and can usually find a solution on a message board somewhere

    • @coldcase666
      @coldcase666 2 года назад

      THX

    • @coldcase666
      @coldcase666 2 года назад +1

      @@GiusePooP Me too

    • @agreatchannelguys3263
      @agreatchannelguys3263 2 года назад +6

      I just replaced the one "1" in "%tensorflow version 1.x" with "2"and then runned it and it worked. Though for the boring code part, I've ran into errors that won't let the code run correctly and I don't know how to solve it so maybe replacing 1 with 2 is a bad idea.

  • @ethan.aproductions5159
    @ethan.aproductions5159 3 года назад +49

    This is the future of YTP sentence mixing.

    • @DirtyPotter
      @DirtyPotter 3 года назад +2

      I'm gonna use it and put myself out of a job

    • @DirtyPotter
      @DirtyPotter 3 года назад +4

      @8 haha yeah I was kidding. Nothing beats sentence mixing

    • @Moony_ultimate
      @Moony_ultimate 3 года назад +2

      _In fact, I just thought the same. It is more difficult, but, if you plan to do a 30 minute ytp, but you cannot cut it, training an AI would not be a bad idea. Actually, I plan to use it to create videos like that, in fact, I don't do "ytp" if not "ytph" "RUclips Poop Hispano" basically, ytp but in Spanish. So, I plan to do a ytph of the president of Mexico, and training an AI is not a bad idea, it would save time._

    • @zenosyeetgalvus
      @zenosyeetgalvus 3 года назад +1

      shitposting 2077

  • @PhilXavierSierraJones
    @PhilXavierSierraJones 3 года назад +32

    I had an interesting result. I fed the entire dialog of Reinhardt from Overwatch into it (with the breath noises and other laughter removed), but the neural network really latched onto the remaining shouting parts. So the entire result is him just making various shouting noises. I thought it was funny, but it didn't quite work in the end.
    It was at Epoch 250, and I think the shouting noises are actually the vowel "ai". How fitting!

    • @PhilXavierSierraJones
      @PhilXavierSierraJones 3 года назад +1

      The graph seems to go between "line goes limp after few pixels" and "Kind of linear" and the loss fluctuates between 0.068635 and 0.069028. What is going on and how do I tune this to move forward?

    • @sravanidandu794
      @sravanidandu794 3 года назад

      when I run check_dataset(params) I get this error FileNotFoundError: [Errno 2] No such file or directory: 'wavs/1.npy'. I uploaded files in wav format. it is renamed to npy.Help!!!!

    • @vidyasagar8715
      @vidyasagar8715 3 года назад

      @@sravanidandu794 convert the text file txt to npy

    • @7thday321
      @7thday321 3 года назад

      I really need help here. Can u walk me through installation? HIs video does not look like what is on the training manuel now. I understand audaccity n all that. Been using it for years. I Just cant figure out this program.

  • @jwknight
    @jwknight 3 года назад +15

    For anyone searching or wondering how long of time or hours wise this takes it's around an hour but you should train it longer. Like 2 or three hours instead for better quality. That applies though to 30 audio files with longer transcripts. it is wrong to assume that the number of interations, epoches, or the validation loss is more important until having a good solid diagonal incline on the graph. Don't stop running this until the graph is complete or near to it. Also, you want to train it as close to zero as possible. But if you overtrain it then it will start overfitting and drastically start raising above where you have dropped it to in validation loss. But still, if you just follow the number of interations being 60 being the end of training you still likely won't be near where it needs to be likely. Every session is different.

    • @elixstrations7147
      @elixstrations7147 3 года назад

      So you're saying to disregard the whole, "Stop training when the number gets to 0.15 or lower"? What we should be looking for instead is a diagonal graph?

    • @7thday321
      @7thday321 3 года назад

      I really need help here. Can u walk me through installation? HIs video does not look like what is on the training manuel now. I understand audaccity n all that. Been using it for years. I Just cant figure out this program.

    • @Nekotico
      @Nekotico 2 года назад

      how can u get the training save if its on a collab notebook? where does it get save? on drive?

    • @tim3780
      @tim3780 2 года назад

      Can anybody help me with this error?
      FP16 Run: False
      Dynamic Loss Scaling: True
      Distributed Run: False
      cuDNN Enabled: True
      cuDNN Benchmark: False
      % Total % Received % Xferd Average Speed Time Time Time Current
      Dload Upload Total Spent Left Speed
      100 1555 100 1555 0 0 101k 0 --:--:-- --:--:-- --:--:-- 108k
      Warm starting model from checkpoint 'pretrained_model'
      ---------------------------------------------------------------------------
      UnpicklingError Traceback (most recent call last)
      in ()
      5 print('cuDNN Benchmark:', hparams.cudnn_benchmark)
      6 train(output_directory, log_directory, checkpoint_path,
      ----> 7 warm_start, n_gpus, rank, group_name, hparams, log_directory2)
      3 frames
      in train(output_directory, log_directory, checkpoint_path, warm_start, n_gpus, rank, group_name, hparams, log_directory2)
      275 os.path.isfile("pretrained_model")
      276 download_from_google_drive("1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA","pretrained_model")
      --> 277 model = warm_start_model("pretrained_model", model, hparams.ignore_layers)
      278 # download LJSpeech pretrained model if no checkpoint already exists
      279
      in warm_start_model(checkpoint_path, model, ignore_layers)
      133 assert os.path.isfile(checkpoint_path)
      134 print("Warm starting model from checkpoint '{}'".format(checkpoint_path))
      --> 135 checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')
      136 model_dict = checkpoint_dict['state_dict']
      137 if len(ignore_layers) > 0:
      /usr/local/lib/python3.7/dist-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
      606 return torch.jit.load(opened_file)
      607 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
      --> 608 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
      609
      610
      /usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
      775 "functionality.")
      776
      --> 777 magic_number = pickle_module.load(f, **pickle_load_args)
      778 if magic_number != MAGIC_NUMBER:
      779 raise RuntimeError("Invalid magic number; corrupt file?")
      UnpicklingError: invalid load key, '

  • @RespectTheJeff
    @RespectTheJeff 2 года назад +4

    I get "ValueError: Tensorflow 1 is unsupported in Colab." any ideas on how to get around this? I'm assuming this tutorial is just out of date, but not sure how to get past this step now.

    • @omerozer7000
      @omerozer7000 2 года назад

      I get the same error too, could you find a way to solve this?

    • @muffinman2134
      @muffinman2134 2 года назад +2

      @@omerozer7000 You just need to change the first line of the code that says 'tensorflow_version 1.x' to 'tensorflow_version 2.x'

  • @Agnostic_Asi
    @Agnostic_Asi 2 года назад +21

    Tensorflow 1 is no longer supported in Colab. So this tutorial sadly doesn't work anymore!

    • @plejra
      @plejra 2 года назад

      Is it possible to download the code and make it locally on pc?

  • @BigDraco-So
    @BigDraco-So 3 года назад +7

    If youre Model is moaning just simply change False to True in the synthesis notebook (last cell where you put the text in)

  • @OmriSama
    @OmriSama 3 года назад +34

    This is hands down the best video I've seen on how to get started with Tacotron2 and WaveGlow for people who aren't running Linux. Did you write these Colab Notebooks? They were really clear too.

    • @RS-tz9fu
      @RS-tz9fu 2 года назад

      in the notebook, there's a link to ruclips.net/video/LQAOCXdU8p8/видео.html which is not Cherry Studios's video so probably he didn't write the Colab notebooks.

    • @tonygosling2592
      @tonygosling2592 2 года назад

      @@RS-tz9fu i've tried severally i keep getting an error at the {Check Data} cell. FileNotFoundError i wish i could post a screenshot
      what does this mean: list index out of range,

    • @RS-tz9fu
      @RS-tz9fu 2 года назад

      ​@@tonygosling2592 it's possibly due to some error in the transcript text file of your dataset. Check if there are any empty lines in the text file or a line which either contains the address to audio file or just the transcript.

  • @NowThatsAnime
    @NowThatsAnime 3 года назад +1

    Thank you man. I was looking for something like this for a year now.

  • @pansinquesopro
    @pansinquesopro Год назад +2

    hey i was using the bored button that is in 2:30 and it gives me this error: "AttributeError: module 'tensorflow' has no attribute 'contrib'" any help?

    • @pansinquesopro
      @pansinquesopro Год назад

      also in the next 2 steps, (name the model and set the parameters) it says: "name 'hparams' is not defined"

  • @TheDylandProductions
    @TheDylandProductions 3 года назад +16

    Yeah, I don't think you can get anything other than K80s anymore on colab... At least not on the free version. Supposedly (I don't have it, so can't test) even Pro+ members are getting K80s a lot too. :(
    Please help update this!!!

    • @Markiplier2
      @Markiplier2 2 года назад +1

      The synthesis notebook linked in this description now has the updated (working) one linked at the top, and as for the training notebook, it does actually still work with K80s.
      You probably realised that over the last 2 months, though.

    • @coruys2025
      @coruys2025 2 года назад

      Restart Runtime. System32/error.py

  • @eleanordare9350
    @eleanordare9350 2 года назад

    Brilliant description of how to make a model, thanks a lot

  • @TrenchMobbs
    @TrenchMobbs 2 года назад

    i got a runtime error in MEL spectrograms! any reason why?

  • @thegeek5890
    @thegeek5890 3 года назад +8

    Hello, I have a problem in colab [Errno 2] No such file or directory: 'wavs/2.npy' can you help me ?

    • @scottbarrett1073
      @scottbarrett1073 3 года назад

      I think I can help. the '.npy' file is what is created by your list.txt, so check to make sure your 2.wav is named correctly and also that your list.txt has the 2 text correctly labeled. Sorry it's been 3 months since you've had a response. I'm stuck at the missing waveglow repository :(

  • @dorjzodovsurenbatjargal224
    @dorjzodovsurenbatjargal224 3 года назад +13

    Does anyone face a problem when downloading the pre-trained waveglow model ? I think the current link can not be used anymore

    • @robertovalentino70
      @robertovalentino70 3 года назад +2

      As said before you have to download the waveglow model from nvidia page, put it in your drive, copy the link of the model (as made for your trained model so made it accessible at anyone have the link) and paste the part of the link in the colab cell.

    • @JaxonPham
      @JaxonPham 3 года назад +2

      @@robertovalentino70 I tried downloading the waveglow model from the NVIDIA page and it solved the problem but now in the last # load waveglow part, im getting:
      waveglow = torch.load(waveglow_pretrained_model)['model']
      waveglow.cuda()eval().half()
      for k in waveglow.convinv:
      k.float()
      KeyError: "model'

    • @trickster444
      @trickster444 3 года назад +1

      @@JaxonPham Hey, did you somehow solve this?

    • @JaxonPham
      @JaxonPham 3 года назад +4

      @@trickster444 try using 1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF as your pasted waveglow code

  • @jaogregori
    @jaogregori 2 года назад +2

    HELLLOO ARE YOU HERE???!?!?!? @Cherry Studios we need help. Why the program can't find "Load WaveGlow"; how can we fix this? When I click on the "Permission denied:" link. my browser opens a new tab which states "Not Found, Error 404"

  • @Jayk3M
    @Jayk3M 3 года назад +9

    NICE EXPLANATION, does it work with other languages?

    • @MrGTAmodsgerman
      @MrGTAmodsgerman 3 года назад +1

      I tried making a american voice speak a german text. Which sound the same as a real american would try to speak a german text. I guess it does.

    • @Unknown-rx3br
      @Unknown-rx3br 3 года назад

      @@MrGTAmodsgerman Is possible with a Russian too?

    • @MrGTAmodsgerman
      @MrGTAmodsgerman 3 года назад

      @@Unknown-rx3br I never said its possible. I said, i guess so.

    • @kerokerr8138
      @kerokerr8138 3 года назад

      @@Unknown-rx3br yes, i've seen a video about it

  • @cwdoby
    @cwdoby 3 года назад +23

    Hey Cherry looks like the waveglow pretrained model file being linked from drive is no longer there. Would you perhaps have an update or know where we could look? Thanks for all your help my man

    • @TechnologyGuyOfficial
      @TechnologyGuyOfficial 3 года назад +16

      1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF
      Use this as an alternative.

    • @hardwareclinic2650
      @hardwareclinic2650 3 года назад

      @@TechnologyGuyOfficial works perfect, I cant download the Audio at all. the three dotted line doesn't show

    • @hardwareclinic2650
      @hardwareclinic2650 3 года назад

      @@TechnologyGuyOfficial hey brother, isn't this quite smaller than the file linked to the original link

    • @MemeGuy6900
      @MemeGuy6900 3 года назад

      @@TechnologyGuyOfficial Thank you very much.

    • @nirvanarestored4396
      @nirvanarestored4396 3 года назад +2

      When I play the audio I don’t hear anything. I have tired different models but I don’t hear anything:(. Even when using the new wave glow link.

  • @larrykoopa64dshacker64
    @larrykoopa64dshacker64 3 года назад +10

    It's telling me permission is denied on WaveGlow stuff, and when I click the link it says it is not found...

  • @TheDanksNewGroove
    @TheDanksNewGroove 3 года назад +5

    Came for the tutorial, stayed for your awesome voice.

  • @LunarFlareStudios
    @LunarFlareStudios 2 года назад +2

    On the last step of the training notebook it won't even begin, it's giving me an Unpickling Error.

  • @XiuGeorge
    @XiuGeorge 3 года назад +5

    Hello sir, when I execute Select Tacotron model in Tacotron Synthesis Notebook.ipynb, I cannot download waveglow Model (Tacotron2 Model has been downloaded successfully). Is there a solution?

  • @ajitkumar15
    @ajitkumar15 2 года назад

    This is the best video with code walkthrough on Tacotron2 , I have seen till date in RUclips. Thank you so much.
    One query I have , which all languages it can be trained on

  • @Crazystunts4190
    @Crazystunts4190 3 года назад +9

    Is there a way to choose a person who’s voice your trying to emulate, then change the tone of which their saying it?
    Like is it possible to sing a song using your own voice, then run this process, and somehow get the person to sound like they are singing your song instead. Basically replace your vocals and the way you sing it with someone else.
    Thanks :) great video

    • @skedyt
      @skedyt 3 года назад

      Yes with talknet

  • @0TheDarkness0
    @0TheDarkness0 3 года назад +7

    I am getting a "Permission denied" error when trying to download the Waveglow model.

    • @0TheDarkness0
      @0TheDarkness0 3 года назад +4

      Solved by downloading the latest Waveglow model from NVidia and putting it in my Drive, then changing the download link to the one generated in my own Drive.

    • @deepfake6821
      @deepfake6821 3 года назад

      @@0TheDarkness0 Thank you for confirming my earlier Post.

    • @nicoperez8720
      @nicoperez8720 3 года назад

      @@0TheDarkness0 How do you do it?

    • @0TheDarkness0
      @0TheDarkness0 3 года назад +4

      @@nicoperez8720 go to the NVIDIA models catalog and look for the "Waveglow for PyTorch" pretrained weights (ngc.nvidia.com/catalog/models?orderBy=modifiedDESC&pageNumber=0&query=%20label%3A%22Speech%20Synthesis%22&quickFilter=models&filters=)
      Next, unzip the model, put it in your Google Drive and rename it "waveglow.pt".
      From there, copy the share link and make it publicly accessible, just like you do with the Tacotron model. Put the link in the Waveglow download line of code so that it will point to your copy of Waveglow instead of the original one, and you should be good to go.

    • @nicoperez8720
      @nicoperez8720 3 года назад

      @@0TheDarkness0 tysm

  • @ashtondelarosa7290
    @ashtondelarosa7290 2 года назад +2

    I keep running into a unpickling error with a invalid load key of "

    • @ashtondelarosa7290
      @ashtondelarosa7290 2 года назад +1

      I still find myself running into the same errors as a day before. Any suggestions?
      FP16 Run: False
      Dynamic Loss Scaling: True
      Distributed Run: False
      cuDNN Enabled: True
      cuDNN Benchmark: False
      % Total % Received % Xferd Average Speed Time Time Time Current
      Dload Upload Total Spent Left Speed
      100 1555 100 1555 0 0 86388 0 --:--:-- --:--:-- --:--:-- 86388
      Warm starting model from checkpoint 'pretrained_model'
      ---------------------------------------------------------------------------
      UnpicklingError Traceback (most recent call last)
      in ()
      5 print('cuDNN Benchmark:', hparams.cudnn_benchmark)
      6 train(output_directory, log_directory, checkpoint_path,
      ----> 7 warm_start, n_gpus, rank, group_name, hparams, log_directory2)
      3 frames
      /usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
      918 "functionality.")
      919
      --> 920 magic_number = pickle_module.load(f, **pickle_load_args)
      921 if magic_number != MAGIC_NUMBER:
      922 raise RuntimeError("Invalid magic number; corrupt file?")
      UnpicklingError: invalid load key, '

    • @ruben5440
      @ruben5440 2 года назад +1

      Running the same problem here. Have you solved it? Thanks in advance

    • @ashtondelarosa7290
      @ashtondelarosa7290 2 года назад

      @@ruben5440 Sorry, despite making multiple attempts, I haven't been able to.

    • @lucastamatescu6090
      @lucastamatescu6090 2 года назад +1

      Hey, this is an issue with downloading the file from the Google Drive link. There is some warning message to alert you whether you're sure you want to download it. I just manually downloaded the file from the link, and then loaded it. The download link was drive.google.com/uc?export=download&confirm={confirm_text}&id=1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA and then you need to manually rename it to "pretrained_model" (no file format) then upload it to the tacotron2 folder

  • @mp_reviews
    @mp_reviews 3 года назад +6

    Hi, great tutorial so far! Thanks for sharing your know-how. Once in the Synthesis Notebook tab, the program can't find "Load WaveGlow"; how can I fix this? When I click on the "Permission denied:" link. my browser opens a new tab which states "Not Found, Error 404".

    • @beratguray3338
      @beratguray3338 2 года назад

      bro. have you got new link for waveglow

    • @iliatugushi
      @iliatugushi 2 года назад

      Did you get new link ?

    • @coruys2025
      @coruys2025 2 года назад

      @@iliatugushi Waveglow is hard

  • @charlescostain8066
    @charlescostain8066 3 года назад +1

    What is the format for best results? Short lines in the list (and many small audio clips) or long ones (fewer audio clips)? If I know that I want my character to be able to say specific things very well, Is there any benefit to putting that word / phrase alone on it's own line? I am guessing the accuracy goes down, the longer the line is. Say I am duping a sports broadcaster. I want him to be able to say the names of the sports / teams very well; if nothing else.
    I have 100 samples so far. I can only run 12 : 1000 epoches. I OOM @ 15:500. Sorry for so many questions. My last one for now is about the syntax of the punctuation. Is it better to comma directly after a word, or with a space in between. ( Comma, vs Comma , ) I am curious if the algorithm treats words the same if it has punctuation attached.

  • @clunkster
    @clunkster 3 года назад +11

    I keep on getting K80, even though I reset the factory runtime a lot of times. Do you have any tips to help me?

    • @monkadude15
      @monkadude15 3 года назад

      I am too. Did you ever figure it out?

    • @clunkster
      @clunkster 3 года назад

      @@monkadude15 Nope.

    • @monkadude15
      @monkadude15 3 года назад +1

      @@clunkster danget. I guess you didn’t find a workaround lol

    • @nimishbansal4752
      @nimishbansal4752 3 года назад

      @CookieBoy were you able to solve it, coz without tesla t4 no sound is being generated?

    • @clunkster
      @clunkster 3 года назад

      @@nimishbansal4752 No.

  • @geejota
    @geejota 2 года назад +4

    Hey! Awesome tutorial! Very clear.
    I've got some error, and I'm really confused. On run the Start Training Part, on the line "train(output directory.....") it stops. It shows me this error UnpicklingError: invalid load key, '

    • @lufunotshivhase7810
      @lufunotshivhase7810 2 года назад

      Hi! Met too, have you gotten a solution yet?

    • @justenhodge
      @justenhodge 2 года назад

      Same, same error as well

    • @aliali-sj9jp
      @aliali-sj9jp 2 года назад

      hi me too do you solve it yet ???

    • @tardigrade184
      @tardigrade184 2 года назад

      For those getting picke error UnpicklingError: invalid load key, '

    • @kripc
      @kripc 2 года назад

      @@tardigrade184 What you upload the file as? pretrained_model.pt or without the filename?

  • @angelpichu1
    @angelpichu1 3 года назад +10

    Question: I've made about four different models already. When you say "You have to wait until .15 or lower", I've been waiting until .144-ish. Is there a benefit of it being much lower than that such as .10 or .08? Can it get that low?

    • @elixstrations7147
      @elixstrations7147 3 года назад

      Have you tested this yourself because I'm wondering that too.

    • @angelpichu1
      @angelpichu1 3 года назад

      @@elixstrations7147 It actually works way better if you do less than .15. I usually stop around .08 because at that point, whe I was still doing them, my internet would cut out around then and I'd lose my progress. But yes. it works a lot better.

    • @MrTony2371
      @MrTony2371 3 года назад +3

      Since these values are representation of Neural Network error, lower value means better output result, because you always want to keep any error on it's minimum. I've created a lot of different NN archuitectures and I personally consider values lower than 0.1 as an acceptable result. But bear in mind, that these error values are not in a universal scale. Some NN archuitectures and it's optimizers can give you good results with values like 0.95.

    • @angelpichu1
      @angelpichu1 3 года назад

      @@MrTony2371 I'll have to try and see what kind of limits I can put on Tacotron next time I use it.

    • @coruys2025
      @coruys2025 2 года назад +2

      I'm in a HUGE challenge with UnpicklingError "

  • @ilhanmertalan640
    @ilhanmertalan640 3 года назад +6

    any advice to do it with a different language? f.e. in turkish there are letters like ö/ü/ı thats differ from english. Do we also deep into phonems? thanks btw for great tutorial.

    • @lobato87
      @lobato87 3 года назад

      at the last step, there is a parameter named 'english_cleaners', I changed this to... uhm, I don't remember, gonna have to look it up and come back to you later; but it worked for spanish.

    • @ilhanmertalan640
      @ilhanmertalan640 3 года назад +1

      @@lobato87 thankss i was find it out. Tried in turkish also works. translateration_cleaners.

    • @lobato87
      @lobato87 3 года назад +1

      @@ilhanmertalan640 exactly! transliteration_cleaners will interpret UTF-8 characters. Cheers!

    • @sabah8312
      @sabah8312 3 года назад +2

      Also change the symbols.py with your language alphabets

    • @kitkorn4782
      @kitkorn4782 3 года назад

      @@lobato87hi I wanted to try with Thai language. What do I need to change to ?

  • @ThalesPo
    @ThalesPo 2 года назад +1

    Does the training run on your machine or the server?

  • @AilurusFungens
    @AilurusFungens 2 года назад +11

    Great tutorial, thank you! I get an error when running the training model: UnpicklingError: invalid load key, '

  • @AndrejusDovidaitis
    @AndrejusDovidaitis 3 года назад +3

    It gives me this error
    ValueError: Caught ValueError in DataLoader worker process 0.
    Original Traceback (most recent call last):
    File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
    File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
    File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in
    data = [self.dataset[idx] for idx in possibly_batched_index]
    File "/content/tacotron2/data_utils.py", line 61, in __getitem__
    return self.get_mel_text_pair(self.audiopaths_and_text[index])
    File "/content/tacotron2/data_utils.py", line 34, in get_mel_text_pair
    mel = self.get_mel(audiopath)
    File "/content/tacotron2/data_utils.py", line 49, in get_mel
    melspec = torch.from_numpy(np.load(filename))
    File "/usr/local/lib/python3.7/dist-packages/numpy/lib/npyio.py", line 444, in load
    raise ValueError("Cannot load file containing pickled data "
    ValueError: Cannot load file containing pickled data when allow_pickle=False

  • @thegreatawakening3601
    @thegreatawakening3601 3 года назад +2

    Bump. Does it work with other languages? The Algo is the same. Just need info on models for different languages.

  • @Mrree1078
    @Mrree1078 3 года назад +3

    i have a pretrained model which i downloaded but the voice comes out like hes trying to gasp for breath and noticed the right image shows up more as lines instead of a diagonal shape

  • @JaxonPham
    @JaxonPham 3 года назад +3

    help! I've tried downloading the waveglow model from the NVIDIA catolog since the one listed in the code says I have permission denied. but in the # load waveglow part, im getting:
    waveglow = torch.load(waveglow_pretrained_model)['model']
    waveglow.cuda()eval().half()
    for k in waveglow.convinv:
    k.float()
    KeyError: "model'

    • @nicoperez8720
      @nicoperez8720 3 года назад +1

      I've been getting the same exact thing

    • @Byoncnc
      @Byoncnc 3 года назад

      Same here, someone pls HELP

    • @Byoncnc
      @Byoncnc 3 года назад +1

      "1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF"
      use this for waveglow same as another model

    • @nicoperez8720
      @nicoperez8720 3 года назад

      @@Byoncnc HOLY CRAP TYSM

    • @JaxonPham
      @JaxonPham 3 года назад

      I downloaded this file for the waveglow and it worked. my results sound terrible though so i gotta go back to training it i think. drive.google.com/u/0/uc?id=1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF&export=download

  • @franz1242
    @franz1242 2 года назад +1

    Question: does it work with other languages too? Or does only with English?

  • @nathanbecks5081
    @nathanbecks5081 3 года назад +3

    When I get to the synthesis part, I get this when I load the Tacotron 2 model, even when I reset the factory runtime a few times, it still occurs:
    ---------------------------------------------------------------------------
    RuntimeError Traceback (most recent call last)
    in ()
    4 hparams.gate_threshold = 0.1 # Model must be 90% sure the clip is over before ending generation (the higher this number is, the more likely that the AI will keep generating until it reaches the Max Decoder Steps)
    5 model = Tacotron2(hparams)
    ----> 6 model.load_state_dict(torch.load(tacotron2_pretrained_model)['state_dict'])
    7 _ = model.cuda().eval().half()
    1 frames
    /usr/local/lib/python3.7/dist-packages/torch/serialization.py in __init__(self, name_or_buffer)
    240 class _open_zipfile_reader(_opener):
    241 def __init__(self, name_or_buffer) -> None:
    --> 242 super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
    243
    244
    RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory

  • @JonathanSantosDeveloper
    @JonathanSantosDeveloper 2 года назад +1

    Hi! THank you for your explations!
    I'm facing a problem during trying to generate the mels:
    RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
    Did you've been experimenting this problem too? Thank you.

    • @zelfacel1563
      @zelfacel1563 2 года назад

      That sounds like something may have been deleted from your file system somehow, I'd reset and try again.

  • @progspro12
    @progspro12 2 года назад +1

    the training is giving me an upickling error

  • @KrabbyPatty99Archive
    @KrabbyPatty99Archive 2 года назад +3

    UnpicklingError: invalid load key, '

  • @RellekEarth
    @RellekEarth 3 года назад +4

    I can't get anything other than K80 and I don't know how to change it. Even after the factory reset suggestion, the GPU is exactly the same. Would really suck if that's the only hurdle and it makes this unusable for me... Does anyone have any tips?

    • @itsranchy6846
      @itsranchy6846 3 года назад

      i have same problem

    • @SootyWill
      @SootyWill 3 года назад +1

      Same problem here too. It was working a couple months ago... can anyone help us out?

  • @VET54
    @VET54 2 года назад +3

    Every time i get to the training part im getting a error? im doing everything the same as you.
    everything else passes when i click on it let it do its operation but when i get to the last bit of actually training model it fails straight away.
    FP16 Run: False
    Dynamic Loss Scaling: True
    Distributed Run: False
    cuDNN Enabled: True
    cuDNN Benchmark: False
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed
    100 1555 100 1555 0 0 6973 0 --:--:-- --:--:-- --:--:-- 7004
    Warm starting model from checkpoint 'pretrained_model'
    ---------------------------------------------------------------------------
    UnpicklingError Traceback (most recent call last)
    in ()
    5 print('cuDNN Benchmark:', hparams.cudnn_benchmark)
    6 train(output_directory, log_directory, checkpoint_path,
    ----> 7 warm_start, n_gpus, rank, group_name, hparams, log_directory2)
    3 frames
    /usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
    918 "functionality.")
    919
    --> 920 magic_number = pickle_module.load(f, **pickle_load_args)
    921 if magic_number != MAGIC_NUMBER:
    922 raise RuntimeError("Invalid magic number; corrupt file?")
    UnpicklingError: invalid load key, '

  • @Health-maintenance
    @Health-maintenance 3 года назад +4

    This does not work anymore, can you make another one if possible, thanks. I’m useless coding and correcting my mistakes lol, they’re no in depth tutorials on RUclips for beginners

  • @sabatmuhamad7179
    @sabatmuhamad7179 3 года назад +1

    I have an error in synthesis part: No such file or directory: 'waveglow.pt' . What should I do?

  • @maximmk3130
    @maximmk3130 3 года назад +2

    I have a question. after training, can i add more samples to old ones that they will train together?

  • @lobato87
    @lobato87 3 года назад +3

    I want to do this in Latin American Spanish. Is this notebook pretrained in English?

    • @lobato87
      @lobato87 3 года назад +2

      update: yes it can talk in other language, tested with about a minute of audio samples and it was able to talk back to me the phrases I gave to the training notebook

    • @IanPaulBrossard
      @IanPaulBrossard 3 года назад

      @@lobato87 So it's not pre-trained in english? i thought must be pre-trained with something, so it can infer the sounds of the letters you don't include in the training set.

    • @lobato87
      @lobato87 3 года назад +1

      @@IanPaulBrossard there's a parameter near the end called 'english_cleaners' that must be changed and also the symbols.py that must be changed to the language you need

    • @IanPaulBrossard
      @IanPaulBrossard 3 года назад

      @@lobato87thank you! Then i guess it would be better if I just ignore all the grammar rules and use á é í ó ú in every stressed syllable. Also, I'll try to use use a separate set of exclamation sentences, so I can choose regular speech and explamations as we please (for example, use moderl 2 for every word between ¡ and ! and use model 1 for everything else!

    • @lobato87
      @lobato87 3 года назад

      @@IanPaulBrossard those are excellent things to try!

  • @RavenclawNimbus
    @RavenclawNimbus 3 года назад +3

    I finally got my ai thing to work! I spent like 5 hours on it, and it's finally finished! Thanks for the help! ❤️
    Edit: I was thinking of doing Homestar Runner characters! XD

    • @tuvstarr5157
      @tuvstarr5157 3 года назад

      No way!!! Another homestar runner fan? And this was commented RECENTLY XD ! I was actually watching this so i could do it for Coach Z's voice!! Hahaha

    • @RavenclawNimbus
      @RavenclawNimbus 3 года назад

      @@tuvstarr5157 That’s the one I was gonna do! XD

    • @scottbarrett1073
      @scottbarrett1073 3 года назад

      I have the same problem,
      the link that needs to be 'anyone with link' is the trained model, not waveglow - so why is it asking me to change permission on waveglow? isn't that coming from a different google drive?

    • @RavenclawNimbus
      @RavenclawNimbus 3 года назад

      @@tuvstarr5157 Oh my gosh, someone made a Homestar Runner on Uberduck Ai the same day you and me made our comments..

    • @nimishbansal4752
      @nimishbansal4752 3 года назад +1

      Hey can you help with this, after doing everything i am not getting audio in the end, do you know how to resolve this? Also i am not able to use T4 gpu coz no matter how many times i do factory reset it always show k80

  • @sgra5373
    @sgra5373 2 года назад +5

    As many other comments state it, I reckon there's an error with tenserflow "Tensorflow 1 is deprecated, and support will be removed on August 1, 2022" Any idea how to update that?

    • @Agnostic_Asi
      @Agnostic_Asi 2 года назад +1

      Yepp I also got "Tensorflow 1 is unsupported in Colab." So this tutorial sadly doesn't work anymore!

    • @moanxion9102
      @moanxion9102 2 года назад

      @@Agnostic_Asi just edit the code to Tensorflow 2 as easy as that :)

    • @slinnyboi
      @slinnyboi 2 года назад

      @@moanxion9102 thank you

    • @slinnyboi
      @slinnyboi 2 года назад +2

      @@moanxion9102 I have a problem, under "A bunch of boring code and stuff" I get this error:
      AttributeError Traceback (most recent call last)
      in
      375
      376 # ---- DEFAULT PARAMETERS DEFINED HERE ----
      --> 377 hparams = create_hparams()
      378 model_filename = 'current_model'
      379 hparams.training_files = "filelists/clipper_train_filelist.txt"
      /content/tacotron2/hparams.py in create_hparams(hparams_string, verbose)
      6 """Create model hyperparameters. Parse nondefault from given string."""
      7
      ----> 8 hparams = tf.contrib.training.HParams(
      9 ################################
      10 # Experiment Parameters #
      AttributeError: module 'tensorflow' has no attribute 'contrib'

  • @Miuzi
    @Miuzi 3 года назад +2

    for some reason it gives me "FileNotFoundError: [Errno 2] No such file or directory: '/wavs/1.npy'" error on check dataset phase, any ideas?

    • @VinxGD
      @VinxGD 3 года назад

      you have to make sure that you dont have any blank lines in your txt file

    • @Miuzi
      @Miuzi 3 года назад

      @@VinxGD I don't, what's wrong then?

    • @VinxGD
      @VinxGD 3 года назад

      @@Miuzi Hm, im sorry idk what else could be wrong. I had the same problem but it worked for me.

    • @BigDraco-So
      @BigDraco-So 3 года назад

      i had the same issue and fixed it with setting hz to 22050 with audacity
      then i t worked for me

    • @Miuzi
      @Miuzi 3 года назад

      @@BigDraco-So it was 22050 so it doesnt work for me

  • @lenny_Videos
    @lenny_Videos 3 года назад +1

    Very good explanation 🤩 Do you know if there are models for sale? I mean models that are created by other people, that could be used in the Synthesis Notebook?

  • @RackerTheRascalMashup
    @RackerTheRascalMashup 3 года назад +7

    Is there no other way to get access to the good gpus?
    no matter how much i try, it keeps giving me a k80

    • @Cmanflip
      @Cmanflip 3 года назад +1

      Same

    • @trickster444
      @trickster444 3 года назад

      Dude, have you solved this problem?

    • @Cmanflip
      @Cmanflip 3 года назад

      @@trickster444 I did get k80 but I moved on and it worked out well

    • @trickster444
      @trickster444 3 года назад

      @@Cmanflip Okay, let's break some rules. Thanks.

    • @Cmanflip
      @Cmanflip 3 года назад

      @@trickster444 Yw

  • @ThatRandomEncounterGuy
    @ThatRandomEncounterGuy 2 года назад +1

    I have a question: the voices I’m using don’t have a lot of sources to draw from-I think it’s only about 3-4 minutes worth of original [usable] dialogue. Would I be able to cheat the system by just copy-pasting the amount I have until I get about 10 minutes or so? Or does it all have to be original dialogue for the program to read it?

  • @VocaPigeon
    @VocaPigeon 3 года назад +1

    Very cool!! Have you got a reclist you'd reccomend for making your own library?

  • @DogBeef
    @DogBeef 3 года назад +20

    if I have 100 wavs, what do you recon is a good batch size and epoch number? i tried 30 and 500 and many other ways and its always OOMing?

    • @DogBeef
      @DogBeef 3 года назад

      @VillaGG alright ill try again thanks

    • @DogBeef
      @DogBeef 3 года назад

      yup, still getting OOM, the training process starts but only does 1 cycle, then gives me an OOM error on the 2nd cycle, no idea why.

    • @DogBeef
      @DogBeef 3 года назад

      @VillaGG yea for sure

    • @DogBeef
      @DogBeef 3 года назад

      @VillaGG its working right now with 50 samples, so maybe the other 50 had some issues, so ill try what you recommended, thanks a lot!

    • @DogBeef
      @DogBeef 3 года назад +2

      @VillaGG "let it train for 10 hours", so if it reaches ≤0.15 on the training part that doesn't matter? I just let it go for 10 hours? because its reaching 0.15 pretty fast.
      also my previous issue was that I exported the previous batch of samples from adobe premiere with different quality settings than the new batch, it works fine now :)

  • @nicoperez8720
    @nicoperez8720 3 года назад +2

    It keeps saying Pemission Denied when I try, it used to work well but now its not

  • @Utsukushiku9183
    @Utsukushiku9183 10 месяцев назад

    Hi, Google Collab is saying that the tacotron2 folder doesn't exist?
    Heres the error code: FileNotFoundError: [Errno 2] No such file or directory: 'tacotron2'

  • @tardigrade184
    @tardigrade184 2 года назад +8

    For those getting picke error UnpicklingError: invalid load key, '

    • @Notoriousseditz
      @Notoriousseditz 2 года назад +1

      Hi could you solve this problem it seems all the synthesis notebooks are not working all of a sudden "[Errno 2] No such file or directory: 'merged.dict.txt'" The Gdown Id is not working it's saying this
      "Access denied with the following error:
      Cannot retrieve the public link of the file. You may need to change
      the permission to 'Anyone with the link', or have had many accesses.
      You may still be able to access the file from the browser:"
      any solution?

    • @tardigrade184
      @tardigrade184 2 года назад

      @@Notoriousseditz You need the pronunciation dictionary. The other one is broken. Get it by:
      # Setup Pronounciation Dictionary
      !gdown --id '1E12g_sREdcH5vuZb44EZYX8JjGWQ9rRp'

    • @tardigrade184
      @tardigrade184 2 года назад

      Use these updated colabs:
      Training: gist.github.com/Oct4Pie/61781515d3e97f70b52dfef0648d71e7
      Synthesis: gist.github.com/Oct4Pie/4e56fa3d5d2c5a4313bdf664597eefc2
      If there are any issues, simply comment under the gist.

    • @TheLastSitcom
      @TheLastSitcom 2 года назад

      Thank you so much! This was such a huge help! You've got a new subscriber :)

    • @effekt_de
      @effekt_de 2 года назад

      @@tardigrade184 Hello bro, in this your training notebook, for training notebook gets brom pretrained model, not from my own audio files and texts.. What i need to train from my own audios ? thanks

  • @existentialcrisis9757
    @existentialcrisis9757 3 года назад +5

    Decent result for 30 wav files 🤯

  • @Nesggy
    @Nesggy 3 года назад +3

    I'm having a problem
    In the Tacotron Synthesis notebook: the panel just after Initialize Tacotron and Waveglow
    I get this error
    model.load_state_dict(torch.load(tacotron2_pretrained_model)['state_dict'])
    RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory
    I have reset the runtime and still doesn't work

    • @brat-b8h
      @brat-b8h 3 года назад

      Same thing for me and this is a game changer when someone finally decides to drop knowledge

    • @Nesggy
      @Nesggy 3 года назад +2

      @@brat-b8h I found the solution. This error means that your trained model file is corrupted. You have to train again your model and you should get a file that weighs around 323Mb.

    • @bedtime285
      @bedtime285 3 года назад

      @@brat-b8h that means its corrupt and you have to train again

  • @Sparkyh
    @Sparkyh 2 года назад +1

    are we able to get a 2022 version? i keep getting tensorflow 2 errors or something. trying to experiment i had this working like 2 years ago now it doesnt work for me. please help xD

  • @robloxfan75000
    @robloxfan75000 3 года назад +2

    What if I keep on going on the start training identifiying and generating the sounds of 500 epoch before it’s finished?

  • @chispun2
    @chispun2 Год назад +1

    I run out of RAM 12.68GB as soon as I try to make it for a full paragraph

  • @Gosmokeless28
    @Gosmokeless28 3 года назад +1

    4:36 What happens if the model isn't stopped?

  • @TheWWDproductions
    @TheWWDproductions 2 года назад +1

    I keep getting UnpicklingError: invalid load key, '

  • @vampirecatnifedipine318
    @vampirecatnifedipine318 2 года назад +2

    why can't I Donwload Tacotron

  • @scottbarrett1073
    @scottbarrett1073 3 года назад +4

    I jumped up to pro and now have a good GPU everytime. The only model I can get to generate sentences that sound like a 'voice' is the hal-9000. All the others produce a weird inter-dimensional singing sound but unintelligible.

  • @tvolow499
    @tvolow499 3 года назад +2

    Is there a Jupyter version of the training and synthesize project files?
    Do anyone know if there is instead of the Google Colab version?

  • @ibuddywolfie7882
    @ibuddywolfie7882 3 года назад +2

    When I get to the "Create MEL spectrograms" part, I run it and it gives me this error:
    "RuntimeError: shape '[1, 1, 94241]' is invalid for input of size 188482"
    With the traceback:
    ---------------------------------------------------------------------------
    RuntimeError Traceback (most recent call last)
    in ()
    1 if generate_mels:
    ----> 2 create_mels()
    3 frames
    /content/tacotron2/stft.py in transform(self, input_data)
    82
    83 # similar to librosa, reflect-pad the input
    ---> 84 input_data = input_data.view(num_batches, 1, num_samples)
    85 input_data = F.pad(
    86 input_data.unsqueeze(1),

    • @ibuddywolfie7882
      @ibuddywolfie7882 3 года назад +1

      I fixed this! But I won't be a dickhead who doesn't tell you how.
      The problem was with the exported audio files. I did split them, however I had 1 mono and 1 stereo. I ended up accidently exported the stereo one. To be extra cautious, I just deleted the stereo track in Audacity so it only exported the Mono track.

    • @TrenchMobbs
      @TrenchMobbs 2 года назад

      @@ibuddywolfie7882 how do i check if any is mono or stereo?

  • @PARAAA
    @PARAAA 3 года назад +1

    6:33 I have a problem. It keeps repeating "Maybe you need to change permission over 'Anyone with the link'?"
    I followed every steps so I don't know what to do now

    • @PARAAA
      @PARAAA 3 года назад

      Update: I followed the instructions written by Antonio Origlia here in the comments. Now everything works fine again

    • @robertovalentino70
      @robertovalentino70 3 года назад

      Per caso sei riuscito a farlo funzionare bene in italiano? Io ho provato con circa 300 clip presi da un audiolibro peró c'è sempre quel tono robotico

    • @PARAAA
      @PARAAA 3 года назад

      @@robertovalentino70 Bene in italiano purtroppo no. Penso ci sia ancora qualcos'altro da cambiare. Il problema è che non ci capisco molto ahahah è solo una roba che mi piace fare a tempo perso. Se scopro qualcosa al limite ti aggiorno

    • @robertovalentino70
      @robertovalentino70 3 года назад +1

      @@PARAAA Se ti serve una mano per qualcosa fammi sapere. Pur essendo un programmatore questo è un'argomento molto complesso. Da quel che ho capito io comunque il software è preaddestrato con un modello inglese quindi mancano completamente i caratteri accentati e le relative pronuncie, ma comunque dovrebbe funzionare anche per le altre lingue. Però in questo caso secondo me è necessario fornirgli molti più audio in modo che impara il maggior numero di sillabe pronunciate. Per esempio io ho notato che molte parole le pronuncia in modo corretto anche se con un tono robotico, mentre altre le pronuncia come un inglese le pronuncerebbe.

  • @Notoriousseditz
    @Notoriousseditz 2 года назад

    Hi when i'm using the synthesis notebook i'm getting this error "[Errno 2] No such file or directory: 'merged.dict.txt'" The Gdown Id is not working it's saying this
    "Access denied with the following error:
    Cannot retrieve the public link of the file. You may need to change
    the permission to 'Anyone with the link', or have had many accesses.
    You may still be able to access the file from the browser:"
    any solution?

  • @Feths
    @Feths 3 года назад

    Hola tengo un problema y es cuando estoy en The actual synthesis part pongo el texto que quiero que se diga y al reproducirlo no se escucha nada alguna solucion?
    Hello I have a problem and it is when I am in The current synthesis part I put the text that I want to be said and when reproducing it I do not hear anything, any solution?

    • @Health-maintenance
      @Health-maintenance 3 года назад

      Hi are you still having this problem? I think they might be a problem with the waveglow model

  • @johangrostkerck6046
    @johangrostkerck6046 2 года назад

    Hey I've got a small question. Would this be possible to do in other languages?

  • @conyponey
    @conyponey 2 года назад

    My textfile and wavs are all correct but im getting this please help :(
    Generating Mels
    21%
    104/500 [00:02

  • @conyponey
    @conyponey 3 года назад +5

    it says CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.90 GiB total capacity; 14.91 GiB already allocated; 29.75 MiB free; 14.99 GiB reserved in total by PyTorch) :(

    • @BannerlordOrgblog
      @BannerlordOrgblog 3 года назад +2

      same with me, even if i set batch size and epoch to 1 i always get out of memory

    • @fani7388
      @fani7388 3 года назад

      try to low the batch size

    • @выпускмысль
      @выпускмысль 3 года назад +7

      Maybe it's late but I found what causes this problem, some of my audio files were apparently too long, I decided to use the audio files that were 3-10 seconds long and the problem was solved

    • @aidanh4228
      @aidanh4228 3 года назад

      @@выпускмысль you fixed my problem, thank you

    • @выпускмысль
      @выпускмысль 3 года назад

      @@aidanh4228
      I'm happy to hear that(:

  • @giulianoonica7253
    @giulianoonica7253 2 года назад

    For some reason, when I run the cells in both notebooks, it doesn't seem to work. I do have my 30 audio files and transcript, as I have already uploaded them, but I don't have those other files that you have. Could you solve this problem?

  • @undeaf5860
    @undeaf5860 3 года назад +2

    When I try to set the parameters it just loads instantly no matter what i set the settings as though it's not doing anything. Then in data check there is an error for every single file. (57 of them). I've completely reset everything 5 times and even changed accounts.

  • @yawsampenebuadu9918
    @yawsampenebuadu9918 3 года назад +4

    How do I resume training from where I left off?

    • @BigDraco-So
      @BigDraco-So 3 года назад +1

      Just put in the same model name

    • @yawsampenebuadu9918
      @yawsampenebuadu9918 3 года назад +1

      @@BigDraco-So Thanka, but after doing that, is the epoch still supposed to start from one? or continue?

  • @Cloud9ChroniclesAutomation
    @Cloud9ChroniclesAutomation 9 месяцев назад

    hey is there any way for me to set this up on my local pc to train.

  • @massiveeyebrows4482
    @massiveeyebrows4482 3 года назад +2

    Please help!! I did everything right but when i do the check data module it gives me this error "[WARNING] wavs/1.wav in filelist while expecting .npy ." for every wav in the list.

    • @massiveeyebrows4482
      @massiveeyebrows4482 3 года назад

      To those who liked this comment and also have this problem, go to the text file and replace all the .wav files with .npy . idk why you have to do this now but thats how you get it to work

    • @lobato87
      @lobato87 3 года назад

      I got this error and I found out I had a misspelling error in one of the wav files

    • @addit6212
      @addit6212 3 года назад

      @@lobato87 what did you misspell because I am having the same issue

    • @addit6212
      @addit6212 3 года назад

      @@massiveeyebrows4482 to your original text file?

    • @massiveeyebrows4482
      @massiveeyebrows4482 3 года назад

      @@addit6212 yeah

  • @editsofthearrowverse3558
    @editsofthearrowverse3558 3 года назад +2

    The words were generated like a muffled cow but the tone of voice pretty acurate

  • @thefreesoulchannel
    @thefreesoulchannel 2 года назад

    Does both procedures still work? I have the use of colab pro, but when I use gtp- neo or open ai open google colab crashes.

  • @pimphatwaggoner1655
    @pimphatwaggoner1655 2 года назад +1

    I keep getting an unpickling error when I start training. It fails after about 15 seconds. I read that this is an issue with the newest iteration of Torch, so is there a way to circumvent the issue?

    • @sabah8312
      @sabah8312 2 года назад

      there will be a pretrained_model file, its size is around 101 MB check if it is of that size , if it is around 1 KB then you have manually download and use it

    • @WeLoveSpigotApi
      @WeLoveSpigotApi 2 года назад

      @@sabah8312 where i can download the pretrained_model file?

    • @WeLoveSpigotApi
      @WeLoveSpigotApi 2 года назад

      @@sabah8312 every time when i'm uploading the 101MB pretrained_model the size goes to 1kb

    • @sabah8312
      @sabah8312 2 года назад

      @@WeLoveSpigotApi drive.google.com/file/d/1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF/view

  • @brandonbreault6243
    @brandonbreault6243 3 года назад +3

    Is there a way to run this in real-time? Such as with an offline home assistant? Currently using python and have it running great but want to change to pyttsx3 voice to a custom one. Could I do such a thing with this?

    • @brandonbreault6243
      @brandonbreault6243 3 года назад +1

      Ok, I followed both part one and two precisely but utilized 81 samples of my own personal project (rushed just to test) but I got absolutely incredible results! Definitely needs some tweaking but your walkthrough of this is absolutely fantastic. Thank you so much! If you know how I could use this now as a real-time tts voice with pyttsx3 like mentioned in my last comment, that would so dang helpful. Thank you so much once again!

    • @williamjustus2654
      @williamjustus2654 3 года назад

      @Brandon Breault were you able to come up with a real-time solution? I am looking at the same need with an animatronics project.

    • @выпускмысль
      @выпускмысль 3 года назад

      @@williamjustus2654
      Maybe this could help you
      github.com/CorentinJ/Real-Time-Voice-Cloning

    • @mdfaizal2289
      @mdfaizal2289 3 года назад

      Hi did u found out the solution for it

  • @ONEON26
    @ONEON26 2 года назад

    hello, can I make it work with a new language other than mainstream international language? like a dialect?

  • @sickglee3808
    @sickglee3808 2 года назад +1

    Not sure what's wrong, but I got "UnpicklingError: invalid load key".

    • @okru2288
      @okru2288 2 года назад

      Find "train" function in that long chunk of code and replace:
      download_from_google_drive("1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA","pretrained_model")
      model = warm_start_model("pretrained_model", model, hparams.ignore_layers)
      with:
      !pip install gdown
      import gdown
      gdown.download('drive.google.com/u/0/uc?export=download&confirm=kZ1A&id=1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA', "pretrained_model", quiet=False);
      model = warm_start_model("pretrained_model", model, hparams.ignore_layers)

  • @editsofthearrowverse3558
    @editsofthearrowverse3558 3 года назад +2

    I have 54 wavs do you think it is enough?

  • @manasgaba8452
    @manasgaba8452 3 года назад +1

    If I have 20 wavs, would would be a good epoch number?

  • @eeveeears1589
    @eeveeears1589 2 года назад

    On the put file name here, the set parameters, the check data, AND the training, I get the error: NameError: name 'hparams' is not defined, on the bunch of boring code I get the error: AttributeError: module 'tensorflow' has no attribute 'contrib', on the MEL i get the error: NameError: name 'generate_mels' is not defined. Please help!

  • @giftheck
    @giftheck 3 года назад +2

    I get warnings like this when I check files. Does it really matter?
    "[WARNING] wavs/IOMET003.wav in filelist while expecting .npy ."
    Even restarting, I'm now plagued by further errors. Won't generate mels, file lists come up as missing... Seems I am destined to not get this to work at all.

    • @SSingh-nr8qz
      @SSingh-nr8qz 3 года назад

      They have to be MONO files in order to work. It's very sensitive to that. I ran into the same problem, and after some research found that using stereo causes this issue, as well as not using the right bitrate format.

    • @freehaven-junprince2376
      @freehaven-junprince2376 3 года назад

      I had this after I fixed some errors in the lists.txt. Fix the initial errors in your lists.txt, upload the new version. Delete the wavs folder (or just rename to something else and create a new empty wavs folder), re-import your wavs, and then run everything after downloading tacotron2 again. For me this fixed the problem.

  • @gudguhgii1886
    @gudguhgii1886 2 года назад

    Hi I have a unusual error and I been using tacotron2 about a year on colab and only been experiencing this lately, when I’m training a model, i have colab pro, it will eventually run out saying I been disconnected from the runtime, so I start everything up again, upload wavs and filelist and same model name so it resumes from the last checkpoint, well it loads to the Epoch I was at but it doesn’t warm start model from checkpoint ‘pretrained’ model, it says 0% and was wondering if you know how to fix this error? I would appreciate it thanks

  • @nice6360
    @nice6360 3 года назад +2

    Is it possible to continue the training later, if so how? Do I need to keep the PC on or the notebook? What happens if I do shutdown the PC or notebook?

    • @freehaven-junprince2376
      @freehaven-junprince2376 3 года назад

      I want to know this too.

    • @hecko-yes
      @hecko-yes 3 года назад

      super late but yes you'll have to keep your pc on, and in fact it'll ask you to do a captcha every few hours (and force-disconnect after 6-12 hours)
      it does however save its progress to google drive every few minutes, if you want to resume training simply restart the notebook with the same model name and folder

    • @hecko-yes
      @hecko-yes 3 года назад +1

      @@bobajerry3397 yeah you do unfortunately
      though for the dataset part, you could make a directory in your google drive (let's say called `dataset`) with a `wavs` directory and a `filelists` directory (which would have the `list.txt`)
      then in your own copy of the training notebook (so it saves) double-click step 3 to access the code and put in the following line of code:
      *!rsync -aP /content/drive/MyDrive/dataset/ /content/tacotron2/*
      (i _think_ that's the right code)
      that way you wouldn't have to reupload the dataset every time (because it'd copy from google drive), you could just run all the cells in order and only have to worry about google drive mounting

  • @belowthedot8903
    @belowthedot8903 3 года назад +2

    I'm retraining with more audio files, and when i try and generate spectrograms it gives me:
    "RuntimeError: shape '[1, 1, 51868]' is invalid for input of size 103736"

    • @yoliyanda2860
      @yoliyanda2860 3 года назад +4

      Make sure your audio file is mono/one channel.

    • @bastoscc
      @bastoscc 3 года назад

      @@yoliyanda2860 thank you it worked

    • @TrenchMobbs
      @TrenchMobbs 2 года назад

      @@yoliyanda2860 what do u mean exactly?

  • @mitschcrafter6766
    @mitschcrafter6766 3 года назад +2

    does this work with other languages?

  • @therefinedapple9210
    @therefinedapple9210 2 года назад +1

    The synthesis notebook says that it is broken and to use the updated link. However, when I use the updated one and paste my journal into the textfield and hit the play button, it gives me several errors:
    "NameError: name 'initilized' is not defined"
    "RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory"
    What should I do to solve these errors? Any help or feedback from anyone would be greatly appreciated.

    • @nilryth
      @nilryth 2 года назад

      same issue :( you ever figure it out?

    • @therefinedapple9210
      @therefinedapple9210 2 года назад

      @@nilryth I figured it out. You have to change the batch size to under 10 unless you're using a thousand or so clips. That way it doesn't overtrain it and kill it. That's what worked for me at least

  • @K98postman
    @K98postman 3 года назад

    Very good tutorial, you deserved a Like!

  • @RS-tz9fu
    @RS-tz9fu 2 года назад

    When I tried to train the model, it says "Unpicking Error: invalid load key, '

  • @hilmiyalfaruq
    @hilmiyalfaruq 2 года назад +2

    The synthesis notebook doesn't work anymore, it failed to load the warm start model from the LJSpeech pretrained model
    ---------------------------------------------------------------------------
    UnpicklingError Traceback (most recent call last)
    in ()
    5 print('cuDNN Benchmark:', hparams.cudnn_benchmark)
    6 train(output_directory, log_directory, checkpoint_path,
    ----> 7 warm_start, n_gpus, rank, group_name, hparams, log_directory2)
    3 frames
    in train(output_directory, log_directory, checkpoint_path, warm_start, n_gpus, rank, group_name, hparams, log_directory2)
    275 os.path.isfile("pretrained_model")
    276 download_from_google_drive("1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA","pretrained_model")
    --> 277 model = warm_start_model("pretrained_model", model, hparams.ignore_layers)
    278 # download LJSpeech pretrained model if no checkpoint already exists
    279
    in warm_start_model(checkpoint_path, model, ignore_layers)
    133 assert os.path.isfile(checkpoint_path)
    134 print("Warm starting model from checkpoint '{}'".format(checkpoint_path))
    --> 135 checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')
    136 model_dict = checkpoint_dict['state_dict']
    137 if len(ignore_layers) > 0:
    /usr/local/lib/python3.7/dist-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    606 return torch.jit.load(opened_file)
    607 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
    --> 608 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
    609
    610
    /usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
    775 "functionality.")
    776
    --> 777 magic_number = pickle_module.load(f, **pickle_load_args)
    778 if magic_number != MAGIC_NUMBER:
    779 raise RuntimeError("Invalid magic number; corrupt file?")
    UnpicklingError: invalid load key, '