A lot of people are pointing out the Tensorflow problem. Apparently, Colab no longer supports the version of Tensorflow on which Tacotron2 runs. After hours of searching, I found a solution in a message board somewhere. In "Download Tacotron," replace "%tensorflow_version 1.x" with "!pip install tensorflow==1.15." That got everything to run smoothly, and I was able to train. In the Synthesis notebook, there's another line that reads "%tensorflow_version 1.x," so I made the same replacement and got it running fine. Hope this helps, folks.
@@GiusePooP I'm definitely not an expert here, but did you make sure to convert wav to npy? I've gotten unpacking errors a lots of times before, and can usually find a solution on a message board somewhere
I just replaced the one "1" in "%tensorflow version 1.x" with "2"and then runned it and it worked. Though for the boring code part, I've ran into errors that won't let the code run correctly and I don't know how to solve it so maybe replacing 1 with 2 is a bad idea.
_In fact, I just thought the same. It is more difficult, but, if you plan to do a 30 minute ytp, but you cannot cut it, training an AI would not be a bad idea. Actually, I plan to use it to create videos like that, in fact, I don't do "ytp" if not "ytph" "RUclips Poop Hispano" basically, ytp but in Spanish. So, I plan to do a ytph of the president of Mexico, and training an AI is not a bad idea, it would save time._
I had an interesting result. I fed the entire dialog of Reinhardt from Overwatch into it (with the breath noises and other laughter removed), but the neural network really latched onto the remaining shouting parts. So the entire result is him just making various shouting noises. I thought it was funny, but it didn't quite work in the end. It was at Epoch 250, and I think the shouting noises are actually the vowel "ai". How fitting!
The graph seems to go between "line goes limp after few pixels" and "Kind of linear" and the loss fluctuates between 0.068635 and 0.069028. What is going on and how do I tune this to move forward?
when I run check_dataset(params) I get this error FileNotFoundError: [Errno 2] No such file or directory: 'wavs/1.npy'. I uploaded files in wav format. it is renamed to npy.Help!!!!
I really need help here. Can u walk me through installation? HIs video does not look like what is on the training manuel now. I understand audaccity n all that. Been using it for years. I Just cant figure out this program.
For anyone searching or wondering how long of time or hours wise this takes it's around an hour but you should train it longer. Like 2 or three hours instead for better quality. That applies though to 30 audio files with longer transcripts. it is wrong to assume that the number of interations, epoches, or the validation loss is more important until having a good solid diagonal incline on the graph. Don't stop running this until the graph is complete or near to it. Also, you want to train it as close to zero as possible. But if you overtrain it then it will start overfitting and drastically start raising above where you have dropped it to in validation loss. But still, if you just follow the number of interations being 60 being the end of training you still likely won't be near where it needs to be likely. Every session is different.
So you're saying to disregard the whole, "Stop training when the number gets to 0.15 or lower"? What we should be looking for instead is a diagonal graph?
I really need help here. Can u walk me through installation? HIs video does not look like what is on the training manuel now. I understand audaccity n all that. Been using it for years. I Just cant figure out this program.
I get "ValueError: Tensorflow 1 is unsupported in Colab." any ideas on how to get around this? I'm assuming this tutorial is just out of date, but not sure how to get past this step now.
This is hands down the best video I've seen on how to get started with Tacotron2 and WaveGlow for people who aren't running Linux. Did you write these Colab Notebooks? They were really clear too.
in the notebook, there's a link to ruclips.net/video/LQAOCXdU8p8/видео.html which is not Cherry Studios's video so probably he didn't write the Colab notebooks.
@@RS-tz9fu i've tried severally i keep getting an error at the {Check Data} cell. FileNotFoundError i wish i could post a screenshot what does this mean: list index out of range,
@@tonygosling2592 it's possibly due to some error in the transcript text file of your dataset. Check if there are any empty lines in the text file or a line which either contains the address to audio file or just the transcript.
hey i was using the bored button that is in 2:30 and it gives me this error: "AttributeError: module 'tensorflow' has no attribute 'contrib'" any help?
Yeah, I don't think you can get anything other than K80s anymore on colab... At least not on the free version. Supposedly (I don't have it, so can't test) even Pro+ members are getting K80s a lot too. :( Please help update this!!!
The synthesis notebook linked in this description now has the updated (working) one linked at the top, and as for the training notebook, it does actually still work with K80s. You probably realised that over the last 2 months, though.
I think I can help. the '.npy' file is what is created by your list.txt, so check to make sure your 2.wav is named correctly and also that your list.txt has the 2 text correctly labeled. Sorry it's been 3 months since you've had a response. I'm stuck at the missing waveglow repository :(
As said before you have to download the waveglow model from nvidia page, put it in your drive, copy the link of the model (as made for your trained model so made it accessible at anyone have the link) and paste the part of the link in the colab cell.
@@robertovalentino70 I tried downloading the waveglow model from the NVIDIA page and it solved the problem but now in the last # load waveglow part, im getting: waveglow = torch.load(waveglow_pretrained_model)['model'] waveglow.cuda()eval().half() for k in waveglow.convinv: k.float() KeyError: "model'
HELLLOO ARE YOU HERE???!?!?!? @Cherry Studios we need help. Why the program can't find "Load WaveGlow"; how can we fix this? When I click on the "Permission denied:" link. my browser opens a new tab which states "Not Found, Error 404"
Hey Cherry looks like the waveglow pretrained model file being linked from drive is no longer there. Would you perhaps have an update or know where we could look? Thanks for all your help my man
Hello sir, when I execute Select Tacotron model in Tacotron Synthesis Notebook.ipynb, I cannot download waveglow Model (Tacotron2 Model has been downloaded successfully). Is there a solution?
This is the best video with code walkthrough on Tacotron2 , I have seen till date in RUclips. Thank you so much. One query I have , which all languages it can be trained on
Is there a way to choose a person who’s voice your trying to emulate, then change the tone of which their saying it? Like is it possible to sing a song using your own voice, then run this process, and somehow get the person to sound like they are singing your song instead. Basically replace your vocals and the way you sing it with someone else. Thanks :) great video
Solved by downloading the latest Waveglow model from NVidia and putting it in my Drive, then changing the download link to the one generated in my own Drive.
@@nicoperez8720 go to the NVIDIA models catalog and look for the "Waveglow for PyTorch" pretrained weights (ngc.nvidia.com/catalog/models?orderBy=modifiedDESC&pageNumber=0&query=%20label%3A%22Speech%20Synthesis%22&quickFilter=models&filters=) Next, unzip the model, put it in your Google Drive and rename it "waveglow.pt". From there, copy the share link and make it publicly accessible, just like you do with the Tacotron model. Put the link in the Waveglow download line of code so that it will point to your copy of Waveglow instead of the original one, and you should be good to go.
Hey, this is an issue with downloading the file from the Google Drive link. There is some warning message to alert you whether you're sure you want to download it. I just manually downloaded the file from the link, and then loaded it. The download link was drive.google.com/uc?export=download&confirm={confirm_text}&id=1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA and then you need to manually rename it to "pretrained_model" (no file format) then upload it to the tacotron2 folder
Hi, great tutorial so far! Thanks for sharing your know-how. Once in the Synthesis Notebook tab, the program can't find "Load WaveGlow"; how can I fix this? When I click on the "Permission denied:" link. my browser opens a new tab which states "Not Found, Error 404".
What is the format for best results? Short lines in the list (and many small audio clips) or long ones (fewer audio clips)? If I know that I want my character to be able to say specific things very well, Is there any benefit to putting that word / phrase alone on it's own line? I am guessing the accuracy goes down, the longer the line is. Say I am duping a sports broadcaster. I want him to be able to say the names of the sports / teams very well; if nothing else. I have 100 samples so far. I can only run 12 : 1000 epoches. I OOM @ 15:500. Sorry for so many questions. My last one for now is about the syntax of the punctuation. Is it better to comma directly after a word, or with a space in between. ( Comma, vs Comma , ) I am curious if the algorithm treats words the same if it has punctuation attached.
Hey! Awesome tutorial! Very clear. I've got some error, and I'm really confused. On run the Start Training Part, on the line "train(output directory.....") it stops. It shows me this error UnpicklingError: invalid load key, '
Question: I've made about four different models already. When you say "You have to wait until .15 or lower", I've been waiting until .144-ish. Is there a benefit of it being much lower than that such as .10 or .08? Can it get that low?
@@elixstrations7147 It actually works way better if you do less than .15. I usually stop around .08 because at that point, whe I was still doing them, my internet would cut out around then and I'd lose my progress. But yes. it works a lot better.
Since these values are representation of Neural Network error, lower value means better output result, because you always want to keep any error on it's minimum. I've created a lot of different NN archuitectures and I personally consider values lower than 0.1 as an acceptable result. But bear in mind, that these error values are not in a universal scale. Some NN archuitectures and it's optimizers can give you good results with values like 0.95.
any advice to do it with a different language? f.e. in turkish there are letters like ö/ü/ı thats differ from english. Do we also deep into phonems? thanks btw for great tutorial.
at the last step, there is a parameter named 'english_cleaners', I changed this to... uhm, I don't remember, gonna have to look it up and come back to you later; but it worked for spanish.
It gives me this error ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/tacotron2/data_utils.py", line 61, in __getitem__ return self.get_mel_text_pair(self.audiopaths_and_text[index]) File "/content/tacotron2/data_utils.py", line 34, in get_mel_text_pair mel = self.get_mel(audiopath) File "/content/tacotron2/data_utils.py", line 49, in get_mel melspec = torch.from_numpy(np.load(filename)) File "/usr/local/lib/python3.7/dist-packages/numpy/lib/npyio.py", line 444, in load raise ValueError("Cannot load file containing pickled data " ValueError: Cannot load file containing pickled data when allow_pickle=False
i have a pretrained model which i downloaded but the voice comes out like hes trying to gasp for breath and noticed the right image shows up more as lines instead of a diagonal shape
help! I've tried downloading the waveglow model from the NVIDIA catolog since the one listed in the code says I have permission denied. but in the # load waveglow part, im getting: waveglow = torch.load(waveglow_pretrained_model)['model'] waveglow.cuda()eval().half() for k in waveglow.convinv: k.float() KeyError: "model'
I downloaded this file for the waveglow and it worked. my results sound terrible though so i gotta go back to training it i think. drive.google.com/u/0/uc?id=1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF&export=download
When I get to the synthesis part, I get this when I load the Tacotron 2 model, even when I reset the factory runtime a few times, it still occurs: --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) in () 4 hparams.gate_threshold = 0.1 # Model must be 90% sure the clip is over before ending generation (the higher this number is, the more likely that the AI will keep generating until it reaches the Max Decoder Steps) 5 model = Tacotron2(hparams) ----> 6 model.load_state_dict(torch.load(tacotron2_pretrained_model)['state_dict']) 7 _ = model.cuda().eval().half() 1 frames /usr/local/lib/python3.7/dist-packages/torch/serialization.py in __init__(self, name_or_buffer) 240 class _open_zipfile_reader(_opener): 241 def __init__(self, name_or_buffer) -> None: --> 242 super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer)) 243 244 RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory
Hi! THank you for your explations! I'm facing a problem during trying to generate the mels: RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument. Did you've been experimenting this problem too? Thank you.
I can't get anything other than K80 and I don't know how to change it. Even after the factory reset suggestion, the GPU is exactly the same. Would really suck if that's the only hurdle and it makes this unusable for me... Does anyone have any tips?
Every time i get to the training part im getting a error? im doing everything the same as you. everything else passes when i click on it let it do its operation but when i get to the last bit of actually training model it fails straight away. FP16 Run: False Dynamic Loss Scaling: True Distributed Run: False cuDNN Enabled: True cuDNN Benchmark: False % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1555 100 1555 0 0 6973 0 --:--:-- --:--:-- --:--:-- 7004 Warm starting model from checkpoint 'pretrained_model' --------------------------------------------------------------------------- UnpicklingError Traceback (most recent call last) in () 5 print('cuDNN Benchmark:', hparams.cudnn_benchmark) 6 train(output_directory, log_directory, checkpoint_path, ----> 7 warm_start, n_gpus, rank, group_name, hparams, log_directory2) 3 frames /usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args) 918 "functionality.") 919 --> 920 magic_number = pickle_module.load(f, **pickle_load_args) 921 if magic_number != MAGIC_NUMBER: 922 raise RuntimeError("Invalid magic number; corrupt file?") UnpicklingError: invalid load key, '
This does not work anymore, can you make another one if possible, thanks. I’m useless coding and correcting my mistakes lol, they’re no in depth tutorials on RUclips for beginners
update: yes it can talk in other language, tested with about a minute of audio samples and it was able to talk back to me the phrases I gave to the training notebook
@@lobato87 So it's not pre-trained in english? i thought must be pre-trained with something, so it can infer the sounds of the letters you don't include in the training set.
@@IanPaulBrossard there's a parameter near the end called 'english_cleaners' that must be changed and also the symbols.py that must be changed to the language you need
@@lobato87thank you! Then i guess it would be better if I just ignore all the grammar rules and use á é í ó ú in every stressed syllable. Also, I'll try to use use a separate set of exclamation sentences, so I can choose regular speech and explamations as we please (for example, use moderl 2 for every word between ¡ and ! and use model 1 for everything else!
I finally got my ai thing to work! I spent like 5 hours on it, and it's finally finished! Thanks for the help! ❤️ Edit: I was thinking of doing Homestar Runner characters! XD
I have the same problem, the link that needs to be 'anyone with link' is the trained model, not waveglow - so why is it asking me to change permission on waveglow? isn't that coming from a different google drive?
Hey can you help with this, after doing everything i am not getting audio in the end, do you know how to resolve this? Also i am not able to use T4 gpu coz no matter how many times i do factory reset it always show k80
As many other comments state it, I reckon there's an error with tenserflow "Tensorflow 1 is deprecated, and support will be removed on August 1, 2022" Any idea how to update that?
@@moanxion9102 I have a problem, under "A bunch of boring code and stuff" I get this error: AttributeError Traceback (most recent call last) in 375 376 # ---- DEFAULT PARAMETERS DEFINED HERE ---- --> 377 hparams = create_hparams() 378 model_filename = 'current_model' 379 hparams.training_files = "filelists/clipper_train_filelist.txt" /content/tacotron2/hparams.py in create_hparams(hparams_string, verbose) 6 """Create model hyperparameters. Parse nondefault from given string.""" 7 ----> 8 hparams = tf.contrib.training.HParams( 9 ################################ 10 # Experiment Parameters # AttributeError: module 'tensorflow' has no attribute 'contrib'
Very good explanation 🤩 Do you know if there are models for sale? I mean models that are created by other people, that could be used in the Synthesis Notebook?
I have a question: the voices I’m using don’t have a lot of sources to draw from-I think it’s only about 3-4 minutes worth of original [usable] dialogue. Would I be able to cheat the system by just copy-pasting the amount I have until I get about 10 minutes or so? Or does it all have to be original dialogue for the program to read it?
@VillaGG "let it train for 10 hours", so if it reaches ≤0.15 on the training part that doesn't matter? I just let it go for 10 hours? because its reaching 0.15 pretty fast. also my previous issue was that I exported the previous batch of samples from adobe premiere with different quality settings than the new batch, it works fine now :)
Hi, Google Collab is saying that the tacotron2 folder doesn't exist? Heres the error code: FileNotFoundError: [Errno 2] No such file or directory: 'tacotron2'
Hi could you solve this problem it seems all the synthesis notebooks are not working all of a sudden "[Errno 2] No such file or directory: 'merged.dict.txt'" The Gdown Id is not working it's saying this "Access denied with the following error: Cannot retrieve the public link of the file. You may need to change the permission to 'Anyone with the link', or have had many accesses. You may still be able to access the file from the browser:" any solution?
@@Notoriousseditz You need the pronunciation dictionary. The other one is broken. Get it by: # Setup Pronounciation Dictionary !gdown --id '1E12g_sREdcH5vuZb44EZYX8JjGWQ9rRp'
Use these updated colabs: Training: gist.github.com/Oct4Pie/61781515d3e97f70b52dfef0648d71e7 Synthesis: gist.github.com/Oct4Pie/4e56fa3d5d2c5a4313bdf664597eefc2 If there are any issues, simply comment under the gist.
@@tardigrade184 Hello bro, in this your training notebook, for training notebook gets brom pretrained model, not from my own audio files and texts.. What i need to train from my own audios ? thanks
I'm having a problem In the Tacotron Synthesis notebook: the panel just after Initialize Tacotron and Waveglow I get this error model.load_state_dict(torch.load(tacotron2_pretrained_model)['state_dict']) RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory I have reset the runtime and still doesn't work
@@brat-b8h I found the solution. This error means that your trained model file is corrupted. You have to train again your model and you should get a file that weighs around 323Mb.
are we able to get a 2022 version? i keep getting tensorflow 2 errors or something. trying to experiment i had this working like 2 years ago now it doesnt work for me. please help xD
I jumped up to pro and now have a good GPU everytime. The only model I can get to generate sentences that sound like a 'voice' is the hal-9000. All the others produce a weird inter-dimensional singing sound but unintelligible.
When I get to the "Create MEL spectrograms" part, I run it and it gives me this error: "RuntimeError: shape '[1, 1, 94241]' is invalid for input of size 188482" With the traceback: --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) in () 1 if generate_mels: ----> 2 create_mels() 3 frames /content/tacotron2/stft.py in transform(self, input_data) 82 83 # similar to librosa, reflect-pad the input ---> 84 input_data = input_data.view(num_batches, 1, num_samples) 85 input_data = F.pad( 86 input_data.unsqueeze(1),
I fixed this! But I won't be a dickhead who doesn't tell you how. The problem was with the exported audio files. I did split them, however I had 1 mono and 1 stereo. I ended up accidently exported the stereo one. To be extra cautious, I just deleted the stereo track in Audacity so it only exported the Mono track.
6:33 I have a problem. It keeps repeating "Maybe you need to change permission over 'Anyone with the link'?" I followed every steps so I don't know what to do now
@@robertovalentino70 Bene in italiano purtroppo no. Penso ci sia ancora qualcos'altro da cambiare. Il problema è che non ci capisco molto ahahah è solo una roba che mi piace fare a tempo perso. Se scopro qualcosa al limite ti aggiorno
@@PARAAA Se ti serve una mano per qualcosa fammi sapere. Pur essendo un programmatore questo è un'argomento molto complesso. Da quel che ho capito io comunque il software è preaddestrato con un modello inglese quindi mancano completamente i caratteri accentati e le relative pronuncie, ma comunque dovrebbe funzionare anche per le altre lingue. Però in questo caso secondo me è necessario fornirgli molti più audio in modo che impara il maggior numero di sillabe pronunciate. Per esempio io ho notato che molte parole le pronuncia in modo corretto anche se con un tono robotico, mentre altre le pronuncia come un inglese le pronuncerebbe.
Hi when i'm using the synthesis notebook i'm getting this error "[Errno 2] No such file or directory: 'merged.dict.txt'" The Gdown Id is not working it's saying this "Access denied with the following error: Cannot retrieve the public link of the file. You may need to change the permission to 'Anyone with the link', or have had many accesses. You may still be able to access the file from the browser:" any solution?
Hola tengo un problema y es cuando estoy en The actual synthesis part pongo el texto que quiero que se diga y al reproducirlo no se escucha nada alguna solucion? Hello I have a problem and it is when I am in The current synthesis part I put the text that I want to be said and when reproducing it I do not hear anything, any solution?
it says CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.90 GiB total capacity; 14.91 GiB already allocated; 29.75 MiB free; 14.99 GiB reserved in total by PyTorch) :(
Maybe it's late but I found what causes this problem, some of my audio files were apparently too long, I decided to use the audio files that were 3-10 seconds long and the problem was solved
For some reason, when I run the cells in both notebooks, it doesn't seem to work. I do have my 30 audio files and transcript, as I have already uploaded them, but I don't have those other files that you have. Could you solve this problem?
When I try to set the parameters it just loads instantly no matter what i set the settings as though it's not doing anything. Then in data check there is an error for every single file. (57 of them). I've completely reset everything 5 times and even changed accounts.
Please help!! I did everything right but when i do the check data module it gives me this error "[WARNING] wavs/1.wav in filelist while expecting .npy ." for every wav in the list.
To those who liked this comment and also have this problem, go to the text file and replace all the .wav files with .npy . idk why you have to do this now but thats how you get it to work
I keep getting an unpickling error when I start training. It fails after about 15 seconds. I read that this is an issue with the newest iteration of Torch, so is there a way to circumvent the issue?
there will be a pretrained_model file, its size is around 101 MB check if it is of that size , if it is around 1 KB then you have manually download and use it
Is there a way to run this in real-time? Such as with an offline home assistant? Currently using python and have it running great but want to change to pyttsx3 voice to a custom one. Could I do such a thing with this?
Ok, I followed both part one and two precisely but utilized 81 samples of my own personal project (rushed just to test) but I got absolutely incredible results! Definitely needs some tweaking but your walkthrough of this is absolutely fantastic. Thank you so much! If you know how I could use this now as a real-time tts voice with pyttsx3 like mentioned in my last comment, that would so dang helpful. Thank you so much once again!
Find "train" function in that long chunk of code and replace: download_from_google_drive("1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA","pretrained_model") model = warm_start_model("pretrained_model", model, hparams.ignore_layers) with: !pip install gdown import gdown gdown.download('drive.google.com/u/0/uc?export=download&confirm=kZ1A&id=1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA', "pretrained_model", quiet=False); model = warm_start_model("pretrained_model", model, hparams.ignore_layers)
On the put file name here, the set parameters, the check data, AND the training, I get the error: NameError: name 'hparams' is not defined, on the bunch of boring code I get the error: AttributeError: module 'tensorflow' has no attribute 'contrib', on the MEL i get the error: NameError: name 'generate_mels' is not defined. Please help!
I get warnings like this when I check files. Does it really matter? "[WARNING] wavs/IOMET003.wav in filelist while expecting .npy ." Even restarting, I'm now plagued by further errors. Won't generate mels, file lists come up as missing... Seems I am destined to not get this to work at all.
They have to be MONO files in order to work. It's very sensitive to that. I ran into the same problem, and after some research found that using stereo causes this issue, as well as not using the right bitrate format.
I had this after I fixed some errors in the lists.txt. Fix the initial errors in your lists.txt, upload the new version. Delete the wavs folder (or just rename to something else and create a new empty wavs folder), re-import your wavs, and then run everything after downloading tacotron2 again. For me this fixed the problem.
Hi I have a unusual error and I been using tacotron2 about a year on colab and only been experiencing this lately, when I’m training a model, i have colab pro, it will eventually run out saying I been disconnected from the runtime, so I start everything up again, upload wavs and filelist and same model name so it resumes from the last checkpoint, well it loads to the Epoch I was at but it doesn’t warm start model from checkpoint ‘pretrained’ model, it says 0% and was wondering if you know how to fix this error? I would appreciate it thanks
Is it possible to continue the training later, if so how? Do I need to keep the PC on or the notebook? What happens if I do shutdown the PC or notebook?
super late but yes you'll have to keep your pc on, and in fact it'll ask you to do a captcha every few hours (and force-disconnect after 6-12 hours) it does however save its progress to google drive every few minutes, if you want to resume training simply restart the notebook with the same model name and folder
@@bobajerry3397 yeah you do unfortunately though for the dataset part, you could make a directory in your google drive (let's say called `dataset`) with a `wavs` directory and a `filelists` directory (which would have the `list.txt`) then in your own copy of the training notebook (so it saves) double-click step 3 to access the code and put in the following line of code: *!rsync -aP /content/drive/MyDrive/dataset/ /content/tacotron2/* (i _think_ that's the right code) that way you wouldn't have to reupload the dataset every time (because it'd copy from google drive), you could just run all the cells in order and only have to worry about google drive mounting
I'm retraining with more audio files, and when i try and generate spectrograms it gives me: "RuntimeError: shape '[1, 1, 51868]' is invalid for input of size 103736"
The synthesis notebook says that it is broken and to use the updated link. However, when I use the updated one and paste my journal into the textfield and hit the play button, it gives me several errors: "NameError: name 'initilized' is not defined" "RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory" What should I do to solve these errors? Any help or feedback from anyone would be greatly appreciated.
@@nilryth I figured it out. You have to change the batch size to under 10 unless you're using a thousand or so clips. That way it doesn't overtrain it and kill it. That's what worked for me at least
A lot of people are pointing out the Tensorflow problem. Apparently, Colab no longer supports the version of Tensorflow on which Tacotron2 runs. After hours of searching, I found a solution in a message board somewhere. In "Download Tacotron," replace "%tensorflow_version 1.x" with "!pip install tensorflow==1.15." That got everything to run smoothly, and I was able to train. In the Synthesis notebook, there's another line that reads "%tensorflow_version 1.x," so I made the same replacement and got it running fine. Hope this helps, folks.
I tried and it worked but for some reason i also having issue with a " UnpicklingError: invalid load key, '
@@GiusePooP I'm definitely not an expert here, but did you make sure to convert wav to npy? I've gotten unpacking errors a lots of times before, and can usually find a solution on a message board somewhere
THX
@@GiusePooP Me too
I just replaced the one "1" in "%tensorflow version 1.x" with "2"and then runned it and it worked. Though for the boring code part, I've ran into errors that won't let the code run correctly and I don't know how to solve it so maybe replacing 1 with 2 is a bad idea.
This is the future of YTP sentence mixing.
I'm gonna use it and put myself out of a job
@8 haha yeah I was kidding. Nothing beats sentence mixing
_In fact, I just thought the same. It is more difficult, but, if you plan to do a 30 minute ytp, but you cannot cut it, training an AI would not be a bad idea. Actually, I plan to use it to create videos like that, in fact, I don't do "ytp" if not "ytph" "RUclips Poop Hispano" basically, ytp but in Spanish. So, I plan to do a ytph of the president of Mexico, and training an AI is not a bad idea, it would save time._
shitposting 2077
I had an interesting result. I fed the entire dialog of Reinhardt from Overwatch into it (with the breath noises and other laughter removed), but the neural network really latched onto the remaining shouting parts. So the entire result is him just making various shouting noises. I thought it was funny, but it didn't quite work in the end.
It was at Epoch 250, and I think the shouting noises are actually the vowel "ai". How fitting!
The graph seems to go between "line goes limp after few pixels" and "Kind of linear" and the loss fluctuates between 0.068635 and 0.069028. What is going on and how do I tune this to move forward?
when I run check_dataset(params) I get this error FileNotFoundError: [Errno 2] No such file or directory: 'wavs/1.npy'. I uploaded files in wav format. it is renamed to npy.Help!!!!
@@sravanidandu794 convert the text file txt to npy
I really need help here. Can u walk me through installation? HIs video does not look like what is on the training manuel now. I understand audaccity n all that. Been using it for years. I Just cant figure out this program.
For anyone searching or wondering how long of time or hours wise this takes it's around an hour but you should train it longer. Like 2 or three hours instead for better quality. That applies though to 30 audio files with longer transcripts. it is wrong to assume that the number of interations, epoches, or the validation loss is more important until having a good solid diagonal incline on the graph. Don't stop running this until the graph is complete or near to it. Also, you want to train it as close to zero as possible. But if you overtrain it then it will start overfitting and drastically start raising above where you have dropped it to in validation loss. But still, if you just follow the number of interations being 60 being the end of training you still likely won't be near where it needs to be likely. Every session is different.
So you're saying to disregard the whole, "Stop training when the number gets to 0.15 or lower"? What we should be looking for instead is a diagonal graph?
I really need help here. Can u walk me through installation? HIs video does not look like what is on the training manuel now. I understand audaccity n all that. Been using it for years. I Just cant figure out this program.
how can u get the training save if its on a collab notebook? where does it get save? on drive?
Can anybody help me with this error?
FP16 Run: False
Dynamic Loss Scaling: True
Distributed Run: False
cuDNN Enabled: True
cuDNN Benchmark: False
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1555 100 1555 0 0 101k 0 --:--:-- --:--:-- --:--:-- 108k
Warm starting model from checkpoint 'pretrained_model'
---------------------------------------------------------------------------
UnpicklingError Traceback (most recent call last)
in ()
5 print('cuDNN Benchmark:', hparams.cudnn_benchmark)
6 train(output_directory, log_directory, checkpoint_path,
----> 7 warm_start, n_gpus, rank, group_name, hparams, log_directory2)
3 frames
in train(output_directory, log_directory, checkpoint_path, warm_start, n_gpus, rank, group_name, hparams, log_directory2)
275 os.path.isfile("pretrained_model")
276 download_from_google_drive("1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA","pretrained_model")
--> 277 model = warm_start_model("pretrained_model", model, hparams.ignore_layers)
278 # download LJSpeech pretrained model if no checkpoint already exists
279
in warm_start_model(checkpoint_path, model, ignore_layers)
133 assert os.path.isfile(checkpoint_path)
134 print("Warm starting model from checkpoint '{}'".format(checkpoint_path))
--> 135 checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')
136 model_dict = checkpoint_dict['state_dict']
137 if len(ignore_layers) > 0:
/usr/local/lib/python3.7/dist-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
606 return torch.jit.load(opened_file)
607 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
--> 608 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
609
610
/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
775 "functionality.")
776
--> 777 magic_number = pickle_module.load(f, **pickle_load_args)
778 if magic_number != MAGIC_NUMBER:
779 raise RuntimeError("Invalid magic number; corrupt file?")
UnpicklingError: invalid load key, '
I get "ValueError: Tensorflow 1 is unsupported in Colab." any ideas on how to get around this? I'm assuming this tutorial is just out of date, but not sure how to get past this step now.
I get the same error too, could you find a way to solve this?
@@omerozer7000 You just need to change the first line of the code that says 'tensorflow_version 1.x' to 'tensorflow_version 2.x'
Tensorflow 1 is no longer supported in Colab. So this tutorial sadly doesn't work anymore!
Is it possible to download the code and make it locally on pc?
If youre Model is moaning just simply change False to True in the synthesis notebook (last cell where you put the text in)
This is hands down the best video I've seen on how to get started with Tacotron2 and WaveGlow for people who aren't running Linux. Did you write these Colab Notebooks? They were really clear too.
in the notebook, there's a link to ruclips.net/video/LQAOCXdU8p8/видео.html which is not Cherry Studios's video so probably he didn't write the Colab notebooks.
@@RS-tz9fu i've tried severally i keep getting an error at the {Check Data} cell. FileNotFoundError i wish i could post a screenshot
what does this mean: list index out of range,
@@tonygosling2592 it's possibly due to some error in the transcript text file of your dataset. Check if there are any empty lines in the text file or a line which either contains the address to audio file or just the transcript.
Thank you man. I was looking for something like this for a year now.
hey i was using the bored button that is in 2:30 and it gives me this error: "AttributeError: module 'tensorflow' has no attribute 'contrib'" any help?
also in the next 2 steps, (name the model and set the parameters) it says: "name 'hparams' is not defined"
Yeah, I don't think you can get anything other than K80s anymore on colab... At least not on the free version. Supposedly (I don't have it, so can't test) even Pro+ members are getting K80s a lot too. :(
Please help update this!!!
The synthesis notebook linked in this description now has the updated (working) one linked at the top, and as for the training notebook, it does actually still work with K80s.
You probably realised that over the last 2 months, though.
Restart Runtime. System32/error.py
Brilliant description of how to make a model, thanks a lot
i got a runtime error in MEL spectrograms! any reason why?
Hello, I have a problem in colab [Errno 2] No such file or directory: 'wavs/2.npy' can you help me ?
I think I can help. the '.npy' file is what is created by your list.txt, so check to make sure your 2.wav is named correctly and also that your list.txt has the 2 text correctly labeled. Sorry it's been 3 months since you've had a response. I'm stuck at the missing waveglow repository :(
Does anyone face a problem when downloading the pre-trained waveglow model ? I think the current link can not be used anymore
As said before you have to download the waveglow model from nvidia page, put it in your drive, copy the link of the model (as made for your trained model so made it accessible at anyone have the link) and paste the part of the link in the colab cell.
@@robertovalentino70 I tried downloading the waveglow model from the NVIDIA page and it solved the problem but now in the last # load waveglow part, im getting:
waveglow = torch.load(waveglow_pretrained_model)['model']
waveglow.cuda()eval().half()
for k in waveglow.convinv:
k.float()
KeyError: "model'
@@JaxonPham Hey, did you somehow solve this?
@@trickster444 try using 1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF as your pasted waveglow code
HELLLOO ARE YOU HERE???!?!?!? @Cherry Studios we need help. Why the program can't find "Load WaveGlow"; how can we fix this? When I click on the "Permission denied:" link. my browser opens a new tab which states "Not Found, Error 404"
NICE EXPLANATION, does it work with other languages?
I tried making a american voice speak a german text. Which sound the same as a real american would try to speak a german text. I guess it does.
@@MrGTAmodsgerman Is possible with a Russian too?
@@Unknown-rx3br I never said its possible. I said, i guess so.
@@Unknown-rx3br yes, i've seen a video about it
Hey Cherry looks like the waveglow pretrained model file being linked from drive is no longer there. Would you perhaps have an update or know where we could look? Thanks for all your help my man
1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF
Use this as an alternative.
@@TechnologyGuyOfficial works perfect, I cant download the Audio at all. the three dotted line doesn't show
@@TechnologyGuyOfficial hey brother, isn't this quite smaller than the file linked to the original link
@@TechnologyGuyOfficial Thank you very much.
When I play the audio I don’t hear anything. I have tired different models but I don’t hear anything:(. Even when using the new wave glow link.
It's telling me permission is denied on WaveGlow stuff, and when I click the link it says it is not found...
I'm having this same issue here!
@@amantedabahia same
same
Came for the tutorial, stayed for your awesome voice.
On the last step of the training notebook it won't even begin, it's giving me an Unpickling Error.
Hello sir, when I execute Select Tacotron model in Tacotron Synthesis Notebook.ipynb, I cannot download waveglow Model (Tacotron2 Model has been downloaded successfully). Is there a solution?
This is the best video with code walkthrough on Tacotron2 , I have seen till date in RUclips. Thank you so much.
One query I have , which all languages it can be trained on
Is there a way to choose a person who’s voice your trying to emulate, then change the tone of which their saying it?
Like is it possible to sing a song using your own voice, then run this process, and somehow get the person to sound like they are singing your song instead. Basically replace your vocals and the way you sing it with someone else.
Thanks :) great video
Yes with talknet
I am getting a "Permission denied" error when trying to download the Waveglow model.
Solved by downloading the latest Waveglow model from NVidia and putting it in my Drive, then changing the download link to the one generated in my own Drive.
@@0TheDarkness0 Thank you for confirming my earlier Post.
@@0TheDarkness0 How do you do it?
@@nicoperez8720 go to the NVIDIA models catalog and look for the "Waveglow for PyTorch" pretrained weights (ngc.nvidia.com/catalog/models?orderBy=modifiedDESC&pageNumber=0&query=%20label%3A%22Speech%20Synthesis%22&quickFilter=models&filters=)
Next, unzip the model, put it in your Google Drive and rename it "waveglow.pt".
From there, copy the share link and make it publicly accessible, just like you do with the Tacotron model. Put the link in the Waveglow download line of code so that it will point to your copy of Waveglow instead of the original one, and you should be good to go.
@@0TheDarkness0 tysm
I keep running into a unpickling error with a invalid load key of "
I still find myself running into the same errors as a day before. Any suggestions?
FP16 Run: False
Dynamic Loss Scaling: True
Distributed Run: False
cuDNN Enabled: True
cuDNN Benchmark: False
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1555 100 1555 0 0 86388 0 --:--:-- --:--:-- --:--:-- 86388
Warm starting model from checkpoint 'pretrained_model'
---------------------------------------------------------------------------
UnpicklingError Traceback (most recent call last)
in ()
5 print('cuDNN Benchmark:', hparams.cudnn_benchmark)
6 train(output_directory, log_directory, checkpoint_path,
----> 7 warm_start, n_gpus, rank, group_name, hparams, log_directory2)
3 frames
/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
918 "functionality.")
919
--> 920 magic_number = pickle_module.load(f, **pickle_load_args)
921 if magic_number != MAGIC_NUMBER:
922 raise RuntimeError("Invalid magic number; corrupt file?")
UnpicklingError: invalid load key, '
Running the same problem here. Have you solved it? Thanks in advance
@@ruben5440 Sorry, despite making multiple attempts, I haven't been able to.
Hey, this is an issue with downloading the file from the Google Drive link. There is some warning message to alert you whether you're sure you want to download it. I just manually downloaded the file from the link, and then loaded it. The download link was drive.google.com/uc?export=download&confirm={confirm_text}&id=1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA and then you need to manually rename it to "pretrained_model" (no file format) then upload it to the tacotron2 folder
Hi, great tutorial so far! Thanks for sharing your know-how. Once in the Synthesis Notebook tab, the program can't find "Load WaveGlow"; how can I fix this? When I click on the "Permission denied:" link. my browser opens a new tab which states "Not Found, Error 404".
bro. have you got new link for waveglow
Did you get new link ?
@@iliatugushi Waveglow is hard
What is the format for best results? Short lines in the list (and many small audio clips) or long ones (fewer audio clips)? If I know that I want my character to be able to say specific things very well, Is there any benefit to putting that word / phrase alone on it's own line? I am guessing the accuracy goes down, the longer the line is. Say I am duping a sports broadcaster. I want him to be able to say the names of the sports / teams very well; if nothing else.
I have 100 samples so far. I can only run 12 : 1000 epoches. I OOM @ 15:500. Sorry for so many questions. My last one for now is about the syntax of the punctuation. Is it better to comma directly after a word, or with a space in between. ( Comma, vs Comma , ) I am curious if the algorithm treats words the same if it has punctuation attached.
I keep on getting K80, even though I reset the factory runtime a lot of times. Do you have any tips to help me?
I am too. Did you ever figure it out?
@@monkadude15 Nope.
@@clunkster danget. I guess you didn’t find a workaround lol
@CookieBoy were you able to solve it, coz without tesla t4 no sound is being generated?
@@nimishbansal4752 No.
Hey! Awesome tutorial! Very clear.
I've got some error, and I'm really confused. On run the Start Training Part, on the line "train(output directory.....") it stops. It shows me this error UnpicklingError: invalid load key, '
Hi! Met too, have you gotten a solution yet?
Same, same error as well
hi me too do you solve it yet ???
For those getting picke error UnpicklingError: invalid load key, '
@@tardigrade184 What you upload the file as? pretrained_model.pt or without the filename?
Question: I've made about four different models already. When you say "You have to wait until .15 or lower", I've been waiting until .144-ish. Is there a benefit of it being much lower than that such as .10 or .08? Can it get that low?
Have you tested this yourself because I'm wondering that too.
@@elixstrations7147 It actually works way better if you do less than .15. I usually stop around .08 because at that point, whe I was still doing them, my internet would cut out around then and I'd lose my progress. But yes. it works a lot better.
Since these values are representation of Neural Network error, lower value means better output result, because you always want to keep any error on it's minimum. I've created a lot of different NN archuitectures and I personally consider values lower than 0.1 as an acceptable result. But bear in mind, that these error values are not in a universal scale. Some NN archuitectures and it's optimizers can give you good results with values like 0.95.
@@MrTony2371 I'll have to try and see what kind of limits I can put on Tacotron next time I use it.
I'm in a HUGE challenge with UnpicklingError "
any advice to do it with a different language? f.e. in turkish there are letters like ö/ü/ı thats differ from english. Do we also deep into phonems? thanks btw for great tutorial.
at the last step, there is a parameter named 'english_cleaners', I changed this to... uhm, I don't remember, gonna have to look it up and come back to you later; but it worked for spanish.
@@lobato87 thankss i was find it out. Tried in turkish also works. translateration_cleaners.
@@ilhanmertalan640 exactly! transliteration_cleaners will interpret UTF-8 characters. Cheers!
Also change the symbols.py with your language alphabets
@@lobato87hi I wanted to try with Thai language. What do I need to change to ?
Does the training run on your machine or the server?
Great tutorial, thank you! I get an error when running the training model: UnpicklingError: invalid load key, '
I got it too
Same here
@@tim3780 I think it stopped working
same here. Anyone has a solution?
Yep, same to me.
It gives me this error
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/tacotron2/data_utils.py", line 61, in __getitem__
return self.get_mel_text_pair(self.audiopaths_and_text[index])
File "/content/tacotron2/data_utils.py", line 34, in get_mel_text_pair
mel = self.get_mel(audiopath)
File "/content/tacotron2/data_utils.py", line 49, in get_mel
melspec = torch.from_numpy(np.load(filename))
File "/usr/local/lib/python3.7/dist-packages/numpy/lib/npyio.py", line 444, in load
raise ValueError("Cannot load file containing pickled data "
ValueError: Cannot load file containing pickled data when allow_pickle=False
Bump. Does it work with other languages? The Algo is the same. Just need info on models for different languages.
i have a pretrained model which i downloaded but the voice comes out like hes trying to gasp for breath and noticed the right image shows up more as lines instead of a diagonal shape
help! I've tried downloading the waveglow model from the NVIDIA catolog since the one listed in the code says I have permission denied. but in the # load waveglow part, im getting:
waveglow = torch.load(waveglow_pretrained_model)['model']
waveglow.cuda()eval().half()
for k in waveglow.convinv:
k.float()
KeyError: "model'
I've been getting the same exact thing
Same here, someone pls HELP
"1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF"
use this for waveglow same as another model
@@Byoncnc HOLY CRAP TYSM
I downloaded this file for the waveglow and it worked. my results sound terrible though so i gotta go back to training it i think. drive.google.com/u/0/uc?id=1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF&export=download
Question: does it work with other languages too? Or does only with English?
When I get to the synthesis part, I get this when I load the Tacotron 2 model, even when I reset the factory runtime a few times, it still occurs:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
in ()
4 hparams.gate_threshold = 0.1 # Model must be 90% sure the clip is over before ending generation (the higher this number is, the more likely that the AI will keep generating until it reaches the Max Decoder Steps)
5 model = Tacotron2(hparams)
----> 6 model.load_state_dict(torch.load(tacotron2_pretrained_model)['state_dict'])
7 _ = model.cuda().eval().half()
1 frames
/usr/local/lib/python3.7/dist-packages/torch/serialization.py in __init__(self, name_or_buffer)
240 class _open_zipfile_reader(_opener):
241 def __init__(self, name_or_buffer) -> None:
--> 242 super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
243
244
RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory
Hi! THank you for your explations!
I'm facing a problem during trying to generate the mels:
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
Did you've been experimenting this problem too? Thank you.
That sounds like something may have been deleted from your file system somehow, I'd reset and try again.
the training is giving me an upickling error
UnpicklingError: invalid load key, '
I can't get anything other than K80 and I don't know how to change it. Even after the factory reset suggestion, the GPU is exactly the same. Would really suck if that's the only hurdle and it makes this unusable for me... Does anyone have any tips?
i have same problem
Same problem here too. It was working a couple months ago... can anyone help us out?
Every time i get to the training part im getting a error? im doing everything the same as you.
everything else passes when i click on it let it do its operation but when i get to the last bit of actually training model it fails straight away.
FP16 Run: False
Dynamic Loss Scaling: True
Distributed Run: False
cuDNN Enabled: True
cuDNN Benchmark: False
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1555 100 1555 0 0 6973 0 --:--:-- --:--:-- --:--:-- 7004
Warm starting model from checkpoint 'pretrained_model'
---------------------------------------------------------------------------
UnpicklingError Traceback (most recent call last)
in ()
5 print('cuDNN Benchmark:', hparams.cudnn_benchmark)
6 train(output_directory, log_directory, checkpoint_path,
----> 7 warm_start, n_gpus, rank, group_name, hparams, log_directory2)
3 frames
/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
918 "functionality.")
919
--> 920 magic_number = pickle_module.load(f, **pickle_load_args)
921 if magic_number != MAGIC_NUMBER:
922 raise RuntimeError("Invalid magic number; corrupt file?")
UnpicklingError: invalid load key, '
This does not work anymore, can you make another one if possible, thanks. I’m useless coding and correcting my mistakes lol, they’re no in depth tutorials on RUclips for beginners
I have an error in synthesis part: No such file or directory: 'waveglow.pt' . What should I do?
I have a question. after training, can i add more samples to old ones that they will train together?
I want to do this in Latin American Spanish. Is this notebook pretrained in English?
update: yes it can talk in other language, tested with about a minute of audio samples and it was able to talk back to me the phrases I gave to the training notebook
@@lobato87 So it's not pre-trained in english? i thought must be pre-trained with something, so it can infer the sounds of the letters you don't include in the training set.
@@IanPaulBrossard there's a parameter near the end called 'english_cleaners' that must be changed and also the symbols.py that must be changed to the language you need
@@lobato87thank you! Then i guess it would be better if I just ignore all the grammar rules and use á é í ó ú in every stressed syllable. Also, I'll try to use use a separate set of exclamation sentences, so I can choose regular speech and explamations as we please (for example, use moderl 2 for every word between ¡ and ! and use model 1 for everything else!
@@IanPaulBrossard those are excellent things to try!
I finally got my ai thing to work! I spent like 5 hours on it, and it's finally finished! Thanks for the help! ❤️
Edit: I was thinking of doing Homestar Runner characters! XD
No way!!! Another homestar runner fan? And this was commented RECENTLY XD ! I was actually watching this so i could do it for Coach Z's voice!! Hahaha
@@tuvstarr5157 That’s the one I was gonna do! XD
I have the same problem,
the link that needs to be 'anyone with link' is the trained model, not waveglow - so why is it asking me to change permission on waveglow? isn't that coming from a different google drive?
@@tuvstarr5157 Oh my gosh, someone made a Homestar Runner on Uberduck Ai the same day you and me made our comments..
Hey can you help with this, after doing everything i am not getting audio in the end, do you know how to resolve this? Also i am not able to use T4 gpu coz no matter how many times i do factory reset it always show k80
As many other comments state it, I reckon there's an error with tenserflow "Tensorflow 1 is deprecated, and support will be removed on August 1, 2022" Any idea how to update that?
Yepp I also got "Tensorflow 1 is unsupported in Colab." So this tutorial sadly doesn't work anymore!
@@Agnostic_Asi just edit the code to Tensorflow 2 as easy as that :)
@@moanxion9102 thank you
@@moanxion9102 I have a problem, under "A bunch of boring code and stuff" I get this error:
AttributeError Traceback (most recent call last)
in
375
376 # ---- DEFAULT PARAMETERS DEFINED HERE ----
--> 377 hparams = create_hparams()
378 model_filename = 'current_model'
379 hparams.training_files = "filelists/clipper_train_filelist.txt"
/content/tacotron2/hparams.py in create_hparams(hparams_string, verbose)
6 """Create model hyperparameters. Parse nondefault from given string."""
7
----> 8 hparams = tf.contrib.training.HParams(
9 ################################
10 # Experiment Parameters #
AttributeError: module 'tensorflow' has no attribute 'contrib'
for some reason it gives me "FileNotFoundError: [Errno 2] No such file or directory: '/wavs/1.npy'" error on check dataset phase, any ideas?
you have to make sure that you dont have any blank lines in your txt file
@@VinxGD I don't, what's wrong then?
@@Miuzi Hm, im sorry idk what else could be wrong. I had the same problem but it worked for me.
i had the same issue and fixed it with setting hz to 22050 with audacity
then i t worked for me
@@BigDraco-So it was 22050 so it doesnt work for me
Very good explanation 🤩 Do you know if there are models for sale? I mean models that are created by other people, that could be used in the Synthesis Notebook?
Is there no other way to get access to the good gpus?
no matter how much i try, it keeps giving me a k80
Same
Dude, have you solved this problem?
@@trickster444 I did get k80 but I moved on and it worked out well
@@Cmanflip Okay, let's break some rules. Thanks.
@@trickster444 Yw
I have a question: the voices I’m using don’t have a lot of sources to draw from-I think it’s only about 3-4 minutes worth of original [usable] dialogue. Would I be able to cheat the system by just copy-pasting the amount I have until I get about 10 minutes or so? Or does it all have to be original dialogue for the program to read it?
Very cool!! Have you got a reclist you'd reccomend for making your own library?
if I have 100 wavs, what do you recon is a good batch size and epoch number? i tried 30 and 500 and many other ways and its always OOMing?
@VillaGG alright ill try again thanks
yup, still getting OOM, the training process starts but only does 1 cycle, then gives me an OOM error on the 2nd cycle, no idea why.
@VillaGG yea for sure
@VillaGG its working right now with 50 samples, so maybe the other 50 had some issues, so ill try what you recommended, thanks a lot!
@VillaGG "let it train for 10 hours", so if it reaches ≤0.15 on the training part that doesn't matter? I just let it go for 10 hours? because its reaching 0.15 pretty fast.
also my previous issue was that I exported the previous batch of samples from adobe premiere with different quality settings than the new batch, it works fine now :)
It keeps saying Pemission Denied when I try, it used to work well but now its not
Hi, Google Collab is saying that the tacotron2 folder doesn't exist?
Heres the error code: FileNotFoundError: [Errno 2] No such file or directory: 'tacotron2'
For those getting picke error UnpicklingError: invalid load key, '
Hi could you solve this problem it seems all the synthesis notebooks are not working all of a sudden "[Errno 2] No such file or directory: 'merged.dict.txt'" The Gdown Id is not working it's saying this
"Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:"
any solution?
@@Notoriousseditz You need the pronunciation dictionary. The other one is broken. Get it by:
# Setup Pronounciation Dictionary
!gdown --id '1E12g_sREdcH5vuZb44EZYX8JjGWQ9rRp'
Use these updated colabs:
Training: gist.github.com/Oct4Pie/61781515d3e97f70b52dfef0648d71e7
Synthesis: gist.github.com/Oct4Pie/4e56fa3d5d2c5a4313bdf664597eefc2
If there are any issues, simply comment under the gist.
Thank you so much! This was such a huge help! You've got a new subscriber :)
@@tardigrade184 Hello bro, in this your training notebook, for training notebook gets brom pretrained model, not from my own audio files and texts.. What i need to train from my own audios ? thanks
Decent result for 30 wav files 🤯
I'm having a problem
In the Tacotron Synthesis notebook: the panel just after Initialize Tacotron and Waveglow
I get this error
model.load_state_dict(torch.load(tacotron2_pretrained_model)['state_dict'])
RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory
I have reset the runtime and still doesn't work
Same thing for me and this is a game changer when someone finally decides to drop knowledge
@@brat-b8h I found the solution. This error means that your trained model file is corrupted. You have to train again your model and you should get a file that weighs around 323Mb.
@@brat-b8h that means its corrupt and you have to train again
are we able to get a 2022 version? i keep getting tensorflow 2 errors or something. trying to experiment i had this working like 2 years ago now it doesnt work for me. please help xD
What if I keep on going on the start training identifiying and generating the sounds of 500 epoch before it’s finished?
I run out of RAM 12.68GB as soon as I try to make it for a full paragraph
4:36 What happens if the model isn't stopped?
I keep getting UnpicklingError: invalid load key, '
why can't I Donwload Tacotron
I jumped up to pro and now have a good GPU everytime. The only model I can get to generate sentences that sound like a 'voice' is the hal-9000. All the others produce a weird inter-dimensional singing sound but unintelligible.
Is there a Jupyter version of the training and synthesize project files?
Do anyone know if there is instead of the Google Colab version?
When I get to the "Create MEL spectrograms" part, I run it and it gives me this error:
"RuntimeError: shape '[1, 1, 94241]' is invalid for input of size 188482"
With the traceback:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
in ()
1 if generate_mels:
----> 2 create_mels()
3 frames
/content/tacotron2/stft.py in transform(self, input_data)
82
83 # similar to librosa, reflect-pad the input
---> 84 input_data = input_data.view(num_batches, 1, num_samples)
85 input_data = F.pad(
86 input_data.unsqueeze(1),
I fixed this! But I won't be a dickhead who doesn't tell you how.
The problem was with the exported audio files. I did split them, however I had 1 mono and 1 stereo. I ended up accidently exported the stereo one. To be extra cautious, I just deleted the stereo track in Audacity so it only exported the Mono track.
@@ibuddywolfie7882 how do i check if any is mono or stereo?
6:33 I have a problem. It keeps repeating "Maybe you need to change permission over 'Anyone with the link'?"
I followed every steps so I don't know what to do now
Update: I followed the instructions written by Antonio Origlia here in the comments. Now everything works fine again
Per caso sei riuscito a farlo funzionare bene in italiano? Io ho provato con circa 300 clip presi da un audiolibro peró c'è sempre quel tono robotico
@@robertovalentino70 Bene in italiano purtroppo no. Penso ci sia ancora qualcos'altro da cambiare. Il problema è che non ci capisco molto ahahah è solo una roba che mi piace fare a tempo perso. Se scopro qualcosa al limite ti aggiorno
@@PARAAA Se ti serve una mano per qualcosa fammi sapere. Pur essendo un programmatore questo è un'argomento molto complesso. Da quel che ho capito io comunque il software è preaddestrato con un modello inglese quindi mancano completamente i caratteri accentati e le relative pronuncie, ma comunque dovrebbe funzionare anche per le altre lingue. Però in questo caso secondo me è necessario fornirgli molti più audio in modo che impara il maggior numero di sillabe pronunciate. Per esempio io ho notato che molte parole le pronuncia in modo corretto anche se con un tono robotico, mentre altre le pronuncia come un inglese le pronuncerebbe.
Hi when i'm using the synthesis notebook i'm getting this error "[Errno 2] No such file or directory: 'merged.dict.txt'" The Gdown Id is not working it's saying this
"Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:"
any solution?
Hola tengo un problema y es cuando estoy en The actual synthesis part pongo el texto que quiero que se diga y al reproducirlo no se escucha nada alguna solucion?
Hello I have a problem and it is when I am in The current synthesis part I put the text that I want to be said and when reproducing it I do not hear anything, any solution?
Hi are you still having this problem? I think they might be a problem with the waveglow model
Hey I've got a small question. Would this be possible to do in other languages?
My textfile and wavs are all correct but im getting this please help :(
Generating Mels
21%
104/500 [00:02
it says CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.90 GiB total capacity; 14.91 GiB already allocated; 29.75 MiB free; 14.99 GiB reserved in total by PyTorch) :(
same with me, even if i set batch size and epoch to 1 i always get out of memory
try to low the batch size
Maybe it's late but I found what causes this problem, some of my audio files were apparently too long, I decided to use the audio files that were 3-10 seconds long and the problem was solved
@@выпускмысль you fixed my problem, thank you
@@aidanh4228
I'm happy to hear that(:
For some reason, when I run the cells in both notebooks, it doesn't seem to work. I do have my 30 audio files and transcript, as I have already uploaded them, but I don't have those other files that you have. Could you solve this problem?
When I try to set the parameters it just loads instantly no matter what i set the settings as though it's not doing anything. Then in data check there is an error for every single file. (57 of them). I've completely reset everything 5 times and even changed accounts.
How do I resume training from where I left off?
Just put in the same model name
@@BigDraco-So Thanka, but after doing that, is the epoch still supposed to start from one? or continue?
hey is there any way for me to set this up on my local pc to train.
Please help!! I did everything right but when i do the check data module it gives me this error "[WARNING] wavs/1.wav in filelist while expecting .npy ." for every wav in the list.
To those who liked this comment and also have this problem, go to the text file and replace all the .wav files with .npy . idk why you have to do this now but thats how you get it to work
I got this error and I found out I had a misspelling error in one of the wav files
@@lobato87 what did you misspell because I am having the same issue
@@massiveeyebrows4482 to your original text file?
@@addit6212 yeah
The words were generated like a muffled cow but the tone of voice pretty acurate
Does both procedures still work? I have the use of colab pro, but when I use gtp- neo or open ai open google colab crashes.
I keep getting an unpickling error when I start training. It fails after about 15 seconds. I read that this is an issue with the newest iteration of Torch, so is there a way to circumvent the issue?
there will be a pretrained_model file, its size is around 101 MB check if it is of that size , if it is around 1 KB then you have manually download and use it
@@sabah8312 where i can download the pretrained_model file?
@@sabah8312 every time when i'm uploading the 101MB pretrained_model the size goes to 1kb
@@WeLoveSpigotApi drive.google.com/file/d/1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF/view
Is there a way to run this in real-time? Such as with an offline home assistant? Currently using python and have it running great but want to change to pyttsx3 voice to a custom one. Could I do such a thing with this?
Ok, I followed both part one and two precisely but utilized 81 samples of my own personal project (rushed just to test) but I got absolutely incredible results! Definitely needs some tweaking but your walkthrough of this is absolutely fantastic. Thank you so much! If you know how I could use this now as a real-time tts voice with pyttsx3 like mentioned in my last comment, that would so dang helpful. Thank you so much once again!
@Brandon Breault were you able to come up with a real-time solution? I am looking at the same need with an animatronics project.
@@williamjustus2654
Maybe this could help you
github.com/CorentinJ/Real-Time-Voice-Cloning
Hi did u found out the solution for it
hello, can I make it work with a new language other than mainstream international language? like a dialect?
Not sure what's wrong, but I got "UnpicklingError: invalid load key".
Find "train" function in that long chunk of code and replace:
download_from_google_drive("1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA","pretrained_model")
model = warm_start_model("pretrained_model", model, hparams.ignore_layers)
with:
!pip install gdown
import gdown
gdown.download('drive.google.com/u/0/uc?export=download&confirm=kZ1A&id=1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA', "pretrained_model", quiet=False);
model = warm_start_model("pretrained_model", model, hparams.ignore_layers)
I have 54 wavs do you think it is enough?
If I have 20 wavs, would would be a good epoch number?
On the put file name here, the set parameters, the check data, AND the training, I get the error: NameError: name 'hparams' is not defined, on the bunch of boring code I get the error: AttributeError: module 'tensorflow' has no attribute 'contrib', on the MEL i get the error: NameError: name 'generate_mels' is not defined. Please help!
I get warnings like this when I check files. Does it really matter?
"[WARNING] wavs/IOMET003.wav in filelist while expecting .npy ."
Even restarting, I'm now plagued by further errors. Won't generate mels, file lists come up as missing... Seems I am destined to not get this to work at all.
They have to be MONO files in order to work. It's very sensitive to that. I ran into the same problem, and after some research found that using stereo causes this issue, as well as not using the right bitrate format.
I had this after I fixed some errors in the lists.txt. Fix the initial errors in your lists.txt, upload the new version. Delete the wavs folder (or just rename to something else and create a new empty wavs folder), re-import your wavs, and then run everything after downloading tacotron2 again. For me this fixed the problem.
Hi I have a unusual error and I been using tacotron2 about a year on colab and only been experiencing this lately, when I’m training a model, i have colab pro, it will eventually run out saying I been disconnected from the runtime, so I start everything up again, upload wavs and filelist and same model name so it resumes from the last checkpoint, well it loads to the Epoch I was at but it doesn’t warm start model from checkpoint ‘pretrained’ model, it says 0% and was wondering if you know how to fix this error? I would appreciate it thanks
Is it possible to continue the training later, if so how? Do I need to keep the PC on or the notebook? What happens if I do shutdown the PC or notebook?
I want to know this too.
super late but yes you'll have to keep your pc on, and in fact it'll ask you to do a captcha every few hours (and force-disconnect after 6-12 hours)
it does however save its progress to google drive every few minutes, if you want to resume training simply restart the notebook with the same model name and folder
@@bobajerry3397 yeah you do unfortunately
though for the dataset part, you could make a directory in your google drive (let's say called `dataset`) with a `wavs` directory and a `filelists` directory (which would have the `list.txt`)
then in your own copy of the training notebook (so it saves) double-click step 3 to access the code and put in the following line of code:
*!rsync -aP /content/drive/MyDrive/dataset/ /content/tacotron2/*
(i _think_ that's the right code)
that way you wouldn't have to reupload the dataset every time (because it'd copy from google drive), you could just run all the cells in order and only have to worry about google drive mounting
I'm retraining with more audio files, and when i try and generate spectrograms it gives me:
"RuntimeError: shape '[1, 1, 51868]' is invalid for input of size 103736"
Make sure your audio file is mono/one channel.
@@yoliyanda2860 thank you it worked
@@yoliyanda2860 what do u mean exactly?
does this work with other languages?
The synthesis notebook says that it is broken and to use the updated link. However, when I use the updated one and paste my journal into the textfield and hit the play button, it gives me several errors:
"NameError: name 'initilized' is not defined"
"RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory"
What should I do to solve these errors? Any help or feedback from anyone would be greatly appreciated.
same issue :( you ever figure it out?
@@nilryth I figured it out. You have to change the batch size to under 10 unless you're using a thousand or so clips. That way it doesn't overtrain it and kill it. That's what worked for me at least
Very good tutorial, you deserved a Like!
uhhhhhhhhhh...
When I tried to train the model, it says "Unpicking Error: invalid load key, '
The synthesis notebook doesn't work anymore, it failed to load the warm start model from the LJSpeech pretrained model
---------------------------------------------------------------------------
UnpicklingError Traceback (most recent call last)
in ()
5 print('cuDNN Benchmark:', hparams.cudnn_benchmark)
6 train(output_directory, log_directory, checkpoint_path,
----> 7 warm_start, n_gpus, rank, group_name, hparams, log_directory2)
3 frames
in train(output_directory, log_directory, checkpoint_path, warm_start, n_gpus, rank, group_name, hparams, log_directory2)
275 os.path.isfile("pretrained_model")
276 download_from_google_drive("1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA","pretrained_model")
--> 277 model = warm_start_model("pretrained_model", model, hparams.ignore_layers)
278 # download LJSpeech pretrained model if no checkpoint already exists
279
in warm_start_model(checkpoint_path, model, ignore_layers)
133 assert os.path.isfile(checkpoint_path)
134 print("Warm starting model from checkpoint '{}'".format(checkpoint_path))
--> 135 checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')
136 model_dict = checkpoint_dict['state_dict']
137 if len(ignore_layers) > 0:
/usr/local/lib/python3.7/dist-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
606 return torch.jit.load(opened_file)
607 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
--> 608 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
609
610
/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
775 "functionality.")
776
--> 777 magic_number = pickle_module.load(f, **pickle_load_args)
778 if magic_number != MAGIC_NUMBER:
779 raise RuntimeError("Invalid magic number; corrupt file?")
UnpicklingError: invalid load key, '