Thank you watching this video. I appreciate your time being here. Subscribe the channel to get more high quality videos soon! @DataScienceGarage The best place to learn Data Science with the best in the industry - Turing College. READ MORE HERE: turingcollege.org/DataScienceGarage - see you there!
At 15:01 executing the code with function "Seq2SeqTrainingArguments" is now throwing error as :- ImportError: Using the `Trainer` with `PyTorch` requires `accelerate>=0.19.0`: Please run `pip install transformers[torch]` or `pip install accelerate -U` Please help. P.S. I am using google colab.
Hi! Good question. You can prepare the dataset by yourself, or you can download it from somewhere. For this tutorial, I used it from huggingface.co/datasets/mozilla-foundation/common_voice_11_0 , where you can choose the language you want from the list. And then you can define the url for that dataset in your Python code (6:23).
@@DataScienceGarage hi, thank you for the great video, but i am having this error while it is doing eval on training step 10/50 because my eval step is 10 and max step is 50: RuntimeError: Given groups=1, weight of size [768, 80, 3], expected input[8, 70, 3000] to have 80 channels, but got 70 channels instead
Have you tried fine-tuning on large-v3 model? I tried on large-v3 model for not-English such as Chinese but the fine-tuned model does not transcribe Chinese, it auto-translate to English though I specified transcribing Chinese on actual Chinese audio files.
Why i have this error when i try to fine tuning with my own data: RuntimeError: Given groups=1, weight of size [768, 80, 3], expected input[8, 70, 3000] to have 80 channels, but got 70 channels instead
I tried running it locally. It runs on a medium db size. But I only have 8gb gpu. But the medium model requires 12gb for training. What can be changed to run on 8GB.
This is great, do you have any advice on training it on mixed language audio? Alot of our meetings are held with at least two spoken languages, would I have to create my own dataset for that?
Can you provide resources or the code to show how you can create and upload a custom data set that you created yourself instead of common voice dataset. Thank you.
This was a really great video, as short as possible without losing any of the important content. Can you please give some directions, for I can not use Mozilla datasets for my language (not yet built) so i would like to use a dataset i found in other places, locally downloaded to my computer. Every row contains a pair of sentences and names of correspondent .wav files. The wav files need to be transformed into 16000Hz sample-rate and turned into a spectrogram format, but i am not familiar with the whole datasets package environments (mostly using pandas with numpy). I don't expect you to guide me through the whole process, i just want to know where my code would deviate from yours. I can change sample rate myself, i can probably find a package that will create a spectrogram from those files and i can create a pandas dataset /csv file in which every row will be a pair of audio array (spectrogram) and the tokenized sentence it corresponds to, could i use the same DataCollectorSpeechSeq2SeqwithPadding class on that format and just continue from there?
Thanks for this video, super helpful! If I want to train the whisper model to transcribe Arabic audio into Arabic text, and also translate into english is this possible within 1 trained model by feeding it such a dataset? Or will I have to train 2 separate models?
Hello! For now, I don't have the full notebook after the training once I did not waited 6 hours. What I will do, I will transform all the code to the Google Cloud VM with GPU and see how it will go there. I will update thst on this channel.
Thank you watching this video. I appreciate your time being here.
Subscribe the channel to get more high quality videos soon!
@DataScienceGarage
The best place to learn Data Science with the best in the industry - Turing College.
READ MORE HERE: turingcollege.org/DataScienceGarage
- see you there!
Very nicely presented! Have subscribed to your channel and am eager to explore and learn!
Thanks for such feedback, really appreciate! :)
At 15:01 executing the code with function "Seq2SeqTrainingArguments" is now throwing error as :-
ImportError: Using the `Trainer` with `PyTorch` requires `accelerate>=0.19.0`: Please run `pip install transformers[torch]` or `pip install accelerate -U`
Please help. P.S. I am using google colab.
I have same problem did u remember did u fix this ? @DataScienceGarage
yeah, can u help me? i got the same problems
Hi ! I had the same issue , you should exucute this "`pip install accelerate -U" after that you need to restart the session again
thank u for solution 😃@@football-uj4yg
@@football-uj4yg i got error when step after training, i can't run the model (push_to_hub=False), can u help me :(
Hello, thank you for the great video! Did the Lithuanian language training dataset already exist or did you insert your dataset?
Hi! Good question. You can prepare the dataset by yourself, or you can download it from somewhere. For this tutorial, I used it from huggingface.co/datasets/mozilla-foundation/common_voice_11_0 , where you can choose the language you want from the list. And then you can define the url for that dataset in your Python code (6:23).
@@DataScienceGarage hi, thank you for the great video, but i am having this error while it is doing eval on training step 10/50 because my eval step is 10 and max step is 50:
RuntimeError: Given groups=1, weight of size [768, 80, 3], expected input[8, 70, 3000] to have 80 channels, but got 70 channels instead
Thanks!
What is the best WER obtained in the process?
Hello, I further trained the whisper model and received the file. And now I want to get the final ggml file format. Please tell me how to do this
Can you provide the colab notebook or the source code in your video?
Have you tried fine-tuning on large-v3 model? I tried on large-v3 model for not-English such as Chinese but the fine-tuned model does not transcribe Chinese, it auto-translate to English though I specified transcribing Chinese on actual Chinese audio files.
Why i have this error when i try to fine tuning with my own data:
RuntimeError: Given groups=1, weight of size [768, 80, 3], expected input[8, 70, 3000] to have 80 channels, but got 70 channels instead
I tried running it locally. It runs on a medium db size. But I only have 8gb gpu. But the medium model requires 12gb for training. What can be changed to run on 8GB.
Thanks for sharing this tutorial. Would you know if this is replicable on an M2 GPU?
This is great, do you have any advice on training it on mixed language audio? Alot of our meetings are held with at least two spoken languages, would I have to create my own dataset for that?
did you find anything?
Can you provide resources or the code to show how you can create and upload a custom data set that you created yourself instead of common voice dataset. Thank you.
what about preprocessing the transcriptions, isn't it important, or it is handled with the whisper processor
This was a really great video, as short as possible without losing any of the important content. Can you please give some directions, for I can not use Mozilla datasets for my language (not yet built) so i would like to use a dataset i found in other places, locally downloaded to my computer. Every row contains a pair of sentences and names of correspondent .wav files.
The wav files need to be transformed into 16000Hz sample-rate and turned into a spectrogram format, but i am not familiar with the whole datasets package environments (mostly using pandas with numpy). I don't expect you to guide me through the whole process, i just want to know where my code would deviate from yours.
I can change sample rate myself, i can probably find a package that will create a spectrogram from those files and i can create a pandas dataset /csv file in which every row will be a pair of audio array (spectrogram) and the tokenized sentence it corresponds to, could i use the same DataCollectorSpeechSeq2SeqwithPadding class on that format and just continue from there?
wow, amazing! one question. how can we use the trainned model in whisper
Thanks for feedback! To use pre-trained model, that the idea for another video. I will keep that in mind.
Thanks for this video, super helpful! If I want to train the whisper model to transcribe Arabic audio into Arabic text, and also translate into english is this possible within 1 trained model by feeding it such a dataset? Or will I have to train 2 separate models?
did you find anything?
Hello, this video is very helpful. Can you please put the link of the full notebook file after training, testing and making the predictions?
Hello! For now, I don't have the full notebook after the training once I did not waited 6 hours. What I will do, I will transform all the code to the Google Cloud VM with GPU and see how it will go there. I will update thst on this channel.
@@DataScienceGarage Ok, please let me know when it's done. Thank you.
@@DataScienceGarage That would be really great since trying to type everything manually is pretty error prone
How do i add my own voice into the trained data set?
Did u get anything?