Could you please conduct a video series on reducing background noise of data, then audio data preprocessing, feature extraxtion of audio and training a model from audio. Please!
I already have a few series that one way or another address audio feature extraction and model training. I'm planning for the future videos on Audio ML in Production touching the whole pipeline from an MLOps perspective. Stay tuned!
Thanks a lot Valerio, This channel has taught me a lot, May you give us an example of application of augmentation especially using audiomentation on a whole dataset in Keras. Thanks a lot once again
I was wondering the same thing here. What is best way to implement audio augumentation in Keras. Is it possible to integrate and create a flow, similar to what can be done with the class ImageDataGenerator (native to Keras) or do we have to implement audio augumentation on some files, store them and then finally use the new dataset as a regular dataset in Keras?
1. Is it possible to hide a photo in an audio file? 2. Convert audio to Spectrogram picture and extract features. Can I convert a spectrogram photo to an audio file?
Thank you for your video. I have a question regarding to the audio augmentation. In my project, the test speaker is not in the train data (2000 samples), so my model performers pretty bad on test set,only 50% accuracy. I try to use the pitch shift (shift 2*2*(np.random.uniform()) )on the training data, but still doesn't works well. How should I use audio augmentation for this dataset?
Dear Valerio, thanks again for a great tutorial, just one thing or problem with the pitchshifting -8/8 up is that the whole spectrum gets shiftet, but different pitches from natural sounds have formants that normally stay (therefore the pitchshift sounds upwards like mickeymouse and downwards like darth vader). In spectrogramms natural pitchshifts you can in fact perfectly see these formants stable in some of the bands (sometimes with extra relation inbetween) while the rest of non formant spectrum is shifted. Therefor i recommend a shifting algorithm with formant correction.F.e. when working on voices it makes sense to run with python batches on professional pitchshifting algorithms with formant correction. Just my 5 cts.
You're absolutely right Bruno. I would suggest using a professional pitch-shifting solution like Rubber Band. However, if the pitch shift is limited a less complex solution (e.g., librosa) would work just fine for DL audio applications.
Hello this video series is very informative and amazing like the others. Thanks a lot. And I have a question. If I want to apply data augmentation to the entire dataset, not a single audio file, (and if I want to save this augmented dataset) would it be sufficient to prefer a for loop?
your an awsome teacher. Thank you Sir!
Thanks for your selflessly made tutorials, it's really fantastic!
You're welcome!
Very useful! Thanks for this great playlist Valerio 😍
This channel is GOLD. Thanks for the content man :0
You're welcome!
Thanks a lot, a very informative video, great job
Great contents. Please consider making more voice related topics such as speaker recognition.
Could you please conduct a video series on reducing background noise of data, then audio data preprocessing, feature extraxtion of audio and training a model from audio. Please!
I already have a few series that one way or another address audio feature extraction and model training.
I'm planning for the future videos on Audio ML in Production touching the whole pipeline from an MLOps perspective.
Stay tuned!
Thanks a lot Valerio, This channel has taught me a lot, May you give us an example of application of augmentation especially using audiomentation on a whole dataset in Keras. Thanks a lot once again
I was wondering the same thing here. What is best way to implement audio augumentation in Keras. Is it possible to integrate and create a flow, similar to what can be done with the class ImageDataGenerator (native to Keras) or do we have to implement audio augumentation on some files, store them and then finally use the new dataset as a regular dataset in Keras?
1. Is it possible to hide a photo in an audio file?
2. Convert audio to Spectrogram picture and extract features. Can I convert a spectrogram photo to an audio file?
Thank you for your video. I have a question regarding to the audio augmentation. In my project, the test speaker is not in the train data (2000 samples), so my model performers pretty bad on test set,only 50% accuracy. I try to use the pitch shift (shift 2*2*(np.random.uniform()) )on the training data, but still doesn't works well. How should I use audio augmentation for this dataset?
Class, as always
Dear Valerio, thanks again for a great tutorial, just one thing or problem with the pitchshifting -8/8 up is that the whole spectrum gets shiftet, but different pitches from natural sounds have formants that normally stay (therefore the pitchshift sounds upwards like mickeymouse and downwards like darth vader). In spectrogramms natural pitchshifts you can in fact perfectly see these formants stable in some of the bands (sometimes with extra relation inbetween) while the rest of non formant spectrum is shifted. Therefor i recommend a shifting algorithm with formant correction.F.e. when working on voices it makes sense to run with python batches on professional pitchshifting algorithms with formant correction. Just my 5 cts.
You're absolutely right Bruno. I would suggest using a professional pitch-shifting solution like Rubber Band. However, if the pitch shift is limited a less complex solution (e.g., librosa) would work just fine for DL audio applications.
Hello this video series is very informative and amazing like the others. Thanks a lot. And I have a question.
If I want to apply data augmentation to the entire dataset, not a single audio file,
(and if I want to save this augmented dataset) would it be sufficient to prefer a for loop?
How to use it in pytorch dataset and make it on-the-fly instead of offline?
You should look into torch-audiomentations which specifically addresses that need ;)
@@ValerioVelardoTheSoundofAIis it possbile to do it for single item instead of each batch? bc the output of dataset is spectrogram