Audio Data Augmentation Techniques: The Theory

Valerio Velardo - The Sound of AI

Просмотров 11 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 16 янв 2025

Комментарии • 29

@odysy5179 Год назад ⁺²
These videos are very well made, thank you.
@CourseWare-xg2wq 10 месяцев назад
I am learning so much from you, I am gonna learn and write a blog through these video. Thanks Valerio for such concise and on point videos
@markusbuchholz3518 3 года назад ⁺³
Thank you Valerio for the video and impressive theoretical discussion. Iam not musician but I love (and daily used) advanced math and programming. Following series is brilliant and applicable to other domains. Recommended paper looks to be awesome and has to be read. Wish all of you nice day and time.
@ValerioVelardoTheSoundofAI 3 года назад
Thank you Markus!
@mehralmohit Год назад
Thanks for sharing, its really helpful
@mmhasan1234 Год назад
hi. I was trying to reach you via LinkedIn, but could not. Can you please elaborate a bit about audio data balancing technique. I have a bunch of data with a skewed on a specific class. So I need to balance before application of machine learning model.
@ashurasadullah3398 3 года назад
Hey Valerio. Thank you for releasing a very hot and challenging (so far for me) topic on audio processing, and when you will post the following parts of this video? I am looking forward ...
@ValerioVelardoTheSoundofAI 3 года назад
Today's the next installment. Stay tuned!
@chaimarebah115 Год назад
I always appreciate your excellent videos and your hard work, Valerio. I have a question for you. I want to train my model using the STFT matrix (not the spectrogram) and apply some augmentation techniques. Should I augment the audio files (.wav) first and then convert them to STFT, or should I convert them to STFT first and then augment them?
@ValerioVelardoTheSoundofAI Год назад ⁺¹
Both options are possible. I usually prefer doing augmentation on the STFT. In my experience, it tends to work better for most use cases.
@Rajivrocks-Ltd. Год назад
So if you were going to use time stretching for spectrograms you'd stretch some samples and some samples you'd leave alone. But wouldn't you get a dataset with mismatched shapes? Also, can you combine let's say time stretching and noise addition or pitch augmentation or is it recommended to add only one augmentation type to a spectrogram? So basically make more copies and add 1 type of augmentation to each?
@dariobasile2699 6 месяцев назад
Ciao Valerio, ho una domanda. Se ho già acquisito un suono, campionato a una certa frequenza fs e avendo acquisito un certo numero di punti N, come funziona il time stretching ?
@aneekaazmat6653 2 года назад
which techniques is good for music boundary detection ?
@yangwang9688 3 года назад
Are you going to apply this on ASR system? Really looking forward to seeing this kind of tutorials
@ValerioVelardoTheSoundofAI 3 года назад
Not for now. These techniques can be used on any audio use case.
@yangwang9688 3 года назад
@@ValerioVelardoTheSoundofAI Thanks for your quick reply. I found out that ASR quite complicated and hard to understand.
@ValerioVelardoTheSoundofAI 3 года назад
@@yangwang9688 it can be, depending on the resource you learn it from :)
@thenortheasterwizard16 Год назад
You are great man
@petehsu326 3 года назад
Thank you for the great explanations!
I have a question:
For the spectrogram-based techniques, is it feasible to apply them to the waveform-based model, e.g., wav2vec2, after manipulations on the spectrogram? If yes, how can it be brought back to the time domain? With the absence of phase information, I don't know how to do it, are there techniques for this?
@ValerioVelardoTheSoundofAI 3 года назад
Recontructing phase it's highly complicated and definitely not solved. A simple approach is to use the original phase.
@rarytempo6440 2 года назад
Just a random comment but at 8:16 your voice has augmentated with octaves with your voices, like Si Si LA Sol (What are like we-)
@firaolg 3 месяца назад
Sir, I have followed your channel, and the contents are amazing. I was wondering, whether you have a book or not?
@ValerioVelardoTheSoundofAI 3 месяца назад ⁺¹
I'm currently writing a book called "AI Music Revolution". It's not technical, as it is for the general public. It should be launched by my publishing house approximately in a year or a little bit more. Stay tuned!
@Sawaedo 3 года назад
Amazing video!, Thank you, Valerio!
Right now I'm in a project where I use a GAN to transfer Timbre between speakers, and this theory is pretty useful, which augmentations would you recommend trying in my case?
I was thinking of applying time stretching and time-shifting to leave the speaker's identity the same.
@ValerioVelardoTheSoundofAI 3 года назад ⁺¹
I agree on time stretching / shifting. I would suggest pitch scaling.
@yousafali3009 Год назад
Can you please share the code for data augmentation with GAN, i hope you would have done with.
@sxmmii_gxchaa774 3 года назад
You are Great.
I have a question.
Almost 60 years old.
Not familiar with math, but I am interested in sound and I have some ideas in it.
How long do you think it can take me to get familiar with linear algebra and work in this field?
Best regards.
@ValerioVelardoTheSoundofAI 3 года назад ⁺¹
Thank you for the great question. If you are consistent, it'll probably take you 1 year to build all the math skills to work in the field (e.g., linear algebra, calculus, probability, Fourier Transform...). Of course, you don't need to master all the mathematical concepts behind AI Music to start playing around with it. You can gradually acquire the theoretical knowledge while you experiment with cool applications and coding. I believe that's the most compelling learning path. That's what I use to teach AI in this channel.
@sxmmii_gxchaa774 3 года назад
@@ValerioVelardoTheSoundofAI I am so thrilled for your comprehensive response.
This will encourage me more.
I don't know whether the idea in the following paragraph has come to accomplishment or not, but I say it to you.
One of my idea is that if we have a long audio file, and we want to see if there is a certain word or sentence, we can type the desired word or sentence in the software, and it can find the location of that word said for example by the speaker.
I hope it doesn't seem silly.
Kind regards.

Следующие

Автовоспроизведение

How To Implement Audio Data Augmentation in Python