Thank you Valerio for the video and impressive theoretical discussion. Iam not musician but I love (and daily used) advanced math and programming. Following series is brilliant and applicable to other domains. Recommended paper looks to be awesome and has to be read. Wish all of you nice day and time.
hi. I was trying to reach you via LinkedIn, but could not. Can you please elaborate a bit about audio data balancing technique. I have a bunch of data with a skewed on a specific class. So I need to balance before application of machine learning model.
Hey Valerio. Thank you for releasing a very hot and challenging (so far for me) topic on audio processing, and when you will post the following parts of this video? I am looking forward ...
I always appreciate your excellent videos and your hard work, Valerio. I have a question for you. I want to train my model using the STFT matrix (not the spectrogram) and apply some augmentation techniques. Should I augment the audio files (.wav) first and then convert them to STFT, or should I convert them to STFT first and then augment them?
So if you were going to use time stretching for spectrograms you'd stretch some samples and some samples you'd leave alone. But wouldn't you get a dataset with mismatched shapes? Also, can you combine let's say time stretching and noise addition or pitch augmentation or is it recommended to add only one augmentation type to a spectrogram? So basically make more copies and add 1 type of augmentation to each?
Ciao Valerio, ho una domanda. Se ho già acquisito un suono, campionato a una certa frequenza fs e avendo acquisito un certo numero di punti N, come funziona il time stretching ?
Thank you for the great explanations! I have a question: For the spectrogram-based techniques, is it feasible to apply them to the waveform-based model, e.g., wav2vec2, after manipulations on the spectrogram? If yes, how can it be brought back to the time domain? With the absence of phase information, I don't know how to do it, are there techniques for this?
I'm currently writing a book called "AI Music Revolution". It's not technical, as it is for the general public. It should be launched by my publishing house approximately in a year or a little bit more. Stay tuned!
Amazing video!, Thank you, Valerio! Right now I'm in a project where I use a GAN to transfer Timbre between speakers, and this theory is pretty useful, which augmentations would you recommend trying in my case? I was thinking of applying time stretching and time-shifting to leave the speaker's identity the same.
You are Great. I have a question. Almost 60 years old. Not familiar with math, but I am interested in sound and I have some ideas in it. How long do you think it can take me to get familiar with linear algebra and work in this field? Best regards.
Thank you for the great question. If you are consistent, it'll probably take you 1 year to build all the math skills to work in the field (e.g., linear algebra, calculus, probability, Fourier Transform...). Of course, you don't need to master all the mathematical concepts behind AI Music to start playing around with it. You can gradually acquire the theoretical knowledge while you experiment with cool applications and coding. I believe that's the most compelling learning path. That's what I use to teach AI in this channel.
@@ValerioVelardoTheSoundofAI I am so thrilled for your comprehensive response. This will encourage me more. I don't know whether the idea in the following paragraph has come to accomplishment or not, but I say it to you. One of my idea is that if we have a long audio file, and we want to see if there is a certain word or sentence, we can type the desired word or sentence in the software, and it can find the location of that word said for example by the speaker. I hope it doesn't seem silly. Kind regards.
These videos are very well made, thank you.
I am learning so much from you, I am gonna learn and write a blog through these video. Thanks Valerio for such concise and on point videos
Thank you Valerio for the video and impressive theoretical discussion. Iam not musician but I love (and daily used) advanced math and programming. Following series is brilliant and applicable to other domains. Recommended paper looks to be awesome and has to be read. Wish all of you nice day and time.
Thank you Markus!
Thanks for sharing, its really helpful
hi. I was trying to reach you via LinkedIn, but could not. Can you please elaborate a bit about audio data balancing technique. I have a bunch of data with a skewed on a specific class. So I need to balance before application of machine learning model.
Hey Valerio. Thank you for releasing a very hot and challenging (so far for me) topic on audio processing, and when you will post the following parts of this video? I am looking forward ...
Today's the next installment. Stay tuned!
I always appreciate your excellent videos and your hard work, Valerio. I have a question for you. I want to train my model using the STFT matrix (not the spectrogram) and apply some augmentation techniques. Should I augment the audio files (.wav) first and then convert them to STFT, or should I convert them to STFT first and then augment them?
Both options are possible. I usually prefer doing augmentation on the STFT. In my experience, it tends to work better for most use cases.
So if you were going to use time stretching for spectrograms you'd stretch some samples and some samples you'd leave alone. But wouldn't you get a dataset with mismatched shapes? Also, can you combine let's say time stretching and noise addition or pitch augmentation or is it recommended to add only one augmentation type to a spectrogram? So basically make more copies and add 1 type of augmentation to each?
Ciao Valerio, ho una domanda. Se ho già acquisito un suono, campionato a una certa frequenza fs e avendo acquisito un certo numero di punti N, come funziona il time stretching ?
which techniques is good for music boundary detection ?
Are you going to apply this on ASR system? Really looking forward to seeing this kind of tutorials
Not for now. These techniques can be used on any audio use case.
@@ValerioVelardoTheSoundofAI Thanks for your quick reply. I found out that ASR quite complicated and hard to understand.
@@yangwang9688 it can be, depending on the resource you learn it from :)
You are great man
Thank you for the great explanations!
I have a question:
For the spectrogram-based techniques, is it feasible to apply them to the waveform-based model, e.g., wav2vec2, after manipulations on the spectrogram? If yes, how can it be brought back to the time domain? With the absence of phase information, I don't know how to do it, are there techniques for this?
Recontructing phase it's highly complicated and definitely not solved. A simple approach is to use the original phase.
Just a random comment but at 8:16 your voice has augmentated with octaves with your voices, like Si Si LA Sol (What are like we-)
Sir, I have followed your channel, and the contents are amazing. I was wondering, whether you have a book or not?
I'm currently writing a book called "AI Music Revolution". It's not technical, as it is for the general public. It should be launched by my publishing house approximately in a year or a little bit more. Stay tuned!
Amazing video!, Thank you, Valerio!
Right now I'm in a project where I use a GAN to transfer Timbre between speakers, and this theory is pretty useful, which augmentations would you recommend trying in my case?
I was thinking of applying time stretching and time-shifting to leave the speaker's identity the same.
I agree on time stretching / shifting. I would suggest pitch scaling.
Can you please share the code for data augmentation with GAN, i hope you would have done with.
You are Great.
I have a question.
Almost 60 years old.
Not familiar with math, but I am interested in sound and I have some ideas in it.
How long do you think it can take me to get familiar with linear algebra and work in this field?
Best regards.
Thank you for the great question. If you are consistent, it'll probably take you 1 year to build all the math skills to work in the field (e.g., linear algebra, calculus, probability, Fourier Transform...). Of course, you don't need to master all the mathematical concepts behind AI Music to start playing around with it. You can gradually acquire the theoretical knowledge while you experiment with cool applications and coding. I believe that's the most compelling learning path. That's what I use to teach AI in this channel.
@@ValerioVelardoTheSoundofAI I am so thrilled for your comprehensive response.
This will encourage me more.
I don't know whether the idea in the following paragraph has come to accomplishment or not, but I say it to you.
One of my idea is that if we have a long audio file, and we want to see if there is a certain word or sentence, we can type the desired word or sentence in the software, and it can find the location of that word said for example by the speaker.
I hope it doesn't seem silly.
Kind regards.