How to Extract Spectrograms from Audio with Python

Valerio Velardo - The Sound of AI

Просмотров 66 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 сен 2020
Learn how to extract spectrograms from an audio file with Python and Librosa using the Short-Time Fourier Transform. Learn different types of spectrograms and compare the spectrograms of music in different genres.
Code:
github.com/musikalkemist/Audi...
Join The Sound Of AI Slack community:
valeriovelardo.com/the-sound-...
Interested in hiring me as a consultant/freelancer?
valeriovelardo.com/
Follow Valerio on Facebook:
/ thesoundofai
Connect with Valerio on Linkedin:
/ valeriovelardo
Follow Valerio on Twitter:
/ musikalkemist
Наука

Комментарии • 87

@professorbalthazar82 3 года назад ⁺⁹
All the videos in this series are very helpful, well made and well explained, thank you so much!
@nmirza2013 Год назад
Such a comprehensive yet easy to understand series, hats off
@faramirchevlonski6152 3 года назад ⁺¹
The best explanation in all the internet! Thanks man.
@ourissueanniversary Год назад
Extremely clear explanation!
Thanks a lot!
@Sumit_Sharma_Music Год назад ⁺²
Thank you for making these amazing videos and putting so much effort. This series has cleared my doubts better than any signal processing and audio processing videos. I have one question: In this plot at 12:48, you are computing the spectrogram for the square of the amplitude, if I plot not the square of the amplitude, I am able to visualize the frequencies. In the previous videos shows in the series, we are computing the fourier transform given the amplitude (while computing full fourier transform).
@saucyyy4508 Год назад
This helps so much for my final project idea thank you!
@juanhernandez-up4pg 2 года назад
thank you for this video, your a real hero.
@giavinhvoquang4456 2 года назад
Thank you so so much for your dedication.
@thuancollege5594 Год назад
very helpful. Thank you very much!
@leonardofreua3084 3 года назад
Excellent, thanks for this video.
@ValerioVelardoTheSoundofAI 3 года назад
Glad you liked it!
@ang3dang2 Год назад
You are going to put so many lecturers - top universities inclusive, out of job!
@riobale 3 года назад ⁺²
Hi Valerio, in the above spectrograms there is always a strong and constant low frequency component. What does it depend on? Is it relevant or is it just an artefact? Thank you
@ryandaputra2056 3 года назад ⁺¹
MAKASIIH BG, SUDAH MEMBANTU TUGAS SAYA
@girishrane9926 3 года назад
Thank you sir🥳🙌🙌
@lahcenekabour7542 2 года назад
thanks for your video, very helpfull and well explained !!!!!!
@ValerioVelardoTheSoundofAI 2 года назад
You're welcome!
@sandipandhar1668 3 года назад ⁺²
Great content
@ValerioVelardoTheSoundofAI 3 года назад ⁺¹
Thank you Sandipan :)
@user-co6pu8zv3v Год назад
Thank you :)
@sohampatil6539 3 года назад
Thank you thank you thank you!!!!!!!!
@cloudhuang700 3 года назад
Quick questions, What is the purpose of doing Y_scale = S_scale ** 2 ? Why 2 but not a different number ? What effect does this power parameter has on the generated spectrogram ?
@zahraroozbehi734 3 года назад ⁺⁵
Thank you so much for your fascinating course.
At about 7:33, when you explain how to get #frames, which is 342 here, I cannot calculate it myself based on the formula in the last video:
#frames = ((#samples(of scale array)-FRAME_SIZE)/HOP_SIZE)+1 = ((174943-2048)/512)+1 = 338.68 and not 342.
Can you please clarify me?
Thank you
Zahra
@nezardasan5015 3 года назад
Very very usfull thank you
@ValerioVelardoTheSoundofAI 3 года назад
You're welcome!
@Pianistprogrammer 3 года назад ⁺¹
Can you please explain similarity matrix if its possible with python?
@ronespu 3 года назад
Why do I have this error when I try to run "Ipd.Audio(scale_file)" : Value Error: rate must be specified when data is a numpy array or list of audio samples
@sharonmcgavin1666 17 дней назад
I notice in the librosa docs they don't square the magnitude but then use amplitude to db, does anyone know if this make any difference to the final results? Guess I'll have to try to understand everything properly later! Great video!
@Underscore_1234 2 месяца назад
Hi, great video again. Could anyone explain me why we square after the np.abs(Y)? not using it doesn't change much the result since we'll use logarithmic scales, but is it correct to actually use it?
@dr.nandkishordhawale3 Год назад
Hello! The series is extremely interesting. Thank you for creating a channel and sharing the knowledge with theory and hands-on. Between, I want to report a tiny error. In this track around the time 3:15, the audio track been played via Ipython was not audible. The same problem happend when you played the same audio in one your previous video. Thank you.
@ValerioVelardoTheSoundofAI Год назад
Thank you. I think the issue with audio has to do with YT's copyright.
@rekreator9481 3 года назад ⁺⁵
Is there any possibility to recreate audio after we get the log-amplitude spectrogram? Ofc, first we would convert dB back to power for what we have a function in Librosa, but what then? How to invert the "np.abs(S_scale) ** 2" part back to audio?
@gvcallen Год назад ⁺¹
once you've taken the magnitude, you unfortunately lose the phase information and therefore lose information in general. you would have to have stored the phase somewhere
@MsBalajiv 3 года назад ⁺¹
Hi Valerio,
Thanks for this video. Very useful.
At 7:30 of this video, displaying the shape of stft output matrix, I feel that the #frames is calculated as (samples/hopsize)+1 in my example code. I understand the equation in your previous video but the librosa output is slightly different.
In my example, samples=220100, Frame_size = 512, Hop_size = 160
Output STFT matrix has second dimension: 1379
Can you please clarify.
@sac6496 3 года назад ⁺²
From the librosa source code says that pad_mode is 'reflect' and place center If unspecified.
if center:
y = np.pad(y, int(n_fft // 2), mode=pad_mode)
so both n_frames calculation Valerio said and librosa have same result.
Your samples + padding = 220100 + (256*2) = 220612, n_frames = (samples - frame_size)/hop_size + 1 = (220612-512)/160 + 1 = 1376, and I think your **second dimension: 1379** maybe type wrong. My computation is (257, 1376).
@habibrekik4009 2 года назад
Hey Valerio, i tried to make it on an audiofile that i got but i have an error; it sais to me that this "figsize" is not defined... can you please help :)
@Trivimania 2 года назад
Sir is it also possible to save such a spectogram to an image file?
@seathru1232 2 года назад
Dear Valerio, how can I convert from magnitude spectrum (therefore, with ^2 coefficient) back to audio? The istft requires complex numbers, but we lose track of them with the magnitude.
@codeparity Год назад
just save the variable before doing magnitude
@desryan1603 3 года назад
Was there a reason why you changed from using a continuous colour scale in the first (non-log) plot to a diverging scale for the log plot?
@desryan1603 3 года назад
I suspect it demonstrates harmonics better than with the contiuous scale.
@nikkatalnikov 3 года назад ⁺¹
Great video!
What is the advantage of power magnitude vs. magnitude, e.g. why doing np.abs(S_scale) ** 2, not np.abs(S_scale)?
@slok930 Год назад
In signal processing practices, its common to plot power against time(rather than simply amplitude/magnitude vs time) in order to make the plots of magnitude more pronounced in nature(kinda more noticeable)....and as it happens time- averaged power of a sine wave is proportional to the square of its amplitude.
@user-nw7rv3oq3p 5 месяцев назад
Why does the debussy sound file does'nt look like a copy from the center?
@shubhamkapoor5152 Год назад
Hi how do we do this if we have 150 audio files?
@roaaeb1894 3 года назад
Hi Valerio, I want to extract features for set of video that is stored in folder and number of audio file is 200.How can I load audio files and apply the feature extracting in all files at the same time.What is the format for the code that is contains the all audio files???
@debabratagogoi9038 2 года назад
I am also looking for the same, to extract spectrogram from the large number of audio files. did you get any solution that I can follow
@basmam8542 2 года назад
nice hair style :)
@enkaibi2756 3 года назад ⁺¹
Why the time of the spectrum is only lasting a few seconds while the raw audio is a few hours long?
@ValerioVelardoTheSoundofAI 3 года назад
I believe that's a bug when displaying audio in Jupyter notebooks.
@ont7126 2 года назад
Hi, i got confused a bit. In the DL series we compute spectrogram with spectrogram = np.abs(stft) and log_spectrogram = librosa.amplitude_to_db(spectrogram). Is there any difference with the way we compute it in this video (spectrogram=np.abs(stft)**2 and Y_log_scale = librosa.power_to_db(Y_scale) )?
@ValerioVelardoTheSoundofAI 2 года назад
In the first case, we have an amplitude spectrogram. In the second, a power spectrogram.
@ont7126 2 года назад
@@ValerioVelardoTheSoundofAI Thank you! But still i can't understand when we use the first case and when the second one, since you call both as spectrograms
@ValerioVelardoTheSoundofAI 2 года назад
@@ont7126 it depends on the task. Sometimes, amplitude spectrograms work better than power spectrograms. Sometimes, it's the reverse. Unfortunately, there isn't a "rule". You'll have to try both of them and see which representation works best for your problem.
@ont7126 2 года назад
@@ValerioVelardoTheSoundofAI Ok, got it ! Since i am new to speech recognition and still practicing on datasets, would it be better to use scipy.signal.spectrogram() instead ?
@0Freguenedy0 3 года назад
are we only able to work in .wav with librosa? I've been using only .wav so far.
If it's needed, i'll convert mp3s to .wav using pydub. I'll try that next week
@ValerioVelardoTheSoundofAI 3 года назад
You can load mp3 files with librosa if you have ffmpeg installed.
@0Freguenedy0 3 года назад ⁺¹
@@ValerioVelardoTheSoundofAI ah, ok. Thanks!
@schibob1 3 года назад
Thank you so much for the great content!
I followed every video up to now and learned so much
At this point is the first time I have a problem that I can't solve:
In Sublime Text when I use the plot_spectogram function there is no spectogram window popping up as usual. If I put a print() before that (I don't know if that would be the right way to do it) the output shows "None". Apart from that no errors are occuring. Does anyone know how to visualize the spectogram in Sublime text?
Hope that someone knows the solution to that. Thanks in advance :)
@idontevenknow3707 2 года назад
getting the same issue, did you end up solving it? thanks
@idontevenknow3707 2 года назад
ah, found it. add plt.show() at the end of the function!
@blaze-pn6fk 3 года назад ⁺¹
Can you make videos on voice cloning?
@ValerioVelardoTheSoundofAI 3 года назад ⁺²
Thank you for the suggestion! I haven't planned to cover this topic soon, but I'll put it in my backlog as it's quite interesting.
@seeking9145 Год назад
Which piece was it from debussy?
@nowrozimohammad 3 года назад
How can I get your code?
@ValerioVelardoTheSoundofAI 3 года назад ⁺¹
From the link to github in the description box.
@iioiggtrt9085 3 года назад
how to save this after processing in csv file
@ValerioVelardoTheSoundofAI 3 года назад
You could use Pandas for that. It has a super convenient to_csv function.
@iioiggtrt9085 3 года назад
@@ValerioVelardoTheSoundofAI can make video for all there is no resources as you know in internet it will be refrence thanka alot
@iioiggtrt9085 3 года назад ⁺¹
from scratch build a dataset from audio and save it as a CSV file that will be great I appreciate that effort
@ValerioVelardoTheSoundofAI 3 года назад
@@iioiggtrt9085 I'll put this in my backlog! Thank you for the suggestion :)
@subhashachutha7413 3 года назад ⁺¹
Do on mfcc also . Upto now no good resource on it please do on mfcc also🙏
@ValerioVelardoTheSoundofAI 3 года назад ⁺¹
I've planned to cover MFCCs (theory + code) over the next few weeks. Stay tuned!
@subhashachutha7413 3 года назад ⁺¹
@@ValerioVelardoTheSoundofAI thank you and waiting for video
@achalcharantimath5603 3 года назад
hii, valerio i have a question if we have 100 genres with 100 training examples each then how to take the spectrograms, and store them? (or is there any way to generate img data and input at the run time)and each spectrogram will have varying dimensions then how to get uniform input for the network to train? will use of rectangular windows of the spectrogram be better for training , can you suggest some links for reading more about audio augmentation.
Y_scale.shape is a 2d array is enough for training or we need the rgb version of it, which is more efficient
please have look at this kaggle competition this might interest you , for bird audio recognition what should the input be spectrogram or melspectrogram
www.kaggle.com/c/birdsong-recognition
@ValerioVelardoTheSoundofAI 3 года назад
Achai you've asked a lot of good questions! I have a couple of videos in "DL for Audio with Python" that has a similar use case to yours, i.e., 100 samples in 10 musical genres. You can refer to those videos.
It's important that you always have the same input shape. For that, you can segment the songs as to have the same number of samples. Usually for music genre classification 15-30 sec worth of audio should be good.
If by "rectangular windows of the spectrogram" you mean applying a Mel filter bank, you're on the right path. Mel Spectrograms, or even better CQT, are valuable approaches when dealing with music data.
For training, you'll need the equivalent of a grey scale image, in case you decide to go with CNN architectures -- which I suggest you to do. In other words, you'll have to add a 3rd dimension. Once again, you can refer back to my videos in DL for Audio to see how to do that.
@achalcharantimath5603 3 года назад
@@ValerioVelardoTheSoundofAI
Hii, I have been watching the series(DL for Audio with Python) and referring to it a lot, thank you for this channel,
so for CNN we have to set the third dimension as 1 right is that what you meant , like (120,120,1) adding the other dimension, by windowing i meant if there is a 1 minute call of bird then taking 5 second input next 5 second input so on , is there any way to do that in the spectrogram ?
@ValerioVelardoTheSoundofAI 3 года назад
@@achalcharantimath5603 (120, 120, 1) works. You can use the whole 1 minute worth of audio, or segment it in, say, 15 seconds chunks.
@drjfilix3192 2 года назад
Hi, Valerio, do you speak italian ? ;-)
@ValerioVelardoTheSoundofAI 2 года назад
Yes, I'm Italian
@drjfilix3192 2 года назад
@@ValerioVelardoTheSoundofAI Perfetto! Vorrei chiederti 1 milione di cose ! :-D Intanto inizio a guardare i tuoi video, che mi interessano moltissimo!
@foxyonfire 7 месяцев назад
0:16 debussy
@noahdrisort2005 2 года назад
you were too young with new hair :))
@ValerioVelardoTheSoundofAI 2 года назад
LOL

Следующие

Автовоспроизведение