Mel Frequency Cepstral Coefficients (MFCC) Explained

Поделиться
HTML-код
  • Опубликовано: 19 окт 2024

Комментарии • 45

  • @florin-andreirusu6424
    @florin-andreirusu6424 Год назад +6

    Oh wow, this is very well explained. Thank you!

  • @datamlistic
    @datamlistic  Год назад +3

    If you want to find out more about the Fourier Transform and the maths behind it, make sure to check out this video: ruclips.net/video/7Tk6BAJ3mm8/видео.html

  • @nayanvats3424
    @nayanvats3424 Год назад +1

    Very crisp and precise explanation of MFCC :)

    • @datamlistic
      @datamlistic  Год назад

      Many thanks! Glad it helped! :)

    • @nayanvats3424
      @nayanvats3424 Год назад

      @@datamlistic would you try to explore features like Single Frequency Filtering and other filter bank approaches?

    • @datamlistic
      @datamlistic  Год назад

      @@nayanvats3424 Most likely not in the very near future, but they are on my list. Probably at some point I will make a whole series about speech processing as I did for object detection. :)

  • @anne.nijenhuis
    @anne.nijenhuis 19 дней назад

    A great explanation! Thanks!

    • @datamlistic
      @datamlistic  17 дней назад

      Thanks! Glad it was helpful! :)

  • @iankeck3419
    @iankeck3419 Год назад +1

    Thanks! Nice explanation. They were used in the early days of speech recognition when most recognizes were HMM based.

    • @datamlistic
      @datamlistic  Год назад

      Thank you! MFCCs are still quite used in speech recognition based neural networks that do not take directly as input the audio and need some kind of pre-processing step (DeepSpeech, LAS etc). However, the community is shifting towards wav2vec2 like models that take as input the raw audio, so yeah, they are starting to become less relevant, but are an interesting studycase nevertheless.

  • @profdrmea
    @profdrmea Год назад +1

    Thank you for your effort👍

    • @datamlistic
      @datamlistic  Год назад +1

      No problem. I hope you enjoyed the explanation.

  • @cgyh68748
    @cgyh68748 2 месяца назад

    quick and simple!

  • @shanybarhom4395
    @shanybarhom4395 Год назад +1

    The best explanation I've heard 👏

    • @datamlistic
      @datamlistic  Год назад +1

      Sweet thanks! I am happy you found it helpful! :)

  • @fernandovldrs
    @fernandovldrs Год назад

    Great video

  • @ethancooper4154
    @ethancooper4154 Год назад

    So when using the triangular filterbank, you store just one scalar for each bank rather than a whole array of data?

    • @datamlistic
      @datamlistic  Год назад +1

      Exactly, that scalar is the weighted sum of energies as given by the filterbank. Please let me know if you have any more questions. :)

  • @dexnug
    @dexnug Год назад

    Hi nice presentation..tbh applying filterbank part is the hardest to understand.
    1. After we compute the fourier transform to each frame/segment signal, we convert signal to mel scale and applying triangular filterbank?and what is the output from mel filterbank?
    2. Btw can I have the reference that you used?
    3. Can i get the code to generate all this plot 1:59 ?

    • @datamlistic
      @datamlistic  Год назад

      Thank you for your feedback! Yeah, the filterbanks part also didn't come easy to me. Maybe I should have provided more details there. Regarding your questions:
      1. Almost correct. We don't convert the spectrum to mel-scale directly, we apply the triangular filterbank to obtain the filterbank energies. We could have simply selected the frequencies that correspond to equal points on the mel scale, but then we would have lost quite a bit of spectral information. You can think of filterbank triangles as taking the average of frequencies in that triangle.
      2. practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/ and wiki.aalto.fi/display/ITSP/Cepstrum+and+MFCC
      3. I don't have that code anymore.

  • @devdexvils7602
    @devdexvils7602 Год назад

    tbh I am not so understand in the riangular bank (03:23), why we are not keep the highest frequency component?what does the mean Wh,k, Xh and the red dot on frequency?

    • @datamlistic
      @datamlistic  Год назад

      Thank you for your question! Basically, you can see at 02:35 all the possible filterbanks we can apply to the spectrum. Some of them capture the highest frequency component.
      W_{h,k} - the weight given by filterbank k at frequency h in the spectrogram
      x_h - the power of the frequency h
      red dots - each discrete frequency in the signal
      I hope that this helps you in better understanding the video material. Please let me know if you have any other questions.

    • @devdexvils7602
      @devdexvils7602 Год назад

      @@datamlistic I have learn from many tutorial so that I have made the summary, please correct me if something missing, btw my project is mel spectrogram, it is process before DCT part in MFCC.
      1. Pre-process the audio signal: Depending on the application, the audio signal may need to be pre-processed before computing the Mel spectrogram. This can include steps such as removing silence or background noise, downsampling the signal, or normalizing the signal level.
      2. Compute the STFT of the audio signal: To compute the STFT of the audio signal, the signal is divided into overlapping frames using a window function. The Fourier Transform is then applied to each frame to compute the frequency spectrum of the signal at that point in time.
      3. Convert the frequency spectrum to the Mel scale: The frequency spectrum is typically expressed in Hz, but the Mel scale is based on the way that the human auditory system perceives pitch. To convert the frequency spectrum to the Mel scale, the spectrum is multiplied by a scale factor that maps the frequencies to the Mel scale.
      4. Apply the Mel-weighted filter bank: The Mel-weighted filter bank consists of a series of filters that are spaced equally on the Mel scale. Each filter is designed to pass a specific range of frequencies and the output of each filter is a measure of the energy in the signal at those frequencies. The filters are typically designed using a triangular shape, with the center frequency of each filter being the peak of the triangle and the width of the triangle being determined by the bandwidth of the filter.
      5. Compute the Mel spectrogram: The output of the Mel-weighted filter bank is a set of filterbank energies, which represent the energy of the signal at each frequency band on the Mel scale. These filterbank energies can be plotted over time to create a Mel spectrogram, which is a visual representation of the frequency spectrum of the signal over time.
      6. Post-process the Mel spectrogram: Depending on the application, the Mel spectrogram may need to be post-processed before it can be used. This can include steps such as smoothing the spectrogram, applying a logarithmic scale, or normalizing the spectrogram.

    • @datamlistic
      @datamlistic  Год назад

      ​@@devdexvils7602 That's pretty much the algorithm used to compute the Mel Spectrogram. The only remark I have is that you should make it a little bit more clear that at step 5 you apply step 4 on each frame resulted from step 2.

  • @anikaroy8311
    @anikaroy8311 6 месяцев назад

    Amazing!

  • @billylee7758
    @billylee7758 Год назад

    what is different MFCC and MFE ?

    • @datamlistic
      @datamlistic  Год назад

      I've never personally used MFEs features, but as far as I am aware you just extract the frequencies and convert to the Mel Scale using the triangular coefficients (the first two steps in MFCC). You don't apply the logarithm and the (inverse) Fourier transformation again (the last 2 steps in MFCC).
      Please let me know if this info was useful.

    • @billylee7758
      @billylee7758 Год назад

      @@datamlistic great answers. many thanks for you

    • @datamlistic
      @datamlistic  Год назад

      @@billylee7758 My pleasure! :)

  • @quinxx12
    @quinxx12 23 дня назад

    Talking about windowing. Sounds like you windowed your voice recording in this video. The chopping is quite irritating

    • @datamlistic
      @datamlistic  17 дней назад

      Sorry for that! That's an older video and I didn't know how to correctly record my voice back then. The sound quality in the newer videos should be much higher. Thanks for the feedback! :)

  • @saranshduharia6156
    @saranshduharia6156 10 месяцев назад +1

    Nh iutt

    • @datamlistic
      @datamlistic  10 месяцев назад

      :)

    • @saranshduharia6156
      @saranshduharia6156 10 месяцев назад

      Heyyy thanks for video bud

    • @datamlistic
      @datamlistic  10 месяцев назад

      @@saranshduharia6156 You are welcome! Glad you enjoyed it! :)

  • @FreehuntX93
    @FreehuntX93 Год назад

    too complicated 😞

    • @datamlistic
      @datamlistic  Год назад

      Which part do you think is too complicated?

    • @FreehuntX93
      @FreehuntX93 Год назад +1

      @@datamlistic The math behind it. Would be cool to make it understandable with low math knowledge. But nvm 😅

    • @datamlistic
      @datamlistic  Год назад +2

      @@FreehuntX93 Well, to understand MFCCs you need to be familiar with how the Fourier Transform works. You can take a look here to get a better intuition about it: ruclips.net/video/spUNpyF58BY/видео.html. The maths should come easier afterwards.

    • @root55
      @root55 Год назад

      ​@@datamlisticthis link helped a lot. Thanks

    • @datamlistic
      @datamlistic  Год назад

      @@root55 Glad it was helpful!