the beauty of this video is that it explains both general overall idea using understandable terms that suit both experts and new researchers in this domain. and also it covers technical aspects, many of the academic papers and videos and papers they assume you are already an engineer or expert they jump immediately to advanced topics and use jargons that let me stop and search again multiple times in order to understand the video. but this one is really smooth and clear and more importantly it links the concept with purpose (Why do i need to do these steps). Thank you very much!! I will reference your fuzzy wavelet paper in my MSc. research.
Such an Amazing Explanation❤❤. I am unable to thank you in words. This channel is a gem for all Biomedical Signal Processing Researchers and Learners.✨✨
I have watched many videos about wavelets, tried with articles, even in my native language, but this video is the best and easiest to understand I have found. Thank you
Thank you very much, please be motivated to continue to teach and develop your confidence. You have explained something very complex to an undergraduate and will allow me to perform an experiment for a project I need to do at uni. I am super grateful and wish you well in your career.
Thank you so much for the detail and clear explanation, i was really confused how eeg signals get decomposed and the features can be extracted using wavelet transform this video gave a wonderful explanation
Hi there! Thanks for the video. Is it possible to provide a reference(book, papers, etc) for the part you talk about how wavelets are decomposed into LPF and HPF and the iterative process of breaking the signal down?
Hi there, thanks for your feedback. Have a look at this one link.springer.com/book/10.1007/978-3-642-56702-5 The video is the result of books, papers and work experience, so you may or may not find everything in the books. However, this specific book is one of the best IMHO.
Thank you for your feedback. This is the way it was designed originally as the goal here is to separate the high frequency fluctuations from the rest of the signal. We keep decomposing the low pass till we reach the frequency range of interest or to the point we can’t decompose further beyond (criteria to check that is known). However, wavelet packet goes further and says why only decomposing low pass side and let’s do that for the high pass side too and by that providing a better picture of the time frequency contents of the signal.
dear mr rami, i want to decompose an EEG signal (sampling frequency is 500 Hz) into its frequency bands: delta, alpha, theta, beta, gamma. i will be using the discrete wavelet transform and db4 as a family, what should be the decomposition level to cover all the frequency spectrum ? if taking into consideration the nyquist frequency the levels is 6 ? if not is it 7 ?
hello hassansaad, "I am currently working on a thesis similar to your task. May I see your guidelines on how to understand what DWT, signal decomposition, and db4 are?"
one point that might be confusing for people, that at time 28:00 the same value 128 was used for both number of 128 channels and frequency of 128 Hz. So new learners might think that this is the same thing. I have one question, so what is the difference between these two methods (wavelet transform and wavelet packet transform and "Wavelet Convolution" where dot product between signal and a wavelet (eg. Morlet wavelet) is performed ? is this the same thing or something different ?
Thank you for the feedback. Generally, EEG caps used to come in packages with 2^n electrodes with n =0,1,2,3… so the number of electrodes in many papers you read in the literature might be close to the sampling frequency. A bit confusing I understand, but we have to learn that anyway to move forward. I will make sure in the future to more carefully select my examples :) As for your question, watch the next video -part2 where I kind of discussed that. Discrete wavelet transform is more focused on decomposing the lower frequency (left side) while wavelet packet transform looks at decomposing both high and low frequencies (left and right). However, both use scale values that are integer multiple of 2 (au.mathworks.com/help/wavelet/gs/continuous-and-discrete-wavelet-transforms.html). Then continuous wavelet transform solve this by using better selection of the scale value but it ends up doing more computations (more differences discussed in the above link). Finally, there is the wavelet scattering transform which is the latest branch of these models that I explain in the next video. Hope that helps.
Hi, Rami. Amazing video, amazing explanation cleared things up for me. Do you mind sharing the references you used? I would like to learn more. Thanks!
Thank you for your feedback. I would recommend this book link.springer.com/book/10.1007/978-3-642-56702-5. However, there are many more out there. For me, it was mostly the papers and the book chapter I read from here and there plus work experience.
@@dnaviap there are a few ways to work that back. Time of node = Time of signal divided by 2^J. And by that you can track the time for which that node represents. Alternatively, using a sliding window approach, e.g., for a signal of 10 seconds, chop that into 1sec segments and analyze these individually. By that you know that the specific node of interest and its coefficients they belong to that 1sec window. Alternatively, use the CWT as it generate. A time frequency plot. Or, just ask chatGPT and it will give you examples.
@@AlaphBeth according to this, is correct to think that using DWT at each level the entire time domain of the signal is covered by the corresponding level coefficients? I mean the coefficients of a certain level are covering the entire time domain of the signal regardless of the number of coefficients
Assalamualaikum sir, This video is ma sha allah wonderful .Thank you very much for the video. Well explained. lPlease bring more similar types of videos in future.May Allah bless you.
Can you recommend any books/papers to wavelet transform/Feature extraxtion, maybe with algorithms how to do it or more, deeper information? I need it for my master thesis
Thanks for your feedback. That has been actually done by many researchers, see for example www.hindawi.com/journals/mpe/2019/1340174/ Or this one www.researchgate.net/publication/334519126_LSTM_with_Wavelet_Transform_Based_Data_Preprocessing_for_Stock_Price_Prediction
This video is very informative that QnA cleared a lot of doubts. It would be highly appreciate if you can make a video on scattering transform in details. Thanks in advance
Thanks a lot! Great work! It'd be awesome and so appreciated if you made a video and explained the fuzzy entropy and mutual information and understanding them. Thanks again.
Why is there localisation: because what you are doing here is sliding a wavelet with a certain scale (scale is related to frequency) across your entire signal in time domain and performing convolution and aiming to find large wavelet coefficients (coefficients resulting from convolving your signal with the wavelet). If that happens then you have localised the feature of interest across time (by sliding) and frequency (by virtue of selected scale that is related to frequency). Both approximately (CWT is more precise). As for downsampling: we said LPF and HPF each followed by downsampling. If your signal has a frequency content that has a range of 0-8Hz. First LPF gets you 0-4Hz but how? Well remember the ideal filter shape (rectangles for pass area and zeros for no pass). You get your 0-4 followed by 0s. Think of downsampling as keep the 0-4Hz and throwing away the 0s. Now for the HPF, we get 4-8Hz but preceded by few 0s. So we flip that, down sample to keep the 8-4Hz and throw the 0s away and then reorder frequencies. This is what I mentioned in one of the comments below about maintaining natural order of frequencies. I really recommend this book books.google.com.au/books/about/Ripples_in_Mathematics.html?id=nMIPBwAAQBAJ&printsec=frontcover&source=kp_read_button&hl=en&newbks=1&newbks_redir=1&redir_esc=y Hope that helps :)
@@AlaphBeth Thanks so much for the detailed reply. Just one more thing: Understood about the localization but at the same time not really. Suppose we have a time signal with 16 elements. After the decimation, the first 8 elements are low frequency followed by 8 elements from the HPF. So the question is - what is the time axis now? I imagine it would be tau instead of t just like normal convolution so its like tau = 1,2,3 ... 16. But if we were to plot the 8 low frequency followed by 8 high frequencies, it would mean the first 8 tau instant is low frequency i.e. low frequency is always localized at the front part of the signal while high frequency is at the back. Why would that be the case? I don't see how this can be the same as discretized form of continuous wavelet transform.
@@semcify the major concept is similar which is convolution with some wavelets (fathers or mothers) but DWT is NOT the same like CWT. DWT employs discrete scale values that are integer multiple of 2 (discrete value for the translation parameter too) while the CWT has much better control on the scale value. DWT is computationally more efficient that CWT because it does not provide you with the same level of details like CWT. You can localize transients in your signal, or characterize oscillatory behavior better with the CWT than with the DWT due to the way the scale values are selected. On the other hand, DWT can capture some important features of many signals in a few coefficients and is also orthonormal transform (desired property in many applications). On the other hand, your question about why low frequencies come before the high frequencies, isn’t the frequency spectrum usually ordered from 0Hz to Fs/2, so when you start partitioning this range then it’s natural for the low to come before the higher frequencies. You should probably watch Mallat videos on RUclips. I didn’t want to add that level of details to this lecture as I was introducing a complex concept and aiming to mainly simplify the concepts, but you definitely have a good selection of questions :) The book I mentioned earlier does discuss these topics with graphs depictions ( google might get you a pdf).
Thx a lot bro, I just wanna ask how you used high and low pass filters as a band filter with a limited range of frequency (min-max)... (Basima raba aziza )
Thanks for your feedback. It’s the continuous divisions into 2 parts (lower half with LPF and higher part with HPF) that eventually generate the required subbands. For example, if you have signal sampled at 64Hz, with a valid frequency range of 0-32Hz, then using the wavelet transform divide that in the first decomposition level into 0-16Hz and 16-32Hz, then divide 0-16Hz into 0-8Hz and 8-16Hz and keep doing this by simply dividing the frequency into 2 equal bands as you keep decomposing. So you started from 0-32Hz and now by using LPF & HPF you have 8-16Hz (limited range min-max) as one of the bands/subbands. (Raba pshena)
Dear mr rami, can i use the discrete wavelet transform to extract and plot the EEG frequency bands ? In other words can i represent the EEG signal in frequency domain using DWT ? Or when it comes to representing the frequency domain i have to use the fourier tranform ? Please i need a practical answer
Hi Hassan Let’s take this one by one. EEG is a non-stationary signal. By using FFT on EEG, you may lose some info and you have to work on small segments of the EEG signal during which one can assume that the EEG is stationary. FFT can though show you the power spectrum with frequency on the x axis and power on the y axis, that is the contribution of every single frequency. On the other hand, DWT can deal with non stationary signals like EEG. It usually chops the frequency spectrum into bands or segments and keep dividing these bands into smaller portions that allow us to zoom at specific events of interest. In the video, I already showed you how to extract the EEG bands related features. So if you ask can you plot EEG bands then I would say watch the video again as you obviously missed this part. Your second question, DWT is not a frequency analysis tool like FFT, it’s a time frequency analysis tool or time scale. Hence you can get localised info in time and frequency. For your question in which one to use, I would say use whatever works the best for your example.
Rami sir.. I became ur fan... really superb video with great explaination.. I m having one question.. at last u have mentioned about dimensionality reduction using PCA, LDA etc.. I need to ask u that suppose we have done feature extraction first the we apply PCA for dimensionality reduction then we apply any feature selection and last classification, is this flow right?? or any suggestions from ur side
Firstly, thanks a lot for your feedback, much appreciated. About your question: if you have many channels, you may want to do some sort of tree optimisation first per each tree (this is feature selection per tree) and then get the features from all resulting trees and pass them through PCA. Another method, is to simply use any variant of LDA feature projection and in this case don’t worry about optimising individual trees (nothing wrong if you also do that). Just get all the features from all the trees and send these to an LDA based dim reduction. This is because LDA considers class label while PCA does not; so PCA might need some help with individual tree optimisation to guide it, but LDA does not. Hence, the typical flow is what works the best for you, there is no right and wrong. Try as many scenarios and pick the one that works the best. I presume though that the LDA approach might works best.
EDF is a standard file format designed for exchange and storage of medical time series, it has nothing to do with the ability of DWT to process the signals. So the short answer is YES, but you need an edf reader to first bring the signals into Python or Matlab so you can apply any processing step on it. Simply think of this as the type of packaging in which your post arrives within, it does not matter if it is an amazon box or simple normal mail box, you have to open the box first to get your mail contents to think of what to do with it.
@@niroshadas4649 look here for instructions pywavelets.readthedocs.io/en/latest/ref/dwt-discrete-wavelet-transform.html, where cA is the low pass filter side and cD is the high pass filter side I mentioned in the video. You just need to practice with examples and relate to this video
The answer to your question depends on many parameters like: which transform are you going to use, DWT or WPT? How many decomposition levels are you using? How many features are you planning to extract from each node? Are you going to optimise the trees or just use the whole lot? As you can see, once you know the answer to these parameters (which is pretty much depend on your problem) then you can figure out how many features you end with.
There are lots of python implementations using pywavelets, I cant really recommend anything unless I try it myself, and in this direction, I tend to write my own code. Just use pywavelets as a starting point, decompose to one or a few levels with DWT, get the cA (low pass) and cD (high pass) and extract some basic features from each cA and cDs like energy, variance, etc and take it from there to make it more complex. You can also search GitHub for some ready libraries.
@@AlaphBeth 3iteration forward and inverse wavelet transform and reconstructed the image .how we can find the number of operation carried our that is addition and multiplication approximation.how down sampling operator can reduce number of operation
@@Dreams365as what you posted here is not a doubt about anything in the video, but an independent question on computing what is known as the computational complexity (big O notation) when applying DWT forward and backward across several decomposition levels. Generally, the number of additions, multiplications and subtraction is what the Big O notation looks at, so you don’t have to calculate individual number of operations but just report the notation like O(N). This is out of the context of this video, as you can easily find other resources for that.
Hi, great video. But, Im a little confused on the decomposition tree. If each LPF and HPF down samples by 2, then how can D5 = [4 8]Hz contain 8 Hz? Its sampling rate has been reduced to 8Hz (256/2^5) and so due to Nyquist it can only represent 0-4Hz, yet it somehow captures the 4 - 8 Hz BW? any help clearing this confusion? thank you
Hi, thanks for the feedback. Good question :) I will have to simplify this as I can’t draw here. Due to the down sampling shown in slide 9, all the high pass parts are mirrored. So that 0-8Hz passes through 1) LPF to get 0-4Hz that gets down sampled without mirroring and 2) HPF to get 4-8Hz that gets mirrored (flip the 4-8Hz into 8-4Hz and shift this to be located at 0-4Hz). So the range is 4Hz but which 4Hz is the point. Also please note that as a result of this mirroring you might think then what happens to the consequent HPF parts, and in that case you have to note that the ordering of the nodes can be either according to the filter bank ordering OR to the natural frequency ordering. The second ordering is what most toolboxes will give you to keep it more intuitive for the users. Hope that helps.
If you want to understand how wavelets work, I would highly recommend a book titled: Ripples in mathematics, the discrete wavelet wavelet transform. It’s accompanied by lots of Matlab examples and code too.
I think the metaphor of the train platform vs. seat choice is interesting - it has some confusing aspects though since the frequencies make no choices on where to sit :D There is a spatio-temporal uncertainty in spectral/temporal analysis, but it is still a deterministic affair without any random or choice elements :) But there is something to it about the signal passing by, the "window" the train represents and people getting in and out of the train and so on :D Still wrapping my head around it
Thank you for your feedback, much appreciated. I tried to draw an intuitive example that a new starter in the field can relate to and couldn’t find something easier from the real world rather than the train example :)
@@AlaphBeth Your video is impressively detailed and the effort you've put in is clear. Thank you! The train metaphor was a decent attempt but it was a bit confusing. Have you thought about using audio/music as an example instead? Consider a song, where each instrument contributes distinct frequency components unique to its timbre, which can be represented as Fourier constituents in a signal. Instruments don't always play throughout the entire song - they enter and exit. Similarly, our signal's frequency content varies temporally. This variation can be visualised in a spectrogram, correlating to temporal analysis.
@@dreamdrifter Good one 👍 For me the idea of train/people seemed more appealing at the time as you get to see people sitting in the different seats but you can tell when they went inside the train (if you are sitting already), and from outside the train you don’t know what seats they will take without being inside, which paved the way for time frequency view (standing near the door). I guess with some nice presentation the idea of song would look great in this topic.
@@AlaphBeth Indeed. However, it is not clear how the distribution of the people standing on the platform has any temporal element - I perceive that as a spatial distribution. And to be difficult, if I am sitting on the train I can turn my head towards the door and see who is coming in and out. I understand the analogy assumes otherwise but this fact is likely to cause some cognitive dissonance (at least, it did for me)
I don't understand how Fourier or Wavelet transform works in images (2D). I understand how the sinusoids/daughter wavelets analyze a signal (1D) but how exactly they can analyze an image? I'm searching for a brief explanation for so long. Looking forward for your next video:) Hopefully explaining this ^^
Sure, will talk about that in the next video. Basically, consider the coefficients generated by convolution of the signal or image with the wavelet family filters. In many cases, people usually report that they can throw away many coefficients (resulting from the convolution with the wavelet families) and keep a tiny amount of these of represent signals/images and this is the image compression application (storing a small number of coefficients that represent the image is better in terms of memory than storing the original image, if the quality of the reconstructed images are still good). In image processing, in one example you can look at the coefficients that are big enough in a vicinity of small coefficients to detect edges. In image classification, if you apply some shifting to the image, or some weird morphology change and you still get nearly the same coefficients after all of these steps then you have a robust representation of the images which will make learning these images much easier by any classification model. I quote from Mathworks wavelet scattering page: “For classification problems, it is often useful to map the data into some alternative representation which discards irrelevant information while retaining the discriminative properties of each class. “ Hope this helps till the next video :)
@@AlaphBeth It helps a lot and thank you :) In my thesis i use PyWavelets library of Python and apply wavelet transform in images and then extract features from the approximation (LL). I don't understand how the low and high pass are produced (maybe because my math background is not strong). The idea is to use a mother wavelet, translate it in time and scale it up to compare/find similarities between the wavelet and a region of the image, am i right? How these filters are derived? Are they just kernels/masks for the convolution? Basically matrix multiplication between 2 kernels (low and high pass filters) and image? I am looking forward for your next video :)
Thank you for a brilliant recording like this. Absolutely happy to have watched this knowledgeable video. I have a question. @20min in the video we see how the signal is decomposed to different eeg bands. Am working on - kaggle's confused student eeg brainwave data. The data in the Excel file (for example if I just consider Delta column) has values like - 301963, 73787, 758353, 2012240. For gamma waves it's like - 8293, 2740, 25354, 33932. These values don't match with the frequency bands. Am wondering how are these brain waves values represented in this file. Or in what units are these values. Am unable to interpret them. Sorry for a lengthy question.
Thanks for your feedback. I am not familiar with that specific EEG dataset you mentioned from Kaggle (google BCI competition datasets for proper EEG signals). However, if they have already extracted delta and gamma then what’s left is to extract the features. Remember that in the video I started from a “Raw” EEG signal and started decomposing to get the bands of interest and then said once you have the signals from these bands you can extract the features you want. As for your statement when you said: “these values don’t match with frequency bands” I am not sure what are trying to match! because each node in the tree represents a specific frequency band, say for example 0-4Hz, but that does not mean the signal from that node would have values going from 0 to 4. So obviously they decomposed the signals for you already into the frequency bands and what’s left is feature extraction, dimensionality reduction and classification. More intuitively, Simply imaging if one gives you a block of land and ask you to build a few houses on it. When you build the houses (nodes of the tree) each will have an address or a house number (frequency range of the specific node) and each house would be occupied by different people (different signals). Hope that helps
I think I have a better analogy. Say someone is fishing all day and they make a note on the hour, every hour, of how many fish they caught during that hour. For example, they caught two fish at noon and three fish at 1:00 p.m. and four fish at 2:00 p.m. and two fish at 5:00 p.m. and no fish at 6:00 p.m.. if somebody drew a graph of fish caught per hour, that would be a Time domain graph. If instead, they made a graph of how many hours where they caught one fish and how many hours where they caught two fish and how many hours where they caught three fish and so on. That would be the frequency domain. A histogram. The bin for two fish would have two counts. A few bins, including the zero bin, would have one count, and any other bins shown would have zero counts.
@@AppliedCryogenics Nice one, indeed there are many possible representations. One question, how would you account for the time-frequency domain? In my example, people on platform don’t know who takes which seat inside the train (from time you cant tell which bins are occupied), and from frequency domain (inside the train/carriage) you can see all bins occupied or not, but you don’t know at what time that bin got occupied as people come inside the carriage. Only those standing near the doors (time frequency domain) can see to some extent when a person is boarding the train and which seat he/she might take, assuming the design of trains used in Australia :).
It depends on the application. So far it’s been implemented in so many applications including image denoising, compression, feature extraction, etc. If you are more interested in feature extraction from images then I would recommend you take a short cut and head to the next video on wavelet scattering transform as the scattering transform is achieving very impressive results on extracting the important features from images regardless of rotation, translation and deformations. Both Kymatio (python) and Matlab toolboxes provide you with lots of examples to show you how.
Thanks for your feedback. This presentation was originally made for some product managers who know nothing at all about signal processing and time/frequency analysis so in that case we have to keep it as naive and as simple as possible by relating to something everyone knows and understands. After that the slides were extended for those who just started their PhDs, so again some of which may not know anything about the field. I am sure you can find some other explanation of the time/frequency that meets your own needs on RUclips with hundreds other videos 👍
@@AlaphBeth Thank you so much, Doctor, for this fascinating presentation. I also saw the second half. It was also amazing. I wish you continued success.
the beauty of this video is that it explains both general overall idea using understandable terms that suit both experts and new researchers in this domain. and also it covers technical aspects, many of the academic papers and videos and papers they assume you are already an engineer or expert they jump immediately to advanced topics and use jargons that let me stop and search again multiple times in order to understand the video. but this one is really smooth and clear and more importantly it links the concept with purpose (Why do i need to do these steps). Thank you very much!! I will reference your fuzzy wavelet paper in my MSc. research.
Thanks a lot, appreciate your feedback.
Such an Amazing Explanation❤❤.
I am unable to thank you in words.
This channel is a gem for all Biomedical Signal Processing Researchers and Learners.✨✨
Thanks, highly appreciate your feedback.
Love this. Read so many posts that made the subject of Wavelets feel esoteric and unnecessarily complex.
This is so simple and clear.
Thanks a lot, appreciate your feedback
Such an interesting, easy and a clear explanation. So grateful to have found it on the right time! Thanks!
This is the most well-explained video about wavelet transform and its application in EEG analysis I have seen.
Thank you very much for the great work!
Thank you, appreciate the feedback :)
I have watched many videos about wavelets, tried with articles, even in my native language, but this video is the best and easiest to understand I have found. Thank you
Thank you, appreciate the feedback 🙏
Thanks for the great vid. The frequency content diagram really helped my understanding.
Thank you very much, please be motivated to continue to teach and develop your confidence. You have explained something very complex to an undergraduate and will allow me to perform an experiment for a project I need to do at uni. I am super grateful and wish you well in your career.
Thanks a million, highly appreciate your feedback.
I am speachless....the way you explained is truly awsome....stay blessed
Thank you, appreciate your feedback.
Thank you so much for the detail and clear explanation, i was really confused how eeg signals get decomposed and the features can be extracted using wavelet transform this video gave a wonderful explanation
The best video on this topic, thank you so much for sharing this with us!
This is very helpful... I was having some problems understanding the WT, but after this, it's all clear.
Thanks and keep on.🙂
Thanks for your presentation. You made the concept very easy to understand.
Very Great Presentation...
Thank you very much for clarifying concept so clearly with examples...
Happy to had a great minutes in this
Video ....
Thank you, appreciate the feedback
Excellent presentation as always Rami!
Thanks Evan, appreciated
Hi there! Thanks for the video. Is it possible to provide a reference(book, papers, etc) for the part you talk about how wavelets are decomposed into LPF and HPF and the iterative process of breaking the signal down?
Hi there, thanks for your feedback. Have a look at this one link.springer.com/book/10.1007/978-3-642-56702-5
The video is the result of books, papers and work experience, so you may or may not find everything in the books. However, this specific book is one of the best IMHO.
@@AlaphBeth Thank you very much! This helps a LOT!
Amazing explanation....Could you please explain why do we recursively divide the low pass output ?
Thank you for your feedback. This is the way it was designed originally as the goal here is to separate the high frequency fluctuations from the rest of the signal. We keep decomposing the low pass till we reach the frequency range of interest or to the point we can’t decompose further beyond (criteria to check that is known). However, wavelet packet goes further and says why only decomposing low pass side and let’s do that for the high pass side too and by that providing a better picture of the time frequency contents of the signal.
dear mr rami, i want to decompose an EEG signal (sampling frequency is 500 Hz) into its frequency bands: delta, alpha, theta, beta, gamma. i will be using the discrete wavelet transform and db4 as a family, what should be the decomposition level to cover all the frequency spectrum ? if taking into consideration the nyquist frequency the levels is 6 ? if not is it 7 ?
hello hassansaad, "I am currently working on a thesis similar to your task. May I see your guidelines on how to understand what DWT, signal decomposition, and db4 are?"
Many thanks , Excellent presentation and an Amazing Explanation
Thanks, appreciate the feedback
Very nice sharing. Thanks for your great work.
Thanks a lot, highly appreciate your feedback
one point that might be confusing for people, that at time 28:00 the same value 128 was used for both number of 128 channels and frequency of 128 Hz. So new learners might think that this is the same thing.
I have one question, so what is the difference between these two methods (wavelet transform and wavelet packet transform and "Wavelet Convolution" where dot product between signal and a wavelet (eg. Morlet wavelet) is performed ? is this the same thing or something different ?
Thank you for the feedback. Generally, EEG caps used to come in packages with 2^n electrodes with n =0,1,2,3… so the number of electrodes in many papers you read in the literature might be close to the sampling frequency. A bit confusing I understand, but we have to learn that anyway to move forward. I will make sure in the future to more carefully select my examples :)
As for your question, watch the next video -part2 where I kind of discussed that. Discrete wavelet transform is more focused on decomposing the lower frequency (left side) while wavelet packet transform looks at decomposing both high and low frequencies (left and right). However, both use scale values that are integer multiple of 2 (au.mathworks.com/help/wavelet/gs/continuous-and-discrete-wavelet-transforms.html). Then continuous wavelet transform solve this by using better selection of the scale value but it ends up doing more computations (more differences discussed in the above link). Finally, there is the wavelet scattering transform which is the latest branch of these models that I explain in the next video.
Hope that helps.
Hi, Rami. Amazing video, amazing explanation cleared things up for me. Do you mind sharing the references you used? I would like to learn more. Thanks!
Thank you for your feedback. I would recommend this book link.springer.com/book/10.1007/978-3-642-56702-5. However, there are many more out there. For me, it was mostly the papers and the book chapter I read from here and there plus work experience.
Thank you very much!
How are the coefficients associated to time in case I want to relate the frequencies from a specific level to the original signal?
@@dnaviap there are a few ways to work that back. Time of node = Time of signal divided by 2^J. And by that you can track the time for which that node represents. Alternatively, using a sliding window approach, e.g., for a signal of 10 seconds, chop that into 1sec segments and analyze these individually. By that you know that the specific node of interest and its coefficients they belong to that 1sec window. Alternatively, use the CWT as it generate. A time frequency plot.
Or, just ask chatGPT and it will give you examples.
@@AlaphBeth according to this, is correct to think that using DWT at each level the entire time domain of the signal is covered by the corresponding level coefficients? I mean the coefficients of a certain level are covering the entire time domain of the signal regardless of the number of coefficients
Assalamualaikum sir,
This video is ma sha allah wonderful .Thank you very much for the video. Well explained. lPlease bring more similar types of videos in future.May Allah bless you.
Thank you, appreciate your feedback.
Thank you so much! Excellent presentation.
Thanks a lot for the presentation. The explanations are really good. :)
Thank you for the feedback, much appreciated.
Can you recommend any books/papers to wavelet transform/Feature extraxtion, maybe with algorithms how to do it or more, deeper information?
I need it for my master thesis
@@123_thenumber5 there are a lot of books on wavelets, check this for example link.springer.com/book/10.1007/978-3-642-56702-5
Also check the thesis of Naoki Saito, I believe it’s the first one here scholar.google.com.vn/citations?user=IWIK32cAAAAJ&hl=vi
great video! Can I use DWT for preprocessing data before forecasting stock prices using LSTM?
Thanks for your feedback. That has been actually done by many researchers, see for example www.hindawi.com/journals/mpe/2019/1340174/ Or this one www.researchgate.net/publication/334519126_LSTM_with_Wavelet_Transform_Based_Data_Preprocessing_for_Stock_Price_Prediction
This video is very informative that QnA cleared a lot of doubts. It would be highly appreciate if you can make a video on scattering transform in details. Thanks in advance
Thanks, appreciate the feedback. Already preparing for the next video on scattering :)
very good video and so helpful to me although still not fully understand
Thanks a lot! Great work! It'd be awesome and so appreciated if you made a video and explained the fuzzy entropy and mutual information and understanding them. Thanks again.
Thanks for your feedback, much appreciated. Your suggestion is in my plan too, stay tuned :)
@@AlaphBeth Thanks! I am very much looking forward to it ;)
Thank you for this presentation. If it's not a problem, may I know which application you prepared the presentation through?
Thanks for the feedback, I used PowerPoint for creating the slides and zoom to create the presentation video through recording a presentation.
@@AlaphBeth Thats great! thank you for your quick response.
So what is the time axis after downsampling? Do you just remove the alternate values? If that is the case, why is there localization in time?
Why is there localisation: because what you are doing here is sliding a wavelet with a certain scale (scale is related to frequency) across your entire signal in time domain and performing convolution and aiming to find large wavelet coefficients (coefficients resulting from convolving your signal with the wavelet). If that happens then you have localised the feature of interest across time (by sliding) and frequency (by virtue of selected scale that is related to frequency). Both approximately (CWT is more precise).
As for downsampling: we said LPF and HPF each followed by downsampling. If your signal has a frequency content that has a range of 0-8Hz. First LPF gets you 0-4Hz but how? Well remember the ideal filter shape (rectangles for pass area and zeros for no pass). You get your 0-4 followed by 0s. Think of downsampling as keep the 0-4Hz and throwing away the 0s. Now for the HPF, we get 4-8Hz but preceded by few 0s. So we flip that, down sample to keep the 8-4Hz and throw the 0s away and then reorder frequencies. This is what I mentioned in one of the comments below about maintaining natural order of frequencies.
I really recommend this book books.google.com.au/books/about/Ripples_in_Mathematics.html?id=nMIPBwAAQBAJ&printsec=frontcover&source=kp_read_button&hl=en&newbks=1&newbks_redir=1&redir_esc=y
Hope that helps :)
@@AlaphBeth Thanks so much for the detailed reply. Just one more thing: Understood about the localization but at the same time not really. Suppose we have a time signal with 16 elements. After the decimation, the first 8 elements are low frequency followed by 8 elements from the HPF. So the question is - what is the time axis now? I imagine it would be tau instead of t just like normal convolution so its like tau = 1,2,3 ... 16. But if we were to plot the 8 low frequency followed by 8 high frequencies, it would mean the first 8 tau instant is low frequency i.e. low frequency is always localized at the front part of the signal while high frequency is at the back. Why would that be the case? I don't see how this can be the same as discretized form of continuous wavelet transform.
@@semcify the major concept is similar which is convolution with some wavelets (fathers or mothers) but DWT is NOT the same like CWT. DWT employs discrete scale values that are integer multiple of 2 (discrete value for the translation parameter too) while the CWT has much better control on the scale value. DWT is computationally more efficient that CWT because it does not provide you with the same level of details like CWT. You can localize transients in your signal, or characterize oscillatory behavior better with the CWT than with the DWT due to the way the scale values are selected. On the other hand, DWT can capture some important features of many signals in a few coefficients and is also orthonormal transform (desired property in many applications).
On the other hand, your question about why low frequencies come before the high frequencies, isn’t the frequency spectrum usually ordered from 0Hz to Fs/2, so when you start partitioning this range then it’s natural for the low to come before the higher frequencies. You should probably watch Mallat videos on RUclips. I didn’t want to add that level of details to this lecture as I was introducing a complex concept and aiming to mainly simplify the concepts, but you definitely have a good selection of questions :)
The book I mentioned earlier does discuss these topics with graphs depictions ( google might get you a pdf).
@@AlaphBeth Thank you very much for the help. I will check out more online resources. Great video :)
Thx a lot bro, I just wanna ask how you used high and low pass filters as a band filter with a limited range of frequency (min-max)... (Basima raba aziza )
Thanks for your feedback. It’s the continuous divisions into 2 parts (lower half with LPF and higher part with HPF) that eventually generate the required subbands. For example, if you have signal sampled at 64Hz, with a valid frequency range of 0-32Hz, then using the wavelet transform divide that in the first decomposition level into 0-16Hz and 16-32Hz, then divide 0-16Hz into 0-8Hz and 8-16Hz and keep doing this by simply dividing the frequency into 2 equal bands as you keep decomposing. So you started from 0-32Hz and now by using LPF & HPF you have 8-16Hz (limited range min-max) as one of the bands/subbands. (Raba pshena)
Dear mr rami, can i use the discrete wavelet transform to extract and plot the EEG frequency bands ? In other words can i represent the EEG signal in frequency domain using DWT ? Or when it comes to representing the frequency domain i have to use the fourier tranform ? Please i need a practical answer
Hi Hassan
Let’s take this one by one. EEG is a non-stationary signal. By using FFT on EEG, you may lose some info and you have to work on small segments of the EEG signal during which one can assume that the EEG is stationary. FFT can though show you the power spectrum with frequency on the x axis and power on the y axis, that is the contribution of every single frequency. On the other hand, DWT can deal with non stationary signals like EEG. It usually chops the frequency spectrum into bands or segments and keep dividing these bands into smaller portions that allow us to zoom at specific events of interest. In the video, I already showed you how to extract the EEG bands related features. So if you ask can you plot EEG bands then I would say watch the video again as you obviously missed this part. Your second question, DWT is not a frequency analysis tool like FFT, it’s a time frequency analysis tool or time scale. Hence you can get localised info in time and frequency. For your question in which one to use, I would say use whatever works the best for your example.
Hi Rami , Would you mind if I asked a few questions regarding application of Wavelet Filter Bank to Acceleration data? How Can I contact you ?
Hi Adam, thanks for reaching out. Feel free to add me on LinkedIn or write them here.
@@AlaphBeth Thanks Rami, added you just now on Linkedin. Hopefully catch you soon. A
Rami sir.. I became ur fan... really superb video with great explaination.. I m having one question.. at last u have mentioned about dimensionality reduction using PCA, LDA etc.. I need to ask u that suppose we have done feature extraction first the we apply PCA for dimensionality reduction then we apply any feature selection and last classification, is this flow right?? or any suggestions from ur side
Firstly, thanks a lot for your feedback, much appreciated. About your question: if you have many channels, you may want to do some sort of tree optimisation first per each tree (this is feature selection per tree) and then get the features from all resulting trees and pass them through PCA. Another method, is to simply use any variant of LDA feature projection and in this case don’t worry about optimising individual trees (nothing wrong if you also do that). Just get all the features from all the trees and send these to an LDA based dim reduction. This is because LDA considers class label while PCA does not; so PCA might need some help with individual tree optimisation to guide it, but LDA does not. Hence, the typical flow is what works the best for you, there is no right and wrong. Try as many scenarios and pick the one that works the best. I presume though that the LDA approach might works best.
Sir can I perform DWT on. edf format EEG signal.
EDF is a standard file format designed for exchange and storage of medical time series, it has nothing to do with the ability of DWT to process the signals. So the short answer is YES, but you need an edf reader to first bring the signals into Python or Matlab so you can apply any processing step on it. Simply think of this as the type of packaging in which your post arrives within, it does not matter if it is an amazon box or simple normal mail box, you have to open the box first to get your mail contents to think of what to do with it.
@@AlaphBeth Sir, Thanks for your response. Sir I have used mne python to read the data but I am stuck with the data value in pywt.wavedec function.
@@niroshadas4649 look here for instructions pywavelets.readthedocs.io/en/latest/ref/dwt-discrete-wavelet-transform.html, where cA is the low pass filter side and cD is the high pass filter side I mentioned in the video. You just need to practice with examples and relate to this video
Amazing! Just amazing.
if my device is of 5 channels and has sampling frequency of 256 hz then how many features i have to extract?
The answer to your question depends on many parameters like: which transform are you going to use, DWT or WPT? How many decomposition levels are you using? How many features are you planning to extract from each node? Are you going to optimise the trees or just use the whole lot? As you can see, once you know the answer to these parameters (which is pretty much depend on your problem) then you can figure out how many features you end with.
Excellent video
Thank you, appreciate the feedback
thank you so much for this🤩
Amazing and Great Work
Thank you, appreciate the feedback
Have you published any papers on it?
Can u please suggest the python code for reference to extract features from the DEAP dataset?
There are lots of python implementations using pywavelets, I cant really recommend anything unless I try it myself, and in this direction, I tend to write my own code. Just use pywavelets as a starting point, decompose to one or a few levels with DWT, get the cA (low pass) and cD (high pass) and extract some basic features from each cA and cDs like energy, variance, etc and take it from there to make it more complex. You can also search GitHub for some ready libraries.
Good presentation. .i have a dout
Thank you for your feedback. Please post any question here and I will do my best to reply within time.
@@AlaphBeth 3iteration forward and inverse wavelet transform and reconstructed the image .how we can find the number of operation carried our that is addition and multiplication approximation.how down sampling operator can reduce number of operation
@@AlaphBeth it will b helpful if u give ur mailid
@@Dreams365as what you posted here is not a doubt about anything in the video, but an independent question on computing what is known as the computational complexity (big O notation) when applying DWT forward and backward across several decomposition levels.
Generally, the number of additions, multiplications and subtraction is what the Big O notation looks at, so you don’t have to calculate individual number of operations but just report the notation like O(N). This is out of the context of this video, as you can easily find other resources for that.
Hi, great video. But, Im a little confused on the decomposition tree. If each LPF and HPF down samples by 2, then how can D5 = [4 8]Hz contain 8 Hz? Its sampling rate has been reduced to 8Hz (256/2^5) and so due to Nyquist it can only represent 0-4Hz, yet it somehow captures the 4 - 8 Hz BW? any help clearing this confusion? thank you
Hi, thanks for the feedback. Good question :)
I will have to simplify this as I can’t draw here. Due to the down sampling shown in slide 9, all the high pass parts are mirrored. So that 0-8Hz passes through 1) LPF to get 0-4Hz that gets down sampled without mirroring and 2) HPF to get 4-8Hz that gets mirrored (flip the 4-8Hz into 8-4Hz and shift this to be located at 0-4Hz). So the range is 4Hz but which 4Hz is the point.
Also please note that as a result of this mirroring you might think then what happens to the consequent HPF parts, and in that case you have to note that the ordering of the nodes can be either according to the filter bank ordering OR to the natural frequency ordering. The second ordering is what most toolboxes will give you to keep it more intuitive for the users.
Hope that helps.
@@AlaphBeth Ah thanks, I think this explains strange results I was seeing in plotting 0-4 and 4-8 spectrums in Matlab
If you want to understand how wavelets work, I would highly recommend a book titled: Ripples in mathematics, the discrete wavelet wavelet transform. It’s accompanied by lots of Matlab examples and code too.
@@AlaphBeth thank you!
nice explanation sir, thank you so much
Thanks for the feedback, much appreciated.
Great explanaions tx !
Thank you, appreciate the feedback.
Great work bro
Thank you for the feedback, much appreciated.
I think the metaphor of the train platform vs. seat choice is interesting - it has some confusing aspects though since the frequencies make no choices on where to sit :D There is a spatio-temporal uncertainty in spectral/temporal analysis, but it is still a deterministic affair without any random or choice elements :)
But there is something to it about the signal passing by, the "window" the train represents and people getting in and out of the train and so on :D Still wrapping my head around it
Thank you for your feedback, much appreciated. I tried to draw an intuitive example that a new starter in the field can relate to and couldn’t find something easier from the real world rather than the train example :)
@@AlaphBeth Your video is impressively detailed and the effort you've put in is clear. Thank you! The train metaphor was a decent attempt but it was a bit confusing. Have you thought about using audio/music as an example instead? Consider a song, where each instrument contributes distinct frequency components unique to its timbre, which can be represented as Fourier constituents in a signal. Instruments don't always play throughout the entire song - they enter and exit. Similarly, our signal's frequency content varies temporally. This variation can be visualised in a spectrogram, correlating to temporal analysis.
@@dreamdrifter Good one 👍
For me the idea of train/people seemed more appealing at the time as you get to see people sitting in the different seats but you can tell when they went inside the train (if you are sitting already), and from outside the train you don’t know what seats they will take without being inside, which paved the way for time frequency view (standing near the door).
I guess with some nice presentation the idea of song would look great in this topic.
@@AlaphBeth Indeed. However, it is not clear how the distribution of the people standing on the platform has any temporal element - I perceive that as a spatial distribution. And to be difficult, if I am sitting on the train I can turn my head towards the door and see who is coming in and out. I understand the analogy assumes otherwise but this fact is likely to cause some cognitive dissonance (at least, it did for me)
I don't understand how Fourier or Wavelet transform works in images (2D). I understand how the sinusoids/daughter wavelets analyze a signal (1D) but how exactly they can analyze an image? I'm searching for a brief explanation for so long. Looking forward for your next video:) Hopefully explaining this ^^
Sure, will talk about that in the next video.
Basically, consider the coefficients generated by convolution of the signal or image with the wavelet family filters. In many cases, people usually report that they can throw away many coefficients (resulting from the convolution with the wavelet families) and keep a tiny amount of these of represent signals/images and this is the image compression application (storing a small number of coefficients that represent the image is better in terms of memory than storing the original image, if the quality of the reconstructed images are still good). In image processing, in one example you can look at the coefficients that are big enough in a vicinity of small coefficients to detect edges. In image classification, if you apply some shifting to the image, or some weird morphology change and you still get nearly the same coefficients after all of these steps then you have a robust representation of the images which will make learning these images much easier by any classification model. I quote from Mathworks wavelet scattering page: “For classification problems, it is often useful to map the data into some alternative representation which discards irrelevant information while retaining the discriminative properties of each class. “
Hope this helps till the next video :)
@@AlaphBeth It helps a lot and thank you :) In my thesis i use PyWavelets library of Python and apply wavelet transform in images and then extract features from the approximation (LL). I don't understand how the low and high pass are produced (maybe because my math background is not strong). The idea is to use a mother wavelet, translate it in time and scale it up to compare/find similarities between the wavelet and a region of the image, am i right? How these filters are derived? Are they just kernels/masks for the convolution? Basically matrix multiplication between 2 kernels (low and high pass filters) and image? I am looking forward for your next video :)
Thank you for a brilliant recording like this. Absolutely happy to have watched this knowledgeable video.
I have a question. @20min in the video we see how the signal is decomposed to different eeg bands. Am working on - kaggle's confused student eeg brainwave data. The data in the Excel file (for example if I just consider Delta column) has values like - 301963, 73787, 758353, 2012240.
For gamma waves it's like - 8293, 2740, 25354, 33932. These values don't match with the frequency bands. Am wondering how are these brain waves values represented in this file. Or in what units are these values. Am unable to interpret them.
Sorry for a lengthy question.
Thanks for your feedback.
I am not familiar with that specific EEG dataset you mentioned from Kaggle (google BCI competition datasets for proper EEG signals). However, if they have already extracted delta and gamma then what’s left is to extract the features. Remember that in the video I started from a “Raw” EEG signal and started decomposing to get the bands of interest and then said once you have the signals from these bands you can extract the features you want.
As for your statement when you said: “these values don’t match with frequency bands” I am not sure what are trying to match! because each node in the tree represents a specific frequency band, say for example 0-4Hz, but that does not mean the signal from that node would have values going from 0 to 4. So obviously they decomposed the signals for you already into the frequency bands and what’s left is feature extraction, dimensionality reduction and classification.
More intuitively, Simply imaging if one gives you a block of land and ask you to build a few houses on it. When you build the houses (nodes of the tree) each will have an address or a house number (frequency range of the specific node) and each house would be occupied by different people (different signals).
Hope that helps
Very well explained sir. (y)
keep up the good work
Thanks, appreciate the feedback
Thanks can I have the slide?
great video!!
Thank you, appreciate the feedback.
I think I have a better analogy. Say someone is fishing all day and they make a note on the hour, every hour, of how many fish they caught during that hour. For example, they caught two fish at noon and three fish at 1:00 p.m. and four fish at 2:00 p.m. and two fish at 5:00 p.m. and no fish at 6:00 p.m.. if somebody drew a graph of fish caught per hour, that would be a Time domain graph. If instead, they made a graph of how many hours where they caught one fish and how many hours where they caught two fish and how many hours where they caught three fish and so on. That would be the frequency domain. A histogram. The bin for two fish would have two counts. A few bins, including the zero bin, would have one count, and any other bins shown would have zero counts.
@@AppliedCryogenics Nice one, indeed there are many possible representations. One question, how would you account for the time-frequency domain?
In my example, people on platform don’t know who takes which seat inside the train (from time you cant tell which bins are occupied), and from frequency domain (inside the train/carriage) you can see all bins occupied or not, but you don’t know at what time that bin got occupied as people come inside the carriage. Only those standing near the doors (time frequency domain) can see to some extent when a person is boarding the train and which seat he/she might take, assuming the design of trains used in Australia :).
Nice tutorial, thank you
Thanks Ali, appreciated
How to use for images?
It depends on the application. So far it’s been implemented in so many applications including image denoising, compression, feature extraction, etc. If you are more interested in feature extraction from images then I would recommend you take a short cut and head to the next video on wavelet scattering transform as the scattering transform is achieving very impressive results on extracting the important features from images regardless of rotation, translation and deformations. Both Kymatio (python) and Matlab toolboxes provide you with lots of examples to show you how.
Tank you so march
thanks a lot
Terrible explanation of the concept of time/frequency domins, but good insight into wavelet transform.
Thanks for your feedback. This presentation was originally made for some product managers who know nothing at all about signal processing and time/frequency analysis so in that case we have to keep it as naive and as simple as possible by relating to something everyone knows and understands. After that the slides were extended for those who just started their PhDs, so again some of which may not know anything about the field.
I am sure you can find some other explanation of the time/frequency that meets your own needs on RUclips with hundreds other videos 👍
Thank you so much! Excellent presentation.
Thanks, highly appreciate the feedback. Part2 was also released which discusses the wavelet scattering transform, if you need more details about that.
@@AlaphBeth Thank you so much, Doctor, for this fascinating presentation. I also saw the second half. It was also amazing. I wish you continued success.
Thank you so much