Thank you, David. This is the best explanation of dendrogram that I got. I am so grateful to you for not overloading this video with math concepts. Selecting alphabets to represent data points, made it even easier to understand. God bless you sir.
This you so much for taking the time to leave this comment, and for your kind words. This type of feedback gives me huge motivation to make further videos.
Thank you, sensei. This is the most clear explanation I've got about hierarchical clustering. Please I'm interested in learning more about how to know the optimum clusters for a dataset. Thank you.
Thank you for the wonderful video! I had a very vague understanding of this concept before watching it. However, after going through the video, everything became crystal clear, and I experienced a profound moment of enlightenment. Your exceptional teaching skills and ability to break down complex ideas into understandable components have truly been an eye-opener for me. I am deeply grateful for your efforts in creating such an informative and insightful resource.
Thank you so much for the explanation! But I wanted to ask you something: what should I do when I have for instance a and b close, b and c close, but a and c distant? should I still put the three of them together? or should I do them separately, and in this case, which group should I agregate first?
That's an interesting question. We first aggregate the points that are closest - so that would be a and b. You then say a and c are distant. Relative to what? We really need a point d do decide that, I think. Then it's a question of whether c is closer to a&b or closer to d. But there are a couple of further details that we need to think about, and that weren't discussed in this video (I hope I will address these in some new year videos!). First, when we ask about distance of c from a&b what do we mean? We have to define distance more carefully. In the video I used the average location (think of a centre of mass). So we are not comparing the distance of c to a, but the distance of c to the midpoint of a and b. There are alternative distance metrics that we can use - we could take the closest point in a cluster, in which case we would look at the distance of c to b; or we could take the furthest point in a cluster, in which case we would look at the distance of c to a. Different distance metrics allow us to describe different shapes of clusters (think of spherical clusters versus elongated string-like clusters). In all these cases we would compare these distances against the distance from c to d. Any finally, perhaps we want to stop the agglomeration process and keep d and or c separate from a&b because we think that we have reached the optimal number of clusters. To do that we need to define what we mean by "optimal" - but a rule of thumb is to look for the point where we transition from small distances to large distances - and this is something I will discuss in my next video (probably early January because I am decorating the room that I use for recordings!).
Thanks a bunch for this simplified and clear explanation, it would be a pleasure if you could share with us how could we make dendrograms from Pulsed field electrophoresis Gel , thank you :)
Funny you should ask that ... the following paper is next on my reading list: "Pulsed-field gel electrophoresis (PFGE) analysis of Listeria monocytogenes isolates from different sources and geographical origins and representative of the twelve serovars" www.academia.edu/111312006/Pulsed_field_gel_electrophoresis_PFGE_analysis_of_Listeria_monocytogenes_isolates_from_different_sources_and_geographical_origins_and_representative_of_the_twelve_serovars
I've looked at this in a bit more detail, and to be honest, handling these type of data is beyond my area of expertise. I did find some general information that I found helpful: A guide to interpreting electrophoresis gels: bento.bio/resources/bento-lab-advice/interpreting-electrophoresis-gels-with-bento-lab/#:~:text=The%20smallest%20bands%20are%20at,is%20up%20the%20ladder%20scale. (pulsed-field addressed larger DNA molecules but I presume the principles on interpretation remain the same). Any analytical technique requires digital data. I found this: Data processing of pulsed-field gel electrophoresis images www.ncbi.nlm.nih.gov/pmc/articles/PMC6940661/ The data processing would seem to me the critical step, which will ultimately result in the generation of tabulated data that would be amenable to cluster analysis. The columns of this tabulation would correspond to metrics that describe the banding, which each sample being represented by a row. I would guess that this data processing is integrated into most laboratory systems that produce pulsed-field electrophoresis gel?
Thank you, David. This is the best explanation of dendrogram that I got. I am so grateful to you for not overloading this video with math concepts. Selecting alphabets to represent data points, made it even easier to understand. God bless you sir.
This you so much for taking the time to leave this comment, and for your kind words. This type of feedback gives me huge motivation to make further videos.
Thank you, sensei. This is the most clear explanation I've got about hierarchical clustering. Please I'm interested in learning more about how to know the optimum clusters for a dataset. Thank you.
Thanks for your positive feedback. My next video will discuss the optimal number of clusters
Thank you for helping me understand dendrograms!
Happy to hear the video helped :)
Thank you so much this is really easy to understand
Glad it was helpful!
Thank you! This helped a lot
Thank you for the wonderful video! I had a very vague understanding of this concept before watching it. However, after going through the video, everything became crystal clear, and I experienced a profound moment of enlightenment. Your exceptional teaching skills and ability to break down complex ideas into understandable components have truly been an eye-opener for me. I am deeply grateful for your efforts in creating such an informative and insightful resource.
Thank you so much!
Great video!
Thank you very much great video.
I'm glad you liked it. I appreciate the feedback. Thanks!
Thanks for the great video! It would be very appreciated if you will discuss how to select the optimum number of clusters in future videos. 🙂
I appreciate your feedback. I'll make a note to make a video about identifying the optimal number of clusters - thanks for the suggestion.
Thank you so much for the explanation! But I wanted to ask you something: what should I do when I have for instance a and b close, b and c close, but a and c distant? should I still put the three of them together? or should I do them separately, and in this case, which group should I agregate first?
That's an interesting question. We first aggregate the points that are closest - so that would be a and b. You then say a and c are distant. Relative to what? We really need a point d do decide that, I think. Then it's a question of whether c is closer to a&b or closer to d.
But there are a couple of further details that we need to think about, and that weren't discussed in this video (I hope I will address these in some new year videos!). First, when we ask about distance of c from a&b what do we mean? We have to define distance more carefully. In the video I used the average location (think of a centre of mass). So we are not comparing the distance of c to a, but the distance of c to the midpoint of a and b. There are alternative distance metrics that we can use - we could take the closest point in a cluster, in which case we would look at the distance of c to b; or we could take the furthest point in a cluster, in which case we would look at the distance of c to a. Different distance metrics allow us to describe different shapes of clusters (think of spherical clusters versus elongated string-like clusters). In all these cases we would compare these distances against the distance from c to d.
Any finally, perhaps we want to stop the agglomeration process and keep d and or c separate from a&b because we think that we have reached the optimal number of clusters. To do that we need to define what we mean by "optimal" - but a rule of thumb is to look for the point where we transition from small distances to large distances - and this is something I will discuss in my next video (probably early January because I am decorating the room that I use for recordings!).
Thanks a bunch for this simplified and clear explanation, it would be a pleasure if you could share with us how could we make dendrograms from Pulsed field electrophoresis Gel , thank you :)
Funny you should ask that ... the following paper is next on my reading list:
"Pulsed-field gel electrophoresis (PFGE) analysis of Listeria monocytogenes isolates from different sources and geographical origins and representative of the twelve serovars"
www.academia.edu/111312006/Pulsed_field_gel_electrophoresis_PFGE_analysis_of_Listeria_monocytogenes_isolates_from_different_sources_and_geographical_origins_and_representative_of_the_twelve_serovars
I've looked at this in a bit more detail, and to be honest, handling these type of data is beyond my area of expertise. I did find some general information that I found helpful:
A guide to interpreting electrophoresis gels:
bento.bio/resources/bento-lab-advice/interpreting-electrophoresis-gels-with-bento-lab/#:~:text=The%20smallest%20bands%20are%20at,is%20up%20the%20ladder%20scale.
(pulsed-field addressed larger DNA molecules but I presume the principles on interpretation remain the same).
Any analytical technique requires digital data. I found this:
Data processing of pulsed-field gel electrophoresis images
www.ncbi.nlm.nih.gov/pmc/articles/PMC6940661/
The data processing would seem to me the critical step, which will ultimately result in the generation of tabulated data that would be amenable to cluster analysis. The columns of this tabulation would correspond to metrics that describe the banding, which each sample being represented by a row.
I would guess that this data processing is integrated into most laboratory systems that produce pulsed-field electrophoresis gel?
Thanks
You're welcome :)