If all computer science lectures were presented like this gentleman did in this video, all students would excel without a doubt. Great job on this video.
It is probably because they break down the concept for non-technical users. He doesn't explain like this during lectures, I know this because he is actually my professor.
In case anyone is a bit lost as to what's going on with the dot product. Basically, it's a way of comparing two vectors for similarity. So if you've got two identical vectors of length 1 - so both pointing the same way - the dot is 1. Meaning identical. Turn one round 180 degrees and the dot product gives -1. If it's perpendicular you get zero. It's a lot like the cosine if you remember your trig. Bit more to it if they're not unit vectors with a length of one. It basically gives you the ratio if one were projected onto the other. Imagine a clock face with the long hand at 12. The dot product gives you the amount the short had goes up the long hand as a ratio. So if you shine a light from the side and the shadow of one hand goes half way up the other; then the dot product is 0.5. Hope that helps.
It's so much like the cosine that the value of the dot product is the cosine of the angle between the two vectors multiplied by their lengths. The fact you can take two unit vectors and work out the angle between them just by multiplying matching components, summing those values, and taking the inverse cosine of the result is a bit weird, but it falls out of the cosine rule.
Finally something I can understand!! I hate other explanations either asking to ignore the math, or immediately dive into the code. As a beginner in computer science, I have no idea how different concepts get connected. This guy helps bridge that missing link in the knowledge for me!🙏🙏😆 Thank you so much!! Can he explain also the Kernel trick!
Really interesting. For all interested in those directions I recommend researching in the direction of graph neural networks and specifically for molecules topological neural networks!
You can use graph convolutional networks to classify certain kinds of abuse by the users of your product. It's really flexible when you have sparse bits of information about the relationships between people and products. Graphs are absolutely everywhere.
The kernel trick is simply a matrix multiplication. You have a embeddding set A of shape (m,n) and you get your kernel(gram matrix) by computing A@AT (A matmul A Transpose) which will be of shape (m,m) which tells you how similar each vector is to every other vector in the batch. if you flip it AT@A you can use that matrix of shape (n,n) and this is used to compute the covariance matrix and this tells you how how two features vary with respect to each other. All this leads into the SVD where the U matrix captures the eigen vectors of the AAT matrix(observations) and the V matrix captures the eigenvectors of the ATA matrix(features)
I'd think the similarity between two objects is the % of overlapping arrangements. Identical objects would have 1 similarity, while objects that share a large pattern of elements would have about 0.5 similarity. The problem is that going through all those arrangements would get astronomically many right away.
The "plots", distinguished from graphs, which Sean raises at 2:30, are those plots as in a Cartesian coordinate system, like a grid with x and y axes? I assume these algorithms don't apply to plots b/c coordinates are positional, ie, the distance, angle, orientation or other scalar properties of any given coordinate is fixed to the plane itself, and not just relative to other nodes in a graph. Is that more or less why?
@08:35, “We’re not going to go into the details of the kernel method…”. Do you have a recommendation for someone who wants to go into the details? A reference maybe?
so text language models use a mapping coordinate system to find what words are most likely near it.. it would work with the graph triangle demos easily
google "Kipf 2016 Variation Graph Autoencoders". the short answer is that graph convolutions have to be permutation invariant which means using a typical CNN or dense layers on the adjacency matrix doesn't work
Yes, it's correct. The two red vertices in the triangle at the top of G2 each have one red neighbour and one blue. (Well, that specific entry is correct -- I didn't check all of them.)
A general computerphile comment: work is slowly being done to build Babbage's analytical engine. Maybe you can feature the project and give it some momentum!
13:52 It's Leman, not Lehman. Andrei Leman was Russian, not German. But probably half the papers in the literature make this mistake! His co-author, Boris Weisfeiler, at some point emigrated to the US and disappeared in Chile in mysterious circumstances during the Pinochet dictatorship.
The problem seems to be not with SVM but how to create a vector that meaningfully represent the graphs. In the first example, only counting colour vertex is limiting a graph to blue marbles and red marbles in a bag. That is a very poor representation as it looses all the edges and connections.
@Computerphile : David Kohan Marzagão 6:13 Are you telling us that we cannot establish a hash function that give a unique number for each possible graph ? This is weird, 'cause I think I already did that in my life. And yes it worked even without specifying any starting point in the graph. It was symmetry-asymmetry-based. Applying the most tedious and unsmart of naive approaches, we could simply compare the respective sets of all possible paths in each graph. It's very slow, but it proves my point, doesn't it? Nor is it very difficult to define a distance between graphs by minimizing the number of reversible transformations required to transform one graph into another. So, How to prove it cannot be done in the general case ? 🤨
You can produce a hash function that will give a unique number for each possible graph -- this is called "graph canonization". However, there is no known way to do this efficiently (i.e., in polynomial time), because it's computationally equivalent to the graph isomorphism problem. If you think you have an efficient algorithm for graph canonization, there are basically three possibilities (listed in decreasing order of probability): 1. it's not actually a canonization algorithm -- either it assigns the same number to two graphs that are not isomorphic, or it assigns different numbers to two different presentations of the same graph; 2. it's not actually as efficient as you think; 3. you've made a major breakthrough in computer science. To take your specific example of comparing the set of all possible paths, that fails both point 1 and point 2 (as you say -- it's very slow). For point 1, consider a cycle of three vertices and a cycle of four vertices: each contains paths of all possible lengths. For point 2, a graph can have factorially many paths, even if you restrict to non-repeating ones.
the one thing I really learned, is that dense embedding get learned. These all seem to be calculated... We need a word2vec kinda fake task to get something dense and semantically similar?
How can the inner vector show similarity, if you do not normalize the vectors? If you multiply a vector by a factor k, the inner product will also be multiplied with k.
Just watched David's lecture material for AIN and noticed he's so Christopher Hampson coded so I Googled him... and what do you know - they both have a video with Computerphile where the comments are praising them on how well they explain things and how passionate they are - rightfully so. They both broke down complex material into such bite-sized chunks that it was exciting to follow along. Great work, glad I had you guys teaching this stuff!
here's an idea: in the edge vectors, each node should be added to all simpler shapes as well. so a blue connected to 2 blues would be added to both the b2b and the b1b options; all blue nodes would be added to a b0 option as well. This measure of similarity would be more robust in the face of nodes with a new neighbor
After office hours teaching me further on Kernel Hilbert space my ML professor said with disappointment (paraphrased) "But, all of this doesn't matter practically because deep learning, while it's inner workings are not understood rigorously, performance are so much better..." Do you mathematicians have any unput? It makes the work I study and teach in Discrete math seem fascinating but in the end, not practically useful...
I say it's rather simple, treat shapes as a tree of shapes, even the simplex shapes, you start by identifying the simplex shapes then repeat the same comparisons with the simplex shapes as the new nodes, then again with the complex shapes as the new nodes, you keep going until all your comparisons start returning less than 50% probability of being the right deal, then you just use the most complex shape that gave a >= probability of being correct, (>= to last recorded shape) **Edit:** To be clear you're building the tree from the leaves, hence the most complex shape taking priority over the simpler shapes, then the match rate over last match rate **Edit 2:** Here's an simplistic example using a photo of a cat's head as the assumed image to analyse for a shape Every strand of fur is a LINE of colours (treated as spheres) Every edge of is treated as a LINE of strands of fur Each ear is treated as a TRIANGLE of edge LINEs, the head center treated as a SPHERE/CIRCLE of edge LINEs The cat head is treated as a combination of a SPHERE/CIRCLE with 2 triangles either side, those having been identified by the previous loop of analysis So in other words it's an exponential problem that can be done fast ONLY with lots of chips/brain cells to do dedicated analysis of a node set vs all other node sets (the initial spheres of colour being the only node sets with just 1 node)
I read all the comments and apparently I was the only one who was distracted by the absence of color in this man's lips. Maybe they go white when he is nervous, or maybe it was a cold room.
:? Is it really that unusual? It doesn’t seem like it to me. Ok, looking, yes, mine to appear to be a fair bit more red then his (though, largely hidden by my mustache), But like.. still?
If all computer science lectures were presented like this gentleman did in this video, all students would excel without a doubt. Great job on this video.
Indeed!!
It is probably because they break down the concept for non-technical users. He doesn't explain like this during lectures, I know this because he is actually my professor.
In case anyone is a bit lost as to what's going on with the dot product. Basically, it's a way of comparing two vectors for similarity. So if you've got two identical vectors of length 1 - so both pointing the same way - the dot is 1. Meaning identical.
Turn one round 180 degrees and the dot product gives -1. If it's perpendicular you get zero. It's a lot like the cosine if you remember your trig.
Bit more to it if they're not unit vectors with a length of one. It basically gives you the ratio if one were projected onto the other. Imagine a clock face with the long hand at 12. The dot product gives you the amount the short had goes up the long hand as a ratio. So if you shine a light from the side and the shadow of one hand goes half way up the other; then the dot product is 0.5.
Hope that helps.
It's so much like the cosine that the value of the dot product is the cosine of the angle between the two vectors multiplied by their lengths. The fact you can take two unit vectors and work out the angle between them just by multiplying matching components, summing those values, and taking the inverse cosine of the result is a bit weird, but it falls out of the cosine rule.
I like this. I'd watch more of this guy explaining graph kernels and beyond!
Me too! But I wish he would do them nude!!
Nice presentation. More content on graphs optimizations
can you say "please", please? 🙂️
Finally something I can understand!!
I hate other explanations either asking to ignore the math, or immediately dive into the code. As a beginner in computer science, I have no idea how different concepts get connected. This guy helps bridge that missing link in the knowledge for me!🙏🙏😆
Thank you so much!! Can he explain also the Kernel trick!
This guy just oozes passion for computer science!
Really interesting. For all interested in those directions I recommend researching in the direction of graph neural networks and specifically for molecules topological neural networks!
Nice to see David on here!
You can use graph convolutional networks to classify certain kinds of abuse by the users of your product. It's really flexible when you have sparse bits of information about the relationships between people and products. Graphs are absolutely everywhere.
The kernel trick is simply a matrix multiplication. You have a embeddding set A of shape (m,n) and you get your kernel(gram matrix) by computing A@AT (A matmul A Transpose) which will be of shape (m,m) which tells you how similar each vector is to every other vector in the batch. if you flip it AT@A you can use that matrix of shape (n,n) and this is used to compute the covariance matrix and this tells you how how two features vary with respect to each other.
All this leads into the SVD where the U matrix captures the eigen vectors of the AAT matrix(observations) and the V matrix captures the eigenvectors of the ATA matrix(features)
I'd think the similarity between two objects is the % of overlapping arrangements. Identical objects would have 1 similarity, while objects that share a large pattern of elements would have about 0.5 similarity. The problem is that going through all those arrangements would get astronomically many right away.
The "plots", distinguished from graphs, which Sean raises at 2:30, are those plots as in a Cartesian coordinate system, like a grid with x and y axes? I assume these algorithms don't apply to plots b/c coordinates are positional, ie, the distance, angle, orientation or other scalar properties of any given coordinate is fixed to the plane itself, and not just relative to other nodes in a graph. Is that more or less why?
So, it seems Dr David is my favorite now :)
I think we need more videos on Graph theory.
Graphs ❤
@08:35, “We’re not going to go into the details of the kernel method…”.
Do you have a recommendation for someone who wants to go into the details? A reference maybe?
Search this course "Machine learning with kernel methods, Spring 2021" thanks to the pandemic they put it online.
Sean's really into it this time
so text language models use a mapping coordinate system to find what words are most likely near it.. it would work with the graph triangle demos easily
Is it not feasible to use an encoder-decoder model to build a low dimension embedding of an adjacency matrix, and then perform cosine similarity?
google "Kipf 2016 Variation Graph Autoencoders". the short answer is that graph convolutions have to be permutation invariant which means using a typical CNN or dense layers on the adjacency matrix doesn't work
wouldn't the indexing of vertices affect the result?
What happens when you use this technique with a distance of 3 and you start having to consider loops?
@15:46, is this correct? A value of 2 for the third entry for WL1(G2) seems to contradict everything.
Yes, it's correct. The two red vertices in the triangle at the top of G2 each have one red neighbour and one blue. (Well, that specific entry is correct -- I didn't check all of them.)
@@beeble2003 got it. I was looking for neighboring shapes as opposed to individual neighboring vertices. You're right.
A general computerphile comment: work is slowly being done to build Babbage's analytical engine. Maybe you can feature the project and give it some momentum!
Instead of three red and three blue nodes, what if we had three blue and one brown? 😁
13:52 It's Leman, not Lehman. Andrei Leman was Russian, not German. But probably half the papers in the literature make this mistake! His co-author, Boris Weisfeiler, at some point emigrated to the US and disappeared in Chile in mysterious circumstances during the Pinochet dictatorship.
So what did AlphaFold do differently?
Why write on paper instead of using ppt style slides ?
shouldent the similarity be the difference between blue and red for each vector? ie the vectors with similar differences are the same?
The problem seems to be not with SVM but how to create a vector that meaningfully represent the graphs.
In the first example, only counting colour vertex is limiting a graph to blue marbles and red marbles in a bag. That is a very poor representation as it looses all the edges and connections.
👍🏼
@Computerphile : David Kohan Marzagão
6:13 Are you telling us that we cannot establish a hash function that give a unique number for each possible graph ?
This is weird, 'cause I think I already did that in my life.
And yes it worked even without specifying any starting point in the graph. It was symmetry-asymmetry-based.
Applying the most tedious and unsmart of naive approaches, we could simply compare the respective sets of all possible paths in each graph. It's very slow, but it proves my point, doesn't it?
Nor is it very difficult to define a distance between graphs by minimizing the number of reversible transformations required to transform one graph into another.
So, How to prove it cannot be done in the general case ? 🤨
You can produce a hash function that will give a unique number for each possible graph -- this is called "graph canonization". However, there is no known way to do this efficiently (i.e., in polynomial time), because it's computationally equivalent to the graph isomorphism problem. If you think you have an efficient algorithm for graph canonization, there are basically three possibilities (listed in decreasing order of probability):
1. it's not actually a canonization algorithm -- either it assigns the same number to two graphs that are not isomorphic, or it assigns different numbers to two different presentations of the same graph;
2. it's not actually as efficient as you think;
3. you've made a major breakthrough in computer science.
To take your specific example of comparing the set of all possible paths, that fails both point 1 and point 2 (as you say -- it's very slow). For point 1, consider a cycle of three vertices and a cycle of four vertices: each contains paths of all possible lengths. For point 2, a graph can have factorially many paths, even if you restrict to non-repeating ones.
13:10 Oh those pens are going to dry out. Put their lids on, man!
Awesome. More graphs!
the one thing I really learned, is that dense embedding get learned. These all seem to be calculated...
We need a word2vec kinda fake task to get something dense and semantically similar?
How can the inner vector show similarity, if you do not normalize the vectors? If you multiply a vector by a factor k, the inner product will also be multiplied with k.
he literally said that
@@raspberry_picker395 But then he literally continued to use it as a measure of similarity, which is just nonsense.
Just watched David's lecture material for AIN and noticed he's so Christopher Hampson coded so I Googled him... and what do you know - they both have a video with Computerphile where the comments are praising them on how well they explain things and how passionate they are - rightfully so.
They both broke down complex material into such bite-sized chunks that it was exciting to follow along.
Great work, glad I had you guys teaching this stuff!
here's an idea: in the edge vectors, each node should be added to all simpler shapes as well. so a blue connected to 2 blues would be added to both the b2b and the b1b options; all blue nodes would be added to a b0 option as well. This measure of similarity would be more robust in the face of nodes with a new neighbor
Isn't that molecule Methanol?
Close but missing the other three hydrogen.
@@S_t_r_e_s_s you can just leave the "H" if it's hydrogen on an edge (but only if it's hydrogen)
although you would normally do it for all cases
@kuhluhOG he drew a line, if you know some chem that looks like hydrogen
tert-butanol since lines ending without a symbol typically represent carbon. It would be methanol with hydrogens at the end of those lines though.
obrigado, Dr. Davi!
Linked List is also graph
New color labeling feels like graph derivation.
I understood everything
After office hours teaching me further on Kernel Hilbert space my ML professor said with disappointment (paraphrased) "But, all of this doesn't matter practically because deep learning, while it's inner workings are not understood rigorously, performance are so much better..." Do you mathematicians have any unput? It makes the work I study and teach in Discrete math seem fascinating but in the end, not practically useful...
Unclosed bracket 🤯
Here you go: >
He just like me frfr
Network topology
I wish my brain was larger. Or denser. Or something.
Don't we all 😂
the molecule is methanol...
i think you should apologize to dogs and cats
Hydrogen Peroxide and Hydrogen Peroxide Water
I say it's rather simple, treat shapes as a tree of shapes, even the simplex shapes, you start by identifying the simplex shapes then repeat the same comparisons with the simplex shapes as the new nodes, then again with the complex shapes as the new nodes, you keep going until all your comparisons start returning less than 50% probability of being the right deal, then you just use the most complex shape that gave a >= probability of being correct, (>= to last recorded shape)
**Edit:** To be clear you're building the tree from the leaves, hence the most complex shape taking priority over the simpler shapes, then the match rate over last match rate
**Edit 2:** Here's an simplistic example using a photo of a cat's head as the assumed image to analyse for a shape
Every strand of fur is a LINE of colours (treated as spheres)
Every edge of is treated as a LINE of strands of fur
Each ear is treated as a TRIANGLE of edge LINEs, the head center treated as a SPHERE/CIRCLE of edge LINEs
The cat head is treated as a combination of a SPHERE/CIRCLE with 2 triangles either side, those having been identified by the previous loop of analysis
So in other words it's an exponential problem that can be done fast ONLY with lots of chips/brain cells to do dedicated analysis of a node set vs all other node sets (the initial spheres of colour being the only node sets with just 1 node)
>
stop interrupting the guy over and over. let him explain!
This must be related to Moleds. Everything is Moleds after all
I read all the comments and apparently I was the only one who was distracted by the absence of color in this man's lips. Maybe they go white when he is nervous, or maybe it was a cold room.
:?
Is it really that unusual? It doesn’t seem like it to me.
Ok, looking, yes, mine to appear to be a fair bit more red then his (though, largely hidden by my mustache),
But like.. still?
maybe ur the only one staring at his lips
@@kellysmith7357 Must be, since I am the only one why commented on it.
Couldn't continue watching. The off-camera man kept interrupting the speaker.