This is such a beautiful tutorial. I cannot thank you enough, I had been searching for a solution to this problem for months, and just now did i come across your video. You have saved my final year project. Thank you so much
I really appreciate the information that you shared in this video/playlist. Do you have an example of where you used used the Heterogeneous graph data to create a GNN or GCN?
Very good video! One of the biggest problems is always how to prepare the dataset, and this helped a lot... Can you make a video explaining how to convert PDB file information into graphs to feed in a GCN? I'm having a lot of trouble understanding this...
Hi, thanks you for your tutorial. It's beautiful. I have a project that consist to convert images dataset to graph representation. Can you help about this ? I make many searchs on google but got nothing. Have you some proposition for me ?
I am working on a similar node regressiin problem on homogeneous graph. I tried loading the data in the same manner but I keep getting Key value error. I checked if the dimensionality is appropriate and there is no problem with that. What could possibly be causing this problem? Thank you.
Many thanks for your great videos and notebooks, it has been very helpful! I was wondering when you would recommend the approach for creating a dataset as explained in the video 'GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric' and when you would recommend this approach?
Hi! I think the Dataset - class approach is always the one I would prefer (in the long run). In the end everything related to your dataset (download, preprocessing, transformations..) should go there so that you have things separated. This is also much cleaner than multiple cells in the notebook :) For quick tests and at early stages you can use this easier approach
Great effort! Could you please kindly do this using an example of healthcare data? I couldn't follow because I don't know anything about football. I've just realised how important the choice of example is in teaching and learning. Thanks again.
This is a great video I was searching for. Most of the problems when learning GNN are when preparing the dataset. Could you provide an explanation using a very familiar dataset, like "iris_dataset.csv"? I'm sure, this will provide a very clear explanation for those of us who are just starting to learn GNN. Thank you in advance.
Hi! Thank you! The problem with iris is that there is no real relational information contained. It only makes sense to build graphs if you can build edges. :) or do you know a way to connect the flowers ;-)
Really great content and undoubtedly GOATED 🐐. I wondered if we could still use domain knowledge in a specific field like medicine or mechanics where we have just a data set on users and drugs with their ingredients but a third-party dataset on the chemical components and treatment to map relationships with the drugs. Another example could be users and company stocks and their sectors, but we have a third-party economic dataset about the sectors and their contribution to GDP. So just wondering about how to integrate third-party datasets that have a relationship/edge with initial node. Thanks
Greata video. I have a question. I want to create mulitple graphs in one dataset like dataset ESOL (Water solubility data). Now, I only can create a graph. Could you suggest me some ways to create dataset like ESOL, please? Thank you.
Hi, regarding multiple graphs I have another video in my GNN project series called "Custom dataset" or something like that. This explains how to create multiple graphs :)
Great video! do you have an idea how we can make the iteration loop over the teams faster (run in parallel)? It seems to me like it would be very slow to iterate over a large dataset with a significant number of individual graph. Thank you!
Sure running in parallel is a reasonable approach. Simply use multiprocessing and calculate per-type subsets of the dataframe individually. There is almost always a way to speed up trivial for loops on dataframes :) might also be that some advanced explode / melt operations allow a faster processing
Thanks for the awesome video! I have a couple of questions. 1. What is the difference between having unidirectional or bidirectional edges in recommender systems? (heterographs) 2. If I work with a bidirectional hetero graph in a task of link prediction (predict existence of edges between 2 nodes) Should I negative sample both adjancency matrices or just sample one of the directions? Thanks again!
Hi! 1. The direction of the edges mainly influences what information you share and which nodes are updated. If you only have Person - > Item, then the items are aggregating from persons, but the person node features will never change because they have no incoming connections. If you connect both directions, you allow to also learn about neighbors (i.e. Learning about other persons via the item). I would always go with the bidirectional approach to learn both embeddings (persons and items) :) 2. I would do the sampling in both directions as this probably makes the model more robust.
Hi, thanks for your excellent explanations. In the first part, we have multiple disjoint graph, as we make graph based on teamz? I mean we have numbers of graphs the same as numbers of teams. Am i correct?
@@DeepFindr Hi, hope you're doing well. Sorry I have a question. Is there any multivariate dataset in the internet that the variables are labeled?!!!!! As far as I've checked the multivariate dataset that I've seen, are labeled based on observations( for example observation 1 suffer from cancer, 2 do not and....) Now I want the variables have lables. Is there any data set? I'll be bery thankfull if you help me as always. Thanks in advance🌸
I have like (13283, 16) shape of the node features, but when I pass it in Data as x , it shows NoneType. Can you please tell why is it giving a NoneType return?
Hi! There is a link prediction example on pytorch geometric: github.com/pyg-team/pytorch_geometric/blob/master/examples/link_pred.py Also check out the other examples, there might be more :)
Hi ! Thanks for the tutorial 2- what if in hetero graph we want to add not only one edge type but 3 edge types, should we create 3 triplets and assign the index for the edge type index ? 2- How to model a hyperlink in hetero?
Hi, Thank you very much for the video. I am still struggling with some errors during the dataset generation. I got the node matrix with size(25,12), label matrix size(25), and edge_idx with size (2,28). I tried to put them together like this: data = Data(x=x, edge_index=edge_index,y=labels). Then I load my data into DataLoader: loader = DataLoader(data, batch_size=10, shuffle=True). However, I got some errors. Is it possible if you could suggest me some source of knowledge to solve this problem? Thank you very much.
Hello! Which errors did you get? :) One suggestion is to search the Github issues for the error message you get. But maybe I have also seen your error before :)
@@DeepFindr hi, I was trying to search everywhere but it seems I couldnot see any explanation. I guess I got problem with the edge matrix. Is it possible if I could have your email address? I might give you a screenshot of my error. I do appreciate a lot for your kind help.
First, great video! I was following part 2 of the tutorial (Tabular dataset -> Temporal Graph dataset) and I got this error in the last part in creating the StaticGraphTemporalSignal: TypeError: __init__() takes 5 positional arguments but 6 were given
Hi! The additional arguments need to be passed as keyword arguments. So maybe try to pass them like y_index = y_index. Will upload the next video today or tomorrow :)
Ah man, I was struggling for the past two days with this. You are a savior!
I hope it helps :)
This is such a beautiful tutorial. I cannot thank you enough, I had been searching for a solution to this problem for months, and just now did i come across your video. You have saved my final year project. Thank you so much
Could you help me 🙏
can you help me i have some confussion in graph nural network
You deserve a subscriber. Your content is next level in terms of knowledge and the way you explain things. Thankyou so much
This is some of the best learning material I have found on youtube, thank you so much!
I am really looking forward to watch the second video about the temporal graph datasets!
Exceptionally clear! Thank you for making this!
I really appreciate the information that you shared in this video/playlist. Do you have an example of where you used used the Heterogeneous graph data to create a GNN or GCN?
Thank you for your time making these videos and sharing information, which are not clearly described in manuscripts.
I really needed this video, thank you soo much man!! appretiate your work❤
Thanks for your content. It helped me a lot!
Very good video! One of the biggest problems is always how to prepare the dataset, and this helped a lot... Can you make a video explaining how to convert PDB file information into graphs to feed in a GCN? I'm having a lot of trouble understanding this...
Thank you so very much. It has been really useful for me.
Hi, thanks you for your tutorial. It's beautiful. I have a project that consist to convert images dataset to graph representation. Can you help about this ? I make many searchs on google but got nothing. Have you some proposition for me ?
thank u a lot for all ur videos :D can u do one about graphsage ?
I am working on a similar node regressiin problem on homogeneous graph. I tried loading the data in the same manner but I keep getting Key value error. I checked if the dimensionality is appropriate and there is no problem with that. What could possibly be causing this problem? Thank you.
Awesome tutorial . Thanks a lot .
Very good tutorial! Is there a tutorial that converts csv files to PyG data objects with multiple graphs for graph classification?
Many thanks for your great videos and notebooks, it has been very helpful!
I was wondering when you would recommend the approach for creating a dataset as explained in the video 'GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric' and when you would recommend this approach?
Hi! I think the Dataset - class approach is always the one I would prefer (in the long run). In the end everything related to your dataset (download, preprocessing, transformations..) should go there so that you have things separated.
This is also much cleaner than multiple cells in the notebook :)
For quick tests and at early stages you can use this easier approach
You can simply calculate the distance between the instances
Great effort! Could you please kindly do this using an example of healthcare data? I couldn't follow because I don't know anything about football. I've just realised how important the choice of example is in teaching and learning. Thanks again.
Great video! Thanks
Thank you sooooo much bro !!!
This is a great video I was searching for. Most of the problems when learning GNN are when preparing the dataset. Could you provide an explanation using a very familiar dataset, like "iris_dataset.csv"? I'm sure, this will provide a very clear explanation for those of us who are just starting to learn GNN. Thank you in advance.
Hi! Thank you!
The problem with iris is that there is no real relational information contained. It only makes sense to build graphs if you can build edges. :) or do you know a way to connect the flowers ;-)
@@DeepFindr Oh I see, so do you mean not all the datasets can be solved using GNN, am I right? Btw thanks for replying to me.
Yes correct. Only datasets with relational information (some sort of connection between the entities)
Thanks for sharing!!
Really great content and undoubtedly GOATED 🐐. I wondered if we could still use domain knowledge in a specific field like medicine or mechanics where we have just a data set on users and drugs with their ingredients but a third-party dataset on the chemical components and treatment to map relationships with the drugs. Another example could be users and company stocks and their sectors, but we have a third-party economic dataset about the sectors and their contribution to GDP. So just wondering about how to integrate third-party datasets that have a relationship/edge with initial node. Thanks
Love this!
What is your datset is timeseries based, will it work the same way?
Greata video. I have a question. I want to create mulitple graphs in one dataset like dataset ESOL (Water solubility data). Now, I only can create a graph. Could you suggest me some ways to create dataset like ESOL, please? Thank you.
Hi, regarding multiple graphs I have another video in my GNN project series called "Custom dataset" or something like that. This explains how to create multiple graphs :)
Great video! do you have an idea how we can make the iteration loop over the teams faster (run in parallel)? It seems to me like it would be very slow to iterate over a large dataset with a significant number of individual graph. Thank you!
Sure running in parallel is a reasonable approach. Simply use multiprocessing and calculate per-type subsets of the dataframe individually.
There is almost always a way to speed up trivial for loops on dataframes :) might also be that some advanced explode / melt operations allow a faster processing
Thanks for the awesome video! I have a couple of questions.
1. What is the difference between having unidirectional or bidirectional edges in recommender systems? (heterographs)
2. If I work with a bidirectional hetero graph in a task of link prediction (predict existence of edges between 2 nodes) Should I negative sample both adjancency matrices or just sample one of the directions?
Thanks again!
Hi!
1. The direction of the edges mainly influences what information you share and which nodes are updated. If you only have Person - > Item, then the items are aggregating from persons, but the person node features will never change because they have no incoming connections. If you connect both directions, you allow to also learn about neighbors (i.e. Learning about other persons via the item). I would always go with the bidirectional approach to learn both embeddings (persons and items) :)
2. I would do the sampling in both directions as this probably makes the model more robust.
By the way - what speaks against having only one edge_index instead of two?
Greatttttt from you!
Awesome video. Can you create a video comparing performance of graph model vs say a boosted model on Anime dataset?
Hi, thanks for your excellent explanations.
In the first part, we have multiple disjoint graph, as we make graph based on teamz?
I mean we have numbers of graphs the same as numbers of teams. Am i correct?
Exactly :)
@@DeepFindr 🌺
@@DeepFindr Hi, hope you're doing well.
Sorry I have a question.
Is there any multivariate dataset in the internet that the variables are labeled?!!!!!
As far as I've checked the multivariate dataset that I've seen, are labeled based on observations( for example observation 1 suffer from cancer, 2 do not and....)
Now I want the variables have lables.
Is there any data set?
I'll be bery thankfull if you help me as always.
Thanks in advance🌸
Thank you so much ❣️
Thank you so much! Will there be a tutorial on timing diagrams? you are really awesome!!!
Hi! Not sure if I'm the right person for timing diagrams. In which context are you using them?
Can the HeteroData edge index be multi dimensional?
I have like (13283, 16) shape of the node features, but when I pass it in Data as x , it shows NoneType. Can you please tell why is it giving a NoneType return?
Hi! What is the exact error message?
Thank you ! Do you have any code for a link prediction model?
Hi! There is a link prediction example on pytorch geometric: github.com/pyg-team/pytorch_geometric/blob/master/examples/link_pred.py
Also check out the other examples, there might be more :)
Good video.
okay now i have the graph, how do i use it for GNN? i someone knows please tell me
Hi !
Thanks for the tutorial
2- what if in hetero graph we want to add not only one edge type but 3 edge types, should we create 3 triplets and assign the index for the edge type index ?
2- How to model a hyperlink in hetero?
can i convert diabtes indian pima datasets to graph data ?
Thank You
Hi, Thank you very much for the video. I am still struggling with some errors during the dataset generation. I got the node matrix with size(25,12), label matrix size(25), and edge_idx with size (2,28). I tried to put them together like this: data = Data(x=x, edge_index=edge_index,y=labels). Then I load my data into DataLoader: loader = DataLoader(data, batch_size=10, shuffle=True). However, I got some errors. Is it possible if you could suggest me some source of knowledge to solve this problem? Thank you very much.
Hello! Which errors did you get? :)
One suggestion is to search the Github issues for the error message you get.
But maybe I have also seen your error before :)
@@DeepFindr hi, I was trying to search everywhere but it seems I couldnot see any explanation. I guess I got problem with the edge matrix. Is it possible if I could have your email address? I might give you a screenshot of my error. I do appreciate a lot for your kind help.
Sure! Send it to deepfindr@gmail.com :)
Thank you for your but I don't know how to load data from loader (at Homogeneous graphs this video).
Hello! For this pleas have a look at the video "custom dataset" in my GNN project video series. Let me know if it helped :)
@@DeepFindr Thank you so much. It is helpful for me
First, great video! I was following part 2 of the tutorial (Tabular dataset -> Temporal Graph dataset) and I got this error in the last part in creating the StaticGraphTemporalSignal: TypeError: __init__() takes 5 positional arguments but 6 were given
Hi! The additional arguments need to be passed as keyword arguments. So maybe try to pass them like y_index = y_index.
Will upload the next video today or tomorrow :)