Converting a Tabular Dataset to a Graph Dataset for GNNs

Поделиться
HTML-код
  • Опубликовано: 2 ноя 2024

Комментарии • 69

  • @pranayreddy2190
    @pranayreddy2190 2 года назад +2

    Ah man, I was struggling for the past two days with this. You are a savior!

    • @DeepFindr
      @DeepFindr  2 года назад +1

      I hope it helps :)

  • @govindaagrawal816
    @govindaagrawal816 Год назад +2

    This is such a beautiful tutorial. I cannot thank you enough, I had been searching for a solution to this problem for months, and just now did i come across your video. You have saved my final year project. Thank you so much

    • @sarikasaxena5567
      @sarikasaxena5567 11 месяцев назад

      Could you help me 🙏

    • @raqeebshah5693
      @raqeebshah5693 8 месяцев назад

      can you help me i have some confussion in graph nural network

  • @adarshsingh401
    @adarshsingh401 Год назад +1

    You deserve a subscriber. Your content is next level in terms of knowledge and the way you explain things. Thankyou so much

  • @lola-jp5zs
    @lola-jp5zs Год назад +1

    This is some of the best learning material I have found on youtube, thank you so much!

  • @jakubpiatek9667
    @jakubpiatek9667 2 года назад +1

    I am really looking forward to watch the second video about the temporal graph datasets!

  • @princessdickens2220
    @princessdickens2220 2 года назад +1

    Exceptionally clear! Thank you for making this!

  • @stevechesney9334
    @stevechesney9334 2 месяца назад

    I really appreciate the information that you shared in this video/playlist. Do you have an example of where you used used the Heterogeneous graph data to create a GNN or GCN?

  • @zabedecora3028
    @zabedecora3028 2 года назад

    Thank you for your time making these videos and sharing information, which are not clearly described in manuscripts.

  • @2theorists
    @2theorists 2 года назад

    I really needed this video, thank you soo much man!! appretiate your work❤

  • @saeidkazemi5446
    @saeidkazemi5446 Год назад

    Thanks for your content. It helped me a lot!

  • @joanaaraujo4226
    @joanaaraujo4226 9 месяцев назад

    Very good video! One of the biggest problems is always how to prepare the dataset, and this helped a lot... Can you make a video explaining how to convert PDB file information into graphs to feed in a GCN? I'm having a lot of trouble understanding this...

  • @edlec2565
    @edlec2565 Год назад

    Thank you so very much. It has been really useful for me.

  • @aboudramanediarra7086
    @aboudramanediarra7086 9 месяцев назад

    Hi, thanks you for your tutorial. It's beautiful. I have a project that consist to convert images dataset to graph representation. Can you help about this ? I make many searchs on google but got nothing. Have you some proposition for me ?

  • @imadOualid
    @imadOualid 4 месяца назад

    thank u a lot for all ur videos :D can u do one about graphsage ?

  • @phunparpasis5611
    @phunparpasis5611 Год назад

    I am working on a similar node regressiin problem on homogeneous graph. I tried loading the data in the same manner but I keep getting Key value error. I checked if the dimensionality is appropriate and there is no problem with that. What could possibly be causing this problem? Thank you.

  • @jonathanjeremierandriariso8818
    @jonathanjeremierandriariso8818 2 года назад

    Awesome tutorial . Thanks a lot .

  • @EvanTian-nk7kx
    @EvanTian-nk7kx 6 месяцев назад

    Very good tutorial! Is there a tutorial that converts csv files to PyG data objects with multiple graphs for graph classification?

  • @maximecroft3803
    @maximecroft3803 2 года назад +1

    Many thanks for your great videos and notebooks, it has been very helpful!
    I was wondering when you would recommend the approach for creating a dataset as explained in the video 'GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric' and when you would recommend this approach?

    • @DeepFindr
      @DeepFindr  2 года назад

      Hi! I think the Dataset - class approach is always the one I would prefer (in the long run). In the end everything related to your dataset (download, preprocessing, transformations..) should go there so that you have things separated.
      This is also much cleaner than multiple cells in the notebook :)
      For quick tests and at early stages you can use this easier approach

  • @nazarzaki44
    @nazarzaki44 2 года назад +2

    You can simply calculate the distance between the instances

  • @Best101Bits
    @Best101Bits Год назад

    Great effort! Could you please kindly do this using an example of healthcare data? I couldn't follow because I don't know anything about football. I've just realised how important the choice of example is in teaching and learning. Thanks again.

  • @nazarzaki44
    @nazarzaki44 2 года назад

    Great video! Thanks

  • @sharonqin5911
    @sharonqin5911 Год назад

    Thank you sooooo much bro !!!

  • @sunaryaseo
    @sunaryaseo 2 года назад +3

    This is a great video I was searching for. Most of the problems when learning GNN are when preparing the dataset. Could you provide an explanation using a very familiar dataset, like "iris_dataset.csv"? I'm sure, this will provide a very clear explanation for those of us who are just starting to learn GNN. Thank you in advance.

    • @DeepFindr
      @DeepFindr  2 года назад +2

      Hi! Thank you!
      The problem with iris is that there is no real relational information contained. It only makes sense to build graphs if you can build edges. :) or do you know a way to connect the flowers ;-)

    • @sunaryaseo
      @sunaryaseo 2 года назад

      @@DeepFindr Oh I see, so do you mean not all the datasets can be solved using GNN, am I right? Btw thanks for replying to me.

    • @DeepFindr
      @DeepFindr  2 года назад +2

      Yes correct. Only datasets with relational information (some sort of connection between the entities)

  • @EnsignerTV
    @EnsignerTV 2 года назад

    Thanks for sharing!!

  • @amacodes7347
    @amacodes7347 Год назад

    Really great content and undoubtedly GOATED 🐐. I wondered if we could still use domain knowledge in a specific field like medicine or mechanics where we have just a data set on users and drugs with their ingredients but a third-party dataset on the chemical components and treatment to map relationships with the drugs. Another example could be users and company stocks and their sectors, but we have a third-party economic dataset about the sectors and their contribution to GDP. So just wondering about how to integrate third-party datasets that have a relationship/edge with initial node. Thanks

  • @oliver5356
    @oliver5356 2 года назад

    Love this!

  • @sunnyarora4916
    @sunnyarora4916 9 месяцев назад

    What is your datset is timeseries based, will it work the same way?

  • @ninhhoang616
    @ninhhoang616 Год назад

    Greata video. I have a question. I want to create mulitple graphs in one dataset like dataset ESOL (Water solubility data). Now, I only can create a graph. Could you suggest me some ways to create dataset like ESOL, please? Thank you.

    • @DeepFindr
      @DeepFindr  Год назад

      Hi, regarding multiple graphs I have another video in my GNN project series called "Custom dataset" or something like that. This explains how to create multiple graphs :)

  • @robertchamoun7914
    @robertchamoun7914 2 года назад

    Great video! do you have an idea how we can make the iteration loop over the teams faster (run in parallel)? It seems to me like it would be very slow to iterate over a large dataset with a significant number of individual graph. Thank you!

    • @DeepFindr
      @DeepFindr  2 года назад +1

      Sure running in parallel is a reasonable approach. Simply use multiprocessing and calculate per-type subsets of the dataframe individually.
      There is almost always a way to speed up trivial for loops on dataframes :) might also be that some advanced explode / melt operations allow a faster processing

  • @leo.y.comprendo
    @leo.y.comprendo 2 года назад

    Thanks for the awesome video! I have a couple of questions.
    1. What is the difference between having unidirectional or bidirectional edges in recommender systems? (heterographs)
    2. If I work with a bidirectional hetero graph in a task of link prediction (predict existence of edges between 2 nodes) Should I negative sample both adjancency matrices or just sample one of the directions?
    Thanks again!

    • @DeepFindr
      @DeepFindr  2 года назад +1

      Hi!
      1. The direction of the edges mainly influences what information you share and which nodes are updated. If you only have Person - > Item, then the items are aggregating from persons, but the person node features will never change because they have no incoming connections. If you connect both directions, you allow to also learn about neighbors (i.e. Learning about other persons via the item). I would always go with the bidirectional approach to learn both embeddings (persons and items) :)
      2. I would do the sampling in both directions as this probably makes the model more robust.

    • @DeepFindr
      @DeepFindr  2 года назад

      By the way - what speaks against having only one edge_index instead of two?

  •  2 года назад

    Greatttttt from you!

  • @jonimatix
    @jonimatix 2 года назад

    Awesome video. Can you create a video comparing performance of graph model vs say a boosted model on Anime dataset?

  • @nastaranmarzban1419
    @nastaranmarzban1419 2 года назад

    Hi, thanks for your excellent explanations.
    In the first part, we have multiple disjoint graph, as we make graph based on teamz?
    I mean we have numbers of graphs the same as numbers of teams. Am i correct?

    • @DeepFindr
      @DeepFindr  2 года назад

      Exactly :)

    • @nastaranmarzban1419
      @nastaranmarzban1419 2 года назад

      @@DeepFindr 🌺

    • @nastaranmarzban1419
      @nastaranmarzban1419 2 года назад

      @@DeepFindr Hi, hope you're doing well.
      Sorry I have a question.
      Is there any multivariate dataset in the internet that the variables are labeled?!!!!!
      As far as I've checked the multivariate dataset that I've seen, are labeled based on observations( for example observation 1 suffer from cancer, 2 do not and....)
      Now I want the variables have lables.
      Is there any data set?
      I'll be bery thankfull if you help me as always.
      Thanks in advance🌸

  • @nouraboub4805
    @nouraboub4805 2 года назад

    Thank you so much ❣️

  • @徐琨鹏
    @徐琨鹏 2 года назад

    Thank you so much! Will there be a tutorial on timing diagrams? you are really awesome!!!

    • @DeepFindr
      @DeepFindr  2 года назад

      Hi! Not sure if I'm the right person for timing diagrams. In which context are you using them?

  • @desrucca
    @desrucca Год назад +1

    Can the HeteroData edge index be multi dimensional?

  • @factsandfun4371
    @factsandfun4371 2 года назад

    I have like (13283, 16) shape of the node features, but when I pass it in Data as x , it shows NoneType. Can you please tell why is it giving a NoneType return?

    • @DeepFindr
      @DeepFindr  2 года назад

      Hi! What is the exact error message?

  • @EnsignerTV
    @EnsignerTV 2 года назад

    Thank you ! Do you have any code for a link prediction model?

    • @DeepFindr
      @DeepFindr  2 года назад +2

      Hi! There is a link prediction example on pytorch geometric: github.com/pyg-team/pytorch_geometric/blob/master/examples/link_pred.py
      Also check out the other examples, there might be more :)

  • @mmk8072
    @mmk8072 Год назад

    Good video.

  • @lynnbella5180
    @lynnbella5180 Год назад

    okay now i have the graph, how do i use it for GNN? i someone knows please tell me

  • @ghaithmqawass282
    @ghaithmqawass282 2 года назад

    Hi !
    Thanks for the tutorial
    2- what if in hetero graph we want to add not only one edge type but 3 edge types, should we create 3 triplets and assign the index for the edge type index ?
    2- How to model a hyperlink in hetero?

  • @raqeebshah5693
    @raqeebshah5693 8 месяцев назад

    can i convert diabtes indian pima datasets to graph data ?

  • @kudjoeerik1622
    @kudjoeerik1622 2 года назад

    Thank You

  • @dothiduyen511
    @dothiduyen511 2 года назад

    Hi, Thank you very much for the video. I am still struggling with some errors during the dataset generation. I got the node matrix with size(25,12), label matrix size(25), and edge_idx with size (2,28). I tried to put them together like this: data = Data(x=x, edge_index=edge_index,y=labels). Then I load my data into DataLoader: loader = DataLoader(data, batch_size=10, shuffle=True). However, I got some errors. Is it possible if you could suggest me some source of knowledge to solve this problem? Thank you very much.

    • @DeepFindr
      @DeepFindr  2 года назад +1

      Hello! Which errors did you get? :)
      One suggestion is to search the Github issues for the error message you get.
      But maybe I have also seen your error before :)

    • @cloverduyen9603
      @cloverduyen9603 2 года назад

      @@DeepFindr hi, I was trying to search everywhere but it seems I couldnot see any explanation. I guess I got problem with the edge matrix. Is it possible if I could have your email address? I might give you a screenshot of my error. I do appreciate a lot for your kind help.

    • @DeepFindr
      @DeepFindr  2 года назад

      Sure! Send it to deepfindr@gmail.com :)

  • @HieuLe-kl2us
    @HieuLe-kl2us 2 года назад

    Thank you for your but I don't know how to load data from loader (at Homogeneous graphs this video).

    • @DeepFindr
      @DeepFindr  2 года назад +1

      Hello! For this pleas have a look at the video "custom dataset" in my GNN project video series. Let me know if it helped :)

    • @HieuLe-kl2us
      @HieuLe-kl2us 2 года назад

      @@DeepFindr Thank you so much. It is helpful for me

  • @PriscyllaSS
    @PriscyllaSS 2 года назад

    First, great video! I was following part 2 of the tutorial (Tabular dataset -> Temporal Graph dataset) and I got this error in the last part in creating the StaticGraphTemporalSignal: TypeError: __init__() takes 5 positional arguments but 6 were given

    • @DeepFindr
      @DeepFindr  2 года назад

      Hi! The additional arguments need to be passed as keyword arguments. So maybe try to pass them like y_index = y_index.
      Will upload the next video today or tomorrow :)