How to build custom Datasets for Text in Pytorch

Поделиться
HTML-код
  • Опубликовано: 4 фев 2025

Комментарии • 55

  • @deepshankarjha5344
    @deepshankarjha5344 4 года назад +1

    this is the best pytorch tutorial on the internet. even better then the doc provided by the website

  • @sachavanweeren9578
    @sachavanweeren9578 3 года назад +9

    Thanks for the tutorial. It might be worthwhile to show intermediate results of what different parts do earlier in the video to show exactly what certain code snippets do

  • @simoneparvizi775
    @simoneparvizi775 2 года назад +1

    The video is great, really. Just 1 thing that (personally) would make everything literally perfect: could you explain literally everything? Like when you mention transform at 5:50 and you said that you put it as None, explain why etc. As well as for the rest. Basically what you did at 6:40 for the "csv" function explanation. Again, this is only my personal opinion and it would personally help me so much
    Keep up the great work!

  • @aboalifan123
    @aboalifan123 3 года назад +1

    Thanks Aladdin, best Pytorch tutorials on the web

  • @tomkohler1285
    @tomkohler1285 4 года назад +7

    This will save my live. I try loading data since one week and only fail.

  • @foobar1231
    @foobar1231 3 года назад +4

    Very useful source code. Shouldn't remember it by heart, but worth to understand.
    Thank you!

  • @haideralishuvo4781
    @haideralishuvo4781 4 года назад +4

    Awesome tutorial , best channel on pytorch :D

  • @PaAGadirajuSanjayVarma
    @PaAGadirajuSanjayVarma 4 года назад +2

    Give this man a nobel prize

  • @sayedathar2507
    @sayedathar2507 3 года назад

    Thanks Alot for your videos it is helping me alot to learn pytorch , I am trying out to build an Image Captitioning model on a Custom Dataset , Your Videos on Image Captitioning will be useful alot :) , Thanks alot again

  • @curatorsshelf393
    @curatorsshelf393 Год назад

    Thanks a lot for the video. Update for the spacy configuration:
    spacy_eng = spacy.load("en_core_web_sm") - is the correct way to do now :)

  • @vijayendrasdm
    @vijayendrasdm 4 года назад +1

    Thanks for the video. This is helpful.
    Waiting for the next.

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 4 года назад +3

    In future videos, may be you can also add an explanation as to why you architected the objects in this way.

  • @samufit9839
    @samufit9839 3 месяца назад

    In the collate function you are padding samples from "local" maximum length (inside the same batch), this would result in different lengths for different batches. You should have computed the maximum (inside the whole dataset instead of a single batch) before padding all the batches.

  • @supervince110
    @supervince110 3 года назад

    Dude, you are simply awesome!

  • @zhenyuqiu1480
    @zhenyuqiu1480 3 года назад +1

    helpful to me!thank you!

  • @2mitable
    @2mitable 2 года назад

    we can also use the torchtext Field class for the EOS and SOS and in the same class we have build vocab too

  • @sayedathar2507
    @sayedathar2507 3 года назад

    This was really helpful thanks alot bro your videos are saviour Love You :)

  • @thecros1076
    @thecros1076 4 года назад +1

    Really enjoyed the learning journey with u❤️❤️

  • @takagisa4928
    @takagisa4928 3 года назад

    It's a really nice tutorial,thanks a lot!

  • @gaurikmukherjee9692
    @gaurikmukherjee9692 3 года назад

    Amazing video! Had one doubt. Does spacy remove punctuations and white spaces, because it is not doing that when I am trying?

  • @kirtipandya4618
    @kirtipandya4618 3 года назад +2

    In Custom Dataset class one should not add more augmentations or processes. It makes the training very very slow. Do you know any hack to fix this? Here you open the image file, numericalize the text which makes the dataloading process very slow.

  • @adesiph.d.journal461
    @adesiph.d.journal461 4 года назад +3

    Amazing Tutorial. Thanks for it! I am missing the need .unsqueeze(0) for each item in the batch while assigning it to the imgs. Any input on that would be much appreciated. Thanks!

  • @hrithicksen3644
    @hrithicksen3644 4 года назад +2

    Bro, please implement more papers. Make a video on How to use YOLO in torch... Please dude

  • @ahmetsuna4117
    @ahmetsuna4117 3 года назад

    Great video Aladdin. Thanks. I have one question: at the last of the video, sequence lengths seems different. Why they do not equal to [26,32], isn't that a mistake?

  • @mohitsinghpawar9387
    @mohitsinghpawar9387 4 года назад +1

    Thanks a lot 😊

  • @midnightphantom4787
    @midnightphantom4787 15 дней назад

    Ngl the errors did make me feel kinda fine xuz they xleared some goofy doubts

  • @TheFotbollen10
    @TheFotbollen10 4 года назад

    Great video! Do you have an idea of how to translate from english to python code (with custom dataset) using transformer?

  • @andrewwilliam2209
    @andrewwilliam2209 3 года назад

    Hey Aladdin, is there any advantage of doing this over using let's say the Vocab that torchtext provides? I'm currently exploring PyTorch for a project.

    • @AladdinPersson
      @AladdinPersson  3 года назад

      I prefer to build things myself whenever possible so there is no lacking in my understanding but torchtext is great too. Perhaps this is more low level and isn't actually what you would use for larger projects but can be useful for understanding

    • @andrewwilliam2209
      @andrewwilliam2209 3 года назад +1

      @@AladdinPersson That's a great mindset to have! I'm actually pretty early in my deep learning journey. Currently just started a small project using BERT. Thanks for the video, will be trying to take away what I got here for my own project :D

  • @ameerhamza111
    @ameerhamza111 4 года назад

    Hi, could to make any tutorial to make captions on individual frames of an video . please!

  • @sulavojha8322
    @sulavojha8322 4 года назад

    Excellent content as always .
    Any plan for gans(pix2pix, cyclegan) ?

    • @AladdinPersson
      @AladdinPersson  4 года назад +2

      I've seen a lot of people want to see videos on it so I will definitely do it but the question is when since I got a couple of videos planned:)

    • @haideralishuvo4781
      @haideralishuvo4781 4 года назад

      @@AladdinPersson Cyclegans please :(

    • @hrithicksen3644
      @hrithicksen3644 4 года назад

      Bro please more paper implementation

  • @soumyajahagirdar6708
    @soumyajahagirdar6708 4 года назад

    what if i have two text files , meaning the input to the model is image as well as some text and the output is also text?((Image+text_data)-->(text_data))? Do I have to create two vocabularies??

    • @AladdinPersson
      @AladdinPersson  4 года назад +2

      If we are doing machine translation for example then we would need two different vocabularies one for each language, However if both the input text and output text are in the same language then you can reuse the vocabulary

    • @soumyajahagirdar6708
      @soumyajahagirdar6708 4 года назад

      Thank you so much!

  • @ArunKumar-sg6jf
    @ArunKumar-sg6jf 2 года назад

    what value in pad_idx in padding value tell me range of values

  • @sahil-7473
    @sahil-7473 4 года назад

    Hello Sir,
    Can you explain what are these 'pin memory' and 'collate function' actually means? I did go to documentation. But I didn't understand fully. Can you explain in easier way? That would be helpful.
    Rest of them were understood very well.
    Thanks

    • @AladdinPersson
      @AladdinPersson  4 года назад +2

      pin_memory=True should be set as default as is going to speed up the model by pinning the video memory for the model computations but the internals of how that works I'm as clueless as you. The collate function is for additional processing you want to do on the batch you've collected, so in this case we setup how to load all of these captions but when we actually have the batch we need to make sure they are all padded to be of equal number of time steps, this is done using the collate function

    • @sahil-7473
      @sahil-7473 4 года назад

      @@AladdinPersson Thanks.

  • @feravladimirovna1044
    @feravladimirovna1044 4 года назад

    Can I ask what is the secret behind this beautiful intro?

  • @DEEPUSNDS
    @DEEPUSNDS 4 года назад

    How we can build custom dataset on machine translation corpora?

    • @AladdinPersson
      @AladdinPersson  4 года назад

      I think you can do very similar thing as we did in the video just that we would have to do it for the two languages used in the translation. To make it easier you could also check out torchtext which could make it a lot easier to load datasets

  • @robertnakano2596
    @robertnakano2596 2 года назад

    Link to the loader_customtext.py file in Alladdin's repo: github.com/aladdinpersson/Machine-Learning-Collection/blob/master/ML/Pytorch/Basics/custom_dataset_txt/loader_customtext.py

  • @siennypoole4366
    @siennypoole4366 3 года назад

    Hi! anyone know how to fix imgs = [item[0].unsqueeze(0) for item in batch] raise AttributeError(name)
    AttributeError: unsqueeze

  • @physicsmadness
    @physicsmadness 3 года назад

    amazing content, learnt so much from your channel...however your coding style is a bit strange....I am no one to judge you...but you code in inverted fashion which makes it difficult to follow the stuff...for ex. you write the function calls first and then you go on defining the classes and methods..so what happens is that the structure of your code is not clear initially....however everything comes together by the end of the code...anyways, thanks for all the efforts that you have put and all the best!