Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files

Поделиться
HTML-код
  • Опубликовано: 24 окт 2024

Комментарии • 49

  • @buithanhlam3726
    @buithanhlam3726 3 года назад +2

    I found it very difficult to get used to torchtext docs, but then I found your video :) Many thanks!

  • @salihbalci19
    @salihbalci19 Год назад +2

    could you make a video for new version of torchtext?

  • @subhasish661411
    @subhasish661411 3 года назад +1

    I somehow found the potato quote very inspiring 🤔.

    • @AladdinPersson
      @AladdinPersson  3 года назад +1

      Definitely mislabeled samples in that dataset

  • @dhawalsalvi8079
    @dhawalsalvi8079 2 года назад +1

    Yo, new version of torchtext (0.12) does not have Fields

  • @henricbohm8455
    @henricbohm8455 3 года назад +1

    Very helpful tutorial.
    Is it possible for you to make a tutorial, how to load data which is stored in a SQL database?

  • @ximingdong503
    @ximingdong503 3 года назад

    I have a question, how Can I save (train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)) train data or test data in IMDB dataset as a CSV file? since every time I need 15 mins to load data. thanks bro

  • @stephennfernandes
    @stephennfernandes 3 года назад

    very nice tutorial, but i get warning texts saying BucketIerator, Field and TabularDataset are being deprecated ... also i cant scale BucketIterator for TPUs and multi-GPUs , any better alternatives ??

  • @sabarishwarang6212
    @sabarishwarang6212 Год назад

    Fields TabularDataset are depreciated now. Is there any alternatives?

  • @MasterMan2015
    @MasterMan2015 2 года назад

    Thanks for doing that. How to save it once we created it.

  • @FEARESSERES
    @FEARESSERES Год назад

    10:20 I felt that pain :D, happened to me, too recently

  • @jeremiahjohnson6052
    @jeremiahjohnson6052 3 года назад +1

    Can you do a video on the updated Torchtext 9.0? I think they revamped much of this and the new features look pretty awesome with subword tokenization implemented. (i.e. 'sub', '_word')

  • @finix7419
    @finix7419 6 месяцев назад

    torchtext had some changes it seems, can't import these modules with recent version

  • @AhmedIqbal
    @AhmedIqbal 4 года назад +2

    Please create one video on semantic segmentation using Pytroch CNN. Dataset must contain cancer images + ground truth images. And train model will return best IoU and Accuracy of proposed model.

    • @AladdinPersson
      @AladdinPersson  4 года назад +1

      I will try to keep that in mind, thank you for the comment!

  • @sagsriv
    @sagsriv 3 года назад

    Your examples are left padded but when I use the same bucket iterator on IMDB dataset, they are right padded. This is a bit confusing

  • @ugestacoolie5998
    @ugestacoolie5998 4 месяца назад

    hi, it seems that torchtext got quite a bit of a changeover and this tutorial's contents are outdated, any chances you might wanna update it?

  • @le-0ne
    @le-0ne 3 года назад

    Very nice tutorial! While i was looking at torchtext, I actually came across the libraries torchnlp and allennlp. I couldn't really tell what the differences between them were. Have you worked with them ?

  • @salihbalci19
    @salihbalci19 Год назад

    Always getting error can you help me please...
    AttributeError: module 'torchtext.data' has no attribute 'Field'

  • @shoebjoarder
    @shoebjoarder 4 года назад

    Great video! I wanted to ask, how to use TabularDataset to split train, validation and test?
    Should I use something like this below?
    train_data, valid_data = TabularDataset.splits(
    path="data", train="train.json", test="valid.json", format="json", fields=fields
    )
    test_data = TabularDataset.splits(
    path="data", test="test.json", format="json", fields=fields
    )

    • @AladdinPersson
      @AladdinPersson  4 года назад

      Yes that looks fine to me, from my understanding and this is I believe how I showed it in the video is that you need separate json/csv files and then in TabularDataset you specify those files using train, test and also possibly validation data

    • @shoebjoarder
      @shoebjoarder 4 года назад

      @@AladdinPersson Hi Aladdin, I probably have missed out you talking about the validation split. Thank you for the replying, the code is working. :)

  • @mehuljain4920
    @mehuljain4920 4 года назад

    Hi,
    I am trying to iterate the training data and its working fine but for test data but is showing me an error?
    Could you pls help me in resolving that error?
    I would really appreciate it. Thanks

    • @beach2550
      @beach2550 4 года назад

      How? Imagine if you could do everything except for reading someone's mind, how would you answer that question?

  • @idobooks909
    @idobooks909 3 года назад

    Thanks! which version of torchtext?

  • @fabianboro4686
    @fabianboro4686 Год назад

    what a fancy math intro

  • @lingfengshen8559
    @lingfengshen8559 4 года назад

    Great video, I do really learn a lot, thanks. When I run BucketIterator, it came up with an error 'int' object is not subscriptable
    , I check my codes but still got no ideas of where the fault is.

    • @pl4117
      @pl4117 3 года назад

      Make sure you are using batch_size as an argumant name instead of batch_sizes when calling the bucket iterator. I had this issue too.

  • @m.j.8527
    @m.j.8527 3 года назад

    how can I pass pandas dataframe into this process (instead of loading the file)?

  • @mirambikasikdar5655
    @mirambikasikdar5655 4 года назад +1

    thank you!

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 4 года назад

    Is BucketIterator being phased out in future releases?

    • @AladdinPersson
      @AladdinPersson  4 года назад

      Yes, I'm going to wait until that release and update the torchtext tutorials. Hopefully code from Seq2Seq is still being able to run though.

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 4 года назад

    Why can't we go directly from the word to word embedding, without the need to indexify?

    • @AladdinPersson
      @AladdinPersson  4 года назад

      Hm, not sure exactly what you're suggesting. Could you share some code (or state which part you think is unecessary from the video)?

    • @user-or7ji5hv8y
      @user-or7ji5hv8y 4 года назад

      @@AladdinPersson Sorry, I think I understand now, after watching your Seq2Seq video. Thanks!

  • @lazypunk794
    @lazypunk794 4 года назад +1

    Great tutorial, but sadly I think its already outdated. Torchtext has deprecated 'Field' and some other classes and Printing the keys and values of the dict 10:27 doesn't give proper representations of the objects anymore, they probably broke something while updating the code

    • @AladdinPersson
      @AladdinPersson  4 года назад +3

      Will look into it more but I think you might be right about this unfortunately... I just really hope all Seq2Seq tutorials that build on this will still be able to run :/

    • @lazypunk794
      @lazypunk794 4 года назад +1

      @@AladdinPersson Sorry, false alarm. The mistake in my case was that I used TabularDataset.splits() when I had just a train.csv and no test. There is no need of splits method in that case. I was just mindlessly copying from the video. My bad. The deprecation thing is just a warning as of now. The deprecated stuff will go into torchtext.legacy, so I guess your code will still work in 0.8 with some import changes.

  • @rishadkt837
    @rishadkt837 2 года назад

    Feild has become legacy

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 4 года назад

    Is the GitHub page for this available?

    • @AladdinPersson
      @AladdinPersson  4 года назад +2

      Yes: github.com/aladdinpersson/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/torchtext/torchtext_tutorial1.py

  • @doyourealise
    @doyourealise 4 года назад

    hey please create video on legacy of torchtext.data.field , also in upcoming update how can we load field in torchtext 0.8?

    • @AladdinPersson
      @AladdinPersson  4 года назад

      Yeah I saw that torchtext is going to update, I will wait a little bit to make sure they aren't going to make any additional changes and then update my previous videos to the new use of the API

  • @ZobeirRaisi
    @ZobeirRaisi 4 года назад

    Thanks

  • @mummyskitchen5311
    @mummyskitchen5311 4 года назад

    JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    ______________
    Got this error at:
    JSONDecodeError Traceback (most recent call last)
    in ()
    4 test ='test.json',
    5 format = 'json',
    ----> 6 fields=fields
    7 )
    8
    6 frames
    /usr/lib/python3.6/json/decoder.py in raw_decode(self, s, idx)
    355 obj, end = self.scan_once(s, idx)
    356 except StopIteration as err:
    --> 357 raise JSONDecodeError("Expecting value", s, err.value) from None
    358 return obj, end
    ____________
    my dataframe:
    quote score
    0 good product pretty satisfied quality diaper g... 0.9
    1 leak extremely leaky doesnt soak -0.7
    2 version good leak nee version leak night gud s... -0.6
    3 good kid nice product daughter loving hair shi... 0.9

  • @vijaypalmanit
    @vijaypalmanit 3 года назад

    Field is deprecated, so tutorial is no more relevant...