Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files

Aladdin Persson

Просмотров 29 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 24 окт 2024

Комментарии • 49

@buithanhlam3726 3 года назад ⁺²
I found it very difficult to get used to torchtext docs, but then I found your video :) Many thanks!
@salihbalci19 Год назад ⁺²
could you make a video for new version of torchtext?
@subhasish661411 3 года назад ⁺¹
I somehow found the potato quote very inspiring 🤔.
@AladdinPersson 3 года назад ⁺¹
Definitely mislabeled samples in that dataset
@dhawalsalvi8079 2 года назад ⁺¹
Yo, new version of torchtext (0.12) does not have Fields
@henricbohm8455 3 года назад ⁺¹
Very helpful tutorial.
Is it possible for you to make a tutorial, how to load data which is stored in a SQL database?
@ximingdong503 3 года назад
I have a question, how Can I save (train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)) train data or test data in IMDB dataset as a CSV file? since every time I need 15 mins to load data. thanks bro
@stephennfernandes 3 года назад
very nice tutorial, but i get warning texts saying BucketIerator, Field and TabularDataset are being deprecated ... also i cant scale BucketIterator for TPUs and multi-GPUs , any better alternatives ??
@sabarishwarang6212 Год назад
Fields TabularDataset are depreciated now. Is there any alternatives?
@MasterMan2015 2 года назад
Thanks for doing that. How to save it once we created it.
@FEARESSERES Год назад
10:20 I felt that pain :D, happened to me, too recently
@jeremiahjohnson6052 3 года назад ⁺¹
Can you do a video on the updated Torchtext 9.0? I think they revamped much of this and the new features look pretty awesome with subword tokenization implemented. (i.e. 'sub', '_word')
@finix7419 6 месяцев назад
torchtext had some changes it seems, can't import these modules with recent version
@AhmedIqbal 4 года назад ⁺²
Please create one video on semantic segmentation using Pytroch CNN. Dataset must contain cancer images + ground truth images. And train model will return best IoU and Accuracy of proposed model.
@AladdinPersson 4 года назад ⁺¹
I will try to keep that in mind, thank you for the comment!
@sagsriv 3 года назад
Your examples are left padded but when I use the same bucket iterator on IMDB dataset, they are right padded. This is a bit confusing
@ugestacoolie5998 4 месяца назад
hi, it seems that torchtext got quite a bit of a changeover and this tutorial's contents are outdated, any chances you might wanna update it?
@le-0ne 3 года назад
Very nice tutorial! While i was looking at torchtext, I actually came across the libraries torchnlp and allennlp. I couldn't really tell what the differences between them were. Have you worked with them ?
@salihbalci19 Год назад
Always getting error can you help me please...
AttributeError: module 'torchtext.data' has no attribute 'Field'
@shoebjoarder 4 года назад
Great video! I wanted to ask, how to use TabularDataset to split train, validation and test?
Should I use something like this below?
train_data, valid_data = TabularDataset.splits(
path="data", train="train.json", test="valid.json", format="json", fields=fields
)
test_data = TabularDataset.splits(
path="data", test="test.json", format="json", fields=fields
)
@AladdinPersson 4 года назад
Yes that looks fine to me, from my understanding and this is I believe how I showed it in the video is that you need separate json/csv files and then in TabularDataset you specify those files using train, test and also possibly validation data
@shoebjoarder 4 года назад
@@AladdinPersson Hi Aladdin, I probably have missed out you talking about the validation split. Thank you for the replying, the code is working. :)
@mehuljain4920 4 года назад
Hi,
I am trying to iterate the training data and its working fine but for test data but is showing me an error?
Could you pls help me in resolving that error?
I would really appreciate it. Thanks
@beach2550 4 года назад
How? Imagine if you could do everything except for reading someone's mind, how would you answer that question?
@idobooks909 3 года назад
Thanks! which version of torchtext?
@fabianboro4686 Год назад
what a fancy math intro
@lingfengshen8559 4 года назад
Great video, I do really learn a lot, thanks. When I run BucketIterator, it came up with an error 'int' object is not subscriptable
, I check my codes but still got no ideas of where the fault is.
@pl4117 3 года назад
Make sure you are using batch_size as an argumant name instead of batch_sizes when calling the bucket iterator. I had this issue too.
@m.j.8527 3 года назад
how can I pass pandas dataframe into this process (instead of loading the file)?
@mirambikasikdar5655 4 года назад ⁺¹
thank you!
@user-or7ji5hv8y 4 года назад
Is BucketIterator being phased out in future releases?
@AladdinPersson 4 года назад
Yes, I'm going to wait until that release and update the torchtext tutorials. Hopefully code from Seq2Seq is still being able to run though.
@user-or7ji5hv8y 4 года назад
Why can't we go directly from the word to word embedding, without the need to indexify?
@AladdinPersson 4 года назад
Hm, not sure exactly what you're suggesting. Could you share some code (or state which part you think is unecessary from the video)?
@user-or7ji5hv8y 4 года назад
@@AladdinPersson Sorry, I think I understand now, after watching your Seq2Seq video. Thanks!
@lazypunk794 4 года назад ⁺¹
Great tutorial, but sadly I think its already outdated. Torchtext has deprecated 'Field' and some other classes and Printing the keys and values of the dict 10:27 doesn't give proper representations of the objects anymore, they probably broke something while updating the code
@AladdinPersson 4 года назад ⁺³
Will look into it more but I think you might be right about this unfortunately... I just really hope all Seq2Seq tutorials that build on this will still be able to run :/
@lazypunk794 4 года назад ⁺¹
@@AladdinPersson Sorry, false alarm. The mistake in my case was that I used TabularDataset.splits() when I had just a train.csv and no test. There is no need of splits method in that case. I was just mindlessly copying from the video. My bad. The deprecation thing is just a warning as of now. The deprecated stuff will go into torchtext.legacy, so I guess your code will still work in 0.8 with some import changes.
@rishadkt837 2 года назад
Feild has become legacy
@user-or7ji5hv8y 4 года назад
Is the GitHub page for this available?
@AladdinPersson 4 года назад ⁺²
Yes: github.com/aladdinpersson/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/torchtext/torchtext_tutorial1.py
@doyourealise 4 года назад
hey please create video on legacy of torchtext.data.field , also in upcoming update how can we load field in torchtext 0.8?
@AladdinPersson 4 года назад
Yeah I saw that torchtext is going to update, I will wait a little bit to make sure they aren't going to make any additional changes and then update my previous videos to the new use of the API
@ZobeirRaisi 4 года назад
Thanks
@AladdinPersson 4 года назад
Happy you found it useful :)
@mummyskitchen5311 4 года назад
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
______________
Got this error at:
JSONDecodeError Traceback (most recent call last)
in ()
4 test ='test.json',
5 format = 'json',
----> 6 fields=fields
7 )
8
6 frames
/usr/lib/python3.6/json/decoder.py in raw_decode(self, s, idx)
355 obj, end = self.scan_once(s, idx)
356 except StopIteration as err:
--> 357 raise JSONDecodeError("Expecting value", s, err.value) from None
358 return obj, end
____________
my dataframe:
quote score
0 good product pretty satisfied quality diaper g... 0.9
1 leak extremely leaky doesnt soak -0.7
2 version good leak nee version leak night gud s... -0.6
3 good kid nice product daughter loving hair shi... 0.9
@vijaypalmanit 3 года назад
Field is deprecated, so tutorial is no more relevant...

Следующие

Автовоспроизведение

Pytorch Torchtext Tutorial 2: Built in Datasets with Example