I have a question, how Can I save (train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)) train data or test data in IMDB dataset as a CSV file? since every time I need 15 mins to load data. thanks bro
very nice tutorial, but i get warning texts saying BucketIerator, Field and TabularDataset are being deprecated ... also i cant scale BucketIterator for TPUs and multi-GPUs , any better alternatives ??
Can you do a video on the updated Torchtext 9.0? I think they revamped much of this and the new features look pretty awesome with subword tokenization implemented. (i.e. 'sub', '_word')
Please create one video on semantic segmentation using Pytroch CNN. Dataset must contain cancer images + ground truth images. And train model will return best IoU and Accuracy of proposed model.
Very nice tutorial! While i was looking at torchtext, I actually came across the libraries torchnlp and allennlp. I couldn't really tell what the differences between them were. Have you worked with them ?
Great video! I wanted to ask, how to use TabularDataset to split train, validation and test? Should I use something like this below? train_data, valid_data = TabularDataset.splits( path="data", train="train.json", test="valid.json", format="json", fields=fields ) test_data = TabularDataset.splits( path="data", test="test.json", format="json", fields=fields )
Yes that looks fine to me, from my understanding and this is I believe how I showed it in the video is that you need separate json/csv files and then in TabularDataset you specify those files using train, test and also possibly validation data
Hi, I am trying to iterate the training data and its working fine but for test data but is showing me an error? Could you pls help me in resolving that error? I would really appreciate it. Thanks
Great video, I do really learn a lot, thanks. When I run BucketIterator, it came up with an error 'int' object is not subscriptable , I check my codes but still got no ideas of where the fault is.
Great tutorial, but sadly I think its already outdated. Torchtext has deprecated 'Field' and some other classes and Printing the keys and values of the dict 10:27 doesn't give proper representations of the objects anymore, they probably broke something while updating the code
Will look into it more but I think you might be right about this unfortunately... I just really hope all Seq2Seq tutorials that build on this will still be able to run :/
@@AladdinPersson Sorry, false alarm. The mistake in my case was that I used TabularDataset.splits() when I had just a train.csv and no test. There is no need of splits method in that case. I was just mindlessly copying from the video. My bad. The deprecation thing is just a warning as of now. The deprecated stuff will go into torchtext.legacy, so I guess your code will still work in 0.8 with some import changes.
Yeah I saw that torchtext is going to update, I will wait a little bit to make sure they aren't going to make any additional changes and then update my previous videos to the new use of the API
I found it very difficult to get used to torchtext docs, but then I found your video :) Many thanks!
could you make a video for new version of torchtext?
I somehow found the potato quote very inspiring 🤔.
Definitely mislabeled samples in that dataset
Yo, new version of torchtext (0.12) does not have Fields
Very helpful tutorial.
Is it possible for you to make a tutorial, how to load data which is stored in a SQL database?
I have a question, how Can I save (train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)) train data or test data in IMDB dataset as a CSV file? since every time I need 15 mins to load data. thanks bro
very nice tutorial, but i get warning texts saying BucketIerator, Field and TabularDataset are being deprecated ... also i cant scale BucketIterator for TPUs and multi-GPUs , any better alternatives ??
Fields TabularDataset are depreciated now. Is there any alternatives?
Thanks for doing that. How to save it once we created it.
10:20 I felt that pain :D, happened to me, too recently
Can you do a video on the updated Torchtext 9.0? I think they revamped much of this and the new features look pretty awesome with subword tokenization implemented. (i.e. 'sub', '_word')
torchtext had some changes it seems, can't import these modules with recent version
Please create one video on semantic segmentation using Pytroch CNN. Dataset must contain cancer images + ground truth images. And train model will return best IoU and Accuracy of proposed model.
I will try to keep that in mind, thank you for the comment!
Your examples are left padded but when I use the same bucket iterator on IMDB dataset, they are right padded. This is a bit confusing
hi, it seems that torchtext got quite a bit of a changeover and this tutorial's contents are outdated, any chances you might wanna update it?
Very nice tutorial! While i was looking at torchtext, I actually came across the libraries torchnlp and allennlp. I couldn't really tell what the differences between them were. Have you worked with them ?
Always getting error can you help me please...
AttributeError: module 'torchtext.data' has no attribute 'Field'
Great video! I wanted to ask, how to use TabularDataset to split train, validation and test?
Should I use something like this below?
train_data, valid_data = TabularDataset.splits(
path="data", train="train.json", test="valid.json", format="json", fields=fields
)
test_data = TabularDataset.splits(
path="data", test="test.json", format="json", fields=fields
)
Yes that looks fine to me, from my understanding and this is I believe how I showed it in the video is that you need separate json/csv files and then in TabularDataset you specify those files using train, test and also possibly validation data
@@AladdinPersson Hi Aladdin, I probably have missed out you talking about the validation split. Thank you for the replying, the code is working. :)
Hi,
I am trying to iterate the training data and its working fine but for test data but is showing me an error?
Could you pls help me in resolving that error?
I would really appreciate it. Thanks
How? Imagine if you could do everything except for reading someone's mind, how would you answer that question?
Thanks! which version of torchtext?
what a fancy math intro
Great video, I do really learn a lot, thanks. When I run BucketIterator, it came up with an error 'int' object is not subscriptable
, I check my codes but still got no ideas of where the fault is.
Make sure you are using batch_size as an argumant name instead of batch_sizes when calling the bucket iterator. I had this issue too.
how can I pass pandas dataframe into this process (instead of loading the file)?
thank you!
Is BucketIterator being phased out in future releases?
Yes, I'm going to wait until that release and update the torchtext tutorials. Hopefully code from Seq2Seq is still being able to run though.
Why can't we go directly from the word to word embedding, without the need to indexify?
Hm, not sure exactly what you're suggesting. Could you share some code (or state which part you think is unecessary from the video)?
@@AladdinPersson Sorry, I think I understand now, after watching your Seq2Seq video. Thanks!
Great tutorial, but sadly I think its already outdated. Torchtext has deprecated 'Field' and some other classes and Printing the keys and values of the dict 10:27 doesn't give proper representations of the objects anymore, they probably broke something while updating the code
Will look into it more but I think you might be right about this unfortunately... I just really hope all Seq2Seq tutorials that build on this will still be able to run :/
@@AladdinPersson Sorry, false alarm. The mistake in my case was that I used TabularDataset.splits() when I had just a train.csv and no test. There is no need of splits method in that case. I was just mindlessly copying from the video. My bad. The deprecation thing is just a warning as of now. The deprecated stuff will go into torchtext.legacy, so I guess your code will still work in 0.8 with some import changes.
Feild has become legacy
Is the GitHub page for this available?
Yes: github.com/aladdinpersson/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/torchtext/torchtext_tutorial1.py
hey please create video on legacy of torchtext.data.field , also in upcoming update how can we load field in torchtext 0.8?
Yeah I saw that torchtext is going to update, I will wait a little bit to make sure they aren't going to make any additional changes and then update my previous videos to the new use of the API
Thanks
Happy you found it useful :)
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
______________
Got this error at:
JSONDecodeError Traceback (most recent call last)
in ()
4 test ='test.json',
5 format = 'json',
----> 6 fields=fields
7 )
8
6 frames
/usr/lib/python3.6/json/decoder.py in raw_decode(self, s, idx)
355 obj, end = self.scan_once(s, idx)
356 except StopIteration as err:
--> 357 raise JSONDecodeError("Expecting value", s, err.value) from None
358 return obj, end
____________
my dataframe:
quote score
0 good product pretty satisfied quality diaper g... 0.9
1 leak extremely leaky doesnt soak -0.7
2 version good leak nee version leak night gud s... -0.6
3 good kid nice product daughter loving hair shi... 0.9
Field is deprecated, so tutorial is no more relevant...