im so glad founding your youtube channel, its that you snowden? im really glad cause im get natural processing language to presentation in class and demo for the test.. im really want to say thanks to you
New follower/subscriber - and still really enjoy these old videos! Thank you! Do you have any videos or suggestions how to take tokenized/stemmed words and put them in a dataframe for anayltics use? Thanks again for the great content!
wow thanks man! quick question: what if 'stemming' firstly and then followed by this 'punktsentenceTokenizer'? The pasted tense/present continuous probably will be stemmed as 'present simple'. So the tense may not be recognized by 'punktsentenceTokenizer'?
I am writing a program that recursively defines a word to a set recursion depth. The problem is that almost all words have multitudes of definitions, so I need to eliminate definitions that are not appropriate to the context. Knowing the POS will be handy to narrow down the possible definitions.
Hey I am quite perplexed with the following part of code :- custom_sent_tokenizer = PunktSentenceTokenizer(train_text) tokenized = custom_sent_tokenizer.tokenize(sample_text) I mean first "custom_sent_tokenizer = PunktSentenceTokenizer(train_text)" is separating text into sentence wise and storing into custom_sent_tokenizer, then what is and how "tokenized = custom_sent_tokenizer.tokenize(sample_text)" is working. please help!!!
'PunktSentenceTokenizer' is an object. 'custom_sent_tokenizer = PunktSentenceTokenizer(train_text)' creates an instance of the object 'PunktSentenceTokenizer' with the parameter 'train_text', you can say it trains the tokenizer. Then calling the object i.e 'tokenized = custom_sent_tokenizer.tokenize(sample_text) = custom_sent_tokenizer.tokenize(sample_text) ' , it tokenizes the sample text and stores the returned array in the variable 'tokenized'
Hey really enjoy your videos on python, really have helped me out a lot! One question though, do you have a page on your site for where you got the definitions for the different letter codes(like what you pasted in around @6:09)? That would be helpful to comment out in some of my code to use as a reference without needing to hunt for it each time.. If you could post the url itd be appreciated. thanks
Hi! Really liked your videos! Some questions though - We could have used the simple nltk tokenizer for sentence tokenizing the way you did in video #2 right? Also I didn't get how is PunktSentenceTokenizer an unsupervised method if you provide it training data here? I don't see how train_text is used for training when it is essentially similar to the 2006 txt file. How does PunktSentenceTokenizer train on it or is it a pre-trained model used now?
For the first time i didnt get a lecture of sentdex. You shoud have used simple texts and explained pos tagging based on it. I tried pos_tag() on sample lines, but the output shows letters tagged instead of words. Super confused. :(
Will we learn about using improper datasets in NLTK? Because most uses I have planned won't have people using "propper" English. For example, people could use kute for cute, or kewl for cool. Would this be in the stemming step?
Is it better to use PunktSrentenceTokenizer rather than SentenceTokenizer? if yes then why? & with what data I should train it with to tokenize some specific type of data (example: Medical records)
thanks for the video, I am new to all this NLP NLTK with python, so I got this @t it is asking me two work with 2 tagsets (Brown and Universal) and to divide the data en trainset and test set, find the most common tag (NN) and use it as a baseline, all good here - but I am struggling to understand the second part: which is creating taggers that I should train on training set and evaluate on Test set.- How do I create a tagger? a random one? then it says the taggers I am going to work with are (default/affix/unigram, taggers,... etc) ---> This confuses me a lot?? - then it says: These taggers can be created using the nltk.DefaultTagger(), nltk.AffixTagger(), nltk.UnigramTagger(), nltk.BigramTagger(), and nltk.TrigramTagger() functions respectively.--> again???? - finally: create four of taggers in isolated taggers and cascade taggers? - its confusing = please help!
I want to know what is rule based and what is machine learning. Tokenizer and Stemmer in the previous videos were rule based? It's hard to find out since none of it is really labeled. What else is rule based in nltk? What else is model based? I'd really appreciate a thorough and complete answer!!!
Can you make a video showing how to train the pos tagger? Currently, the default one has a limited number of words in its dictionary, and most of them are not accurate. Is there a way to expand this dictionary and make it more accurate? If so can you link me to site which shows this or can you make a video on it? thanks
would work great, actually all sentence tokenizers internally inherit punktsentencetokenizer. just to show he used a new sentence tokenizer, there are more
Is it possible to do this without sent_Tokenize or PunktSentenceTokenize and use word_tokenize only once and store it in a list and then in the function do the tagging?
Hey HS, thanks for the great contents. I was searching but not able to find the Pos list tag for that corpus. Actually seems the NLTK 3.0 changed a bit. Thanks for any advice
really awesome work. This vid could also have been started by sample sentence & then moving on to state_union. Anyways you rock & keep up the awesome stuff.
why should we use sample and train text what would happen if we use only one among them? I did use the only train and got a big dictionary which I couldn't compare both the outputs please help.✌🏼peace
hi, thanks for sharing your videos on NLTK, I've found them super useful so far. Quick question, I'm completely new to programming and I'm trying to teach myself as I go along so maybe for someone experienced this is a quick fix. When I ran the function at 5:10 I got a list of numbers instead of words, why might that be? thanks! :D
Thanks! I did catch a few mistakes I made, now I get the POS tags! but I still get the numbers before the tags haha Will try again. Thank you for such a prompt reply!
I'm kind of confused about the use of 2 set of texts used (2005-GWBush.txt and 2006-GWBush.txt) Q : what is the use of PunktSentenceTokenizer() ? A : It identifies sentence boundaries, just like a sentence tokenizer used in the previous videos. Q : Why two set of texts used...I couldn't understand the use of 2005 & 2006 Set of texts? A : Well you want to train the tokenizer first, so you have to train it on different data than you want to use it. In order to get valid results, training set and testing set have to be different. ... still confused.
Got an error while tokenizing different text. Value Error: The truth of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() I don't know how to solve. Need your help. Please
Love the content but I couldnt figure out how to use this in relation to a dataframe as opposed to a raw text file. I assume you would need a train_test_split but even then didnt get it all together
Hi! Is it possible to create my own custom tags? Also, I am trying to extract certain relations based on my own entities, how exactly can I do that? Thanks!
how do I filter or find only the words with a specific tag? Say I only want to find or show the words with 'NN' tag. Is there a function that handles that already, or si there a way to write an if statement to get it accomplished?
Hi, i have this sentence for POS tagging - "What would a Trump presidency mean for current international master’s students on an F1 visa?". When I pass it through PunktSentenceTokenizer, it shows me "math domain error".... Not able to find out why :(
+Torrtuga Nooh well you want to train the tokenizer first, so you have to train it on different data than you want to use it. In order to get valid results, training set and testing set have to be different.
Is there a way to go a level higher and start to guess what the true subject and object of a sentence is? So that you could start to tell what the Context is, and when the Context changes?
So I was trying different sentences on the pos_tag() and I noticed in sentences like "Check my email" or "Climb the tree" the verbs "Check" and "Climb" are never identified instead it takes it in as a NNP of a NN. Is there a way to solve this or improve it?
Hey, I am trying to run your part of speech tagging code. Unfortunately, it gave me this error: URLError: . I looked up online. People were talking about it is due to NLTK 3.2. They suggested that i force the version back to 3.1. I am not sure if it is right way to make the code work. Wondering if you can help me with that. I am using window10, i installed python 3.5, downloaded nltk 3.2.
further below - I met the same issue. Delete nltk 3.2 and install nltk 3.1 solved the problem. stackoverflow.com/questions/35827859/python-nltk-pos-tag-throws-urlerror
After 5 minutes of this video, Instead of having stream of words I am getting this error when I run the code expected string or bytes-like object Please help me out. Thanks
Is it possible to achieve multilingual NLP. I mean in places where i come from, we have too many regional languages and people usually tend to mix different languages even for a simple sentence. So i want a system that can understand what the speaker is trying to say irrespective of the language?
Great videos...Thanks for all your efforts. I have one issue - For me graph is not displaying even after installing mathplotlib. Basically it's not doing anything when i say "chunked.draw()". Do I need to do anything after installing mathplotlib. Do i need to refresh anything or someting? Please let me know.
What if you need to do this in different (human not programming) language? Nice to have resources for english. But what if you don't have any resource for other language? You need to create these resources which is insane for full vocabulary of a language.
+M.Ahmet demirtaş indentation is very important with Python. If you aren't following indentation rules, you get errors like this. Compare your indentations to mine, sample code can be found here: pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/
Is it necessary to train the punktsentencetokenizer? because i used the sent_tokenizer and got the same result.!! so what exactly is the difference ?????!!!!!
This is not a "real" training set because it's just a text, not delimited by sentences. So noone told Punkt where the correct sentence boundaries are... It only receives raw input and trains himself on it. I don't understand how it does that, it's a mystery for me. Harrison should have explained it more in depth in the video, IMHO.
here training_text is just raw text without any information or LABELS. If it is supervised than we need to give information or LABELS (machine learning jargon). Labels are feature information like if there is period its end of sentence etc
Use a for loop to iterate through tagged List if taggedWord == verb THEN verb_count += 1 Just use a for loop and count amount of tags in a sentence for specified - verb/ noun
Traceback (most recent call last): File "H:/Python programs/speech_tagging.py", line 9, in tokenized=custom_sent_tokenizer(sample_text) TypeError: 'PunktSentenceTokenizer' object is not callable This is the error i'm getting can someone help me with this
Great video, thanks! Does anyone here know how to search within a tagged corpora? For example, if, after processing the text so it is all tagged, I wanted to find every instance of noun-"is"-noun, how would I do that? Thanks :)
For some reason series of these type of numbers were printed before the tagged speech. (0.0006539153179663233 0.0029498525073746312 0.0005192107995846313 6117 339 4 1 )
This semester, I dont know how much I watch your videos. Thanks
im so glad founding your youtube channel, its that you snowden? im really glad cause im get natural processing language to presentation in class and demo for the test.. im really want to say thanks to you
Would love to see more videos on NLP, keep up the great work! :)
Corpus (Singular)
Corpuses/Corpora (Plural)
Putting this out here for confused learners like me. I hope it helps!
These tutorials are great, thanks for sharing!
New follower/subscriber - and still really enjoy these old videos! Thank you! Do you have any videos or suggestions how to take tokenized/stemmed words and put them in a dataframe for anayltics use? Thanks again for the great content!
"Whoa. Everybody settle down." 😆
wow thanks man! quick question: what if 'stemming' firstly and then followed by this 'punktsentenceTokenizer'? The pasted tense/present continuous probably will be stemmed as 'present simple'. So the tense may not be recognized by 'punktsentenceTokenizer'?
I am writing a program that recursively defines a word to a set recursion depth. The problem is that almost all words have multitudes of definitions, so I need to eliminate definitions that are not appropriate to the context. Knowing the POS will be handy to narrow down the possible definitions.
all of your tutorials are totally great :)
Hey I am quite perplexed with the following part of code :-
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
tokenized = custom_sent_tokenizer.tokenize(sample_text)
I mean first "custom_sent_tokenizer = PunktSentenceTokenizer(train_text)" is separating text into sentence wise and storing into custom_sent_tokenizer, then what is and how "tokenized = custom_sent_tokenizer.tokenize(sample_text)" is working.
please help!!!
'PunktSentenceTokenizer' is an object.
'custom_sent_tokenizer = PunktSentenceTokenizer(train_text)' creates an instance of the object 'PunktSentenceTokenizer' with the parameter 'train_text', you can say it trains the tokenizer.
Then calling the object i.e 'tokenized = custom_sent_tokenizer.tokenize(sample_text)
= custom_sent_tokenizer.tokenize(sample_text)
' , it tokenizes the sample text and stores the returned array in the variable 'tokenized'
Hey really enjoy your videos on python, really have helped me out a lot! One question though, do you have a page on your site for where you got the definitions for the different letter codes(like what you pasted in around @6:09)? That would be helpful to comment out in some of my code to use as a reference without needing to hunt for it each time.. If you could post the url itd be appreciated. thanks
+Anthony R Barberini pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/ list is posted there.
Thanks
I downloaded all code examples used in videos from github.com/pythonprogramming, which was very convenient :-)
Hi! Really liked your videos! Some questions though -
We could have used the simple nltk tokenizer for sentence tokenizing the way you did in video #2 right?
Also I didn't get how is PunktSentenceTokenizer an unsupervised method if you provide it training data here? I don't see how train_text is used for training when it is essentially similar to the 2006 txt file. How does PunktSentenceTokenizer train on it or is it a pre-trained model used now?
more than everything, I'm amazed just by your type speed. :))
Can I know why you throw the exception in the function you have written ?
For the first time i didnt get a lecture of sentdex. You shoud have used simple texts and explained pos tagging based on it.
I tried pos_tag() on sample lines, but the output shows letters tagged instead of words. Super confused. :(
Will we learn about using improper datasets in NLTK? Because most uses I have planned won't have people using "propper" English. For example, people could use kute for cute, or kewl for cool. Would this be in the stemming step?
very very interesting video series... I really love your tutorials. please keep up the good work. regards
Thank you for the kind words!
Your videos are always great! I was wondering if you could use pos_tag in real time text input like with pynput?
Is it better to use PunktSrentenceTokenizer rather than SentenceTokenizer?
if yes then why?
& with what data I should train it with to tokenize some specific type of data (example: Medical records)
thanks for the video, I am new to all this NLP NLTK with python, so I got this @t it is asking me two work with 2 tagsets (Brown and Universal) and to divide the data en trainset and test set, find the most common tag (NN) and use it as a baseline, all good here - but I am struggling to understand the second part: which is creating taggers that I should train on training set and evaluate on Test set.- How do I create a tagger? a random one? then it says the taggers I am going to work with are (default/affix/unigram, taggers,... etc) ---> This confuses me a lot?? - then it says: These taggers can be created using the nltk.DefaultTagger(), nltk.AffixTagger(), nltk.UnigramTagger(),
nltk.BigramTagger(), and nltk.TrigramTagger() functions respectively.--> again???? - finally: create four of taggers in isolated taggers and cascade taggers? - its confusing = please help!
I want to know what is rule based and what is machine learning. Tokenizer and Stemmer in the previous videos were rule based? It's hard to find out since none of it is really labeled. What else is rule based in nltk? What else is model based? I'd really appreciate a thorough and complete answer!!!
there is nltk.org where you can find documentation for all modules and even source code itself
Can you make a video showing how to train the pos tagger? Currently, the default one has a limited number of words in its dictionary, and most of them are not accurate. Is there a way to expand this dictionary and make it more accurate? If so can you link me to site which shows this or can you make a video on it? thanks
why are we using punktsentencetokenizer here ? we can simply tokenize the text and use nltk.pos_tag on it .
would work great, actually all sentence tokenizers internally inherit punktsentencetokenizer. just to show he used a new sentence tokenizer, there are more
okk, got it.
Is it possible to do this without sent_Tokenize or PunktSentenceTokenize and use word_tokenize only once and store it in a list and then in the function do the tagging?
Is corpus just the sample database? Because I've been curating my own.
Love you sir you are doing very good Job
Hey HS,
thanks for the great contents. I was searching but not able to find the Pos list tag for that corpus. Actually seems the NLTK 3.0 changed a bit.
Thanks for any advice
Nevermind I have found it(www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html). Thanks anyway
Or here :
pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/
love u sentdex bro
Could we tokenize the doc using wordTokenizer directly rather than firstly using sentenceTokenizer then followed by wordTokenizer?
hahaha I was wondering the same thinng
It might make it harder to create chunks later if all the sentences are mixed together
Why should i use PunktSentenceTokenizer? Sentence tokenizer is already there. And what actually happens when we train using PunktSentence Tokenizer?
really awesome work. This vid could also have been started by sample sentence & then moving on to state_union. Anyways you rock & keep up the awesome stuff.
Great suggestion!
@@sentdex Hi could you please tell me from where can I find the files 2005-GWBush.txt and 2006-GWBush.txt
why should we use sample and train text what would happen if we use only one among them?
I did use the only train and got a big dictionary which I couldn't compare both the outputs please help.✌🏼peace
hi, thanks for sharing your videos on NLTK, I've found them super useful so far. Quick question, I'm completely new to programming and I'm trying to teach myself as I go along so maybe for someone experienced this is a quick fix. When I ran the function at 5:10 I got a list of numbers instead of words, why might that be?
thanks! :D
Hmm, I am not sure. Try comparing your code to mine: pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/
Thanks! I did catch a few mistakes I made, now I get the POS tags! but I still get the numbers before the tags haha Will try again. Thank you for such a prompt reply!
Hi,
Can you make an extension of the video and create a tabular form of each POSs?
Hi! Thanks for your videos! Do you teach how to do distribution of collocations and keywords from a big dataset (excel)?
i did not get anything to print nor did i get an error message. any suggestion?
Please help me. I got the error " 'module' object is not callable"
When I run the program it simply says 'global name 'tagged' is not defined'. Any ideas on what this may be?
I dont have error but the output sequences is in numeric display not in words
ANY SOLUTION
me too. Is there any solution of that type of error?
u must have forgotten to call the function!
I'm kind of confused about the use of 2 set of texts used (2005-GWBush.txt and 2006-GWBush.txt)
Q : what is the use of PunktSentenceTokenizer() ?
A : It identifies sentence boundaries, just like a sentence tokenizer used in the previous videos.
Q : Why two set of texts used...I couldn't understand the use of 2005 & 2006 Set of texts?
A : Well you want to train the tokenizer first, so you have to train it on different data than you want to use it. In order to get valid results, training set and testing set have to be different.
... still confused.
where do you find the corpus to see all the .txt articles/documents inside it and so we can reference or use it as needed?/..thxz for the vid series
Got an error while tokenizing different text.
Value Error: The truth of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
I don't know how to solve. Need your help. Please
What is use of train_text.....im passing train_text as 'None' then also i am getting same and correct ouput on sample text
sir? 2005-GWBush.txt? why is this not in my computer? or is it just built in on python ?
Love the content but I couldnt figure out how to use this in relation to a dataframe as opposed to a raw text file. I assume you would need a train_test_split but even then didnt get it all together
how can I extract all pos tagged words by noun, adjectives, etc based on the most frequently shown?
process_content() showing
'module' object is not callable
how to resolve this
it is bit confusing why are we using PunktSentenceTokenizer what it does
sir can u tell me what is the syntax and rule in phython for transformation sentences
And if you insert a text.txt with wrong words. What happen? Thanks.
Hi! Is it possible to create my own custom tags? Also, I am trying to extract certain relations based on my own entities, how exactly can I do that? Thanks!
just wondering is there any natural language processing library for c language?
Why didn't you tokenize by words in the begining?
can anyone tell me where he got that POS tag content that he pasted that shown tags and there correspondence meaning
stackoverflow.com/a/15389153/12645703
how do I filter or find only the words with a specific tag? Say I only want to find or show the words with 'NN' tag. Is there a function that handles that already, or si there a way to write an if statement to get it accomplished?
+Mystic Bane nvm found it
print([i for i in tagged if i[1] == 'NNP'])
why have you used " for i in tokenized[:5] instead of just using toknized var? why to just take the 1st five rows?
what is the exception code here...can you explain the except block?
Isn't there a switch to print out the full parts of speech in the tuple?
Thank you so much for sharing!
Hi, i have this sentence for POS tagging - "What would a Trump presidency mean for current international master’s students on an F1 visa?". When I pass it through PunktSentenceTokenizer, it shows me "math domain error".... Not able to find out why :(
And why two set of texts used...I couldn't understand the use of 2005 & 2006 Set of texts
+Torrtuga Nooh well you want to train the tokenizer first, so you have to train it on different data than you want to use it. In order to get valid results, training set and testing set have to be different.
Matus Miklos what if I get the words which I dont train, will that be considered as a new word?
@@harikrishnamalyala6214 may be thats why he trained on similar kind of data so that most words are already recognized
Is there a way to go a level higher and start to guess what the true subject and object of a sentence is? So that you could start to tell what the Context is, and when the Context changes?
LazerPotatoe Yes. Subject / object can be done pretty simply with named entity recognition on a basic level. It gets complex fast, but people do it.
LazerPotatoe Yes. Subject / object can be done pretty simply with named entity recognition on a basic level. It gets complex fast, but people do it.
Thank you very much for you awesome videos! Where can I find resources to perform a POS tagging on other languages? Such as italian?
Sketch Engine will do POS tagging for you. It is a web-based application.
So I was trying different sentences on the pos_tag() and I noticed in sentences like "Check my email" or "Climb the tree" the verbs "Check" and "Climb" are never identified instead it takes it in as a NNP of a NN.
Is there a way to solve this or improve it?
Hey, I am trying to run your part of speech tagging code. Unfortunately, it gave me this error: URLError: . I looked up online. People were talking about it is due to NLTK 3.2. They suggested that i force the version back to 3.1. I am not sure if it is right way to make the code work. Wondering if you can help me with that. I am using window10, i installed python 3.5, downloaded nltk 3.2.
further below - I met the same issue. Delete nltk 3.2 and install nltk 3.1 solved the problem.
stackoverflow.com/questions/35827859/python-nltk-pos-tag-throws-urlerror
What method is this tagger using ? is it a log linear model or a HMM model ?
After 5 minutes of this video, Instead of having stream of words I am getting this error when I run the code
expected string or bytes-like object
Please help me out. Thanks
can you do it with nltk.NgramTagger method? Im having some words tagged has None is that the trianing set problem?
I am not getting output in form of words but in form of single letter like [('G',VB),('O',NN)] instead of GO
Is it possible to achieve multilingual NLP. I mean in places where i come from, we have too many regional languages and people usually tend to mix different languages even for a simple sentence. So i want a system that can understand what the speaker is trying to say irrespective of the language?
hey how do you get the list of all the part of speech tags and their description. What code did you use to get that output 6.11?. Thanks!
I am trying to extract only noun words from tagged words.
The error being shown is ' too many values to unpack '..
do you have a working code???
Great videos...Thanks for all your efforts.
I have one issue - For me graph is not displaying even after installing mathplotlib. Basically it's not doing anything when i say "chunked.draw()". Do I need to do anything after installing mathplotlib. Do i need to refresh anything or someting? Please let me know.
update the libraries and modules
hi iam using python 3.3.2 put it doesn't support punktsentencetokenizer what should i do ?
What if you need to do this in different (human not programming) language? Nice to have resources for english. But what if you don't have any resource for other language? You need to create these resources which is insane for full vocabulary of a language.
So what's the PunktSentenceTokenizer?
Hi , thanks the all Tutorials , I get this error message "unindent does not match any outer indentation level " why? thank you
+M.Ahmet demirtaş indentation is very important with Python. If you aren't following indentation rules, you get errors like this. Compare your indentations to mine, sample code can be found here: pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/
Thanks
What is the use of POS tagging?
tokenizer= custom_sent_tokenizer.tokenize(sample_text)
What does this line do?
Is it necessary to train the punktsentencetokenizer?
because i used the sent_tokenizer and got the same result.!!
so what exactly is the difference ?????!!!!!
+Mohanish Nehete the sent_tokenizer is pre-trained already, sentdex just wanted to show you how to train it by yourself, if you need it.
@@WillakeMattis what do you mean by pre-trained, thanks anyways
from where to get that two txt file?
Does nltk.pos_tag() look forward and backward in the list to figure out whether a word like "running" is a gerond (noun) or verb?
Just tried it, and, yes, Punkt does seem to recognize the difference. w00t.
what is the use of PunktSentenceTokenizer() ?
It identifies sentence boundaries, just like a sentence tokenizer used in the previous videos.
ergo... a tokenizer lol just module withing the tokenize side of the lib
It's late but let it clarify ,so that will be used for others.
stackoverflow.com/questions/35275001/use-of-punktsentencetokenizer-in-nltk
^that was useful thanks
Hello, nice explanation thank you .
What is the use of speech tagging ?
Do you from where can we find the files 2005-GWBush.txt and 2006-GWBush.txt used in this video
Thanks for sharing. In the case the tag is "None", how you can add the unknown word?
Why is Punkt unsupervised?...it uses a training data set right?..so it should be a supervised method!
This is not a "real" training set because it's just a text, not delimited by sentences. So noone told Punkt where the correct sentence boundaries are... It only receives raw input and trains himself on it. I don't understand how it does that, it's a mystery for me. Harrison should have explained it more in depth in the video, IMHO.
here training_text is just raw text without any information or LABELS. If it is supervised than we need to give information or LABELS (machine learning jargon).
Labels are feature information like if there is period its end of sentence etc
Thanks for the responses Everyone.. understood now!
How can I print a complete list of tags for reference?
No module named 'nltk.corpus'; 'nltk' is not a package
Did you name your file, any local file, or folder nltk? If so, change the name because you're importing that instead of the actual NLTK package.
How to add a new word to tagger. Lets say an organization recognized as NNP but not as ORG. How can I add that organization to default nltk list?
for that try to use named entity module
hi how can we apply this to all the rows in a dataframe ?
How can we use tagging to count the number of verbs ,nouns , etc in a sentence ?
Use a for loop to iterate through tagged List
if taggedWord == verb THEN
verb_count += 1
Just use a for loop and count amount of tags in a sentence for specified - verb/ noun
I've a problem with state_union ,it says there's no such file or directory
nltk.download() u ran this?
Yes i did
nltk.download('state_union')
Traceback (most recent call last):
File "H:/Python programs/speech_tagging.py", line 9, in
tokenized=custom_sent_tokenizer(sample_text)
TypeError: 'PunktSentenceTokenizer' object is not callable
This is the error i'm getting can someone help me with this
wow thanks for much man! i dont even know what this is but ill give it a try XD
Why did you tokenize twice?
What was the use u PunktSentenceTokenizer ?/
didnt understand the code under the define function
Great video, thanks! Does anyone here know how to search within a tagged corpora? For example, if, after processing the text so it is all tagged, I wanted to find every instance of noun-"is"-noun, how would I do that? Thanks :)
For some reason series of these type of numbers were printed before the tagged speech. (0.0006539153179663233 0.0029498525073746312 0.0005192107995846313 6117 339 4 1 )