Hi. Thanks for the clear tutorials. FYI the latest version of textacy uses token_matches instead of matches (which generates a "module not callable" error)
I wish they had SpaCy in Hebrew. I am trying to learn Hebrew, and it would be cool to have this tool to help me learn. I am sure it will come eventually. It would be super cool to go through the Torah with SpaCy, especially since there is a lot of significance behind certain Hebrew letters, numbers, etc. in the Torah. It would be super easy to visualize the patterns with a tool like SpaCy. It would be cool obviously with the English too but even better with the original Hebrew text. Thanks again for these videos! They are super helpful and I am looking forward to using SpaCy in my first ever major programming project!
Hebrew and Arabic represent two of the greatest challenges in NLP. There are a few hurdles that need to be overcome. First is labeled data for part of speech parsing. Second is labeled data for named entity recognition. For Hebrew, the lack of vowels and textual ambiguity in words represent a real problem with regards to toponym resolution. Also the fact that texts are written right to left. Here is a good resource on Hebrew NLP (github.com/iddoberger/awesome-hebrew-nlp). I work with the CLTK and am developing the NER pipeline. The purpose of the CLTK is to provide NLP for ancient Eurasian languages. This aligns with Hebrew and Arabic. But the real solution to Hebrew will likely come from the advent of BERT models. (spaCy 3.0 will be BERT based and I will be doing a series on it as soon as it is released).
Update: textacy has no longer "textacy.extract.matches( )" instead they add a new module "token_matches" so the correct syntax now is: textacy.extract.matches.token_matches( )
According to the new textacy documentation it is: verb_phrases = textacy.extract.token_matches(doc, patterns = patterns) Or else if it is just matches, we get an error "module' object is not callable"
I am not on my computer right now as I am moving. But this is the output you recieve when you try to print a spacy object wkthout converting it to a string. You can either do str(insert object name here) or objectname.text_ Let me know if that solves the problem.
@@python-programming Thank you so much for your time, but its not working :( i have created the above mentioned function and then applied it to the column of my dataset containing sentences. I am new in this field and almost tried everything according to my knowledge but no progress!!
Sorry, I was replying on my phone before and I did not see your function. Okay, this is a simple fix. The problem is 1. You are not passing the txt object through the nlp model. 2. You need to convert these objects to lists or strings before you return them. d = {"Sentences": ["I am tom.", "I like shoes."]} dataset = pd.DataFrame(data=d) pattern = [{'POS':'VERB'}] def tagging(txt): doc = nlp(txt) #Option 1 returns a list of spacy objects. This means you have all the spacy data saved with them verb_phrases_01 = list(textacy.extract.matches(doc, patterns=pattern)) #Option 2 iterates over the spacy objects and converts them into strings verb_phrases_02 = textacy.extract.matches(doc, patterns=pattern) results = [] for verb_phrase in verb_phrases_02: results.append(verb_phrase.text) return (results) dataset['Verbs'] = dataset['Sentences'].apply(lambda x: tagging(x)) print (dataset["Verbs"])
For my spaCy playlist, see: ruclips.net/p/PL2VXyKi-KpYvuOdPwXR-FZfmZ0hjoNSUo
Hi. Thanks for the clear tutorials. FYI the latest version of textacy uses token_matches instead of matches (which generates a "module not callable" error)
I'm getting that error rn haha
Any advice on how to resolve it?
@@hafiz7611 Others have described the solution -- use textacy.extract.token_matches(doc, patterns=patterns)
Thank you. That helped me a lot
thank you a lot, that really helped)
I wish they had SpaCy in Hebrew. I am trying to learn Hebrew, and it would be cool to have this tool to help me learn. I am sure it will come eventually. It would be super cool to go through the Torah with SpaCy, especially since there is a lot of significance behind certain Hebrew letters, numbers, etc. in the Torah. It would be super easy to visualize the patterns with a tool like SpaCy. It would be cool obviously with the English too but even better with the original Hebrew text. Thanks again for these videos! They are super helpful and I am looking forward to using SpaCy in my first ever major programming project!
Hebrew and Arabic represent two of the greatest challenges in NLP. There are a few hurdles that need to be overcome. First is labeled data for part of speech parsing. Second is labeled data for named entity recognition. For Hebrew, the lack of vowels and textual ambiguity in words represent a real problem with regards to toponym resolution. Also the fact that texts are written right to left. Here is a good resource on Hebrew NLP (github.com/iddoberger/awesome-hebrew-nlp). I work with the CLTK and am developing the NER pipeline. The purpose of the CLTK is to provide NLP for ancient Eurasian languages. This aligns with Hebrew and Arabic. But the real solution to Hebrew will likely come from the advent of BERT models. (spaCy 3.0 will be BERT based and I will be doing a series on it as soon as it is released).
Update: textacy has no longer "textacy.extract.matches( )" instead they add a new module "token_matches" so the correct syntax now is:
textacy.extract.matches.token_matches( )
According to the new textacy documentation it is:
verb_phrases = textacy.extract.token_matches(doc, patterns = patterns)
Or else if it is just matches, we get an error "module' object is not callable"
Tks, it runs normally, but pycharm gives me "Cannot find reference 'extract' in '__init__.py' " error, just curious, how to get rid of it?
Use textacy.extract.token_matches(doc, patterns)
@@suparnamondal4354 THNX U SAVED M DAY
here's the Function:
pattern = [{'POS':'VERB'}]
def tagging(txt):
verb_phrases = textacy.extract.matches(txt, patterns=pattern)
return (verb_phrases)
dataset['Verbs'] = dataset['Sentences'].apply(lambda x: tagging(x))
excellent lesson, thank you!
You are welcome!
verb_phrases = textacy.extract.matches(doc, pattern=patterns) is giving me a "'module' object is not callable" error.. Why is that?
Use textacy.extract.token_matches(doc, patterns)
@@surelyiamjoking Thank you! That worked
@@surelyiamjoking Thanks Alexey, that worked well!
@@surelyiamjoking thanks you saved my time
Thank you, this fix saved me time
textacy.extract.token_matches to be used instead of textacy.extract.matches
Great tutorial, thanks!
Can we use spacy for extracting nouns or verbs from an excel file? The data in each cell may be a word or a sentence/ multiple sentence.
Yes but you will want to use pandas to load the data into Python and then send that data as a string to spaCy. I have a pandas series on this channel
Thank you so much for the walk-throw
No problem! Glad you found it helpful!
What do I do, if textacy won't install properly with pip? I've tried a few times, but it keeps saying an error occurred while cytoolz.
Hii, this tutorial has been really helpful.Is it possible to extract custom words instead of verbs or verb phrases?
Thanks! Indeed it is. You would use the spaCy PhraseMatcher or EntityRuler depending on how you wanted to extract the words.
i have made a function for extracting verbs from sentences but its not working for me.output is something like this:
i would really appreciate if you can help.
I am not on my computer right now as I am moving. But this is the output you recieve when you try to print a spacy object wkthout converting it to a string. You can either do str(insert object name here) or objectname.text_
Let me know if that solves the problem.
@@python-programming Thank you so much for your time, but its not working :( i have created the above mentioned function and then applied it to the column of my dataset containing sentences. I am new in this field and almost tried everything according to my knowledge but no progress!!
Sorry, I was replying on my phone before and I did not see your function. Okay, this is a simple fix. The problem is 1. You are not passing the txt object through the nlp model. 2. You need to convert these objects to lists or strings before you return them.
d = {"Sentences": ["I am tom.", "I like shoes."]}
dataset = pd.DataFrame(data=d)
pattern = [{'POS':'VERB'}]
def tagging(txt):
doc = nlp(txt)
#Option 1 returns a list of spacy objects. This means you have all the spacy data saved with them
verb_phrases_01 = list(textacy.extract.matches(doc, patterns=pattern))
#Option 2 iterates over the spacy objects and converts them into strings
verb_phrases_02 = textacy.extract.matches(doc, patterns=pattern)
results = []
for verb_phrase in verb_phrases_02:
results.append(verb_phrase.text)
return (results)
dataset['Verbs'] = dataset['Sentences'].apply(lambda x: tagging(x))
print (dataset["Verbs"])
@@python-programming Thank you sooooo very much. May God bless you and ease things for you as have eased them for me.
@@zainabamjad5647 Not a problem at all!! Glad I could help! Have a great day!
bro i want to get index of chunk what functio to use
Can you explain this a bit more? I'm not entirely sure I understand the question.
@@python-programming i have extracted verb chunks but I want the index of that token.
I know how to get index of the chunk but I want to get the index of text in chunk means token