Extracting Verbs and Verb Phrases (SpaCy and Python Tutorial for DH 07)

Python Tutorials for Digital Humanities

Просмотров 16 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 дек 2024

Комментарии • 44

@python-programming 4 года назад ⁺²
For my spaCy playlist, see: ruclips.net/p/PL2VXyKi-KpYvuOdPwXR-FZfmZ0hjoNSUo
@autonom8 3 года назад ⁺²⁶
Hi. Thanks for the clear tutorials. FYI the latest version of textacy uses token_matches instead of matches (which generates a "module not callable" error)
@hafiz7611 3 года назад ⁺¹
I'm getting that error rn haha
@hafiz7611 3 года назад ⁺¹
Any advice on how to resolve it?
@autonom8 3 года назад ⁺²
@@hafiz7611 Others have described the solution -- use textacy.extract.token_matches(doc, patterns=patterns)
@eduardodiniz526 2 года назад ⁺¹
Thank you. That helped me a lot
@zhansayazholdybai1575 Год назад
thank you a lot, that really helped)
@jaredsversion 4 года назад ⁺¹
I wish they had SpaCy in Hebrew. I am trying to learn Hebrew, and it would be cool to have this tool to help me learn. I am sure it will come eventually. It would be super cool to go through the Torah with SpaCy, especially since there is a lot of significance behind certain Hebrew letters, numbers, etc. in the Torah. It would be super easy to visualize the patterns with a tool like SpaCy. It would be cool obviously with the English too but even better with the original Hebrew text. Thanks again for these videos! They are super helpful and I am looking forward to using SpaCy in my first ever major programming project!
@python-programming 4 года назад
Hebrew and Arabic represent two of the greatest challenges in NLP. There are a few hurdles that need to be overcome. First is labeled data for part of speech parsing. Second is labeled data for named entity recognition. For Hebrew, the lack of vowels and textual ambiguity in words represent a real problem with regards to toponym resolution. Also the fact that texts are written right to left. Here is a good resource on Hebrew NLP (github.com/iddoberger/awesome-hebrew-nlp). I work with the CLTK and am developing the NER pipeline. The purpose of the CLTK is to provide NLP for ancient Eurasian languages. This aligns with Hebrew and Arabic. But the real solution to Hebrew will likely come from the advent of BERT models. (spaCy 3.0 will be BERT based and I will be doing a series on it as soon as it is released).
@Neurofilia 2 года назад ⁺⁴
Update: textacy has no longer "textacy.extract.matches( )" instead they add a new module "token_matches" so the correct syntax now is:
textacy.extract.matches.token_matches( )
@sowmyarajan3137 2 года назад ⁺¹⁴
According to the new textacy documentation it is:
verb_phrases = textacy.extract.token_matches(doc, patterns = patterns)
Or else if it is just matches, we get an error "module' object is not callable"
@ksymbol7404 2 года назад
Tks, it runs normally, but pycharm gives me "Cannot find reference 'extract' in '__init__.py' " error, just curious, how to get rid of it?
@suparnamondal4354 2 года назад ⁺⁴
Use textacy.extract.token_matches(doc, patterns)
@faresben3406 10 месяцев назад
@@suparnamondal4354 THNX U SAVED M DAY
@zainabamjad5647 4 года назад ⁺⁴
here's the Function:
pattern = [{'POS':'VERB'}]
def tagging(txt):
verb_phrases = textacy.extract.matches(txt, patterns=pattern)
return (verb_phrases)
dataset['Verbs'] = dataset['Sentences'].apply(lambda x: tagging(x))
@chessketeer Год назад ⁺¹
excellent lesson, thank you!
@python-programming Год назад
You are welcome!
@dd_annonymous3333 3 года назад ⁺⁴
verb_phrases = textacy.extract.matches(doc, pattern=patterns) is giving me a "'module' object is not callable" error.. Why is that?
@surelyiamjoking 3 года назад ⁺¹¹
Use textacy.extract.token_matches(doc, patterns)
@dd_annonymous3333 3 года назад ⁺¹
@@surelyiamjoking Thank you! That worked
@philipchan7504 3 года назад
@@surelyiamjoking Thanks Alexey, that worked well!
@suparnamondal4354 2 года назад
@@surelyiamjoking thanks you saved my time
@PMayhew 2 года назад
Thank you, this fix saved me time
@hamzaehsankhan 6 месяцев назад
textacy.extract.token_matches to be used instead of textacy.extract.matches
Great tutorial, thanks!
@manjushaprabhath3411 2 года назад ⁺¹
Can we use spacy for extracting nouns or verbs from an excel file? The data in each cell may be a word or a sentence/ multiple sentence.
@python-programming 2 года назад ⁺¹
Yes but you will want to use pandas to load the data into Python and then send that data as a string to spaCy. I have a pandas series on this channel
@thehistorymill2387 3 года назад ⁺¹
Thank you so much for the walk-throw
@python-programming 3 года назад
No problem! Glad you found it helpful!
@cimmik 2 года назад
What do I do, if textacy won't install properly with pip? I've tried a few times, but it keeps saying an error occurred while cytoolz.
@SD7097-o7y 2 года назад ⁺¹
Hii, this tutorial has been really helpful.Is it possible to extract custom words instead of verbs or verb phrases?
@python-programming 2 года назад
Thanks! Indeed it is. You would use the spaCy PhraseMatcher or EntityRuler depending on how you wanted to extract the words.
@zainabamjad5647 4 года назад ⁺¹
i have made a function for extracting verbs from sentences but its not working for me.output is something like this:

i would really appreciate if you can help.
@python-programming 4 года назад
I am not on my computer right now as I am moving. But this is the output you recieve when you try to print a spacy object wkthout converting it to a string. You can either do str(insert object name here) or objectname.text_
Let me know if that solves the problem.
@zainabamjad5647 4 года назад ⁺¹
@@python-programming Thank you so much for your time, but its not working :( i have created the above mentioned function and then applied it to the column of my dataset containing sentences. I am new in this field and almost tried everything according to my knowledge but no progress!!
@python-programming 4 года назад ⁺¹
Sorry, I was replying on my phone before and I did not see your function. Okay, this is a simple fix. The problem is 1. You are not passing the txt object through the nlp model. 2. You need to convert these objects to lists or strings before you return them.
d = {"Sentences": ["I am tom.", "I like shoes."]}
dataset = pd.DataFrame(data=d)
pattern = [{'POS':'VERB'}]
def tagging(txt):
doc = nlp(txt)
#Option 1 returns a list of spacy objects. This means you have all the spacy data saved with them
verb_phrases_01 = list(textacy.extract.matches(doc, patterns=pattern))
#Option 2 iterates over the spacy objects and converts them into strings
verb_phrases_02 = textacy.extract.matches(doc, patterns=pattern)
results = []
for verb_phrase in verb_phrases_02:
results.append(verb_phrase.text)
return (results)
dataset['Verbs'] = dataset['Sentences'].apply(lambda x: tagging(x))
print (dataset["Verbs"])
@zainabamjad5647 4 года назад
@@python-programming Thank you sooooo very much. May God bless you and ease things for you as have eased them for me.
@python-programming 4 года назад
@@zainabamjad5647 Not a problem at all!! Glad I could help! Have a great day!
@manoshn5977 4 года назад
bro i want to get index of chunk what functio to use
@python-programming 4 года назад
Can you explain this a bit more? I'm not entirely sure I understand the question.
@manoshn5977 4 года назад
@@python-programming i have extracted verb chunks but I want the index of that token.
@manoshn5977 4 года назад
I know how to get index of the chunk but I want to get the index of text in chunk means token

Следующие

Автовоспроизведение

Lemmatization: Finding the Roots of Words (Spacy and Python Tutorial for DH 08)