Stemming (removing something) vs Lemmatization ( mapped with base word) 4:50 Note : Spacy don't have support of stemming . Code : stemming import nltk import spacy from nltk.stem import PorterStemmer stemmer = PorterStemmer() words = ["eating","eats","eat","ate","adjustable","rafting","ability","meeting"] for word in words: print(word,"|",stemmer.stem(word)) -------------------------------------------------------------------------------- Code : lemmatization nlp = spacy.load("en_core_web_sm") doc = nlp("eating eats eat ate adjustable rafting ability meeting better") for token in doc: print(token,"|",token.lemma_,"|",token.lemma) ----------------------------------------------------------------------------------------- Custom lemmatization Code : ar = nlp.get_pipe('attribute_ruler') ar.add([[{"TEXT":"Bro"}],[{"TEXT":"Brah"}]],{"LEMMA":"Brother"}) doc =nlp("Bro, you wanna go ? Brah , don't say no ! I am exhausted") for token in doc: print(token.text,"|",token.lemma_)
8:36 I noticed that the prebuilt language pipelines return an unexpected lemma for "ate". I assumed that lg and trf pipelines would produce ate -> eat while the sm and md pipelines would produce ate -> ate, but that doesn't seem to be the case. def eat_lemma(lang_pipeline): nlp = spacy.load(lang_pipeline) doc = nlp("ate") print(lang_pipeline, '|', doc[0].lemma_) lp = ["en_core_web_sm", "en_core_web_md", "en_core_web_lg", "en_core_web_trf"] for lang_pipeline in lp: eat_lemma(lang_pipeline) en_core_web_sm | ['eat'] en_core_web_md | ['ate'] en_core_web_lg | ['eat'] en_core_web_trf | ['ate'] Update: I see that when "ate" is used in the context of a sentence each pipeline produces a lemma of "eat". doc = nlp("The person ate an apple.") en_core_web_sm | ['the', 'person', 'eat', 'an', 'apple', '.'] en_core_web_md | ['the', 'person', 'eat', 'an', 'apple', '.'] en_core_web_lg | ['the', 'person', 'eat', 'an', 'apple', '.'] en_core_web_trf | ['the', 'person', 'eat', 'an', 'apple', '.']
Hey Guys when we used stemming and lemmatizing before training the data we just change the words. After training the model model could generate words that are different from lemmatized words. I mean we teach the model `eat` however the model learn also `ate` how?
Hey! Firstly, this is a very good series. But for the exercise, in the last part using lemmatization, some of my words such as cooking were converted into cook and playing to play while running stayed as it is. Do you know what could be the issue? Or do you have any explanation to this? Thank you.
hello sir, if i want to stem and lemmatize my string at the same time, how'd i do that? as spacy doesn't allow stemming. and nltk doesn't allow lemmatization. pls answer asap
Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
I love the way you explain - other NLP concepts - customizing the pipeline for example !!!
you are my teacher and i am proud of you
Thanks 🙏
There is a quiz now!! thank your for your awsome work♥♥♥
Stemming (removing something) vs Lemmatization ( mapped with base word) 4:50
Note : Spacy don't have support of stemming .
Code : stemming
import nltk
import spacy
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
words = ["eating","eats","eat","ate","adjustable","rafting","ability","meeting"]
for word in words:
print(word,"|",stemmer.stem(word))
--------------------------------------------------------------------------------
Code : lemmatization
nlp = spacy.load("en_core_web_sm")
doc = nlp("eating eats eat ate adjustable rafting ability meeting better")
for token in doc:
print(token,"|",token.lemma_,"|",token.lemma)
-----------------------------------------------------------------------------------------
Custom lemmatization
Code :
ar = nlp.get_pipe('attribute_ruler')
ar.add([[{"TEXT":"Bro"}],[{"TEXT":"Brah"}]],{"LEMMA":"Brother"})
doc =nlp("Bro, you wanna go ? Brah , don't say no ! I am exhausted")
for token in doc:
print(token.text,"|",token.lemma_)
Fantastic ...you make complex NLP topics simple. !!!
Very helpful! Looking forward to the rest of the series! Thank you!
This is some quality content.
Thank you!
What is Behavioural data science?
8:36 I noticed that the prebuilt language pipelines return an unexpected lemma for "ate". I assumed that lg and trf pipelines would produce ate -> eat while the sm and md pipelines would produce ate -> ate, but that doesn't seem to be the case.
def eat_lemma(lang_pipeline):
nlp = spacy.load(lang_pipeline)
doc = nlp("ate")
print(lang_pipeline, '|', doc[0].lemma_)
lp = ["en_core_web_sm", "en_core_web_md", "en_core_web_lg", "en_core_web_trf"]
for lang_pipeline in lp:
eat_lemma(lang_pipeline)
en_core_web_sm | ['eat']
en_core_web_md | ['ate']
en_core_web_lg | ['eat']
en_core_web_trf | ['ate']
Update: I see that when "ate" is used in the context of a sentence each pipeline produces a lemma of "eat".
doc = nlp("The person ate an apple.")
en_core_web_sm | ['the', 'person', 'eat', 'an', 'apple', '.']
en_core_web_md | ['the', 'person', 'eat', 'an', 'apple', '.']
en_core_web_lg | ['the', 'person', 'eat', 'an', 'apple', '.']
en_core_web_trf | ['the', 'person', 'eat', 'an', 'apple', '.']
Excellent Series👌👌🔥🔥
You are the excellent. Fullstop.
Do you want to learn technology from me? codebasics.io is my website for video courses. First course going live in the last week of May, 2022
Very helpful
Thanks a bunch ❤
Hey Guys when we used stemming and lemmatizing before training the data we just change the words. After training the model model could generate words that are different from lemmatized words. I mean we teach the model `eat` however the model learn also `ate` how?
If possible try to come with live sessions it would be helpful
amazing videos
Sir it will be very helpful if you make a NLP project like a Chatbot at the end of the series and thanks for making this series
Yes I will be making few projects
Thanks so much
Hey!
Firstly, this is a very good series. But for the exercise, in the last part using lemmatization, some of my words such as cooking were converted into cook and playing to play while running stayed as it is. Do you know what could be the issue?
Or do you have any explanation to this?
Thank you.
it just might be how that specific model of nlp you used, performs. maybe idk
hello sir, if i want to stem and lemmatize my string at the same time, how'd i do that? as spacy doesn't allow stemming. and nltk doesn't allow lemmatization. pls answer asap
I could not unable to install Ai4bharat package in PC.
Is there solution. For that error
thank you, sir
very nice
Which one are you? Marc Spector or Steven Grant??
I am Dhaval, Marc and Steven are my alter egos 😎
How to write Lemmatizer from scratch?
Sir last 1year EGO my pc hacked .gujd ransomwer please huw to get back my data 🙏 help mee please sum important data is ther
Hi sir a request for you to make some videos on python
I have a python tutorial playlist with more than 40 videos. in youtube search "codebasics python tutorial"
🤩
Hey, aren't you the moon knight?
Ha ha you are the third person to say this 🤣😎😎😎
pleeeeeeeeeease try hindi speaking