Stemming - Natural Language Processing With Python and NLTK p.3
HTML-код
- Опубликовано: 5 фев 2025
- Another form of data pre-processing with natural language processing is called "stemming."
This is the process where we remove word affixes from the end of words.
The reason we would do this is so that we do not need to store the meaning of every single tense of a word. For example:
Reader
Reading
Read
Aside from tense, and even one of these is a noun, they all have the same meaning for their "root" stem (read).
This way, we store one single value for the root stem of "read." Then, when we wish to learn more, we can look into the affixes that were on the end, like "ing" is an active word, or in the past, then you have reader as someone who reads... then just plain read as either past tense or current.
sample code: pythonprogrammi...
hkinsley.com
/ sentdex
sentdex.com
seaofbtc.com
Your channel is one of the most informative I have watched. Keep at it!
ruclips.net/video/QNS_jDdmQtc/видео.html
Thanks to this . It brought down a lot of worries regarding implementation . I thought I would use MATLAB as I used to do image processing there . But since it doesn't have any NLP functions , this is good . Now I can focus on feature selection exclusively . Thanks mate .
ruclips.net/video/QNS_jDdmQtc/видео.html
u combine humour and machine learning.....thanx sentdex.
why would someone dislike this very very useful video?
some people just like to watch the world burn
brother i love your lectures
OMG you got 1M Congo my man...keep doing the great work. :)
ruclips.net/video/QNS_jDdmQtc/видео.html
This is so so so so cool. Thank you sir.
ruclips.net/video/QNS_jDdmQtc/видео.html
I was waiting for "pythonic". So sad. :( Anyway, amazing tutorial! Can't wait for the rest!
and pythonista
And pythoneer
and pythough
"It is very important to import the importer" => After p.2 and p.3 => "It import import import"
gr8 work man :)….hope you get 1M soon
ruclips.net/video/QNS_jDdmQtc/видео.html
Ha ha, I see your face more than my girlfriend's face :)))
404 , GF not found !!!
Is your girlfriend single?
thanks for uploading such an informative videos i really enjoys the way you are making tutorials
ruclips.net/video/QNS_jDdmQtc/видео.html
sentdex: I appreciate your videos very much - thank you for posting them! Just curious why you're not a fan of dark theme for your IDE. It kills my eyes staring at all the white screen. LOL.
yeah, I wanted to ask the same question lol.
ruclips.net/video/QNS_jDdmQtc/видео.html
i love your tuto .good job keep up
ruclips.net/video/QNS_jDdmQtc/видео.html
ruclips.net/video/QNS_jDdmQtc/видео.html
I have a question!
If I use the way you tell the result is :
python
pythonli //Check This one (pythonly).
python
python
and If I use stemming library:
>>> from stemming.porter2 import stem
>>> for w in example_words:
print stem(w)
result :
python
python //different Result than above.
python
python
should I go for nltk or stemming??
ruclips.net/video/QNS_jDdmQtc/видео.html
@@gulcheeracademy so quick
Stemming can be useful for basic tasks, but you lose a lot of the information when doing so, rendering more advanced NLP tasks pretty much impossible. But that's another discussion, thanks for the hands-on demonstration sentdex, suscribed to your account, there's a bunch of stuff I want to check out, keep up the good work!
ruclips.net/video/QNS_jDdmQtc/видео.html
Great tutorial dude!!!!!!
ruclips.net/video/QNS_jDdmQtc/видео.html
Hey Sentdex, thanks a ton for the tuts.
One question: The stemmer here has produced "import" for "important" which entirely changes the meaning. The information is lost here. How are we handling this?
Import actually is the root of Important, when not used to refer to the transfer of goods it is defined as "The meaning or significance of something."
So really the issue is that stemming does not account for homonyms (From what I can tell)
ruclips.net/video/QNS_jDdmQtc/видео.html
Lemmatize is used to keep the real meaning. But in practice stemming is used.
awesome tutorials!
ruclips.net/video/QNS_jDdmQtc/видео.html
I observed something peculiar. For words ending in "-er" it doesn't always return the root form. For example, it returns "beginn" for "beginner" or "forgiv" for "forgiver", or, as in your example "python" for "pythoner", but returns "maker" for "maker" or "eater" for "eater" and so on for many other cases. Why does this happen? I do understand we don't always want to cut down the "-er" at the like for "father".
ruclips.net/video/QNS_jDdmQtc/видео.html
i liked ur video its amazing.
ruclips.net/video/QNS_jDdmQtc/видео.html
Doesnt it completely devoid the text from meaning? Is that not gonna affect our analysis?
very informative but how to resolve the problem of the word 'important' stemmed to 'import' , 'once' to 'onc' etc. ? is using another stemming algorithm a solution to the problem?
Thankyou
can anyone use nltk on other languages? in particular bengali...i can perform pos tagging for bengali but cant seem to do stemming..can it be done?,,,any ideas anyone?
why does it not work with ["happi" , "Happier" , "Happiest" , "Happened" , "Happily"]
Why we go for stemming?
is stemming is used for the avoid the duplication of data in database.
I think that the only purpose it serves is to get rid of the different tenses of the same word as they pretty much hold the same meaning during our analysis.This would help us to save a lot of space in the list of our words.However, also think that it might make us loose some information as well.For instance, it converted "important" to "import".
its like taking the common part in all the words?
Hey Sentdex love your videos. Please I want to know if it is possible to develop an app for nlp using english and another language, that is not yet supported by any translator. Thank you
ruclips.net/video/QNS_jDdmQtc/видео.html
Hi do you know how I would use stemming and lemetization with swear words I have tried it with many test cases and I have not had much success also abbreviations like u,ur,ure etc.. Any help would be greatly appreciated
Could you specify in what order do we perform stemming, tokenization and stopwords removal? kinda confused about it
I think its :
1. Tokenization
2. Stopwords
3. Stemming
ruclips.net/video/QNS_jDdmQtc/видео.html
I have a question:
sample_text = "I am learning python using pythonlearning being pythonly"
final_sentence = [ps.stem(w) for w in token]
Output: [u'I', u'am', u'learn', u'python', u'use', u'pythonlearn', u'be', u'pythonli']
Please let me know why there is extra 'u 'in all words.
I got correct answer. Try once again
There is issue with text encoding here. What is the default encoding setup in your systemd? Look at documentation of unicodedata package
How can I comment the lines so quickly?
hey sentdex i'm working n a project "Text To Speach Synthesis with Expression" will you please help me out
hi,how can i do a stemming to arabic files exel
the python jokes crack me up
I am getting an error:
stem() missing 1 required positional argument: 'word'
while executing it.
You need to instantiate PorterStemmer object like he did in 3 line.
see this for details: stackoverflow.com/questions/53140392/stem-function-error-stem-required-one-positional-argument
How i do this work on my csv file
can we implement a stemmer for an un-supported language in NLTK?
ruclips.net/video/QNS_jDdmQtc/видео.html
thank you a lot ;)
Why not lemmatize? It provides meaningful 'stems' (lemmas)
ruclips.net/video/QNS_jDdmQtc/видео.html
why does it seam not to work with "pythonly" ?
This has probably been said, or not. "onc" could be related to "oncology", etc.
thank
actively doing a python
Hi!!!... My name's Steven and I'm from Colombia. First, your videos are very good and practical. Could you give me a hand with me task?,... It's about make a Treebank using NLTK with Python. I don´t understand how make it, and I have seen many videos about this point, but I can´t make.
I´m using Python 3.4 for Windows 7 and the text is: "La publicación de artículos científicos en Journals internacionales indexados es un valioso indicador de generación de nuevo conocimiento". I know how tokenize, but I can´t make a tree bank with the text.
Thanks, I hope a answer.
Best Regards:
Steven Serrato.
ruclips.net/video/QNS_jDdmQtc/видео.html
i wanna stemming Indonesia, how?
4:36 pythonly pythoned!
from the example code...
for w in example_words:
print (ps.stem(w))
this is not working if i use...
from nltk.stem import PorterStemmer as ps
without the...
ps = PorterStemmer()
why is that?
thanks in advance.
from nltk.stem import PorterStemmer
imports a class definition. however, it's not "usable" until you instantiate it
ps = PorterStemmer()
instantiates a class. more detailed info here
docs.python.org/3/tutorial/classes.html
by typing
from nltk.stem import PorterStemmer as ps
you're just assigning some alias to a PorterStemmer class definition, but not instantiating it
SnowballStemmer can be better than PorterStemmer...
I learnt that WordNet replaces stemming now days
Sir, you look like "Edward snowden"
So I've heard :P
Exactly
Traceback (most recent call last):
File "C:\Users\Asma\AppData\Local\Programs\Python\Python38\senti-text-classifier.py", line 23, in
print(ps.stem(t))
TypeError: stem() missing 1 required positional argument: 'word'
hmmm it is not very efficient method I would say. I know that the video might be pretty old and new methods might already be present but: Stemming is a bit weird. Valuable information as "important" and "once" are utterly destroyed by this stemming :S. I know that for the word Python it might help us a lot but generally speaking, the sentence looses a lot of value.
stemming is a way to extract further information. It's not the first and only pre-processing step, it's just a pre-processing step. You still take account of tenses, ..etc, but you use stemming mainly for dictionary lookups.
He handles himself pythonly. Lol!
Pythonly.HAHAHAHAH :D
Haha! But i lerning turkish and for me it means like: "like Python", "how do Python"
use snowball, that would be more accurate
Lemmatization is better than stemming
pythonely
The way he says "tokenize" sounds like a stoner! 😆😎
Oncology.
you haven't answered me yet. will you marry me???? :P
Zero explanation of why pythonly returned pythonli and zero dislike did not go well together so...
Thanks for explaining why you decided to dislike the video.
Thanks for choosing the less bitchy reply.
At least one of us has to try to be mature about things. I'm working on trying to still serve a positive response, even in the face of situations like one where someone offers free high quality tutorials, and, when something that is obvious to most isn't obvious to that one person, they decide to still dislike the entire video, rather than simply asking a question. Undoubtedly, this is still a work in progress. Good luck with your stuff.
Wait the reason it returned "pythonli" was obvious?
I disliked the whole video because 0 dislike (should) means perfect video which this video fails to be and btw not only because of this pythonli thing. You are just too slow. Your videos are(/seem to be) unprepared and too long for such little information. And very shallow. I kinda get a feeling that you read one tutorial and you are just voicing it.
You could have made 100 really good videos but instead you decided to do a lot more crap? That's what angers me. You could have been more valuable but you did not manage your time (and ours by making us watch 20% waiting, 30% coding 20% watch you not knowing something, 20% just unimportant talk, 10% real info) properly so. Maybe this is only true for NLTK series. I'll watch your other videos too and find out.
Anyways, I think your videos are worth some good value for some people so...
Hey Sentdex. Dont listen to the idiotic comments.
ps.stem(w)?? how do you use that ?