Stemming - Natural Language Processing With Python and NLTK p.3

sentdex

Просмотров 200 тыс.

2 000

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 5 фев 2025
Another form of data pre-processing with natural language processing is called "stemming."
This is the process where we remove word affixes from the end of words.
The reason we would do this is so that we do not need to store the meaning of every single tense of a word. For example:
Reader
Reading
Read
Aside from tense, and even one of these is a noun, they all have the same meaning for their "root" stem (read).
This way, we store one single value for the root stem of "read." Then, when we wish to learn more, we can look into the affixes that were on the end, like "ing" is an active word, or in the past, then you have reader as someone who reads... then just plain read as either past tense or current.
sample code: pythonprogrammi...
hkinsley.com
/ sentdex
sentdex.com
seaofbtc.com

Комментарии • 116

@KarateLizard 8 лет назад ⁺⁹
Your channel is one of the most informative I have watched. Keep at it!
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@sadenb 8 лет назад ⁺⁴
Thanks to this . It brought down a lot of worries regarding implementation . I thought I would use MATLAB as I used to do image processing there . But since it doesn't have any NLP functions , this is good . Now I can focus on feature selection exclusively . Thanks mate .
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@TJ-wo1xt 3 года назад ⁺¹
u combine humour and machine learning.....thanx sentdex.
@Finance-tips2.0 6 лет назад ⁺²
why would someone dislike this very very useful video?
@sentdex 6 лет назад ⁺⁴
some people just like to watch the world burn
@oriabnu1 6 лет назад
brother i love your lectures
@zarghamkhan6039 4 года назад
OMG you got 1M Congo my man...keep doing the great work. :)
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@syremusic_ 4 года назад
This is so so so so cool. Thank you sir.
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@CSryand2m 8 лет назад ⁺³⁰
I was waiting for "pythonic". So sad. :( Anyway, amazing tutorial! Can't wait for the rest!
@farhan119 7 лет назад ⁺³
and pythonista
@jackleone4150 5 лет назад
And pythoneer
@agentNirmites 5 лет назад
and pythough
@rh2926 5 лет назад ⁺³
"It is very important to import the importer" => After p.2 and p.3 => "It import import import"
@zarghamkhan6039 4 года назад
gr8 work man :)….hope you get 1M soon
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@leloc9067 8 лет назад ⁺⁵²
Ha ha, I see your face more than my girlfriend's face :)))
@VishwajeetPol 7 лет назад ⁺¹⁰
404 , GF not found !!!
@heinzguderian9980 6 лет назад
Is your girlfriend single?
@luqmanali4851 7 лет назад
thanks for uploading such an informative videos i really enjoys the way you are making tutorials
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@danwest9900 4 года назад ⁺³
sentdex: I appreciate your videos very much - thank you for posting them! Just curious why you're not a fan of dark theme for your IDE. It kills my eyes staring at all the white screen. LOL.
@lesserknownfacts7849 3 года назад
yeah, I wanted to ask the same question lol.
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@maissainfo8565 6 лет назад ⁺¹
i love your tuto .good job keep up
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@shivGGG 9 лет назад ⁺⁵
I have a question!
If I use the way you tell the result is :
python
pythonli //Check This one (pythonly).
python
python
and If I use stemming library:
>>> from stemming.porter2 import stem
>>> for w in example_words:
print stem(w)
result :
python
python //different Result than above.
python
python
should I go for nltk or stemming??
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@shivGGG 3 года назад
@@gulcheeracademy so quick
@FatalMojo 9 лет назад ⁺³
Stemming can be useful for basic tasks, but you lose a lot of the information when doing so, rendering more advanced NLP tasks pretty much impossible. But that's another discussion, thanks for the hands-on demonstration sentdex, suscribed to your account, there's a bunch of stuff I want to check out, keep up the good work!
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@DanielOteroRobles 7 лет назад
Great tutorial dude!!!!!!
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@prakul619 9 лет назад ⁺⁵
Hey Sentdex, thanks a ton for the tuts.
One question: The stemmer here has produced "import" for "important" which entirely changes the meaning. The information is lost here. How are we handling this?
@danielpitti5692 7 лет назад ⁺⁴
Import actually is the root of Important, when not used to refer to the transfer of goods it is defined as "The meaning or significance of something."
@danielpitti5692 7 лет назад
So really the issue is that stemming does not account for homonyms (From what I can tell)
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@gulcheeracademy 3 года назад
Lemmatize is used to keep the real meaning. But in practice stemming is used.
@shauryakhurana1743 7 лет назад
awesome tutorials!
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@shrinivasiyengar5799 6 лет назад
I observed something peculiar. For words ending in "-er" it doesn't always return the root form. For example, it returns "beginn" for "beginner" or "forgiv" for "forgiver", or, as in your example "python" for "pythoner", but returns "maker" for "maker" or "eater" for "eater" and so on for many other cases. Why does this happen? I do understand we don't always want to cut down the "-er" at the like for "father".
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@shivsambchonde3333 7 лет назад
i liked ur video its amazing.
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@123ankit321 6 лет назад
Doesnt it completely devoid the text from meaning? Is that not gonna affect our analysis?
@sureshkhatri7321 5 лет назад
very informative but how to resolve the problem of the word 'important' stemmed to 'import' , 'once' to 'onc' etc. ? is using another stemming algorithm a solution to the problem?
Thankyou
@bingochipspass08 8 лет назад ⁺¹
can anyone use nltk on other languages? in particular bengali...i can perform pos tagging for bengali but cant seem to do stemming..can it be done?,,,any ideas anyone?
@ruchiagrawal600 5 лет назад
why does it not work with ["happi" , "Happier" , "Happiest" , "Happened" , "Happily"]
@Racers_Club 4 года назад
Why we go for stemming?
@shivsambchonde3333 7 лет назад
is stemming is used for the avoid the duplication of data in database.
@TheGamerGuy201 7 лет назад
I think that the only purpose it serves is to get rid of the different tenses of the same word as they pretty much hold the same meaning during our analysis.This would help us to save a lot of space in the list of our words.However, also think that it might make us loose some information as well.For instance, it converted "important" to "import".
@s1th626 6 лет назад
its like taking the common part in all the words?
@kyomdonalddogo5775 4 года назад
Hey Sentdex love your videos. Please I want to know if it is possible to develop an app for nlp using english and another language, that is not yet supported by any translator. Thank you
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@theCanadian808 8 лет назад
Hi do you know how I would use stemming and lemetization with swear words I have tried it with many test cases and I have not had much success also abbreviations like u,ur,ure etc.. Any help would be greatly appreciated
@RafiyaShaikhrafiya 7 лет назад ⁺¹
Could you specify in what order do we perform stemming, tokenization and stopwords removal? kinda confused about it
@muhammadmuinmundzir9981 5 лет назад
I think its :
1. Tokenization
2. Stopwords
3. Stemming
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@28gakhar 8 лет назад
I have a question:
sample_text = "I am learning python using pythonlearning being pythonly"
final_sentence = [ps.stem(w) for w in token]
Output: [u'I', u'am', u'learn', u'python', u'use', u'pythonlearn', u'be', u'pythonli']
Please let me know why there is extra 'u 'in all words.
@Karan-ow4wl 8 лет назад
I got correct answer. Try once again
@jaspindersingh7866 7 лет назад
There is issue with text encoding here. What is the default encoding setup in your systemd? Look at documentation of unicodedata package
@mariacamiladurangobarrera2821 4 года назад
How can I comment the lines so quickly?
@monismaqsood1982 6 лет назад
hey sentdex i'm working n a project "Text To Speach Synthesis with Expression" will you please help me out
@bilalshakir1212 6 лет назад
hi,how can i do a stemming to arabic files exel
@makwelewishbert555 2 года назад
the python jokes crack me up
@nabhavlogs371 6 лет назад
I am getting an error:
stem() missing 1 required positional argument: 'word'
while executing it.
@RamakrishnaAppicharla 6 лет назад ⁺²
You need to instantiate PorterStemmer object like he did in 3 line.
see this for details: stackoverflow.com/questions/53140392/stem-function-error-stem-required-one-positional-argument
@muthupandynagendran 6 лет назад
How i do this work on my csv file
@rajjad 7 лет назад
can we implement a stemmer for an un-supported language in NLTK?
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@jaddabliz 8 лет назад
thank you a lot ;)
@fro4e 6 лет назад
Why not lemmatize? It provides meaningful 'stems' (lemmas)
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@charlesenglebert8226 7 лет назад
why does it seam not to work with "pythonly" ?
@itsgoubie 7 лет назад
This has probably been said, or not. "onc" could be related to "oncology", etc.
@СолодушкинСвятослав 2 года назад
thank
@0oj4m1t 5 лет назад ⁺¹
actively doing a python
@eulenstevenserrato202 9 лет назад
Hi!!!... My name's Steven and I'm from Colombia. First, your videos are very good and practical. Could you give me a hand with me task?,... It's about make a Treebank using NLTK with Python. I don´t understand how make it, and I have seen many videos about this point, but I can´t make.
I´m using Python 3.4 for Windows 7 and the text is: "La publicación de artículos científicos en Journals internacionales indexados es un valioso indicador de generación de nuevo conocimiento". I know how tokenize, but I can´t make a tree bank with the text.
Thanks, I hope a answer.
Best Regards:
Steven Serrato.
@gulcheeracademy 3 года назад
ruclips.net/video/QNS_jDdmQtc/видео.html
@stevenroy5851 6 лет назад
i wanna stemming Indonesia, how?
@danielniels22 3 года назад
4:36 pythonly pythoned!
@denissetiawan3645 7 лет назад
from the example code...
for w in example_words:
print (ps.stem(w))
this is not working if i use...
from nltk.stem import PorterStemmer as ps
without the...
ps = PorterStemmer()
why is that?
thanks in advance.
@DmitrijSchmidt 6 лет назад ⁺¹
from nltk.stem import PorterStemmer
imports a class definition. however, it's not "usable" until you instantiate it
ps = PorterStemmer()
instantiates a class. more detailed info here
docs.python.org/3/tutorial/classes.html
by typing
from nltk.stem import PorterStemmer as ps
you're just assigning some alias to a PorterStemmer class definition, but not instantiating it
@joaomateusemilio5467 7 лет назад ⁺³
SnowballStemmer can be better than PorterStemmer...
@cameron.stewart 6 лет назад
I learnt that WordNet replaces stemming now days
@VishwajeetPol 7 лет назад ⁺¹
Sir, you look like "Edward snowden"
@sentdex 7 лет назад
So I've heard :P
@danusharumugam2723 5 лет назад
Exactly
@shaikasmazabi1359 4 года назад ⁺¹
Traceback (most recent call last):
File "C:\Users\Asma\AppData\Local\Programs\Python\Python38\senti-text-classifier.py", line 23, in
print(ps.stem(t))
TypeError: stem() missing 1 required positional argument: 'word'
@AdventureForgeStudio 7 лет назад
hmmm it is not very efficient method I would say. I know that the video might be pretty old and new methods might already be present but: Stemming is a bit weird. Valuable information as "important" and "once" are utterly destroyed by this stemming :S. I know that for the word Python it might help us a lot but generally speaking, the sentence looses a lot of value.
@sentdex 7 лет назад ⁺¹
stemming is a way to extract further information. It's not the first and only pre-processing step, it's just a pre-processing step. You still take account of tenses, ..etc, but you use stemming mainly for dictionary lookups.
@musiclover21187 8 лет назад
He handles himself pythonly. Lol!
@karanbirchahal3268 9 лет назад ⁺⁴
Pythonly.HAHAHAHAH :D
@araslanrus 5 лет назад
Haha! But i lerning turkish and for me it means like: "like Python", "how do Python"
@SHASHANKRUSTAGII 6 лет назад
use snowball, that would be more accurate
@danialmalik80 5 лет назад
Lemmatization is better than stemming
@gmemetics 9 лет назад
pythonely
@themaggattack 6 лет назад ⁺³
The way he says "tokenize" sounds like a stoner! 😆😎
@fulliculli 8 лет назад
Oncology.
@kiriakipoursaitidou2732 6 лет назад
you haven't answered me yet. will you marry me???? :P
@Gioeufshi 8 лет назад
Zero explanation of why pythonly returned pythonli and zero dislike did not go well together so...
@sentdex 8 лет назад ⁺¹
Thanks for explaining why you decided to dislike the video.
@Gioeufshi 8 лет назад
Thanks for choosing the less bitchy reply.
@sentdex 8 лет назад ⁺⁵
At least one of us has to try to be mature about things. I'm working on trying to still serve a positive response, even in the face of situations like one where someone offers free high quality tutorials, and, when something that is obvious to most isn't obvious to that one person, they decide to still dislike the entire video, rather than simply asking a question. Undoubtedly, this is still a work in progress. Good luck with your stuff.
@Gioeufshi 8 лет назад
Wait the reason it returned "pythonli" was obvious?
I disliked the whole video because 0 dislike (should) means perfect video which this video fails to be and btw not only because of this pythonli thing. You are just too slow. Your videos are(/seem to be) unprepared and too long for such little information. And very shallow. I kinda get a feeling that you read one tutorial and you are just voicing it.
You could have made 100 really good videos but instead you decided to do a lot more crap? That's what angers me. You could have been more valuable but you did not manage your time (and ours by making us watch 20% waiting, 30% coding 20% watch you not knowing something, 20% just unimportant talk, 10% real info) properly so. Maybe this is only true for NLTK series. I'll watch your other videos too and find out.
Anyways, I think your videos are worth some good value for some people so...
@Karan-ow4wl 8 лет назад
Hey Sentdex. Dont listen to the idiotic comments.
@piyushraj9561 6 лет назад ⁺¹
ps.stem(w)?? how do you use that ?

Следующие

Автовоспроизведение

Part of Speech Tagging - Natural Language Processing With Python and NLTK p.4