Analyzing Text Data with R on Windows
HTML-код
- Опубликовано: 2 окт 2024
- Provides introduction to text mining with r on a Windows computer. Text analytics related topics include:
reading txt or csv file
cleaning of text data
creating term document matrix
making wordcloud and barplots.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.
Excellent Video Sir,
Very useful, thank you Professor.
+Zhiyou Pang 👍👍👍
please Sir make video on build a model Convolutional Recurrent Neural Network for text recognition .
Thanks for the suggestion, I'm adding it to my list.
Thanks for this Gr8 and simple video.
I have 200k rows in dataset and in 1st and 2nd column consist of sentences and i have to predict cosine similarity between 1 & 2 column into 3rd column
ex:- 1st column : who is ramesh , 2nd column I'm not a ramesh singh and in 3rd column: 0.70 (which is there cosine value )
how to approach this problem.
You're such an excellent and clear teacher ! thank you.
Question: how do you deal with names? First names and last names are separated into 2 different words. How to merge them into one so that the bar plot visualizes them not as separate ?
could you please share the code files and data file
email?
In the bar plot some words are out of the box , i tried with cex.axis but it doesnt fix also i tried with axis(1,cex.axis=0.5) but it still cuts some letter of the words .So is it a R studio problem or is their a way to this
Try las = 2
I did it with it .. but it doesn’t work. It cuts one or two letter of some long words . It even did in your code at 14:16 .. is their a way we could fix it
And thank you for considering my doubt ❤️ you are amazing
Dear Dr. Rai,
Thank you for another excellent tutorial. I have gained many skills from your tutorials.
While running the code dtm
You can try just this:
dtm
Can you make video on tokenization in R language
Yes, hopefully soon.
Hello sir I have followed the same process and want to make Sankey and node diagram but am getting an error, can u help me out in making the plot
Dear professor ,first of all i want to say big thanks to you for all the videos which you had been posted here .I require small help from you .I want " Text categorization "code in R using NLP .i found python code related to text categorization but not in R then i remembered you .Please help me with the code sir . If you ca help me the code as soon as possible i am very much happy .
Related Python code : aqibsaeed.github.io/2016-07-26-text-classification/
Thanks for your feedback! I'll keep this for future due to time constraint.
Thanks for your great channel. I am wondering, could you please teach us about regex library? - (i.e. how to search questions in a text file save it in other formats like CSV)
Thanks, I've added this to my list.
Hi
I am little lost , My question is how to find the data as you mention you have downloaded from codes website? can you please help me by explaining how to obtain that .
thanks
Fot steps to get data directly from Twitter, you can use this link:
ruclips.net/video/QETCjkQ3CBw/видео.html
Hello Bharat,
Greeting for the day!
First, need to congratulate for your efforts on creating videos.
Thank you for very much.
Need your help.
Was trying to use "TermDocumentMartrix" but there am facing an error, below is the error FYI.
"Error in simple_triplet_matrix(i, j, v, nrow = length(terms), ncol = length(corpus), : 'i, j' invalid"
Hope you will help me in this.
Best Regards,
Murali
Hey Bharat,
Got the solution.
Once again thanks for sharing the knowledge.
Regards,
Murali
That's great!
sometime R doesn't return a corpus use
cleantxt
sir please make a video related to the tweets polarity and ggplot and maps related to the tweets origin
Try this link it may help with some questions that you may have:
ruclips.net/video/otoXeVPhT7Q/видео.html
while executing code
dtm
I've sent you the code file. You can review that.
I found the same problem,How did you solve the problem?
Hi Sir,
Thank u very much. It's a great tutorial.
I have two question
1)How to fix spelling mistake of a word in the corpus and replace with the correct word?
2)Is R able to handle if I have 5 lacs comment to analyse?
1) You will have to indicate which mistake should be replaced by which word in the code. This video includes example of how to replace words with new one.
2) 5 lacs can be very easily handled in R.
thank you sir
Hello sir,
I am getting following error message, please assist
Warning message:
In tm_map.SimpleCorpus(corpus, tolower) : transformation drops documents
In R warning messages are ok. It's not an error.
@@bkrai I have the same problem
💐💐👌👌
Thanks!
At the last moment, when I run the last code I received the following error:
Error in if (grepl(tails, words[i])) ht
Please share the link to download text file and share the code as well
Here is the link to data file:
sites.google.com/site/raibharatendra/home/text-analytics
sir i am getting : Error in barplot(termFrequency, las = 2, col = rainbow(20)) :
object 'termFrequency' not found
Check code or spelling of 'termFrequency' in the previous lines.
Thank you so much for your prompt response. Do you have any other video about Data analysis using R?
You can find a wide range of data analysis using R topics from my following playlist:
ruclips.net/p/PL34t5iLfZddv8tJkZboegN6tmyh2-zr_T
For some advanced classification and prediction methods in R you can use this:
ruclips.net/p/PL34t5iLfZddu8M0jd7pjSVUjvjBOBdYZ1
Bharatendra Rai
why dont u add the files here?
R and data files are available with this one:
ruclips.net/video/otoXeVPhT7Q/видео.html
Any idea how to remove emoticons and smileys from the review in tm_map () func.
Dear Sir,
I am getting the following error. could you please check. Thanks
> cleanset dtm
For the following line:
dtm
Hello Sir, I am getting this error:
dtm
probably there may be an issue in the earlier commands. I would suggest look at lines previous to this and run them again.
Error in simple_triplet_matrix(i, j, v, nrow = length(terms), ncol = length(corpus), :
'i, j' invalid as getting this error also checked syntax but everything is fine according to the video
Am facing same error ... did you get a fix for this ?
sometime R doesn't return a corpus use
cleantxt
Thank you sir for a great tutorial
Thanks for comments!
Sir may I ask what if the text file contains special characters? like ("" \ /), I tried the suggested commands, but it doesn't seem working properly.
read about gsub[] and regex patterns and use like this
------------------------------------------------------------------------------------------------------------
replace
Hi Sir, does the terms "creating corpus" and "tokens (tokenization)" are one and the same ???
corpus is collection of documents. For example, if you are analyzing 1000 tweets, each tweet may be treated as a documents and by creating corpus you are creating a collection of 1000 documents. However, each word or a group of words in a tweet can be made a token.
sir, pls send me the .txt file and R code
email id?
@@bkrai milan.majumder@outlook.com
all set.
thank you sir
Good but very basic and old technique in text analytics...
You can get related and more recent ones from here :
ruclips.net/p/PL34t5iLfZddt0tt5GdDy3ny6X5RQvwrp6
when executing this line
dtm
Are you using Mac or a Windows computer?
windows os sir..and i want to work with sentiment package but i cant install it ..i have installed sentimentr package plz send me the link for the sentiment analysis finding word polarity ...plz sir
plz sir rply me..i am using windows os and i ahve to work with sentiment package but i am unble to install it plz hlp me..
Digvijay kumar
link for the data set file used in this video,please?
Here is the link:
sites.google.com/site/raibharatendra/home/text-analytics
Hi sir,please kindly share the rfile and data
all set.
I am getting this error....how to solve this on running dtm
Error in simple_triplet_matrix(i, j, v, nrow = length(terms), ncol = length(corpus), :
'i, j' invalid
Probably there may be some syntax error.
@@bkrai Hi Sir, Your this video is very simple and helpful. I followed the same but getting the same above error, please suggest the write way to come out on ashishs80@gmail.com
@@bkrai sir i am getting this same error please help i can send u the code on your mail if u can provide me with this pls
sir can you give the code for above
email id?
akkimalhotra26@gmail.com
best regards