Analyzing Text Data with R on Windows

Поделиться
HTML-код
  • Опубликовано: 2 окт 2024
  • Provides introduction to text mining with r on a Windows computer. Text analytics related topics include:
    reading txt or csv file
    cleaning of text data
    creating term document matrix
    making wordcloud and barplots.
    R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Комментарии • 221

  • @NewITWorld
    @NewITWorld 6 лет назад +1

    Excellent Video Sir,

  • @zhiyoupang5916
    @zhiyoupang5916 9 лет назад +1

    Very useful, thank you Professor.

    • @bkrai
      @bkrai  8 лет назад

      +Zhiyou Pang 👍👍👍

  • @sk93359
    @sk93359 4 года назад +1

    please Sir make video on build a model Convolutional Recurrent Neural Network for text recognition .

    • @bkrai
      @bkrai  4 года назад +1

      Thanks for the suggestion, I'm adding it to my list.

  • @devawratvidhate9093
    @devawratvidhate9093 5 лет назад +1

    Thanks for this Gr8 and simple video.
    I have 200k rows in dataset and in 1st and 2nd column consist of sentences and i have to predict cosine similarity between 1 & 2 column into 3rd column
    ex:- 1st column : who is ramesh , 2nd column I'm not a ramesh singh and in 3rd column: 0.70 (which is there cosine value )
    how to approach this problem.

  • @ericrichard4940
    @ericrichard4940 5 лет назад +1

    You're such an excellent and clear teacher ! thank you.
    Question: how do you deal with names? First names and last names are separated into 2 different words. How to merge them into one so that the bar plot visualizes them not as separate ?

  • @rajatbathla1
    @rajatbathla1 4 года назад +1

    could you please share the code files and data file

    • @bkrai
      @bkrai  4 года назад

      email?

  • @NiteshKumar-tj7bc
    @NiteshKumar-tj7bc 5 лет назад +1

    In the bar plot some words are out of the box , i tried with cex.axis but it doesnt fix also i tried with axis(1,cex.axis=0.5) but it still cuts some letter of the words .So is it a R studio problem or is their a way to this

    • @bkrai
      @bkrai  5 лет назад

      Try las = 2

    • @NiteshKumar-tj7bc
      @NiteshKumar-tj7bc 5 лет назад

      I did it with it .. but it doesn’t work. It cuts one or two letter of some long words . It even did in your code at 14:16 .. is their a way we could fix it

    • @NiteshKumar-tj7bc
      @NiteshKumar-tj7bc 5 лет назад

      And thank you for considering my doubt ❤️ you are amazing

  • @ArjunCPArjun
    @ArjunCPArjun 4 года назад +1

    Dear Dr. Rai,
    Thank you for another excellent tutorial. I have gained many skills from your tutorials.
    While running the code dtm

    • @bkrai
      @bkrai  4 года назад

      You can try just this:
      dtm

  • @DnyaneshwarPanchaldsp
    @DnyaneshwarPanchaldsp 3 года назад +1

    Can you make video on tokenization in R language

    • @bkrai
      @bkrai  3 года назад

      Yes, hopefully soon.

  • @shraykumar7366
    @shraykumar7366 Год назад

    Hello sir I have followed the same process and want to make Sankey and node diagram but am getting an error, can u help me out in making the plot

  • @VenkateshDataScientist
    @VenkateshDataScientist 7 лет назад

    Dear professor ,first of all i want to say big thanks to you for all the videos which you had been posted here .I require small help from you .I want " Text categorization "code in R using NLP .i found python code related to text categorization but not in R then i remembered you .Please help me with the code sir . If you ca help me the code as soon as possible i am very much happy .
    Related Python code : aqibsaeed.github.io/2016-07-26-text-classification/

    • @bkrai
      @bkrai  7 лет назад

      Thanks for your feedback! I'll keep this for future due to time constraint.

  • @Didanihaaaa
    @Didanihaaaa 5 лет назад +1

    Thanks for your great channel. I am wondering, could you please teach us about regex library? - (i.e. how to search questions in a text file save it in other formats like CSV)

    • @bkrai
      @bkrai  5 лет назад +1

      Thanks, I've added this to my list.

  • @mdabusayeed6108
    @mdabusayeed6108 6 лет назад +1

    Hi
    I am little lost , My question is how to find the data as you mention you have downloaded from codes website? can you please help me by explaining how to obtain that .
    thanks

    • @bkrai
      @bkrai  6 лет назад

      Fot steps to get data directly from Twitter, you can use this link:
      ruclips.net/video/QETCjkQ3CBw/видео.html

  • @muralidhara2063
    @muralidhara2063 6 лет назад +1

    Hello Bharat,
    Greeting for the day!
    First, need to congratulate for your efforts on creating videos.
    Thank you for very much.
    Need your help.
    Was trying to use "TermDocumentMartrix" but there am facing an error, below is the error FYI.
    "Error in simple_triplet_matrix(i, j, v, nrow = length(terms), ncol = length(corpus), : 'i, j' invalid"
    Hope you will help me in this.
    Best Regards,
    Murali
    Hey Bharat,
    Got the solution.
    Once again thanks for sharing the knowledge.
    Regards,
    Murali

    • @bkrai
      @bkrai  6 лет назад

      That's great!

    • @NiteshKumar-tj7bc
      @NiteshKumar-tj7bc 5 лет назад

      sometime R doesn't return a corpus use
      cleantxt

  • @KnowledgeADDA-n3c
    @KnowledgeADDA-n3c 5 лет назад +1

    sir please make a video related to the tweets polarity and ggplot and maps related to the tweets origin

    • @bkrai
      @bkrai  5 лет назад

      Try this link it may help with some questions that you may have:
      ruclips.net/video/otoXeVPhT7Q/видео.html

  • @MukeshKumar-mp5kc
    @MukeshKumar-mp5kc 7 лет назад

    while executing code
    dtm

    • @bkrai
      @bkrai  7 лет назад

      I've sent you the code file. You can review that.

    • @jasonyao3762
      @jasonyao3762 2 года назад

      I found the same problem,How did you solve the problem?

  • @kumarmithun2723
    @kumarmithun2723 6 лет назад +1

    Hi Sir,
    Thank u very much. It's a great tutorial.
    I have two question
    1)How to fix spelling mistake of a word in the corpus and replace with the correct word?
    2)Is R able to handle if I have 5 lacs comment to analyse?

    • @bkrai
      @bkrai  6 лет назад

      1) You will have to indicate which mistake should be replaced by which word in the code. This video includes example of how to replace words with new one.
      2) 5 lacs can be very easily handled in R.

    • @kumarmithun2723
      @kumarmithun2723 6 лет назад +1

      thank you sir

  • @goodyo1239
    @goodyo1239 6 лет назад +1

    Hello sir,
    I am getting following error message, please assist
    Warning message:
    In tm_map.SimpleCorpus(corpus, tolower) : transformation drops documents

    • @bkrai
      @bkrai  6 лет назад

      In R warning messages are ok. It's not an error.

    • @zahedarman6092
      @zahedarman6092 4 года назад

      @@bkrai I have the same problem

  • @DnyaneshwarPanchaldsp
    @DnyaneshwarPanchaldsp 3 года назад +1

    💐💐👌👌

    • @bkrai
      @bkrai  3 года назад

      Thanks!

  • @zahedarman6092
    @zahedarman6092 4 года назад

    At the last moment, when I run the last code I received the following error:
    Error in if (grepl(tails, words[i])) ht

  • @avinash80522
    @avinash80522 8 лет назад

    Please share the link to download text file and share the code as well

    • @bkrai
      @bkrai  8 лет назад

      Here is the link to data file:
      sites.google.com/site/raibharatendra/home/text-analytics

  • @randomslugger
    @randomslugger 5 лет назад

    sir i am getting : Error in barplot(termFrequency, las = 2, col = rainbow(20)) :
    object 'termFrequency' not found

    • @bkrai
      @bkrai  5 лет назад

      Check code or spelling of 'termFrequency' in the previous lines.

  • @matharbarghi
    @matharbarghi 7 лет назад +1

    Thank you so much for your prompt response. Do you have any other video about Data analysis using R?

    • @bkrai
      @bkrai  7 лет назад

      You can find a wide range of data analysis using R topics from my following playlist:
      ruclips.net/p/PL34t5iLfZddv8tJkZboegN6tmyh2-zr_T
      For some advanced classification and prediction methods in R you can use this:
      ruclips.net/p/PL34t5iLfZddu8M0jd7pjSVUjvjBOBdYZ1

    • @didacekamana7295
      @didacekamana7295 7 лет назад

      Bharatendra Rai

  • @alestassi
    @alestassi 4 года назад +1

    why dont u add the files here?

    • @bkrai
      @bkrai  4 года назад

      R and data files are available with this one:
      ruclips.net/video/otoXeVPhT7Q/видео.html

  • @tanmaygawade1068
    @tanmaygawade1068 3 года назад

    Any idea how to remove emoticons and smileys from the review in tm_map () func.

  • @akkimalhotra26
    @akkimalhotra26 7 лет назад

    Dear Sir,
    I am getting the following error. could you please check. Thanks
    > cleanset dtm

    • @bkrai
      @bkrai  7 лет назад

      For the following line:
      dtm

  • @harman5981
    @harman5981 7 лет назад

    Hello Sir, I am getting this error:
    dtm

    • @bkrai
      @bkrai  7 лет назад

      probably there may be an issue in the earlier commands. I would suggest look at lines previous to this and run them again.

  • @randomslugger
    @randomslugger 5 лет назад

    Error in simple_triplet_matrix(i, j, v, nrow = length(terms), ncol = length(corpus), :
    'i, j' invalid as getting this error also checked syntax but everything is fine according to the video

    • @girish5783
      @girish5783 5 лет назад

      Am facing same error ... did you get a fix for this ?

    • @NiteshKumar-tj7bc
      @NiteshKumar-tj7bc 5 лет назад

      sometime R doesn't return a corpus use
      cleantxt

  • @anigov
    @anigov 6 лет назад +1

    Thank you sir for a great tutorial

    • @bkrai
      @bkrai  6 лет назад +1

      Thanks for comments!

  • @bbbaaa7131
    @bbbaaa7131 5 лет назад

    Sir may I ask what if the text file contains special characters? like ("" \ /), I tried the suggested commands, but it doesn't seem working properly.

    • @NiteshKumar-tj7bc
      @NiteshKumar-tj7bc 5 лет назад

      read about gsub[] and regex patterns and use like this
      ------------------------------------------------------------------------------------------------------------
      replace

  • @vishnukowndinya
    @vishnukowndinya 7 лет назад

    Hi Sir, does the terms "creating corpus" and "tokens (tokenization)" are one and the same ???

    • @bkrai
      @bkrai  7 лет назад +1

      corpus is collection of documents. For example, if you are analyzing 1000 tweets, each tweet may be treated as a documents and by creating corpus you are creating a collection of 1000 documents. However, each word or a group of words in a tweet can be made a token.

  • @send2milan
    @send2milan 5 лет назад +1

    sir, pls send me the .txt file and R code

    • @bkrai
      @bkrai  5 лет назад

      email id?

    • @send2milan
      @send2milan 5 лет назад +1

      @@bkrai milan.majumder@outlook.com

    • @bkrai
      @bkrai  5 лет назад

      all set.

    • @send2milan
      @send2milan 5 лет назад +1

      thank you sir

  • @anuragsharma5208
    @anuragsharma5208 7 лет назад

    Good but very basic and old technique in text analytics...

    • @bkrai
      @bkrai  7 лет назад

      You can get related and more recent ones from here :
      ruclips.net/p/PL34t5iLfZddt0tt5GdDy3ny6X5RQvwrp6

  • @digvijaykumar2498
    @digvijaykumar2498 8 лет назад

    when executing this line
    dtm

    • @bkrai
      @bkrai  8 лет назад

      Are you using Mac or a Windows computer?

    • @digvijaykumar2498
      @digvijaykumar2498 8 лет назад

      windows os sir..and i want to work with sentiment package but i cant install it ..i have installed sentimentr package plz send me the link for the sentiment analysis finding word polarity ...plz sir

    • @digvijaykumar2498
      @digvijaykumar2498 8 лет назад

      plz sir rply me..i am using windows os and i ahve to work with sentiment package but i am unble to install it plz hlp me..

    • @didacekamana7295
      @didacekamana7295 7 лет назад

      Digvijay kumar

  • @piyushpahwa7897
    @piyushpahwa7897 8 лет назад

    link for the data set file used in this video,please?

    • @bkrai
      @bkrai  8 лет назад

      Here is the link:
      sites.google.com/site/raibharatendra/home/text-analytics

  • @bharathjc4700
    @bharathjc4700 7 лет назад

    Hi sir,please kindly share the rfile and data

    • @bkrai
      @bkrai  7 лет назад

      all set.

  • @harendrakumar7647
    @harendrakumar7647 7 лет назад

    I am getting this error....how to solve this on running dtm
    Error in simple_triplet_matrix(i, j, v, nrow = length(terms), ncol = length(corpus), :
    'i, j' invalid

    • @bkrai
      @bkrai  7 лет назад

      Probably there may be some syntax error.

    • @ashishsingh6329
      @ashishsingh6329 5 лет назад

      @@bkrai Hi Sir, Your this video is very simple and helpful. I followed the same but getting the same above error, please suggest the write way to come out on ashishs80@gmail.com

    • @salilsaurov3663
      @salilsaurov3663 5 лет назад

      @@bkrai sir i am getting this same error please help i can send u the code on your mail if u can provide me with this pls

  • @akkimalhotra26
    @akkimalhotra26 7 лет назад

    sir can you give the code for above

    • @bkrai
      @bkrai  7 лет назад

      email id?

    • @akkimalhotra26
      @akkimalhotra26 7 лет назад

      akkimalhotra26@gmail.com
      best regards