Tutorial on topic modelling in r tutorial

Поделиться
HTML-код
  • Опубликовано: 31 янв 2025

Комментарии • 81

  • @wanwantang9971
    @wanwantang9971 Год назад +2

    Thank you so much! You would never know how your video saved my life!! So thank you for your clear instructions!!! Have a nice day~

  • @umermuhammad826
    @umermuhammad826 Год назад +1

    Awesome video!! I learned so much from this. Truly grateful for the awesome resource. Thanks a lot.

  • @ShahanShawkat
    @ShahanShawkat 2 года назад +1

    Good day! Thanks a lot for uploading this video it was really helpful to learn topic modeling using R! All the best to you!!! 🙋🏻‍♂️

  • @amandacole2224
    @amandacole2224 2 года назад +2

    Hi Opal, thanks for such a clear demonstration of the process. Now to implement it in my own LDA models.

  • @daynerobinson3551
    @daynerobinson3551 2 года назад +1

    Excellent Tutorial!

  • @lprayaga1
    @lprayaga1 2 года назад

    Awesome to make it so simple and easy for beginners as well

  • @kiroiitori79
    @kiroiitori79 2 года назад

    Hi! Thank you for the video! I've tried it out and it finally worked! If you have time would really love your tutorial on how to conduct topic modeling using STM!!

  • @viniciussilva2997
    @viniciussilva2997 Год назад

    Your videos are so great!

  • @shreyesshah1082
    @shreyesshah1082 Год назад

    Thank you so much your videos helped me a lot!!!

  • @arithamindula5956
    @arithamindula5956 Год назад

    Thank you for this video. helped a lot.

  • @kenyabolt9549
    @kenyabolt9549 3 года назад +1

    Lovely tutorial 💕

  • @researchideas3434
    @researchideas3434 10 месяцев назад

    Hi Opal .thank you for great video. I was able to follow along and produce the barcharts however some column have dash instead of item words. Can you help me how to filter those dashes.thanks

  • @chrismeletakos2964
    @chrismeletakos2964 2 года назад

    This was great. Thank you. Can you please do a video on the Quanteda package?

  • @andreapearce1346
    @andreapearce1346 2 года назад

    Thanks for the tutorial, great video. When creating the Topic Model charts is there a way to get the topic added rather than numbering them.

  • @stevydibala3928
    @stevydibala3928 2 года назад +1

    Dear, could you assist to choose the optimal number of topics (k)? Thanks

    • @DataCentricInc
      @DataCentricInc  2 года назад

      There are several schools of thought on selecting k. I do not have a tutorial at this moment however see link below with different recommendations on how to approach this.
      stackoverflow.com/questions/17421887/how-to-determine-the-number-of-topics-for-lda

  • @mollymurphey4526
    @mollymurphey4526 7 месяцев назад

    What would I do if I have my text data as lines in a csv file?

  • @syedabdullah-jk6oe
    @syedabdullah-jk6oe Год назад

    Hello. Respected Mrs./Ms. Quite informational video. I had a question. If we were to run the LDA on abstract corpus accessed in .csv format from academic databases such as Scopus or WOS, question arises how can we run it? Could you please demonstrate it

  • @stevydibala3928
    @stevydibala3928 2 года назад +1

    Hello, Very useful your video. I'd like to use topic modeling to understand the importance of the concept "Climate justice" on social questions of company's documents. If "climate justice" is term, what should be topics in this study?

    • @DataCentricInc
      @DataCentricInc  2 года назад

      Hi Stevy Dibala, thank you for your feedback. If you are using topic modeling, the topics will emerge from the analysis of the documents. If you watch this video to the end I discuss the topics that emerge from the documents.

    • @stevydibala3928
      @stevydibala3928 2 года назад

      @@DataCentricInc wow! Great your speed feedback. I’ll look at to the end. Tks.

    • @stevydibala3928
      @stevydibala3928 2 года назад

      For the various documents to analyse, i’m planning to get my documents about use of AI on « Factiva » since i’ll have a lot documents, is the best way to load in r only the URL of all documents found or single document?

    • @DataCentricInc
      @DataCentricInc  2 года назад

      I would suggest you load the documents similarly to have I have loaded them in R. You can visit my webpage for the code. It is available in the description of the video.

  • @alancelaya3123
    @alancelaya3123 Год назад

    Did someone have a problem applying the LDA model? Because I had it and I am still trying to figure out how to fix it...

  • @ShahanShawkat
    @ShahanShawkat 2 года назад

    Hi again! I just noticed there is a new package available in CRAN 'fastTopics' if you manage some time will you please make a video on how to do Topic Modeling using fastTopics! Thanks again for your kind efforts! 🤝

  • @Meena_HSK
    @Meena_HSK 2 года назад

    hi~ thank you for your video, could you tell me why the topic number k=4?

    • @DataCentricInc
      @DataCentricInc  2 года назад

      I choose how many topics I wanted to look for based on the number of pdf documents I had.

  • @briantheworld
    @briantheworld Год назад

    Hello! I have a question.. is there a way to implement LDA in other languages? I'm trying to applied to Italian Reviews from the web

  • @tobip7631
    @tobip7631 Год назад

    Hi, thanks for your video! I'm trying to implement LDA for customer reviews. Would that be the same process in R?

  • @finix-z3o
    @finix-z3o 2 года назад +1

    great tutorial Opal. Is there a scientific way of properly determining K or the number of topics>

    • @DataCentricInc
      @DataCentricInc  2 года назад

      There are several schools of thought on selecting k. I do not have a tutorial at this moment however see link below with different recommendations on how to approach this.
      stackoverflow.com/questions/17421887/how-to-determine-the-number-of-topics-for-lda

    • @finix-z3o
      @finix-z3o 2 года назад

      Thanks Opal

  • @audukafwa1555
    @audukafwa1555 2 года назад +1

    Hi Opal, thank you for the nice work you are doing. Your codes are simple and easy to work with.
    please, when plotting the chart for beta, I got this error: Error in `geom_col()`:
    ! `mapping` must be created by `aes()`. please help. Thank you

    • @DataCentricInc
      @DataCentricInc  2 года назад +1

      Hi Audu You can access the the full code from the link in my video description.

  • @Taika_
    @Taika_ 3 года назад +2

    Hi Opal, thank you for creating this content. I'm learning how to implement LDA on my dataset which consists of social media posts. Should I remove hashtags in the cleaning process or would those hashtags actually contribute to the resulting topics?

    • @DataCentricInc
      @DataCentricInc  3 года назад +1

      I would include the hashtags as they contribute to the overall pattern of the topics.

  • @stevydibala3928
    @stevydibala3928 2 года назад

    Dear, I was trying the LDA with my pdf documents and the result of beta is coming out not with word on term column but numbers. For information, the document has been previously cleaned out.
    topic term beta

    • @DataCentricInc
      @DataCentricInc  2 года назад

      Hi Stevy, Did you remove numbers when you were cleaning the document?

  • @finix-z3o
    @finix-z3o 2 года назад

    I used a csv twitter data, and followed the tutorials, however, when I run "beta_topics" i dont see terms in the "term" column, mine appears as numbers. Kindly assist me with this

    • @DataCentricInc
      @DataCentricInc  2 года назад

      Hi Banny
      Did you remove numbers when cleaning the corpus? refer to the code below to remove numbers.
      document

    • @finix-z3o
      @finix-z3o 2 года назад

      @@DataCentricInc I did remove numbers and follow the code

  • @ahmetkurtoglu6415
    @ahmetkurtoglu6415 Год назад

    How did you set topic number as 4? Is it arbitrary?

  • @nejc8316
    @nejc8316 2 года назад

    Hi again, I also have one question: how to add Slovenian stopwords in R? Do you know it maybe? Thank you so much.

    • @DataCentricInc
      @DataCentricInc  2 года назад

      You would have to specify that you are the language for the stopwords. Once it is supported you can add Slovenian stopwords. You can check out this link to one of my videos that speak about the different languages as well. ruclips.net/video/oKTG5ulP3wQ/видео.html

  • @wendyimamf
    @wendyimamf 2 года назад +1

    Hi Opal, great tutorials. Thankyou for sharing. Newbie here. It will be enhance my knowledge about Topic Modeling. Just quick question. Its quite technical. Its about DocumentTermMatrix functions, my results on term column does not shown any words as appeared, it just only numerical data in beta_topics. It is from the Document Matrix? I used one of two different function in determine the DTM (1. TermDocumentMatrix and 2. DocumentTermMatrix) I tried both with the same output. :(

    • @DataCentricInc
      @DataCentricInc  2 года назад

      Hi Wendy Imam, you can try reducing the sparsity of the Document Term Matrix to show only terms that appear for a significant count. For example in the code below I reduce the sparsity in the Document Term Matric to only include terms that appears at least 100 times. You can reduce the number. This should get rid of the numbers from your analysis. Hope this helps.
      DTM

  • @punchpartea
    @punchpartea 2 года назад +1

    Thank you so much for this video! This is a HUGE help since I need to do topic modeling for my dissertation. I followed the code directly from the blog and was able to replicate both plots with my own data.
    I have a question about term frequency (tf-idf values) and removal of sparse terms, are these already incorporated in the model? Sorry if this is such an obvious question. I'm still trying to wrap my head around all the R text mining tutorials and papers. I find your tutorials the simplest to follow, so I figured might go ahead and ask. Thank you again!

    • @DataCentricInc
      @DataCentricInc  2 года назад +1

      You are welcome Cep Garcia, no I did not remove sparse terms in this tutorial. However you can achieve this by including the following after you create the Document Term Matrix. The first line reduce the matrix to top 100 most frequent terms. You can play around with that number based on what you are trying to achieve. This will reduce the sparsity in the text. Please note that each time you change the number you must recreate the DTM and then reduce it.
      DTM

    • @punchpartea
      @punchpartea 2 года назад +1

      @@DataCentricInc Thank you so much Dr. Opal, it did help! I like how you said to "play around" with it. It is definitely challenging, but also fun!

  • @christianmoreno7060
    @christianmoreno7060 2 года назад

    Hi, Opal wanderful video, I'm trying to determinate if a "abstract" or topic could be more common in one univeristy or other. I have 90 abstract, titles and keywords from 4 universties. I want to know if they abstracts have some relantionship among them or in base of the university that they were made. For example the topic modelling could agroup this abstracts into four groups: university 1, university 2... sorry for my english I'm not a native speaker.

  • @nejc8316
    @nejc8316 2 года назад

    Hi Opal, I have this trouble. When I want to run > DTM

    • @DataCentricInc
      @DataCentricInc  2 года назад

      I have never observed that error so i am not certain of the resolution however here is a link to a page that is discussing the Rcpp issue: support.rstudio.com/hc/en-us/articles/4415936301335-Resolving-Rcpp-precious-remove-error

  • @moonisshakeel6350
    @moonisshakeel6350 2 года назад

    Hi ,first up thankyou for such wonderful and extremely useful video.
    Now my query is that I have 146 pdf files of research papers which I have merged togather into a single file. Now please help me to clean the file and to get the bigrams analysis of the same. Your help will much appreciated.thank you

    • @DataCentricInc
      @DataCentricInc  2 года назад +1

      Hi Moonis I will be working on content to show how to execute this so you can look out for it.

    • @moonisshakeel6350
      @moonisshakeel6350 2 года назад

      @@DataCentricInc thank you

  • @sahil_shrma
    @sahil_shrma 3 года назад

    Hi, I liked your video. It is very useful, thank you so much.
    Could you help me in one more small thing?
    What Code can I use to delete additional texts from the corpus after I have applied stopwords("en").

    • @DataCentricInc
      @DataCentricInc  3 года назад +1

      Hi Sahil, thanks for your support. You can also watch this video that shows how to create a custom list of stop words. The description of the video will also take you to the code on my webpage. ruclips.net/video/oKTG5ulP3wQ/видео.html

    • @sahil_shrma
      @sahil_shrma 3 года назад

      @@DataCentricInc Thank you so much !!

  • @zahidhussain-zf1iy
    @zahidhussain-zf1iy 2 года назад

    Most beautiful

  • @itumelengmosala5335
    @itumelengmosala5335 3 года назад +1

    I am finding your videos so instructive and helpful but am still having errors when I load documents:

    • @DataCentricInc
      @DataCentricInc  3 года назад

      Hi itumeleng, what error are you getting and what exactly are you trying to do?

    • @itumelengmosala5335
      @itumelengmosala5335 3 года назад

      @@DataCentricInc Thanks for replying. I am trying to follow your videos on text analytics on pdfs. The code is not loading them. lapply function is giving error

    • @itumelengmosala5335
      @itumelengmosala5335 3 года назад

      I have an urgent academic paper I am writing on texts about Land in the Bible and in South Africa. I have created a folder of pdf texts but cannot load it using the code you suggesting

    • @DataCentricInc
      @DataCentricInc  3 года назад

      @@itumelengmosala5335 Did you store the PDF file in the same folder you are working in for R? I really would need to see a screenshot of your R studio and the code. If you want you can email me at DataCentricInc@gmail.com

  • @ambarksaudi4051
    @ambarksaudi4051 2 года назад

    Hi, I am coducting text mining for academic articles and approved pdf docs from government website and i found some pdfs are protected how can i perform text mining. can we have a meeting in zoom to discuss this in more details
    Thx

    • @DataCentricInc
      @DataCentricInc  2 года назад +1

      By protected you mean that they have passwords?

    • @AsdEgyptAsdEgypt
      @AsdEgyptAsdEgypt 2 года назад

      @@DataCentricInc yes they have passwords

  • @harmandeepsingh8903
    @harmandeepsingh8903 2 года назад

    Hii mam, will u create a video on pdf text analytical for a specific keywords, like for example social responsibility in company annual report
    Thanks

  • @ritunagpal6184
    @ritunagpal6184 2 года назад

    Could you please identify the error
    doc_gamma.df doc_gamma.df$chapter ggplot(data=doc_gamma.df, aes(x=chapter, y= gamma,
    + group= factor(topic),color=factor(topic)))+
    + geom_line()+facet_wrap(~factor(topic),ncol=1)
    Error in FUN(X[[i]], ...) : object 'chapter' not found

    • @DataCentricInc
      @DataCentricInc  2 года назад

      It means that the variable chapter does not exist, Ensure that the line that creates the variable was executed.

  • @nejc8316
    @nejc8316 2 года назад

    Hi. I find this video very useful. I have one question. In my research, I am analysing comments from social media. But I organised the data as follows and I would be happy if you could help. In the Excel document, I have the authors of the comment written in one column, and I have the content of the comment written in the other column. So I have approx. 4,000 rows and each row has two columns - one for the author and one for the comment. I had all of these comments “separate” in my document, but I wanted to combine them. I obtained each group of comments from individual web portals (eg. Facebook posts, comments under articles, Reddit debates, ...). And I combined all these documents of comments into two columns. So now all comments are written in one column. That's my corpus (it is binary now - all the comments in one row). Can I use the LDA in the R program on this data set? Or do comment groups need to be separated into individual documents for the LDA method? I hope my question is clear, thank you so much.

    • @DataCentricInc
      @DataCentricInc  2 года назад

      I believe you should be able to proceed with everything combined.