How to automatically extract major themes(Topics) from your text data |Python | NLP

Поделиться
HTML-код
  • Опубликовано: 9 фев 2025
  • #TopicModelling #Python #DataScience #LDA #Hands-on #Tutorial
    This video shows how to perform topic modelling in python using the LDA techniques in the Gensim library.
    Github:-github.com/raa...
    Twitter:- @dataraaga

Комментарии • 38

  • @ASHearn90
    @ASHearn90 3 года назад +4

    Seriously amazing. I've been DIY text analysis for 2 years, this is hands down best example I've seen. Thank you !!

  • @jessicap4527
    @jessicap4527 2 года назад +1

    I've been looking everywhere for this exact type of tutorial! Not only did you explain to us the needed libraries and processes, but ALSO how we can actually MODEL the data all in one! You rock.
    p.s I love you lol

  • @hh3739
    @hh3739 Год назад

    this is a really good way of keywords clustering and text analysis

  • @mohitagarwal3679
    @mohitagarwal3679 3 года назад

    Nice work Raghav. Keep Going 1

  • @kannankarmegam-bne
    @kannankarmegam-bne 5 месяцев назад +1

    You should share your journey / roadmap on how you started with Phyton

    • @datasciencewithraghav
      @datasciencewithraghav  5 месяцев назад

      sure I will work on that video. thank you for your suggestion :)

  • @eggy7112
    @eggy7112 4 года назад +1

    Great video, explanation is really clear and thank you for posting the notebook on your github also!

  • @KirillSimin
    @KirillSimin 4 года назад

    Thank you! This tutorial and the Github repo helped me with topic extraction. Very clear.

  • @bencampos8141
    @bencampos8141 4 года назад

    Thanks Raghav for the video. It was very informational and helpful.

  • @sinisterrblade449
    @sinisterrblade449 2 года назад

    thanks for the explain but what software do you recommend

  • @narijami
    @narijami 2 года назад

    Hey. Thank you for your nice video. I have a question. I have course book with several chapters in English. Each chapter has a summary. But few chapters does not have any. I want to create 20 short cast videos for this course book. My goal is to create title for each video from these summaries. I would like to know how to automate the process using nlp and python? The point is that keywords or titles should be unique for each video in each chapter and should not be the same either with other video titles in other chapters. Will be great to help me.

  • @derektathgur
    @derektathgur 3 года назад

    How could i integrate this with network theory? where each row of the dataframe has an author/publication/format/month-year. Can i find a matrix of similarity between authors/publications/formats/month-year (i.e. whether newspaper, magazine or blog etc) etc?

  • @ammyarora5088
    @ammyarora5088 4 года назад

    It was great learning but I didn't get the part of how to assign the topics to each tweet document I have in corpus after performing the lda on the corpus. As I need to filter the tweets corpus topiwise. Can you shed some light over this?

  • @tuneinsight8497
    @tuneinsight8497 3 года назад

    Could this be applied on document intelligence where i have to extract clauses specific to each topic like discount, pricing etc from the new document uploaded with high accuracy score ??
    I want to connect with you regarding this

  • @billwallis30
    @billwallis30 3 года назад +1

    Note that this is using the pyLDAvis package at (presumably) version 2.1.2 and will not work for later versions -- you can install the older version with
    pip install pyLDAvis==2.1.2

  • @sangitamodi7452
    @sangitamodi7452 3 года назад

    Thank you....How to generate data to process, how to extract and store data in CSV file and accessories for process

  • @shashankgehlot4087
    @shashankgehlot4087 4 года назад

    Can we use LDA to cluster food categories based on the ingredients?
    I have of multiple food columns and categories (spicy,sweet...etc) and respective list of ingredients.
    Ingredients can be common to multiple foods.

  • @lifeofmateumalatji
    @lifeofmateumalatji 3 года назад

    Which metrics do we use to evaluate the perfromance of the LDA model?

  • @PankajSavaliyaGoogle
    @PankajSavaliyaGoogle 3 года назад

    Awesome :)

  • @karthickperumal9733
    @karthickperumal9733 2 года назад

    Hi, Thanks for the detailed explanation on LDA. Any idea what is archetypal LDA?

  • @boussalemmohamed9615
    @boussalemmohamed9615 4 года назад

    First of all, thank you a lot for this video, the second thing, i have a question, how can I use a list of words to indicate that this set of words constitute the target topic that I woud extract its sentances ?.

  • @AkshayA86
    @AkshayA86 3 года назад +2

    Thank you for sharing this. One question - With the training dataset we do already have topic labels of corona and iphone assigned. After training the model, when the model is executed on the actual data, how to get these topics(corona or iphone) assigned to the actual articles(which we want to know the topic of)? Looking forward to your response.

    • @josephtran1500
      @josephtran1500 3 года назад +2

      Hi Akshay, I was wondering the same. Upon closer look at the documentation in states 'Gensim focuses on unsupervised models so that no human intervention, such as costly annotations or tagging documents by hand, is required.' So, no. These are unsupervised models so no labels are needed. There are tutorials and trainings on the gensim website. I am going through them now, they look really helpful

  • @cgrdna7921
    @cgrdna7921 3 года назад

    Hey nice explanation...Had a query...if I wanted to extract the top topics for each article...maybe like top 3 tags in each article...How can I do that...I need to extract that along with the category scores without any threshold. I am having problems doing the first part.

  • @tazrinkhan1297
    @tazrinkhan1297 3 года назад

    Thank you for this tutorial. I was trying this with CSV file. Since I did not use pkl file like you, it was giving me a Keyword error for "text". Can you please suggest to me how to solve this issue?

  • @eilis401
    @eilis401 3 года назад

    sir i want to ask you can you make one video on making corpus

  • @gedefayeachamu3353
    @gedefayeachamu3353 3 года назад

    really interesting tutor it is. I am doing a scientometrics analysis and topic modeling for the Title and Abstract of publication I retrieved and stored in reference manager using both LDA and BerTopic, but I face difficulty to run the dynamic analysis( topic evolution). How could I do that?

  • @luismontero3416
    @luismontero3416 2 года назад

    👏👏👏👏👏👏

  • @MrShriniketpatil
    @MrShriniketpatil 3 года назад

    How to deal with PDF data and is there a walkthrough document for it?

    • @datasciencewithraghav
      @datasciencewithraghav  3 года назад

      You can try to first convert pdf to text file then apple these techniques. There are multiple libraries in python for reading in pdf.

  • @alwaysWannaFlai
    @alwaysWannaFlai 4 года назад

    What is the best way to find out how good our clustering is?

    • @datasciencewithraghav
      @datasciencewithraghav  4 года назад

      I know for K-means clustering you can check the elbow chart to find the optimal clustering number. But evaluating overall clustering might be difficult to do automatically.

  • @elyelena1002
    @elyelena1002 3 года назад

    Code is cool, but you have huge amount of errors in the code. I am currently stuck on this error NameError: name 'chain' is not defined But thank you for the code.