BERTopic for Topic Modeling - Maarten Grootendorst - Talking Language AI Ep#1

Поделиться
HTML-код
  • Опубликовано: 6 сен 2024

Комментарии • 27

  • @adventurerwannabe
    @adventurerwannabe Год назад +15

    this guy just casually switched from Psychology to this and already has 100000 times a more in depth understanding than I do of DS concepts as well as coding, and I did a whole masters in DS...conclusion....im not very smart...he is very smart

  • @TowhidIslam
    @TowhidIslam 4 месяца назад

    Thanks to both of you Jay and Maarten, for doing such generous tutorial. Special Gratitude to Maarten, for your contribution to the computing community. WOW!

  • @oostopitre
    @oostopitre Год назад +4

    Such a thoughtful speaker..

  • @rvian4
    @rvian4 11 месяцев назад

    I just used bertopic in my conclusion project. Incredible framework, very versatile and the default algorithms worked very well.

  • @WouterSuren
    @WouterSuren Год назад +4

    Amazing package, have used it on email topic clustering

  • @connor-shorten
    @connor-shorten Год назад +4

    Really enjoyed this!

  • @marcelosilvadasilva4547
    @marcelosilvadasilva4547 Год назад +2

    Awesome, used it on nps, I believe future use it on medical records on any area

  • @deepakwalia9878
    @deepakwalia9878 Год назад +3

    Great Session ✋

  • @tariqnahmad
    @tariqnahmad Год назад +3

    Fascinating 👍

  • @tariqnahmad
    @tariqnahmad Год назад +3

    Suggestion for next time: classification

  • @Stopinvadingmyhardware
    @Stopinvadingmyhardware Год назад

    I was just working on doing something like this in Julia. I wasn’t aware that BERT was already there.

  • @MadMads-hp8ug
    @MadMads-hp8ug Год назад

    Great talk! Thank you for sharing your knowledge and work with us!

  • @datacamaraderie3527
    @datacamaraderie3527 Год назад

    Dear Maarten, Amazing package!

  • @fernigasos3320
    @fernigasos3320 Год назад +3

    are there techniques to automatically label topics?

  • @ibragimsadikov3194
    @ibragimsadikov3194 Год назад +3

    Awesome presentation, can you share please notebook as well

  • @raziehfadaei4801
    @raziehfadaei4801 6 месяцев назад

    Does BERTopic need preprocesing like lemmatization, tokenization and removing stopwords?

  • @BernardoGarciadelRio
    @BernardoGarciadelRio Год назад +1

    Amazing presentation!

  • @luka7626
    @luka7626 5 месяцев назад

    Great video!

  • @PolymetricMonogon
    @PolymetricMonogon 5 месяцев назад

    where I can learn all of this BERTopic as mathematical procedure not computational?

  • @guimaraesalysson
    @guimaraesalysson Год назад +1

    Amazing presentation. The notebook was shared ?

  • @datacamaraderie3527
    @datacamaraderie3527 Год назад

    Dear Maarten, how are the topic embeddings calculated (I supposed they came from the document embeddings in Step 1?) for the Topic Similarity measure in the [visualize_heatmap] function?

  • @ankitrohilla11
    @ankitrohilla11 Год назад +1

    Thanks for this awesome explanation. I am a beginner in Data science field. What's the use of Count Vectorizer here?

    • @amnahebrahim3325
      @amnahebrahim3325 Год назад +1

      I haven’t watched the video fully but I’m assuming that it’s used to convert words into numbers for the model to be able to train on.

    • @ankitrohilla11
      @ankitrohilla11 Год назад +1

      @@amnahebrahim3325 I thought tf-idf is doing that

    • @fireworker8205
      @fireworker8205 Год назад

      The use is to 'tokenize' the learned clusters. So to get a bag-of-words representation you can see on the slide at 36:00. Which is what you need in order to apply the cTF-IDF thing, to extract topic words that represent the topic as a whole. So it has nothing to do with preparing the data for training, but with making nice topic representations of the clusters, that have been found by the cluster algorithm of choice.

    • @Mrroy08657
      @Mrroy08657 6 месяцев назад

      ​@@fireworker8205
      Hi bro , I'm going to Cluster & Analyze few RUclips Vdos to clustering the Video into Various Topics. If is there any Unstructured Data Emojis Pic. then how to handel that data. First do I need remove that Unstructured data then proceed with Embedding or goinh with that all data ?

  • @Stopinvadingmyhardware
    @Stopinvadingmyhardware Год назад

    No. I am not CO