this guy just casually switched from Psychology to this and already has 100000 times a more in depth understanding than I do of DS concepts as well as coding, and I did a whole masters in DS...conclusion....im not very smart...he is very smart
Thanks to both of you Jay and Maarten, for doing such generous tutorial. Special Gratitude to Maarten, for your contribution to the computing community. WOW!
Dear Maarten, how are the topic embeddings calculated (I supposed they came from the document embeddings in Step 1?) for the Topic Similarity measure in the [visualize_heatmap] function?
The use is to 'tokenize' the learned clusters. So to get a bag-of-words representation you can see on the slide at 36:00. Which is what you need in order to apply the cTF-IDF thing, to extract topic words that represent the topic as a whole. So it has nothing to do with preparing the data for training, but with making nice topic representations of the clusters, that have been found by the cluster algorithm of choice.
@@fireworker8205 Hi bro , I'm going to Cluster & Analyze few RUclips Vdos to clustering the Video into Various Topics. If is there any Unstructured Data Emojis Pic. then how to handel that data. First do I need remove that Unstructured data then proceed with Embedding or goinh with that all data ?
this guy just casually switched from Psychology to this and already has 100000 times a more in depth understanding than I do of DS concepts as well as coding, and I did a whole masters in DS...conclusion....im not very smart...he is very smart
Thanks to both of you Jay and Maarten, for doing such generous tutorial. Special Gratitude to Maarten, for your contribution to the computing community. WOW!
Such a thoughtful speaker..
I just used bertopic in my conclusion project. Incredible framework, very versatile and the default algorithms worked very well.
Amazing package, have used it on email topic clustering
Really enjoyed this!
Awesome, used it on nps, I believe future use it on medical records on any area
Great Session ✋
Fascinating 👍
Suggestion for next time: classification
I was just working on doing something like this in Julia. I wasn’t aware that BERT was already there.
Great talk! Thank you for sharing your knowledge and work with us!
Dear Maarten, Amazing package!
are there techniques to automatically label topics?
Awesome presentation, can you share please notebook as well
Does BERTopic need preprocesing like lemmatization, tokenization and removing stopwords?
Amazing presentation!
Great video!
where I can learn all of this BERTopic as mathematical procedure not computational?
Amazing presentation. The notebook was shared ?
Dear Maarten, how are the topic embeddings calculated (I supposed they came from the document embeddings in Step 1?) for the Topic Similarity measure in the [visualize_heatmap] function?
Thanks for this awesome explanation. I am a beginner in Data science field. What's the use of Count Vectorizer here?
I haven’t watched the video fully but I’m assuming that it’s used to convert words into numbers for the model to be able to train on.
@@amnahebrahim3325 I thought tf-idf is doing that
The use is to 'tokenize' the learned clusters. So to get a bag-of-words representation you can see on the slide at 36:00. Which is what you need in order to apply the cTF-IDF thing, to extract topic words that represent the topic as a whole. So it has nothing to do with preparing the data for training, but with making nice topic representations of the clusters, that have been found by the cluster algorithm of choice.
@@fireworker8205
Hi bro , I'm going to Cluster & Analyze few RUclips Vdos to clustering the Video into Various Topics. If is there any Unstructured Data Emojis Pic. then how to handel that data. First do I need remove that Unstructured data then proceed with Embedding or goinh with that all data ?
No. I am not CO