this guy just casually switched from Psychology to this and already has 100000 times a more in depth understanding than I do of DS concepts as well as coding, and I did a whole masters in DS...conclusion....im not very smart...he is very smart
Dear Maarten, how are the topic embeddings calculated (I supposed they came from the document embeddings in Step 1?) for the Topic Similarity measure in the [visualize_heatmap] function?
The use is to 'tokenize' the learned clusters. So to get a bag-of-words representation you can see on the slide at 36:00. Which is what you need in order to apply the cTF-IDF thing, to extract topic words that represent the topic as a whole. So it has nothing to do with preparing the data for training, but with making nice topic representations of the clusters, that have been found by the cluster algorithm of choice.
@@fireworker8205 Hi bro , I'm going to Cluster & Analyze few RUclips Vdos to clustering the Video into Various Topics. If is there any Unstructured Data Emojis Pic. then how to handel that data. First do I need remove that Unstructured data then proceed with Embedding or goinh with that all data ?
this guy just casually switched from Psychology to this and already has 100000 times a more in depth understanding than I do of DS concepts as well as coding, and I did a whole masters in DS...conclusion....im not very smart...he is very smart
very productive, best 50 minutes.
I just used bertopic in my conclusion project. Incredible framework, very versatile and the default algorithms worked very well.
Such a thoughtful speaker..
Really enjoyed this!
Does BERTopic need preprocesing like lemmatization, tokenization and removing stopwords?
are there techniques to automatically label topics?
Great Session ✋
Awesome, used it on nps, I believe future use it on medical records on any area
Amazing package, have used it on email topic clustering
Dear Maarten, Amazing package!
Fascinating 👍
Great talk! Thank you for sharing your knowledge and work with us!
Dear Maarten, how are the topic embeddings calculated (I supposed they came from the document embeddings in Step 1?) for the Topic Similarity measure in the [visualize_heatmap] function?
where I can learn all of this BERTopic as mathematical procedure not computational?
Great video!
Amazing presentation!
Awesome presentation, can you share please notebook as well
Suggestion for next time: classification
I was just working on doing something like this in Julia. I wasn’t aware that BERT was already there.
Amazing presentation. The notebook was shared ?
Thanks for this awesome explanation. I am a beginner in Data science field. What's the use of Count Vectorizer here?
I haven’t watched the video fully but I’m assuming that it’s used to convert words into numbers for the model to be able to train on.
@@amnahebrahim3325 I thought tf-idf is doing that
The use is to 'tokenize' the learned clusters. So to get a bag-of-words representation you can see on the slide at 36:00. Which is what you need in order to apply the cTF-IDF thing, to extract topic words that represent the topic as a whole. So it has nothing to do with preparing the data for training, but with making nice topic representations of the clusters, that have been found by the cluster algorithm of choice.
@@fireworker8205
Hi bro , I'm going to Cluster & Analyze few RUclips Vdos to clustering the Video into Various Topics. If is there any Unstructured Data Emojis Pic. then how to handel that data. First do I need remove that Unstructured data then proceed with Embedding or goinh with that all data ?
No. I am not CO
k-means is poor man's analysis. It has little to no statistical reasoning for clustering. Works off heuristics 😓