Topic Modeling with Llama 2

Maarten Grootendorst

Просмотров 17 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 янв 2025

Комментарии • 57

@varshakcolab5460 26 дней назад ⁺¹
you're doing a great job Maarten; thank you for this video!!
@SankalpChandel Месяц назад ⁺¹
Super awesome.. really love the content. keep making such content. Thanks
@redfield126 Год назад ⁺²
It works like a charm on my data. Topic label is now really meaningful and then highly more useful. You made my data Maarteen. Now i need to include those to my embeddings for semantic search and I think I am good to go. thanks a lot. Eager to check your book that is coming (cf. the link in your descriptiopn)
@jackhales6179 10 дней назад
Pretty incredible. Great coverage!
@redfield126 Год назад ⁺²
I am a big fan of BERTopic and what you are proposing with llama2 looks solving part of my challenges. I dedicate my next night on testing it ! Thanks for all the great work so far and sharing this with the community. You are a Man.
@redfield126 Год назад
As promised, I am in the train. First thing first, there is no surprise. the tutorial combo with this video, the colab and the dedicated tutroial page is just perfect and educational as usual Maarteen. I like the integration with Llama2 as a new representation model and the possibility to leverage quantization. I was afraid of not being able to run your experiment on my desktop. You made my day allowing using 4bits ! Now the result is really really promising. This is exactly the type of challenge I was facing with previous topics like with KeyBERT. There are interesting but prone to interpretation and question loops with end users. This time, with llama2 I have the feeling we have the flexibility and versatility we need to guide the topics generation as we need. Really elegant implementation. Thank you Sir ! Next step for me is to test on my use case. ! Exciting
@imyhull4923 Год назад ⁺¹
Been following BERTopic from the beginning and used it many times along with KeyBERT for work projects and personal projects. Always struggled with the interpretation of topics at the end of the process, but this looks like a great solution. Looking forward to getting your book now. Thanks so much for the tutorial!
@AlokKumar-fi8qh Год назад ⁺¹
I have seen all three videos. Loved all. Absolutely Gold
@wenqianzhou9174 Год назад ⁺³
Please produce more content ! Love BERTopic 💯
@asifsiddiqui1058 Год назад
THANK YOU MAARTEN THIS HAS TUTORIAL HAS MADE MY LIFE LOT EASY TO FINISH MY PROJECT SUCCESSFULLY!
@Shubhi021 Год назад
Thank you for sharing this! Detailed, super informative and very helpful.
@tlerksuthirat Год назад ⁺¹
Thank you very much for uploading this video. It is very useful for our research work. Really appreciated your works and dedication :)
@grrdmeester1 11 месяцев назад
Heel goed uitgelegd Maarten. Heel inspirerende video. En geweldig dat ik zelf in Google Colab hands on met jouw voorbeeld aan de gang kan. Heb al wat geprobeerd op mails (alleen nog gebruikmakend van BERTopic, zonder Llama) en resultaat is veelbelovend. Keep up the good work! Ik ben op mijn werk al een ambassadeur van BERTopic
@natalietran6382 Год назад
Thank you for this informative tutorial! It is really easy to understand and I am ready to implement it.
@raymondkusch214 Год назад
This is great! Thank you for providing this to the community.
@MrSuperGerald Год назад
Thank you, Marteen! Looking forward to your next videos.
Some on federated learning would be great too.
@MaartenGrootendorst Год назад
That's a good one! I work a lot with federated LLMs nowadays, so I'll keep it in mind 😀
@harish2985 10 месяцев назад
This is exceptionally useful. thanks a lot !
@oleksiipanasenko4987 Год назад
Thank you, Maarten! Your video and explanation are perfect
@BatBallBites Год назад
Perfect, Thanks for this video as I tried so much to get access of your mediam article but was not able to read it because the content was for the premium users only having paid subscriptions, Thanks I was looking for something like this for my solution I will surely try this one
@ernestosantiesteban6333 11 месяцев назад
Great video. You save my work.
@tyrealq 11 месяцев назад
Great video. Thank you so much sir!
@Fritz0id Год назад
Fascinating! Can’t wait to try this
@bermus0101 Год назад
Really appreciated your works !! thank you !!
@erick-alfaro Год назад
Hi Maarten! I've been following your work for some time and am so happy to see you start a RUclips channel. I am curious how you suggest I apply this (or something similar) to the task of identifying topic timestamps for RUclips videos?
@MaartenGrootendorst Год назад
You could use whisper to convert audio into text and feed it to Bertopic: towardsdatascience.com/using-whisper-and-bertopic-to-model-kurzgesagts-videos-7d8a63139bdf
@GB-ot2iv 5 месяцев назад
Hi Marteen, great content as always. Would it be possible to make a video on topic distribution? If I've understood well, what BERTopic does is to assign a document to a cluster of documents, hence assign a single topic to a document. What if we want to assign multiple topics? For example, an abstract can talk about sentiment analysis in medical reviews using LLMs so we want to extract at least three main topics: sentiment analysis, medical reviews, and LLMs. How do we do? Your answer would be super appreciated!
@alane1988 7 месяцев назад
This video is fantastic.
@pegahghadak5867 11 месяцев назад
Great. Thanks for sharing
@Break_down1 9 месяцев назад
Great work on this ~topic~. I’d be curious, have you tried using fuzzy clustering algorithms for separating topics? It’s likely that documents sometimes contain multiple topics
@AbhishekPandey-tk1it 8 месяцев назад
Great video.
@mauritsvanwijland2872 Год назад ⁺¹
Maarten, great video on how to use your next iteration of Bertopic and the Llama2 model. Your examples are all focused on the english language. I have tried Bertopic with Dutch documents, but it fails to generate good quality topics. Could you make a video on using Dutch or any another language?
@MaartenGrootendorst Год назад
That's a great idea! To give you a quick few tips already... using a multi-lingual embedding model is quite important for properly representing another language especially if you use KeyBERTInspired. Another trick is to remove Dutch stopwords using the CountVectorizer.
If you combine those tips together with the Best Practices, then that should already give you a head-start: maartengr.github.io/BERTopic/getting_started/best_practices/best_practices.html
@zezimabig 3 месяца назад
Could we integrate with AWS bedrock? The possibilities are endless! Thank you for your contribution to this field 😊
@wangeesadesilva6321 Год назад
This is incredible. Fantastic explanation. Thank you so much for the great content! a quick question, if we have only consists of object labels or information about objects detected in the images/video (e.g., "dog," "car," "tree," etc.), can we still use this object label information as input for BERTopic?
@MaartenGrootendorst Год назад ⁺¹
With enough documents, I think this should be no problem. Definitely worth trying out!
@researchKIL Год назад
Considering your innovative approach was a great source of inspiration for me, I'm curious about using my own data. Is it sufficient to focus on the 'abstract' column, or would it be beneficial to include a 'title' column as well? I noticed you extracted 'titles' in your example but didn't use them in the training process(I may have overlooked it.). Additionally, the model returned over 100 topics, how can I effectively control the number of topics in the analysis? Thank you again for your contribution.
@umit_00 Год назад
Thanks for the update - really insightful! Is it possible to use a GPT-3.5 API instead of local LLama-2?
@rahulkulkarni7224 5 месяцев назад
Can you please do a video on LLAMA3.1 for topic modeling and data summary [like agent - customer chat, reviews etc]
@aaroldaaroldson708 8 месяцев назад
Hi! Unrelated to this video directly, but is there a way to render the visualisation of the clusters in html and not in Jupyter notebook?
@harmpwns 8 месяцев назад
Hi Maarten, does llama also do a good job labeling dutch keywords?
@phoebeyu2566 Год назад
Hi Maarten! Thank you so much for the great content! One quick question - would you be able to have llama2 label the merged topics when doing hierarchical topic modeling?
@FatemehDehghani-k6l Год назад
Thanks for this great video. Do you think this can be done with game reviews to detect the most important components of the game?I planned to do that with LDA. However, I came across your video, and I thought that is great do to that with LLM.
@asmaaziz2436 Год назад
Definitely
@shameekm2146 Год назад
The query I have regarding this Topic Modelling is Can we use this anywhere in use case of Retrieval Augmented Generation for better fetching of relevant documents and also for better generation of answers?
@MaartenGrootendorst Год назад
You could use the constructed topics to categorize the documents that you have. By supplying these documents with additional categories, you can create additional constraints/filters for a RAG-based pipeline. Therefore, instead of having to search through all documents, it will first search the category of the question after which it selects a relevant subset based on the category. There are many more ways you can use BERTopic in RAG but this can work well if you do not have additional metadata.
@shameekm2146 Год назад
@@MaartenGrootendorst Thankyou so much. I will look into this implementation methods and possibilities.
@asmaaziz2436 Год назад
Can we use Llama 2 for german topics?
@aguntuk10 11 месяцев назад
can we do that with gpt-3.5-turbo
@jackbauer322 Год назад ⁺¹
When using Agglomerative Clustering in this workflow, I have a HUGE topic 0 with 99% if the and so on ... like he regrouped most of the documents relative to stopwords ... that only happens with Agglomerative Clustering KMeans Mini Batch is ok ...
@MaartenGrootendorst Год назад
Good that you are experimenting with clustering models. As you have noticed, they matter greatly in the construction of the topics. One generally outperforms another. I generally hear good stories about using HDBSCAN, the default clustering algorithm. Even if you do not want the outliers, then there are options for reducing or even removing them: maartengr.github.io/BERTopic/getting_started/outlier_reduction/outlier_reduction.html
@streamocu2929 Год назад
thx ❤
@gijajoy8524 Год назад
what if i want the important topics from a single custom document, will it detect
@MaartenGrootendorst Год назад
Sure, use approximate_distribution: maartengr.github.io/BERTopic/getting_started/distribution/distribution.html
@dimitripetrenko438 Год назад
Hi Maarten! I have been using Bertopic since last year it's such a useful tool! When I tried this new LLM technique I ran into a problem where keybert and MMR are working fine, but LLM generated topics are just giving me repeated non-sense words, would you have any idea why? It looks like this

[INST]
I have a topic that contains the following documents:
- How does bekanområområområområområområområområområområområområområområområområområområområområ
@dimitripetrenko438 Год назад
My bad I was being an idiot, it was a problem with prompting template

Следующие

Автовоспроизведение

BERTopic for Topic Modeling - Maarten Grootendorst - Talking Language AI Ep#1