NLP Demystified 9: Automatically Finding Topics in Documents with Latent Dirichlet Allocation

An Introduction to Topic Modeling

Word Embeddings, word2vec, Sentiment Analysis up to BERT

Race Highlights | 2024 United States Grand Prix

Mets vs. Dodgers NLCS Game 6 Highlights (10/20/24) | MLB Highlights

aespa 에스파 'Whiplash' MV Teaser

A Gentle Introduction to Topic Modelling, Latent Semantic Analysis and Latent Dirichlet Allocation

Joshgun Sirajzade

Просмотров 3,3 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 20 окт 2024

Комментарии • 11

@joeybasile1572 5 месяцев назад ⁺¹
Great video. The music is groovey and actually not distracting once I got into the video.
@bilmezonlar 2 года назад
One of my favorite topic, thanks.
@joshgun_sirajzade 2 года назад
😂😂 It is great to hear! Tell me, what is your background in? I mean, what do/did you study? Are you more interested from humanities point of view or more from computer scientific?
@ozycozy9706 2 года назад ⁺²
Just wonder if publishing companies are using this technique for taxonomy.
@joshgun_sirajzade 2 года назад
Well, I guess it depends on particular publishing company. However, I would definitely say yes in the broader sense, and not only publishing companies, but also many others like internet companies, law firms, cords and many others. The thing is that the phenomenon giving us the topics can be used in all other things, like search, clustering and classification. For example, like I say in my video about document term matrix, it can be used both for identifying similar documents or similar words. Believe me when I say that the modern algorithms like BERT which is in use in google search engine leverages the same or similar phenomenon although with a different techniques (here deep learning). So, the question comes down to which specific algorithm and which purpose. Presenting topics like words (or terms) in a cloud is just the tip of ice and is more or less famous, however some may see it also critically and go to more standard taxonomy like "field names", "disciplines" and so on. You can also find some information in my newest publication: www.springerprofessional.de/deep-mining-covid-19-literature/23606068
@ozycozy9706 2 года назад ⁺¹
@@joshgun_sirajzade Thanks for sharing the link.I really like springprof books. I am big fan of SVD, as I remember that is being used in NLP too. need to remember those :)
@joeybasile1572 5 месяцев назад ⁺¹
I'd say that the order of the words in a document do matter for topic generation... that is, for precision. Sure, we can generalize and do so accurately. We can get the bins that are categories, but precision involves the subsets, yeah? And the order of the words can orchestrate particular meaning. This is how my mind works, at least.
Yet, when we are attempting to automate this process, it may seem unncessary. But, I think that the machine's inference/categorization capabilitiy is of course increased when considered, as, for example, it could point out incoherence of a document, even if it has key words that would place it in a particular topic/multiple topics (depending on what the hell you're doing, I suppose).
Please do let me know your thoughts. I'm wanting to learn more about this space.
@joshgun_sirajzade 5 месяцев назад ⁺¹
Thank you for the nice comment! Your thoughts are absolutely correct. I think, even today there is no consensus for a definition of „topics“, whether in Computer Science nor in the Humanities (especially in linguistics for example where there are much more precise definitions for „words“, „sentences“ or „text“). The most common idea is that words in a text express topics. The order of words was always a debate. Whilst it can give a topic more precise shape (like with what words you start and end might be relevant), ignoring it generalizes more and creates less number of topics, which might be handy if someone has to look at the topics or for inferencing as you correctly pointed out. Moreover, the latest algorithms like Transformers (I might create a video about that, too) which are used in chatbots, make indeed a great use of the word oder, wich are called there positional embeddings. That is why when chatting they can find the exact topic you are talking to them. So, I guess, it would be more beneficial to consider word order. Another point is as you might have already guessed, topic modeling is intimately related to other techniques of text mining and language modeling. Considering them all together might make answers to such questions easier. For that, my video about Document-Term-Matrix can be helpful. Thank you again for a great question and don't hesitate to ask new ones.
@carneirouece 2 года назад ⁺¹
:)))
@ah1548 2 года назад
Ok, it's very, very gentle - fine. But please, for the love of God, don't use background music!
@joshgun_sirajzade 2 года назад ⁺²
Thank you so much for the feedback! I did not know or consider that the music can be distractive.🤣I will try to put the next videos without it...

Следующие

Автовоспроизведение

NLP Demystified 9: Automatically Finding Topics in Documents with Latent Dirichlet Allocation

NLP Demystified 9: Automatically Finding Topics in Documents with Latent Dirichlet Allocation

An Introduction to Topic Modeling

An Introduction to Topic Modeling

Word Embeddings, word2vec, Sentiment Analysis up to BERT

Word Embeddings, word2vec, Sentiment Analysis up to BERT

Race Highlights | 2024 United States Grand Prix

Race Highlights | 2024 United States Grand Prix

Mets vs. Dodgers NLCS Game 6 Highlights (10/20/24) | MLB Highlights

Mets vs. Dodgers NLCS Game 6 Highlights (10/20/24) | MLB Highlights

aespa 에스파 'Whiplash' MV Teaser

aespa 에스파 'Whiplash' MV Teaser

Mystery Detective Battle: HALLOWEEN

Mystery Detective Battle: HALLOWEEN

LDA Topic Models

LDA Topic Models

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Latent Dirichlet Allocation (LDA) | Topic Modeling | Machine Learning

Latent Dirichlet Allocation (LDA) | Topic Modeling | Machine Learning

How AI 'Understands' Images (CLIP) - Computerphile

How AI 'Understands' Images (CLIP) - Computerphile

Latent Semantic Analysis | Back Talks

Latent Semantic Analysis | Back Talks

Topic Models: Introduction

Topic Models: Introduction

Latent Dirichlet Allocation (LDA) with Gibbs Sampling Explained

Latent Dirichlet Allocation (LDA) with Gibbs Sampling Explained

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

Я не могу снимать видео, купить айфон 16 теперь ? 🦍

Я не могу снимать видео, купить айфон 16 теперь ? 🦍

She Couldn’t Believe I Did This! 😭 #shorts

She Couldn’t Believe I Did This! 😭 #shorts

Россия захватывает полезные ископаемые, превращая Украину в недодержаву! | Арестович | Канал Центр

Россия захватывает полезные ископаемые, превращая Украину в недодержаву! | Арестович | Канал Центр

感觉要去见太奶了！！有同款宝爸宝妈吗？ #看一遍笑一遍 #宝爸带娃 #人类幼崽 #亲子日常 #露兮粑粑

感觉要去见太奶了！！有同款宝爸宝妈吗？ #看一遍笑一遍 #宝爸带娃 #人类幼崽 #亲子日常 #露兮粑粑

Millennial (1972-1995) or GEN-Z (1996-2012)? 💗 Subscribe for #fashion #shorts

Millennial (1972-1995) or GEN-Z (1996-2012)? 💗 Subscribe for #fashion #shorts

Как Меня Выгнали с УЧИЛИЩА (анимация)

Как Меня Выгнали с УЧИЛИЩА (анимация)

【斗罗大陆】唐老六电量不足，谁能来救救她啊！#斗罗大陆#唐老六#小舞

【斗罗大陆】唐老六电量不足，谁能来救救她啊！#斗罗大陆#唐老六#小舞

Fake Referee Whistle Moments 😅

Fake Referee Whistle Moments 😅