How to Easily Find Keywords in a Document with KeyBERT in Python

Поделиться
HTML-код
  • Опубликовано: 8 сен 2024
  • 📚 What You'll Learn:
    Introduction to KeyBERT: Understand what KeyBERT is and why it's a valuable tool for digital humanities.
    Installation and Setup: Learn how to install KeyBERT and set it up with the transformer model of your choice.
    Preparing Texts: We'll walk you through preparing three different texts for analysis.
    Extracting Keywords with KeyBERT: Dive into the code to extract keywords from each text using the KeyBERT model.
    Conclusion & Further Exploration: Wrap up the lesson and explore how you can customize KeyBERT for your research.
    📖 Ideal for Digital Humanists:
    This tutorial is tailored for digital humanists who want to explore key themes, characters, or ideas in textual data. Whether you're working with literary classics or contemporary texts, KeyBERT can help you uncover the essence of the text.
    🔗 Resources & Links:
    KeyBERT: github.com/Maa...
    Textbook: intermediate-py...
    The Notebook: github.com/wjb...
    Join this channel to get access to perks:
    / @python-programming
    If you enjoy this video, please subscribe.
    ✅Be my Patron: / wjbmattingly
    ✅PayPal: www.paypal.com...
    If there's a specific video you would like to see or a tutorial series, let me know in the comments and I will try and make it.
    If you liked this video, check out www.PythonHumanities.com, where I have Coding Exercises, Lessons, on-site Python shells where you can experiment with code, and a text version of the material discussed here.
    You can follow me at:
    / wjb_mattingly

Комментарии • 24

  • @sebastianpranz1307
    @sebastianpranz1307 Год назад +2

    Thank you for sharing! Awesome tool, I have trillions of use cases in mind … think I'll start with auto-tagging my pdf library …

  • @user-me3wq7oo3d
    @user-me3wq7oo3d Месяц назад

    Hello,sorry to bother you! I want to ask you a question about the usage of Keybert. I create a excel file which contains thousands of texts,how can I extract keywords from this excel file?Thanks a lot!

  • @jesusmtz29
    @jesusmtz29 Год назад +2

    Do you have a video on fine-tuning with sentence transformers? I have text which is very narrow in domain with a lot of typos and weird jargon. But I do have identified the key features to extract. Thanks

    • @python-programming
      @python-programming  Год назад +2

      I don't! I could put one together, though! I am training a RoBERTa model from scratch right now for DNA classification. I was planning to build a lesson around that for Holocaust-specific texts, but I can do a fine-tuning video instead.

    • @python-programming
      @python-programming  Год назад

      What domain are you working with and do you have a large corpus? If so, mind DMing me on Twitter or GitHub?

    • @jesusmtz29
      @jesusmtz29 Год назад

      ​@@python-programmingthat's great

  • @sureshkumargondi4631
    @sureshkumargondi4631 11 месяцев назад

    Super easy to understand this. Thanks for making the video.
    Quick question, Can we argument these generated keywords to generate questions too? If we can,..then how we can do?
    Thank you in advance.

  • @daryladhityahenry
    @daryladhityahenry 5 месяцев назад +1

    Hi! I kind of curious... Is there any context size limit for these kind? keybert, bertopic? If not, isn't they kind of superior vs current GPT? Event though the use case is different, but in someway, it has the upperhand. Like this one, getting keyword, or bertopic to get topic model. GPT can't even do that because the context size is limited and can forget the middle part of the article..
    And if there's no context size limit, so it can be like our library? We can collect so much data, and when we need, we can search using this... Or explore using topic modelling to find out better what we want. Am I right?
    Thanks..

    • @jorgesantanabengoechea8395
      @jorgesantanabengoechea8395 4 месяца назад

      GPT is a generative model, meaning in theory the tasks it's suited to complete are different. KeyBert and BERTopic can identify the relationships between documents but if you want if you want to obtain meaningful understanding and generate a explanation or new keyword that wasn't in the original documents, you'd need a generative model.

    • @daryladhityahenry
      @daryladhityahenry 4 месяца назад

      @@jorgesantanabengoechea8395 Exactly, I want to know the relation of many documents, so I can get the big picture of the topic.
      I ask about context window because of that kind of limitation making GPT won't able to do that kind of things. But it seems BERTopic and KeyBert can do that without care about context window?

    • @jorgesantanabengoechea8395
      @jorgesantanabengoechea8395 4 месяца назад +1

      @@daryladhityahenry Yeah, the BERT model in general would be better suited for this kind of task. What I'd reccomend doing is getting BERTopic to cluster documents and then feeding the representative documents within those clusters into GPT-4 to get a more specfic view of what the whole cluster is about.

    • @daryladhityahenry
      @daryladhityahenry 4 месяца назад

      @@jorgesantanabengoechea8395 Okay2. Thanks a looot :D:D

  • @jakobkristensen2390
    @jakobkristensen2390 9 месяцев назад +2

    Great explanation! But what are some actual usecases for this? Thanks

    • @python-programming
      @python-programming  9 месяцев назад +1

      Thanks! Great question!! There are a lot, but one of the most common works with the presumption that keywords can be an initial step to summarizing documents. Maybe I can make a video on this

    • @jakobkristensen2390
      @jakobkristensen2390 9 месяцев назад +1

      @@python-programming Would love that! Maybe just as more practical inspiration, thank you

    • @python-programming
      @python-programming  9 месяцев назад

      @@jakobkristensen2390 no problem!

  • @tarik1895
    @tarik1895 3 месяца назад

    Hello thanks for thé vidéo dumb question if it IS bert based does it have thé same limitation in term of thé text size?

  • @viorelteodorescu
    @viorelteodorescu Год назад +1

    Nice one

  • @sriram151092
    @sriram151092 Год назад +1

    I am unable to run even a sample code in my work setup !!
    I am facing some SSL errors for which I couldn't find a reliable solution.

    • @python-programming
      @python-programming  Год назад

      In the top right of the jupyterbook, you can click the Colab icon and open it in Google Colab. Mind trying that?

    • @sriram151092
      @sriram151092 Год назад

      @@python-programming Thanks for reading my comment and offering help. Our work setup is firewalled I won't be able to let you join.
      Could it be the same firewall that is causing such an issue 🤔.

    • @fotisj321
      @fotisj321 Год назад

      @@sriram151092I think he is telling you to run your code on colab (Google's platform for Jupyter notebooks). That way you don't have to run it on your local pc avoiding the typical setup issues.