Sentence Transformers: Sentence Embedding, Sentence Similarity, Semantic Search and Clustering |Code

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024
  • Learn How to use Sentence Transformers to perform Sentence Embedding, Sentence Similarity, Semantic search, and Clustering.
    Code: github.com/Pra...
    Previously Uploaded Transformers Videos:
    1. Learn How to Fine Tune BERT on Custom Dataset.
    2. Learn How to Deploy Fine-tuned BERT Model. (Hugging Face Hub and Streamlit Cloud)
    3. Learn How to Deploy Fine-tuned Transformers Model on AWS Fargate (Docker Image)
    #nlp #bert #transformers #machinelearning #artificialintelligence #datascience

Комментарии • 75

  • @FutureSmartAI
    @FutureSmartAI  Год назад +2

    📌 Hey everyone! Enjoying these NLP tutorials? Check out my other project, AI Demos, for quick 1-2 min AI tool demos! 🤖🚀
    🔗 RUclips: www.youtube.com/@aidemos.futuresmart
    We aim to educate and inform you about AI's incredible possibilities. Don't miss our AI Demos RUclips channel and website for amazing demos!
    🌐 AI Demos Website: www.aidemos.com/
    Subscribe to AI Demos and explore the future of AI with us!

  • @kyoungd
    @kyoungd Год назад +2

    This is an amazing video. I love how you walk me through, step by step. I love how this video gets into the meat of the problem and solution rather than talking endlessly about this and that. Straight to the point, and tons of useful and practical information that I can apply right away.

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      Thank You very much 🙏.Hope you find other videos also useful.

  • @Munk-tt6tz
    @Munk-tt6tz 4 месяца назад

    That's exactly what I needed. Huge thanks Pradip!

  • @phongd5929
    @phongd5929 Год назад +2

    I'm a starter in this wide range of ML and very impressed about your presentation. If you have a chance, can you make a video about predicting name tag from some alphabets. For example, searching FBI, it will return Federal Bureau of Investigation, etc

  • @martinmolina-sx4be
    @martinmolina-sx4be Год назад +1

    All the concepts were clearly explained, thanks for the video! 🙌

  • @HuggingFace
    @HuggingFace 2 года назад +3

    Cool video! 🤗

  • @user-zl1pf2sy5s
    @user-zl1pf2sy5s Месяц назад +1

    Easy Interpretation!! Kudos

  • @prajithkumar432
    @prajithkumar432 2 года назад +1

    Very helpful video on embeddings Pradip. Keep it going👏👏👏

  • @balag3611
    @balag3611 Год назад +1

    Thanks for explaining this concept.This video is really helpful for project

  • @bhusanchettri8594
    @bhusanchettri8594 Год назад +1

    Nicely compiled. Great work!

  • @ShaikhMushfikurRahman
    @ShaikhMushfikurRahman Год назад +1

    Just amazing! Salute man!

  • @ragibshahriar7959
    @ragibshahriar7959 2 месяца назад +1

    Great!!!! Super!!!!

  • @AndresVeraF
    @AndresVeraF Год назад +1

    thanks you! very good explaination

  • @Ashesoftheliving
    @Ashesoftheliving 2 года назад +1

    Wonderful lesson!

  • @IbrahimKhan-lf9cq
    @IbrahimKhan-lf9cq 11 месяцев назад +1

    Amazing great work can please make video on semantic similarity detection model using bert transformer pleaseeee 🙏🙏🙏🙏🙏

  • @wilfredomartel7781
    @wilfredomartel7781 Год назад +1

    Amazing! But how about a video of how to fine tuning a sentence transformer for nom english?

  • @kittu999c
    @kittu999c 4 месяца назад

    Great content!!

  • @LearningWorldChatGPT
    @LearningWorldChatGPT Год назад +1

    Fantastic video!
    Thanks a lot for the explanation

  • @birolyildiz
    @birolyildiz Год назад +1

    Thank you very much ❤

  • @Raaj_ML
    @Raaj_ML 4 месяца назад

    Pradip, can you please explain how you narrow down a particular model from all others ? Like how or why did you pick up this particular mfaq model for semantic search of query ?

  • @veronicanatividade
    @veronicanatividade 11 месяцев назад

    OMG, man! Thank you!!

  • @ilducedimas
    @ilducedimas Год назад +1

    You rock !

  • @MyAscetic
    @MyAscetic Год назад +1

    Hi Pradip. Great demo. Can we further classify the cluster number into text? For example is there a model that will generate the word “baby” for cluster 0, “drums or monkey” for cluster 2, “animal” for cluster 3 and “food” for cluster 4?

  • @panditamey1
    @panditamey1 Год назад +1

    Fantastic video Pradip!! Can you please suggest any reading material for sentence embeddings?

    • @FutureSmartAI
      @FutureSmartAI  Год назад +1

      Thanks, Amey. I think you should check the official website. They have details of what pre-trained models are available and how to fine-tune them.
      www.sbert.net/docs/training/overview.html
      There is also new thing called SetFit
      huggingface.co/blog/setfit

    • @panditamey1
      @panditamey1 Год назад

      @@FutureSmartAI Thanks a lot!!

  • @flreview212
    @flreview212 Год назад +1

    Hello sir, thanks for sharing, this is so insightful. I want to build a text summary, but I find it interesting on this embedding method. I want to ask, how we train our dataset on this model. Have any tutorials? Thank you in advance!

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      You can finetuning senetence transormers but I dont have any tutorials on it. You can read more about it www.sbert.net/docs/training/overview.html

    • @flreview212
      @flreview212 Год назад

      @@FutureSmartAI Sorry to bothering again sir, I'm still new to this, so I just give the sentences (all the news text in each document without labels) in InputExample function and then train it in SentenceTransformer?

  • @Amazingarjun
    @Amazingarjun 5 месяцев назад

    Thank YOU.

  • @rajsethi26
    @rajsethi26 Год назад

    excellent man!! short and crisp. Do you mind creating semantic search model on custom dataset using pre-trained hugging face model.

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      you mean you want to fientune senetence transfomers?

  • @saritbahuguna9603
    @saritbahuguna9603 9 месяцев назад

    pip install -U sentence-transformers I am getting error .
    To fix this you could try to:
    1. loosen the range of package versions you've specified
    2. remove package versions to allow pip attempt to solve the dependency conflict

  • @sumankumari-gl3ze
    @sumankumari-gl3ze Год назад

    amazing

  • @kaka_rbp1998
    @kaka_rbp1998 Год назад

    Thankyou

  • @pouriaforouzesh5349
    @pouriaforouzesh5349 Год назад

    🙏

  • @venkatesanr9455
    @venkatesanr9455 2 года назад +1

    Thanks for valuable inputs and clarity explanation. Can you do ner related videos/fine tuning using hugging face. Another request is currently, i am also doing semantic search related tasks and followed all the links in the notebooks already excluding clustering. I like to do semantic search between text input and images output ( which is possible only vectorizing both query and image description). Can you share any links related to huggingface or others that will be helpful.

    • @FutureSmartAI
      @FutureSmartAI  2 года назад +1

      Hi Venkatesan, I have already done videos related to custom NER and Fine tuning Hugging face transformers.
      ruclips.net/video/9he4XKqqzvE/видео.html
      ruclips.net/video/YLQvVpCXpbU/видео.html
      For semantic search between text input and images output:
      Check CLIP (Contrastive Language-Image Pre-training)
      openai.com/blog/clip/
      It is a neural network model which efficiently learns visual concepts from natural language supervision.
      CLIP is trained on a dataset composed of pairs of images and their textual descriptions, abundantly available across the internet.

    • @FutureSmartAI
      @FutureSmartAI  2 года назад +1

      SentenceTransformers provides models that allow embedding images and text into the same vector space. This allows to find similar images as well as to implement image search.
      www.sbert.net/examples/applications/image-search/README.html

    • @venkatesanr9455
      @venkatesanr9455 2 года назад +1

      @@FutureSmartAI Thanks for your valuable links and I will check/try.

  • @duetplay4551
    @duetplay4551 Год назад +1

    Embedding/similarity can only be applied between sentences? What if paragraph to paragraph? essay to essay? Thx

    • @FutureSmartAI
      @FutureSmartAI  Год назад +1

      Embedding can be calculated for paragraph and also for big documents.
      Sometime models will have their input token limit in the case you need to break document into smaller paragraphs and then calculate embedding.

    • @duetplay4551
      @duetplay4551 Год назад

      @@FutureSmartAI Thank you, sir! I will try it out. Is there any particular model you suggest to start with?
      Thanks again.

  • @duetplay4551
    @duetplay4551 Год назад +1

    q about the clustering case you gave the last. Is there a default criterion of similarity score to group the sentence? Which factor(s) sort the sentences together behind the scene? I mean some groups have only 2 sentences and some have 4 or 5. Thx

    • @FutureSmartAI
      @FutureSmartAI  Год назад +1

      K-means clustering based on their proximity to the centroid of each cluster. The distance measure used in K-means clustering is typically the Euclidean distance, which is the straight-line distance between two points in n-dimensional space.
      Depends on these distances they are grouped. here we are calculating distance between embedding.

    • @duetplay4551
      @duetplay4551 Год назад

      @@FutureSmartAI thx for your quick reply. Let me ask this way: Is there a specific distance value behind this clustering? This might be need a read though the document by myself. Thanks again!

    • @FutureSmartAI
      @FutureSmartAI  Год назад +1

      @@duetplay4551 Yes in Kmeans you should be able to get the distance between cluster centroid and points.

  • @samarthsarin
    @samarthsarin 2 года назад +1

    How can I train my custom sentence embeddings for my domain specific task so that I can find out similarity between my custom domain words

    • @FutureSmartAI
      @FutureSmartAI  2 года назад

      You can train your own here are steps.
      www.sbert.net/docs/training/overview.html

    • @samarthsarin
      @samarthsarin 2 года назад

      @@FutureSmartAI thank you for replying but this is valid for a supervised problem. I have huge amount of data which is pure text documents. I want to train it in an unsupervised way where the model can learn similar words/ sentences

    • @FutureSmartAI
      @FutureSmartAI  2 года назад

      @@samarthsarin One way to train unsupervised is by using dummy tasks like next word prediction or next sentence prediction

  • @nitinchavan3395
    @nitinchavan3395 Год назад

    HI Pradip, thanks for the video.
    Can you please help me with this:
    The embeddings (numerical values) change every time I use a new kernel.
    How can I ensure that the embeddings are exactly same?
    I have tried the following but it does not seem to work:
    1. use model.eval() to turn model into evaluation mode and deactivate dropouts.
    2. set "requires_grad" for each layer in the model as false so that the weight do not change.
    3. set the same seeds.
    Could you please guide me on this, any suggestion is appreciated.
    Thanks,
    Nitin

    • @tintumarygeorge9309
      @tintumarygeorge9309 11 месяцев назад

      Hi, Did you get the solution for this problem ? I am also facing same problem

    • @nitinchavan3395
      @nitinchavan3395 11 месяцев назад

      @@tintumarygeorge9309 Yes, the weights remain same (provided you use exactly same text each time). The bug in my case was, the order of text which I fed to the transformer was not same every time.

  • @shobhitrajgautam
    @shobhitrajgautam Год назад

    great video. My usecase is slightly different.
    i have corpus of articles and corpus of summary.
    i want to find for particiular summary, how many articles are sematically related or similar.
    Which model is use, embedding and cluster it or not?
    Can you help

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      You can use embedding and semantic score.
      1. Calculate embedding for each article.
      2. calculate embedding for a particular summary
      Now iterate through each article embedding and calculate the cosine similarity between article embedding and particular summary.
      Sort results to get the most semantically similar articles to that summary.
      Check this it has utility functions : www.sbert.net/examples/applications/semantic-search/README.html

  • @SMCGPRA
    @SMCGPRA 5 месяцев назад

    How we know what is number of clusters needed

    • @Mr_ScrufMan
      @Mr_ScrufMan 4 месяца назад

      It's beneficial if you can somehow infer it based on domain knowledge, but have a look at the "elbow method" or "silhouette method"

  • @AzertAzert-nw4ze
    @AzertAzert-nw4ze 6 месяцев назад

    😙🤯🤯🤯🤯😃😲😁😅🤣😲😁🤣

  • @shubhamguptachannel3853
    @shubhamguptachannel3853 Год назад

    Thanku soo much sir😊❤❤❤

  • @highstakestrading
    @highstakestrading Год назад

    where can we find the dataset?
    its throwing error it cant find the dataset No such file or directory: '/content/drive/MyDrive/Content Creation/RUclips Tutorials/datasets/toxic_commnets_500.csv'

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      Its is shared in previous video of playlsit.