Sentence Transformers: Sentence Embedding, Sentence Similarity, Semantic Search and Clustering |Code
HTML-код
- Опубликовано: 11 сен 2024
- Learn How to use Sentence Transformers to perform Sentence Embedding, Sentence Similarity, Semantic search, and Clustering.
Code: github.com/Pra...
Previously Uploaded Transformers Videos:
1. Learn How to Fine Tune BERT on Custom Dataset.
2. Learn How to Deploy Fine-tuned BERT Model. (Hugging Face Hub and Streamlit Cloud)
3. Learn How to Deploy Fine-tuned Transformers Model on AWS Fargate (Docker Image)
#nlp #bert #transformers #machinelearning #artificialintelligence #datascience
📌 Hey everyone! Enjoying these NLP tutorials? Check out my other project, AI Demos, for quick 1-2 min AI tool demos! 🤖🚀
🔗 RUclips: www.youtube.com/@aidemos.futuresmart
We aim to educate and inform you about AI's incredible possibilities. Don't miss our AI Demos RUclips channel and website for amazing demos!
🌐 AI Demos Website: www.aidemos.com/
Subscribe to AI Demos and explore the future of AI with us!
Ghvtyu
This is an amazing video. I love how you walk me through, step by step. I love how this video gets into the meat of the problem and solution rather than talking endlessly about this and that. Straight to the point, and tons of useful and practical information that I can apply right away.
Thank You very much 🙏.Hope you find other videos also useful.
That's exactly what I needed. Huge thanks Pradip!
I'm a starter in this wide range of ML and very impressed about your presentation. If you have a chance, can you make a video about predicting name tag from some alphabets. For example, searching FBI, it will return Federal Bureau of Investigation, etc
All the concepts were clearly explained, thanks for the video! 🙌
Glad it was helpful!
Cool video! 🤗
Thanks for the visit 🤗
Easy Interpretation!! Kudos
Very helpful video on embeddings Pradip. Keep it going👏👏👏
Thank you, I will
Thanks for explaining this concept.This video is really helpful for project
Glad it was helpful!
Nicely compiled. Great work!
Just amazing! Salute man!
Great!!!! Super!!!!
thanks you! very good explaination
You are welcome!
Wonderful lesson!
Glad you liked it!
Amazing great work can please make video on semantic similarity detection model using bert transformer pleaseeee 🙏🙏🙏🙏🙏
Amazing! But how about a video of how to fine tuning a sentence transformer for nom english?
Great content!!
Fantastic video!
Thanks a lot for the explanation
You're welcome! 🤗
Thank you very much ❤
You're welcome 😊
Pradip, can you please explain how you narrow down a particular model from all others ? Like how or why did you pick up this particular mfaq model for semantic search of query ?
OMG, man! Thank you!!
You're welcome!
You rock !
Hi Pradip. Great demo. Can we further classify the cluster number into text? For example is there a model that will generate the word “baby” for cluster 0, “drums or monkey” for cluster 2, “animal” for cluster 3 and “food” for cluster 4?
You can create your mapping
Fantastic video Pradip!! Can you please suggest any reading material for sentence embeddings?
Thanks, Amey. I think you should check the official website. They have details of what pre-trained models are available and how to fine-tune them.
www.sbert.net/docs/training/overview.html
There is also new thing called SetFit
huggingface.co/blog/setfit
@@FutureSmartAI Thanks a lot!!
Hello sir, thanks for sharing, this is so insightful. I want to build a text summary, but I find it interesting on this embedding method. I want to ask, how we train our dataset on this model. Have any tutorials? Thank you in advance!
You can finetuning senetence transormers but I dont have any tutorials on it. You can read more about it www.sbert.net/docs/training/overview.html
@@FutureSmartAI Sorry to bothering again sir, I'm still new to this, so I just give the sentences (all the news text in each document without labels) in InputExample function and then train it in SentenceTransformer?
Thank YOU.
excellent man!! short and crisp. Do you mind creating semantic search model on custom dataset using pre-trained hugging face model.
you mean you want to fientune senetence transfomers?
pip install -U sentence-transformers I am getting error .
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
amazing
Thanks
Thankyou
🙏
Thanks for valuable inputs and clarity explanation. Can you do ner related videos/fine tuning using hugging face. Another request is currently, i am also doing semantic search related tasks and followed all the links in the notebooks already excluding clustering. I like to do semantic search between text input and images output ( which is possible only vectorizing both query and image description). Can you share any links related to huggingface or others that will be helpful.
Hi Venkatesan, I have already done videos related to custom NER and Fine tuning Hugging face transformers.
ruclips.net/video/9he4XKqqzvE/видео.html
ruclips.net/video/YLQvVpCXpbU/видео.html
For semantic search between text input and images output:
Check CLIP (Contrastive Language-Image Pre-training)
openai.com/blog/clip/
It is a neural network model which efficiently learns visual concepts from natural language supervision.
CLIP is trained on a dataset composed of pairs of images and their textual descriptions, abundantly available across the internet.
SentenceTransformers provides models that allow embedding images and text into the same vector space. This allows to find similar images as well as to implement image search.
www.sbert.net/examples/applications/image-search/README.html
@@FutureSmartAI Thanks for your valuable links and I will check/try.
Embedding/similarity can only be applied between sentences? What if paragraph to paragraph? essay to essay? Thx
Embedding can be calculated for paragraph and also for big documents.
Sometime models will have their input token limit in the case you need to break document into smaller paragraphs and then calculate embedding.
@@FutureSmartAI Thank you, sir! I will try it out. Is there any particular model you suggest to start with?
Thanks again.
q about the clustering case you gave the last. Is there a default criterion of similarity score to group the sentence? Which factor(s) sort the sentences together behind the scene? I mean some groups have only 2 sentences and some have 4 or 5. Thx
K-means clustering based on their proximity to the centroid of each cluster. The distance measure used in K-means clustering is typically the Euclidean distance, which is the straight-line distance between two points in n-dimensional space.
Depends on these distances they are grouped. here we are calculating distance between embedding.
@@FutureSmartAI thx for your quick reply. Let me ask this way: Is there a specific distance value behind this clustering? This might be need a read though the document by myself. Thanks again!
@@duetplay4551 Yes in Kmeans you should be able to get the distance between cluster centroid and points.
How can I train my custom sentence embeddings for my domain specific task so that I can find out similarity between my custom domain words
You can train your own here are steps.
www.sbert.net/docs/training/overview.html
@@FutureSmartAI thank you for replying but this is valid for a supervised problem. I have huge amount of data which is pure text documents. I want to train it in an unsupervised way where the model can learn similar words/ sentences
@@samarthsarin One way to train unsupervised is by using dummy tasks like next word prediction or next sentence prediction
HI Pradip, thanks for the video.
Can you please help me with this:
The embeddings (numerical values) change every time I use a new kernel.
How can I ensure that the embeddings are exactly same?
I have tried the following but it does not seem to work:
1. use model.eval() to turn model into evaluation mode and deactivate dropouts.
2. set "requires_grad" for each layer in the model as false so that the weight do not change.
3. set the same seeds.
Could you please guide me on this, any suggestion is appreciated.
Thanks,
Nitin
Hi, Did you get the solution for this problem ? I am also facing same problem
@@tintumarygeorge9309 Yes, the weights remain same (provided you use exactly same text each time). The bug in my case was, the order of text which I fed to the transformer was not same every time.
great video. My usecase is slightly different.
i have corpus of articles and corpus of summary.
i want to find for particiular summary, how many articles are sematically related or similar.
Which model is use, embedding and cluster it or not?
Can you help
You can use embedding and semantic score.
1. Calculate embedding for each article.
2. calculate embedding for a particular summary
Now iterate through each article embedding and calculate the cosine similarity between article embedding and particular summary.
Sort results to get the most semantically similar articles to that summary.
Check this it has utility functions : www.sbert.net/examples/applications/semantic-search/README.html
How we know what is number of clusters needed
It's beneficial if you can somehow infer it based on domain knowledge, but have a look at the "elbow method" or "silhouette method"
😙🤯🤯🤯🤯😃😲😁😅🤣😲😁🤣
Thanku soo much sir😊❤❤❤
where can we find the dataset?
its throwing error it cant find the dataset No such file or directory: '/content/drive/MyDrive/Content Creation/RUclips Tutorials/datasets/toxic_commnets_500.csv'
Its is shared in previous video of playlsit.