gosh, it is soooo useful. I saw those unsqueeze, last_hidden_states, clamp, etc commands before, but did not know how to link it with transformer embeddings. This hands-up detailed explanation helps soooo much! I did not even know where my question is; now I know the question and the answer. Thank you!
Thanks for the tutorial !! Here is a question: How about using "pooler_output" tensor from model instead of mean pooling on last_hidden_state ?? Is it because pooler_output more suitable for downstream task rather than sentence representation? I appreciate your answer 🙏
hi thanks for great vedio! just one question, when i was running the part "outputs=model(**tokens)", i got error message saying "ValueError: You need to specify either `text` or `text_target`.", any idea why it happened?
Hi James, thank you for the amazing lecture, could we apply this method for a list with combined words such as "machine learning", "Deep convolutional neural network", 'Random forests', 'Semisupervised learning', "Internet of things", 'Supervised learning', "deep learning", "Fuzzy logic", "real-time", "Big data", etc.? and find semantic similarities for these combined words
Thanks for sharing. Can we use this approach to compare two addresses using sentence embeddings? For example given two columns of addresses, can we find how one address is similar (or not) to other addresses?
You could maybe, but I’m not sure the results would be great, I think for two addresses I’d use Levenshtein distance, I have a playlist cover these things in depth ruclips.net/p/PLIUOU7oqGTLhlWpTz4NnuT3FekouIVlqc The first one covers Levenshtein, then if that’s not enough we have the other vids - there’s an article incl with each video in the description - I’d def recommend those to help! Let me know how it goes or if you have questions :)
Hi James, I'm following this tutorial but using sentence-transformers/distiluse-base-multilingual-cased-v1 with 45000 sentences and the longest is 118, but I'm running out of memory when training, it saids that needs to allocate almost 17 Gb, this happens when running outputs = model(**tokens) , any idea of why is this happening? I'm using a sagemaker instance with 32Gb RAM... @James Briggs
hey Diego, this isn't the best approach when dealing with larger amounts of data, I'd recommend using a vector database like FAISS. Working on a tutorial for exactly this - should be out in ~2 weeks, so if you want to do everything before then I'd recommend you look into FAISS and see if you can use it to store your vectors and perform the similarity search, some good documentation here: github.com/facebookresearch/faiss/wiki Good luck!
I could give it a go for sure - the transformer models, or the huggingface transformers library? And are there any parts in particular that you think would be useful?
@@williamwang2676 I have a lot of videos on these in the "NLP" and "NLP for Semantic Search" playlists on my channel, is there anything in particular you're interested in that I'm missing?
I'm not sure how we could do this - keyBert is supposedly good at identifying similar words within a sentence - so maybe this could be the right direction, I've never used it though so I'm not 100% certain
BERT works well, there is also another model called Universal Sentence Encoder (USE) which is popular, but unfortunately only available via TensorFlow as far as I'm aware Check out the sentence-transformers library too, it's very good!
Hi James, such a great video ! I am new to the nlp world but I am working on text similarity on a very domain specific corpus which is unlabeled. In most cases for text similarity with bert I find a dataset with pairs of sentences but in my case it will be so hard to obtain. I just have one question when we use this kind of method we are unable to judge the performance (accuracy,recall,F1-score), How can evaluate my model in this case ? Thanks in advance !
Haven't seen this level of in-depth explanation elsewhere. Thank you so much. Also I will be glad if you can provide your insight on appending some neural network layers post sentence embedding for some downstream application. Do you think it will enrich the embeddings more or make any difference?
It depends on what you're using them for, generally though I've found that you can't really enrich embeddings any further than what the big transformers models manage. What sort of downstream applications are you thinking of?
Thanks for the tutorial its amazing, I have a question though what is the difference between using tokenizers and the other approach you did in another video using BERT encoder is it that you just used the library ? and here you are actually implementing it from scratch, Thanks in advance
Yes that's right, both are exactly the same - and if we compare the output logits we'll even see (almost) the exact same values (with some rounding differences)
Hi James, awesome video! I've a question, i need to do this but in spanish, what model should i used to get good results, I'm looking to compare, for example, "Beer" and "Heineken Original Lager" and get a high similarity, is it possible?
Yes it should be able to do that, I’m looking and I can’t see any Spanish-specific models (looking at www.sbert.net/docs/pretrained_models.html), but there are a few multilingual models you could try, a little more info on those here: www.sbert.net/examples/training/multilingual/README.html I don’t know how to train sentence transformer models *yet*, but clearly there’s a gap in non-English sentence transformers, so I’m going to look into it - although it will be some time so don’t wait for me! Good luck :)
@@jamesbriggs Thanks for the resources I´ll check them out. Is it possible to build my own subject specific sentence transformer in spanish? And if it is, do you know where i can find info to do this? Cause i have a kinda big list of 45k sentences but the problem I'm trying to solve is very specific
the concept is the same, they're using transformer models with some sort of pooling to create the embedding models - if you want to dive into it I have a course on all of these techniques here ruclips.net/p/PLIUOU7oqGTLgz-BI8bNMVGwQxIMuQddJO
Hi James, thank you for this amazing tutorial ! Just a quick question : attention_mask is used 2 times : 1) model(input_ids, attention_mask) and then 2) mask_embeddings = embeddings * unsqueezed_expanded_mask . I wonder why the second step is necessary. Why does the model output unnecessary information, that we need to cancel with 2) ?
the models by default aren't used to produce sentence embeddings, so they output some information in the final embeddings, but when we average activations across those embeddings (to create sentence embeddings) we need to remove those activations from token embeddings that align to [PAD] tokens
Can I compute similarity between a sentence/paragraph and a word/expression (related to a topoic) to identified if the sentence/paragraph is about a topic?
I would assume, although I can't say this for sure - that a random irrelevant word *should* have a lower similarity score than a relevant word. But you likely won't get as strong similarity scores as between two semantically similar sentences
gosh, it is soooo useful. I saw those unsqueeze, last_hidden_states, clamp, etc commands before, but did not know how to link it with transformer embeddings. This hands-up detailed explanation helps soooo much! I did not even know where my question is; now I know the question and the answer. Thank you!
This Channel is gem.
Thanks a lot keep doing it James
thanks!
James, your lessons to ease the advanced NLP topics are amazing ...
That's awesome to hear, thankyou!
Thank you James, good work as usual. A small comment to help you with the RUclips algorithm as you deserve.
Thanks as always Sajid!
These are all gold. Thank you for making this!
Thanks I appreciate it!
works like a charm. Thanks a lot man
Thank you so much for your awesome video!!! Seriously helped a lot.
Hello friend, thanks a lot for the explanation. It was too useful to me!
Thanks for the tutorial !!
Here is a question: How about using "pooler_output" tensor from model instead of mean pooling on last_hidden_state ??
Is it because pooler_output more suitable for downstream task rather than sentence representation?
I appreciate your answer 🙏
is the mean pooled embedding computed this way the same as "bert-as-service"?
hi thanks for great vedio! just one question, when i was running the part "outputs=model(**tokens)", i got error message saying "ValueError: You need to specify either `text` or `text_target`.", any idea why it happened?
I wish to do a lesson on fine-tuning the sentence-transformer model for sentence similarity on a custom dataset ...
That would be pretty interesting, I'll likely look into it sometime soon :)
@@jamesbriggs Thanks James, please use roBERTa as suggested above to deal with out of vocabulary ...
Could you give a deep dive about Fine tuning/retraining transformer VS Stitching Adapters for a downstream task ?
sure I can look into it :)
Hi James, thank you for the amazing lecture, could we apply this method for a list with combined words such as "machine learning", "Deep convolutional neural network", 'Random forests', 'Semisupervised learning', "Internet of things", 'Supervised learning', "deep learning", "Fuzzy logic", "real-time", "Big data", etc.? and find semantic similarities for these combined words
Hey I had a question how could I do this I multiple languages
Hi James, on your github code you created final result pandas dataset, could you share the logic before and new similarity
Thank you very much, this is really cool
Thanks for sharing. Can we use this approach to compare two addresses using sentence embeddings? For example given two columns of addresses, can we find how one address is similar (or not) to other addresses?
You could maybe, but I’m not sure the results would be great, I think for two addresses I’d use Levenshtein distance, I have a playlist cover these things in depth ruclips.net/p/PLIUOU7oqGTLhlWpTz4NnuT3FekouIVlqc
The first one covers Levenshtein, then if that’s not enough we have the other vids - there’s an article incl with each video in the description - I’d def recommend those to help! Let me know how it goes or if you have questions :)
Hi James, I'm following this tutorial but using sentence-transformers/distiluse-base-multilingual-cased-v1 with 45000 sentences and the longest is 118, but I'm running out of memory when training, it saids that needs to allocate almost 17 Gb, this happens when running outputs = model(**tokens) , any idea of why is this happening? I'm using a sagemaker instance with 32Gb RAM... @James Briggs
hey Diego, this isn't the best approach when dealing with larger amounts of data, I'd recommend using a vector database like FAISS. Working on a tutorial for exactly this - should be out in ~2 weeks, so if you want to do everything before then I'd recommend you look into FAISS and see if you can use it to store your vectors and perform the similarity search, some good documentation here:
github.com/facebookresearch/faiss/wiki
Good luck!
Thanks. Can you do a deep dive into explaining Transformers?
I could give it a go for sure - the transformer models, or the huggingface transformers library? And are there any parts in particular that you think would be useful?
@@jamesbriggs the huggingface transformers library would be useful
@@williamwang2676 I have a lot of videos on these in the "NLP" and "NLP for Semantic Search" playlists on my channel, is there anything in particular you're interested in that I'm missing?
How can the cosine similarity vector be used to point out which words in the original sentences contribute the most to dissimilarity?
I'm not sure how we could do this - keyBert is supposedly good at identifying similar words within a sentence - so maybe this could be the right direction, I've never used it though so I'm not 100% certain
Which language models [BERT, Elmo, flair, GPT-2, etc.]are suitable for finding similar sentences or tweets?
BERT works well, there is also another model called Universal Sentence Encoder (USE) which is popular, but unfortunately only available via TensorFlow as far as I'm aware
Check out the sentence-transformers library too, it's very good!
Hi James, such a great video ! I am new to the nlp world but I am working on text similarity on a very domain specific corpus which is unlabeled. In most cases for text similarity with bert I find a dataset with pairs of sentences but in my case it will be so hard to obtain. I just have one question when we use this kind of method we are unable to judge the performance (accuracy,recall,F1-score), How can evaluate my model in this case ? Thanks in advance !
amazing video again
Haven't seen this level of in-depth explanation elsewhere. Thank you so much.
Also I will be glad if you can provide your insight on appending some neural network layers post sentence embedding for some downstream application. Do you think it will enrich the embeddings more or make any difference?
It depends on what you're using them for, generally though I've found that you can't really enrich embeddings any further than what the big transformers models manage. What sort of downstream applications are you thinking of?
Thanks for the tutorial its amazing, I have a question though what is the difference between using tokenizers and the other approach you did in another video using BERT encoder is it that you just used the library ? and here you are actually implementing it from scratch, Thanks in advance
Yes that's right, both are exactly the same - and if we compare the output logits we'll even see (almost) the exact same values (with some rounding differences)
Hi James, awesome video! I've a question, i need to do this but in spanish, what model should i used to get good results, I'm looking to compare, for example, "Beer" and "Heineken Original Lager" and get a high similarity, is it possible?
Yes it should be able to do that, I’m looking and I can’t see any Spanish-specific models (looking at www.sbert.net/docs/pretrained_models.html), but there are a few multilingual models you could try, a little more info on those here:
www.sbert.net/examples/training/multilingual/README.html
I don’t know how to train sentence transformer models *yet*, but clearly there’s a gap in non-English sentence transformers, so I’m going to look into it - although it will be some time so don’t wait for me! Good luck :)
@@jamesbriggs Thanks for the resources I´ll check them out. Is it possible to build my own subject specific sentence transformer in spanish? And if it is, do you know where i can find info to do this? Cause i have a kinda big list of 45k sentences but the problem I'm trying to solve is very specific
Seen your openai embeddings, is this what they are doing or the techniques are different?
the concept is the same, they're using transformer models with some sort of pooling to create the embedding models - if you want to dive into it I have a course on all of these techniques here ruclips.net/p/PLIUOU7oqGTLgz-BI8bNMVGwQxIMuQddJO
@@jamesbriggs Awesome! Will look at it
Hi James, thank you for this amazing tutorial ! Just a quick question : attention_mask is used 2 times : 1) model(input_ids, attention_mask) and then 2) mask_embeddings = embeddings * unsqueezed_expanded_mask . I wonder why the second step is necessary. Why does the model output unnecessary information, that we need to cancel with 2) ?
the models by default aren't used to produce sentence embeddings, so they output some information in the final embeddings, but when we average activations across those embeddings (to create sentence embeddings) we need to remove those activations from token embeddings that align to [PAD] tokens
Thank you Sir for your contribution,definitely purchasing your course on Udemy
that's awesome to hear, looking forward to seeing you there!
Can I compute similarity between a sentence/paragraph and a word/expression (related to a topoic) to identified if the sentence/paragraph is about a topic?
I would assume, although I can't say this for sure - that a random irrelevant word *should* have a lower similarity score than a relevant word. But you likely won't get as strong similarity scores as between two semantically similar sentences
is this model can deal with out of vocabulary tokens?
No this one can't, we want byte level encoding for that, which I believe roBERTa uses, but BERT doesn't
it will take too much time to match similiarty
what will be the fast way
I'd recommend something like FAISS
Just discovered your channel. I can't thank the yt algorithim enough
I'm thanking the algorithm too - super happy you're enjoying it!
@@jamesbriggs oh my god u actually replied
haha I always do :)
You are awesome!
Great channel! we both share the same interests!
Subscribed
That's awesome to hear, thanks!
I guess after this I qualify to be called BERTISIAN EXPERT woow