GPT-3 Embeddings: Perform Text Similarity, Semantic Search, Classification, and Clustering | Code
HTML-код
- Опубликовано: 16 сен 2024
- Hands-on GPT-3 tutorial Learn How to use GPT-3 Embeddings to perform Text Similarity, Semantic Search, Classification, and Clustering.
Open AI claims its embeddings outperform top models in 3 standard benchmarks, including a 20% relative improvement in code search.
Code: github.com/Pra...
In the last video, we learn How to use Sentence Transformers to perform Sentence Embedding, Sentence Similarity, Semantic search, and Clustering.
• Sentence Transformers:...
GPT-3 Playlist: • Open AI ChatGPT, GPT-4...
NLP Beginner to Advanced Playlist:
• NLP Beginner to Advanced
I am a Freelance Data Scientist working on Natural Language Processing (NLP) and building end-to-end NLP applications.
I have over 7 years of experience in the industry, including as a Lead Data Scientist at Oracle, where I worked on NLP and MLOps.
I Share Practical hands-on tutorials on NLP and Bite-sized information and knowledge related to Artificial Intelligence.
LinkedIn: / pradipnichite
#gpt3 #openai #nlp #sentencetransformers #embedding #artificalintelligence #machinelearning
📌 Hey everyone! Enjoying these NLP tutorials? Check out my other project, AI Demos, for quick 1-2 min AI tool demos! 🤖🚀
🔗 RUclips: www.youtube.com/@aidemos.futuresmart
We aim to educate and inform you about AI's incredible possibilities. Don't miss our AI Demos RUclips channel and website for amazing demos!
🌐 AI Demos Website: www.aidemos.com/
Subscribe to AI Demos and explore the future of AI with us!
Qqqqqq1
1❤
Q
You have explained everything very well and very patiently. 👍Thanks for these amazing tutorials Pradip!
Hi Pradip,This is very very useful video for me because this is what I am searching to my real time project
Glad it was helpful!
Great work! Very useful video Pradip. Helped me a lot while doing POC at work. :)
Great to hear!
Hi Pradip, thank you for the video. It would be great if you could also talk about the challenges which face during the real time implementation.
Thanks for the idea!
@@FutureSmartAI Thank you in advance
Thanks Pradip . super simple and informative 👌
Glad you liked it!
Do check other videos also
Very helpful. Thanks!
You're welcome!
really appreciate your work as always, just wonder which one is better open AI embedding API or Transformer considering they all have same models for same functionality
You should try first transformers they are open source. For most of my applications transformers works pretty well.
@@FutureSmartAI Thank you for ur reply, i genuinely have checked all ur videos already, insanely helpful
This video was excellent. I'm going to have an interview on NlP OpenAI ChatGPT. What should I prepare for? Your suggestions will be helpful.
Best of luck!
Prepare
How to write Prompt
meaning of different parameters in open ai API or playground
What are things you can build with GPT-3, ChatGPT etc.
What is an amazing app you came across that is built using GPT-3 or ChatGPT
@@FutureSmartAI thank you
Thanks for your videos. Whether NER can be used for search engines using the tags and information retrieval. Any example link will be helpful and we are trying to do semantic search/map ocr output text with the input query text and final output is image based on the similarity. How openai can be fine tuning for semantic search?.
I have done experiments on sentence transformer for semantic search whether openai models are heavy weighted.
Hi Venkatesan,
NER is very much useful for building knowledge graph which is useful in semantic search.
in the video which db are you using to store the embeddings [video:playtime( 18:17)] for semantic search.
I am storing it in Pandas data frame. you can store it in Pinecone.
Here is video that I have made?
ruclips.net/video/bWOvO_cxLHw/видео.html
As per your reply i have checked the video sir, but before initializing i wanted to check in pandas ,so i created embedding and tried to store in pandas , mean while i got an exception.1)While trying to implement in text-embedding-ada-002 it gives error as RateLimitError Traceback (most recent call last)
/usr/local/lib/python3.8/dist-packages/tenacity/__init__.py in __call__(self, fn, *args, **kwargs)
399 try:
--> 400 result = fn(*args, **kwargs)
401 except BaseException: # noqa: B902
14 frames
RateLimitError: Rate limit reached for default-global-with-image-limits in organization org-y8bZbm1L2kH97ykcSqxofMML on requests per min. Limit: 60.000000 / min. Current: 110.000000 / min. Contact support@openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit platform.openai.com/account/billing to add a payment method.
The above exception was the direct cause of the following exception:
RetryError Traceback (most recent call last)
/usr/local/lib/python3.8/dist-packages/tenacity/__init__.py in iter(self, retry_state)
352 if self.reraise:
353 raise retry_exc.reraise()
--> 354 raise retry_exc from fut.exception()
355
356 if self.wait:
RetryError: RetryError[]
2) when i give text-embedding-babbage-002 it shows errorr as df['babbage_search'] = df.combined.apply(lambda x: get_embedding(x, engine='text-embedding-babbage-001'))
How do i resolve this error?
Rate Limit error is everyone facing.its openai issu
how to create df[babbage_search ] and df[babbage similarity] because in the example it already have a dataframe, if we have to create how shoud i give
What method would correspond to these problems? Can I use GPT-3 for these tasks?
"Fire" + "Mountain" --> "Volcano"
"Fire" + "Metal" + "Building" --> "Forge"
"Volcano" --> "Fire", "Mountain", "Environment", "Lava", "heat", "danger"
Help would be greaty appreciated! Thank you for the content!, I liked!
Yes. If you give such 3-5 examples in GPT-3 prompt, it will work.
Thank you for a wonderful explanation. I have two questions. 1. The embedding model works for English only in my view so how we can use it for other languages? for example if we want to do it for other languages what we can do? 2. if it is possible to train the model with our data. what kind of data is needed? finally how can measure the accuracy of the similarity, semantic search, and classification? Thank you.
Hi You can use Coher's Multilingual Embedding for language other than English.
@@FutureSmartAI please I need the second question if you would respond.
@@mesaygemeda2867 To my knowledge, cosine similarity can tell the accuracy of the similarity. In Pradip's video, he mentioned.
I'm very unclear on classifications still - what is being classified to what? It looks like we're just comparing numbers with other numbers? what are the classifications?
For classification we are using embedings as feature vector and then we can train any machine learning model
Sir, Your Transformers playlist link showing invalid.
Thank you. I have changed URL. Now that playlist called NLP Beginner to Advanced Playlist:
@@FutureSmartAI ok sir, thanks. already started your NLP beginner to Advanced Playlist.🤗
Quick question: what if the documents are 5000 words long, how can we apply this approach? or is there an alternative way to do it? Thanks in advance!
Hi, If documents are longer. we should beak them in paragraphs. you can use Spacy to split big text into paragraphs. and then calculate embedding for those paragraphs.
@@FutureSmartAI Thanks for clearing my brain frog. I will do the experiment and get back to you... u r the best online teacher in 2022, period😁
Hey Pradip, I am building a discord bot that connects people based on the thoughts they send to the bot and messages on the server. Since im mew to the space wanted to get in touch with you to know more on how to get building this. Followed you on twitter, can you open your dms?
For starter, you mentioned gpt to be more accurate than models by huggingface? So should i follow this tutorial in building the bot thaay reads the messages, analyse thhe sentiments, topics of the message and then group them together?
I think you can start with sentence transformers embedding and semantic similarity score.
@@FutureSmartAI i was also told about vector database to be relevant here. Again, very new to the whole space. Is vector database related to embeddings and similarity score?
@@sampriti6026 yes we store embedding generated using sentence transformers into vector db like pinecone which also support getting similar docs based on your query.
There are some open source vector dbs also available
@@FutureSmartAI alright thanks a lot for the reply. Any way to get in touch with you more directly ?
Can I use nested token?
Can you explain more?
@@FutureSmartAI [[love, kiss, hug, like, dinner,….],[winter, ride, hot, swim….], [….]]
I think this video would be much better if instead of using Python you'd showed the same example using curl. This way it would be much better to people adapt the example using any tech stack... There are a lot of things going on that only make sense for those who know Python and a lot of "magic" behind the libs...
Hi Thanks, Thanks for your feedback.
The video is intentionally made for Python. I think when people search for something, they search for specific things.
I am confused I thought gpt-3 is not open source
GPT-3 is not open sourced. We can access it using Open AI API
Hmm, the difference in score is not what I call spectacular. Where do you set the threshold? Cannot simply say if similarity is above 80% then its the same if its less than 50% than its definitly not ok.
Co ask here. For time being, I would just directly send to my boss to let him proof read 😛
The threshold can't be absolute; we should experiment and see what works, for GPT-3 embedding similarity threshold could be 0.75 whereas the sentence transformes embedding could be 0.85