GPT-3 Embeddings: Perform Text Similarity, Semantic Search, Classification, and Clustering | Code

Поделиться
HTML-код
  • Опубликовано: 16 сен 2024
  • Hands-on GPT-3 tutorial Learn How to use GPT-3 Embeddings to perform Text Similarity, Semantic Search, Classification, and Clustering.
    Open AI claims its embeddings outperform top models in 3 standard benchmarks, including a 20% relative improvement in code search.
    Code: github.com/Pra...
    In the last video, we learn How to use Sentence Transformers to perform Sentence Embedding, Sentence Similarity, Semantic search, and Clustering.
    • Sentence Transformers:...
    GPT-3 Playlist: • Open AI ChatGPT, GPT-4...
    NLP Beginner to Advanced Playlist:
    • NLP Beginner to Advanced
    I am a Freelance Data Scientist working on Natural Language Processing (NLP) and building end-to-end NLP applications.
    I have over 7 years of experience in the industry, including as a Lead Data Scientist at Oracle, where I worked on NLP and MLOps.
    I Share Practical hands-on tutorials on NLP and Bite-sized information and knowledge related to Artificial Intelligence.
    LinkedIn: / pradipnichite
    #gpt3 #openai #nlp #sentencetransformers #embedding #artificalintelligence #machinelearning

Комментарии • 59

  • @FutureSmartAI
    @FutureSmartAI  Год назад +1

    📌 Hey everyone! Enjoying these NLP tutorials? Check out my other project, AI Demos, for quick 1-2 min AI tool demos! 🤖🚀
    🔗 RUclips: www.youtube.com/@aidemos.futuresmart
    We aim to educate and inform you about AI's incredible possibilities. Don't miss our AI Demos RUclips channel and website for amazing demos!
    🌐 AI Demos Website: www.aidemos.com/
    Subscribe to AI Demos and explore the future of AI with us!

  • @arjunob
    @arjunob Год назад +1

    You have explained everything very well and very patiently. 👍Thanks for these amazing tutorials Pradip!

  • @sathyag2608
    @sathyag2608 Год назад +3

    Hi Pradip,This is very very useful video for me because this is what I am searching to my real time project

  • @mansibisht557
    @mansibisht557 Год назад +2

    Great work! Very useful video Pradip. Helped me a lot while doing POC at work. :)

  • @dhirajkumarsahu999
    @dhirajkumarsahu999 2 года назад +3

    Hi Pradip, thank you for the video. It would be great if you could also talk about the challenges which face during the real time implementation.

  • @HazemAzim
    @HazemAzim Год назад +1

    Thanks Pradip . super simple and informative 👌

  • @kevon217
    @kevon217 Год назад +1

    Very helpful. Thanks!

  • @youwang9156
    @youwang9156 Год назад +1

    really appreciate your work as always, just wonder which one is better open AI embedding API or Transformer considering they all have same models for same functionality

    • @FutureSmartAI
      @FutureSmartAI  Год назад +1

      You should try first transformers they are open source. For most of my applications transformers works pretty well.

    • @youwang9156
      @youwang9156 Год назад +1

      @@FutureSmartAI Thank you for ur reply, i genuinely have checked all ur videos already, insanely helpful

  • @younginnovatorscenterofint8986
    @younginnovatorscenterofint8986 Год назад +1

    This video was excellent. I'm going to have an interview on NlP OpenAI ChatGPT. What should I prepare for? Your suggestions will be helpful.

    • @FutureSmartAI
      @FutureSmartAI  Год назад +1

      Best of luck!
      Prepare
      How to write Prompt
      meaning of different parameters in open ai API or playground
      What are things you can build with GPT-3, ChatGPT etc.
      What is an amazing app you came across that is built using GPT-3 or ChatGPT

    • @younginnovatorscenterofint8986
      @younginnovatorscenterofint8986 Год назад

      @@FutureSmartAI thank you

  • @venkatesanr9455
    @venkatesanr9455 2 года назад +3

    Thanks for your videos. Whether NER can be used for search engines using the tags and information retrieval. Any example link will be helpful and we are trying to do semantic search/map ocr output text with the input query text and final output is image based on the similarity. How openai can be fine tuning for semantic search?.
    I have done experiments on sentence transformer for semantic search whether openai models are heavy weighted.

    • @FutureSmartAI
      @FutureSmartAI  2 года назад +1

      Hi Venkatesan,
      NER is very much useful for building knowledge graph which is useful in semantic search.

  • @sarathipriya
    @sarathipriya Год назад +1

    in the video which db are you using to store the embeddings [video:playtime( 18:17)] for semantic search.

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      I am storing it in Pandas data frame. you can store it in Pinecone.
      Here is video that I have made?
      ruclips.net/video/bWOvO_cxLHw/видео.html

    • @sarathipriya
      @sarathipriya Год назад

      As per your reply i have checked the video sir, but before initializing i wanted to check in pandas ,so i created embedding and tried to store in pandas , mean while i got an exception.1)While trying to implement in text-embedding-ada-002 it gives error as RateLimitError Traceback (most recent call last)
      /usr/local/lib/python3.8/dist-packages/tenacity/__init__.py in __call__(self, fn, *args, **kwargs)
      399 try:
      --> 400 result = fn(*args, **kwargs)
      401 except BaseException: # noqa: B902
      14 frames
      RateLimitError: Rate limit reached for default-global-with-image-limits in organization org-y8bZbm1L2kH97ykcSqxofMML on requests per min. Limit: 60.000000 / min. Current: 110.000000 / min. Contact support@openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit platform.openai.com/account/billing to add a payment method.
      The above exception was the direct cause of the following exception:
      RetryError Traceback (most recent call last)
      /usr/local/lib/python3.8/dist-packages/tenacity/__init__.py in iter(self, retry_state)
      352 if self.reraise:
      353 raise retry_exc.reraise()
      --> 354 raise retry_exc from fut.exception()
      355
      356 if self.wait:
      RetryError: RetryError[]
      2) when i give text-embedding-babbage-002 it shows errorr as df['babbage_search'] = df.combined.apply(lambda x: get_embedding(x, engine='text-embedding-babbage-001'))
      How do i resolve this error?

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      Rate Limit error is everyone facing.its openai issu

  • @sarathipriya
    @sarathipriya Год назад

    how to create df[babbage_search ] and df[babbage similarity] because in the example it already have a dataframe, if we have to create how shoud i give

  • @seventfour9247
    @seventfour9247 Год назад +1

    What method would correspond to these problems? Can I use GPT-3 for these tasks?
    "Fire" + "Mountain" --> "Volcano"
    "Fire" + "Metal" + "Building" --> "Forge"
    "Volcano" --> "Fire", "Mountain", "Environment", "Lava", "heat", "danger"
    Help would be greaty appreciated! Thank you for the content!, I liked!

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      Yes. If you give such 3-5 examples in GPT-3 prompt, it will work.

  • @mesaygemeda2867
    @mesaygemeda2867 Год назад

    Thank you for a wonderful explanation. I have two questions. 1. The embedding model works for English only in my view so how we can use it for other languages? for example if we want to do it for other languages what we can do? 2. if it is possible to train the model with our data. what kind of data is needed? finally how can measure the accuracy of the similarity, semantic search, and classification? Thank you.

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      Hi You can use Coher's Multilingual Embedding for language other than English.

    • @mesaygemeda2867
      @mesaygemeda2867 Год назад

      ​@@FutureSmartAI please I need the second question if you would respond.

    • @duetplay4551
      @duetplay4551 Год назад

      @@mesaygemeda2867 To my knowledge, cosine similarity can tell the accuracy of the similarity. In Pradip's video, he mentioned.

  • @otonomimusic
    @otonomimusic 10 месяцев назад

    I'm very unclear on classifications still - what is being classified to what? It looks like we're just comparing numbers with other numbers? what are the classifications?

    • @FutureSmartAI
      @FutureSmartAI  10 месяцев назад

      For classification we are using embedings as feature vector and then we can train any machine learning model

  • @Subhajit_311
    @Subhajit_311 Год назад +1

    Sir, Your Transformers playlist link showing invalid.

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      Thank you. I have changed URL. Now that playlist called NLP Beginner to Advanced Playlist:

    • @Subhajit_311
      @Subhajit_311 Год назад

      @@FutureSmartAI ok sir, thanks. already started your NLP beginner to Advanced Playlist.🤗

  • @duetplay4551
    @duetplay4551 Год назад

    Quick question: what if the documents are 5000 words long, how can we apply this approach? or is there an alternative way to do it? Thanks in advance!

    • @FutureSmartAI
      @FutureSmartAI  Год назад +1

      Hi, If documents are longer. we should beak them in paragraphs. you can use Spacy to split big text into paragraphs. and then calculate embedding for those paragraphs.

    • @duetplay4551
      @duetplay4551 Год назад

      @@FutureSmartAI Thanks for clearing my brain frog. I will do the experiment and get back to you... u r the best online teacher in 2022, period😁

  • @sampriti6026
    @sampriti6026 Год назад

    Hey Pradip, I am building a discord bot that connects people based on the thoughts they send to the bot and messages on the server. Since im mew to the space wanted to get in touch with you to know more on how to get building this. Followed you on twitter, can you open your dms?
    For starter, you mentioned gpt to be more accurate than models by huggingface? So should i follow this tutorial in building the bot thaay reads the messages, analyse thhe sentiments, topics of the message and then group them together?

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      I think you can start with sentence transformers embedding and semantic similarity score.

    • @sampriti6026
      @sampriti6026 Год назад

      @@FutureSmartAI i was also told about vector database to be relevant here. Again, very new to the whole space. Is vector database related to embeddings and similarity score?

    • @FutureSmartAI
      @FutureSmartAI  Год назад +1

      @@sampriti6026 yes we store embedding generated using sentence transformers into vector db like pinecone which also support getting similar docs based on your query.
      There are some open source vector dbs also available

    • @sampriti6026
      @sampriti6026 Год назад

      @@FutureSmartAI alright thanks a lot for the reply. Any way to get in touch with you more directly ?

  • @shk5253
    @shk5253 Год назад

    Can I use nested token?

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      Can you explain more?

    • @shk5253
      @shk5253 Год назад

      @@FutureSmartAI [[love, kiss, hug, like, dinner,….],[winter, ride, hot, swim….], [….]]

  • @joao-pedro-alves
    @joao-pedro-alves Год назад

    I think this video would be much better if instead of using Python you'd showed the same example using curl. This way it would be much better to people adapt the example using any tech stack... There are a lot of things going on that only make sense for those who know Python and a lot of "magic" behind the libs...

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      Hi Thanks, Thanks for your feedback.
      The video is intentionally made for Python. I think when people search for something, they search for specific things.

  • @ethiopianphenomenon6574
    @ethiopianphenomenon6574 Год назад

    I am confused I thought gpt-3 is not open source

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      GPT-3 is not open sourced. We can access it using Open AI API

  • @TauvicRitter
    @TauvicRitter Год назад

    Hmm, the difference in score is not what I call spectacular. Where do you set the threshold? Cannot simply say if similarity is above 80% then its the same if its less than 50% than its definitly not ok.

    • @duetplay4551
      @duetplay4551 Год назад

      Co ask here. For time being, I would just directly send to my boss to let him proof read 😛

    • @FutureSmartAI
      @FutureSmartAI  Год назад +1

      The threshold can't be absolute; we should experiment and see what works, for GPT-3 embedding similarity threshold could be 0.75 whereas the sentence transformes embedding could be 0.85