Don’t Embed Wrong!

Поделиться
HTML-код
  • Опубликовано: 25 дек 2024
  • As a founding member of the Ollama team, I discovered I've been doing embeddings wrong all along - and you probably are too. In this eye-opening video, I reveal how a simple technique called "prefixing" can dramatically improve your RAG application's accuracy by up to 2x.
    Learn about:
    • What prefixes are and how they work
    • The 3 embedding models that support prefixes
    • Detailed performance comparisons across different models
    • Real-world testing results and implications
    • Why traditional LLMs shouldn't be used for embeddings
    I've conducted extensive testing comparing 5 different embedding models, with and without prefixes, across multiple scenarios. The results will surprise you - they certainly surprised me!
    🔗 Full source code and test results available in the GitHub repo:
    github.com/tec...
    Whether you're building RAG applications or working with vector databases, this video will transform how you approach embeddings. Don't miss this crucial technique that even experienced AI developers often overlook!
    #AI #MachineLearning #Ollama #RAG #Embeddings #Programming #TechTutorial
    The shirt: VATPAVE Mens Casual Hawaiian... geni.us/mhawaii1
    $27 on Amazon
    My Links 🔗
    👉🏻 Subscribe (free): / technovangelist
    👉🏻 Join and Support: / @technovangelist
    👉🏻 Newsletter: technovangelis...
    👉🏻 Twitter: / technovangelist
    👉🏻 Discord: / discord
    👉🏻 Patreon: / technovangelist
    👉🏻 Instagram: / technovangelist
    👉🏻 Threads: www.threads.ne...
    👉🏻 LinkedIn: / technovangelist
    👉🏻 All Source Code: github.com/tec...
    Want to sponsor this channel? Let me know what your plans are here: www.technovang...

Комментарии • 90

  • @madytyoo
    @madytyoo Месяц назад +25

    This is the best channel about Ollama!

  • @AlekseyRubtsov
    @AlekseyRubtsov Месяц назад +1

    Thanks!

  • @solyarisoftware
    @solyarisoftware 18 дней назад

    Hi Matt, new embedding model "Snowflake Arctic Embed 2" deserves one of your deepenings :)

  • @aurielklasovsky1435
    @aurielklasovsky1435 Месяц назад

    Wow. I am very surprised that this actually works. It is so bizarre that this technology (LLMs) actually performs better when you tell it what it should be doing. Thank you for the tip!
    And thank you for telling me not to use llama for embeddings. I absolutely thought that it works better because it is bigger without ever testing anything else. Cheers!

  • @rundeks
    @rundeks Месяц назад +2

    I never heard of this before. Thank you so much for sharing it!

  • @conneyk
    @conneyk Месяц назад +3

    Thanks for the video!
    I‘m working on my own RAGs for some time now. Maybe prefixing would help.
    What I learned so far is, that RAG is very individual for each use case. Like if you are dealing with code Docs or with large Texts or with multi Line PDFs. Also if your docs aren’t in english embedding models like nomic or other open source are really weak. You first have to translate the docs before embedding them. Than we even haven’t talked about reranking queries, corrective rag to enhance your query with web search results or other docs, hybrid query search based on metadata and docs content, and so on. Also the vector store you using is playing a difference.
    All this is making it very complex to implement all the combinations and benchmark and test them.
    I really would love to find some RAG KISS Principles and best practices

    • @aurielklasovsky1435
      @aurielklasovsky1435 Месяц назад +1

      I am running into the same problem working with none English documents. I think there needs to be a ton of investment done for each language, there is probably no real way around it

  • @ahasani2008
    @ahasani2008 Месяц назад

    can't help to notice your Batik shirt, nice one. And the content is excellent as always Matt, thanks

  • @Leo_ai75
    @Leo_ai75 Месяц назад +1

    I've just added this to my project folder in an .md file, for an embedding project I was working on, for my Coding assistant to use as context:
    Embedding Best Practices from Ollama Founding Team:
    When doing embeddings with Ollama models, you were doing it wrong until now. Adding prefixes to content can make results twice as accurate.
    Three of the five embedding models in official Ollama Library support prefixes:
    1. Nomic embed-text:
    - Documents: "search_document:"
    - Queries: "search_query:"
    - Classification: "classification:"
    - Clustering: "clustering:"
    2. Snowflake & Arctic:
    - Queries: "represent the sentence for searching relevant passages:"
    - Documents: No prefix needed
    3. Mixed Bread:
    - Same as Snowflake/Arctic format
    Implementation:
    - For vector stores: Add prefix before chunk of text
    - For similarity search: Add prefix before query
    - For hosted Nomic API: Use API option
    - For Ollama: Simply prepend prefix text
    Testing shows prefixes deliver:
    - More complete answers
    - Better document matching
    - 2x accuracy improvement in many cases
    This comes directly from Matt Williams, founding Ollama team member.
    I hope it helps!

    • @technovangelist
      @technovangelist  Месяц назад

      I can’t see why you would want to do that

    • @Leo_ai75
      @Leo_ai75 Месяц назад

      @@technovangelist it’s for people who use coding assistants Matt, like copilot, Cursor, Cody etc.

    • @technovangelist
      @technovangelist  Месяц назад

      But that’s not something that goes into a prompt. It doesn’t make any sense

    • @Leo_ai75
      @Leo_ai75 Месяц назад

      @@technovangelist What I meant was I used it as an .md file in my project folder for the AI assistant to use as context. My apologies, I realise I said system prompt previously, that was in error.

    • @technovangelist
      @technovangelist  Месяц назад

      I still don’t get that. That’s how you need to interact with an embedding model. It’s not something a model would benefit knowing.

  • @raymond_luxury_yacht
    @raymond_luxury_yacht Месяц назад

    When I created embeddings before I sent to the embedder I got llm to analyse the text and add 10 questions about it added it to the chunk and sent it. Search accuracy was very good

  • @proterotype
    @proterotype Месяц назад

    So awesome man. I really appreciate this kind of information

  • @alexlee1711
    @alexlee1711 26 дней назад

    Ollama's running will become slower. The content generation speed does not decrease, and the interval between questions is increasing.
    It seems that the "question & answer" interval will become longer based on the model with an additional knowledge base. When the knowledge base is not attached, there is no such problem in actual tests.

    • @technovangelist
      @technovangelist  26 дней назад

      You already said this on another video. Not sure what your particular situation is. Ollama won’t slow down over time except when there is an app built around it that doesn’t take advantage of it properly. Larger models will take longer than smaller ones but there has been no slow down and some speed ups.

  • @deucebigs9860
    @deucebigs9860 Месяц назад

    Liking and subscribed to tell you you're definitely on the right path of what I want to learn!

  • @muchainganga9563
    @muchainganga9563 Месяц назад +1

    Love this!

  • @blackswann9555
    @blackswann9555 Месяц назад

    Thank you for sharing ❤

  • @Ad434443
    @Ad434443 Месяц назад

    Interesting! I didn't know how big difference the embedding models meant. I have used nomic for RAG. And maybe i have been a bit sloppy, as prefixes I just cursively read about before and didn't implement them in the RAGs. If you have time, Matt, then a video on how to use n8n to do the prefixing ingestion of external docs. I used tagging for filtering and so on, but prefixes does seem very powerful.

  • @karlfranz2pl
    @karlfranz2pl Месяц назад +2

    I havent used any embed models but a while ago I tried to give PDF to llama 3.1 7b and results were between nothing to horrible. Then I tied the same document with llama 3.1 70b and results were actually pretty good. I could not test it really in depth because my PC runs 70b model at almost negative speed :) (please keep in mind I actually don't know what I am doing with thees LLMs :) )

  • @sebastianpodesta
    @sebastianpodesta Месяц назад +2

    Thanks a lot!!! Great stuff! Quick question, what would you recomend for multilingual data, what happens if the rag data and the user prompts are in Spanish, should I do all system prompts and instructions in Spanish? Or tell it to translate the just answer?

    • @sebastianpodesta
      @sebastianpodesta Месяц назад

      I’m trying to do RAG on n8n using ollama with llama 3.1 chat model and nomic embedding model with mixed results, I get answers some times in English, others in Spanish and some times the model tells me that it didn’t understand the question

  • @basterman13
    @basterman13 Месяц назад +2

    Thank you for the video, I learned a lot! Could you please advise on best RAG implementation and document splitter for Python? I’ve tried several methods, but I often get mixed results, around 50/50 accuracy. The main issue is with chunking: sometimes, chunks split in a way that separates the beginning of a class or method from its continuation. Is there a way to ensure that chunks belonging to the same file can be grouped or kept together more effectively?
    Thank you in advance.

    • @technovangelist
      @technovangelist  Месяц назад +1

      That’s what the metadata in most vector databases is for. Describe the source. Then use that in your code to keep similar things together.

    • @basterman13
      @basterman13 Месяц назад

      @@technovangelist Thank you. Yesterday, after I left my comment, I came to the same conclusion. I just need to get distracted sometimes. The answer was always on the surface. I thought maybe there were some more specific approaches, but in this case, the simplest way is the best.
      Have you heard anything about LightRAG(HKUDS developer)? I'd be interested to hear your thoughts on it.

  • @wnicora
    @wnicora Месяц назад

    This video opens new perspectives on Rag, tx
    Could you share links to articles explaining the design and use of prefixes?

  • @serikazero128
    @serikazero128 Месяц назад

    @10:49 could've been the perfect time for "Stop, Get some help!" meme :)

  • @marcelgeers3263
    @marcelgeers3263 Месяц назад +1

    How would you prefix code snippets or unit functions?

    • @themax2go
      @themax2go 28 дней назад

      that is a very good q, i'd like to know. did you find an answer yet?

    • @technovangelist
      @technovangelist  28 дней назад

      Prefixes are set by the model not the content

  • @NLPprompter
    @NLPprompter Месяц назад

    hi Matt, do you happen to know about contextual RAG by anthropic? does this some how similar with it?

  • @fabriai
    @fabriai Месяц назад +1

    Excellent stuff, Matt. Thanks for this! Why do you prefer typescript for coding the test over python? Do you run it in node? Have you tried dejó for these tasks?

    • @technovangelist
      @technovangelist  Месяц назад +1

      It doesn’t have all the installation baggage that comes with Python. Python is so brittle and easy to screw up your setup. I use deno to run it usually. I don’t know what dejo is.

    • @fabriai
      @fabriai Месяц назад

      @@technovangelist Thanks for the answer. It makes sense. Sorry for "dejó", I intended to write "deno" but the Spanish autocorrect in the phone changed it and I didn't notice until your reply.
      I'm more a nodist by trade than a pythonist, so Deno comes more naturally to me. Is good to know an expert like you uses Typescript on Deno. Will follow you lread.
      Thank you so much!

  • @paulomtts
    @paulomtts Месяц назад

    Right on time, I'm just implementing a RAG pipeline!

  • @agsvk-com
    @agsvk-com Месяц назад

    Thank you for sharing. I'm just wondering how would we be able to select one of the prefixed nomic or prefixed snowflake arctic using one of the vector databases. Is this possible or do we need to do this via typescript or python? All the videos I see doesn't seem to have embeddings using any prefixed models? I'm still learning. It would be really great to have more step by step tutorials on this. 😊 God bless

  • @PanayotPanayotov-x6p
    @PanayotPanayotov-x6p Месяц назад

    Can you provide articles or links to the documentation?

    • @technovangelist
      @technovangelist  Месяц назад

      The docs are in the github repo. You can get to it from ollama.com

  • @YuryGurevich
    @YuryGurevich Месяц назад

    Thanks, Matt!

  • @davidtapang6917
    @davidtapang6917 Месяц назад

    Hey bro! Subscribed!

    • @technovangelist
      @technovangelist  Месяц назад +1

      It’s been 20 years. Wonderful to have you

  • @kasomoru6
    @kasomoru6 Месяц назад +1

    Don't know why, but I feel like I just watched a really convincing A.I.. Great info though.👍

  • @hitmusicworldwide
    @hitmusicworldwide Месяц назад

    I see the Thanka on your wall on the viewer left hand side.

    • @technovangelist
      @technovangelist  Месяц назад +1

      Good eye. From one of my two visits to Nepal. My sister used to run a health care clinic in a town called Jiri for about 20 years.

  • @WaldoRochow
    @WaldoRochow 28 дней назад

    I really struggle with this. Why would this make a difference? I'm not questioning that it does, but why?

    • @technovangelist
      @technovangelist  28 дней назад

      It’s how the model was trained to respond.

  • @tomwawer5714
    @tomwawer5714 Месяц назад

    Prefix yay

  • @ToddWBucy-lf8yz
    @ToddWBucy-lf8yz Месяц назад

    Great I'm refactoring for prefixes now, I'm sure now I need to update training data as well for prefixes Any pre trained models already capable.of using prefixes?

    • @technovangelist
      @technovangelist  Месяц назад +1

      Perhaps you should watch the video. It shows 3 models that use the prefixes.

    • @ToddWBucy-lf8yz
      @ToddWBucy-lf8yz Месяц назад

      ​​@@technovangelistnomic isn't useful when your trying to integrate cypher queries and vector store queries in the same model. I'm try to avoid multiple models for my particular RAG setup.

    • @technovangelist
      @technovangelist  Месяц назад

      Avoiding multiple models is asking for lower quality results

    • @ToddWBucy-lf8yz
      @ToddWBucy-lf8yz Месяц назад

      @@technovangelist yeah you are probably right...at least nomic is small and fast. Someone really needs to create a MoE just for RAG and databases.

    • @technovangelist
      @technovangelist  Месяц назад +1

      embedding models arent something you ask questions to. its just for the embedding to stick into the vector db and find similar results. you still have to use a regular model to get insights into your data.

  • @k1chko
    @k1chko Месяц назад

    Seems similar to contextual embedding.

    • @technovangelist
      @technovangelist  Месяц назад

      Different topics. This was about how to get the embedding model to function correctly.

  • @remmask
    @remmask Месяц назад

    Hi Matt. Thank you for these videos. Can we get the source in python?

  • @smhanov
    @smhanov Месяц назад

    I have 200000 images of things described by llava. But if the user is searching for a single word, like "pants" then the search is too broad. It comes up with people wearing pants, shoes, etc. I'm hoping this prefix method helps a little.

  • @ISK_VAGR
    @ISK_VAGR Месяц назад

    Ok, Matt. All what u just said i knew. However, the question of the million dollars is why bigger models perform bad in embedding?

    • @technovangelist
      @technovangelist  Месяц назад +1

      They aren’t embedding models. Embedding models do embeddings. Regular LLMs don’t do it.

    • @jparkerweb
      @jparkerweb Месяц назад

      @@technovangelist in other words, just because something "can" do it doesn't mean it "should" 🤣

    • @technovangelist
      @technovangelist  Месяц назад +2

      But I don’t think that language is strong enough. An embedding model might take 30 seconds when an llm can take 45 minutes and is 10% as effective. It’s bad enough when folks insist on using a 70b model for an answer that is maybe 10% better than an 8b model and wait 3 minutes instead of 30 seconds. That’s not worth it in most cases but there is a debatable benefit. Embedding with an llm make zero sense.

    • @jparkerweb
      @jparkerweb Месяц назад

      @@technovangelist oh, I 100% agree! Choose the right tool for the right job

    • @ISK_VAGR
      @ISK_VAGR Месяц назад

      @@technovangelist Matt u may have misunderstood my question. I was interested in why mathematically, a good LLM is not a good embedder. When I started to use RAG I believed that perhaps embedding models were LLMs delivering the output of hidden layers as embeddings. I still wonder why if LLMs can find patterns are not good in providing embeddings for RAG. Cheers..

  • @wgabrys88
    @wgabrys88 17 дней назад

    This one video is confusing, the first one that I have to look for an explanation elsewhere. (You compare the results but not showing examples of how to write these prefixes (the course itself is great, until this moment) FYI

    • @technovangelist
      @technovangelist  17 дней назад +1

      Huh? I did show the prefixes. I even showed a code sample. What else could have been added.

    • @wgabrys88
      @wgabrys88 17 дней назад

      @technovangelist ah, you refer to the code sample from GitHub (wall of text) - as a viewer of a RUclips video I have noticed only the two prefixes during your speech and here's the problem (with me also, impatience... but still..) - it's a RUclips video tutorial, by default it's watched briefly? During doing something else? (at least that is how I look into these tutorials, watching on a phone, it's really hard to stop and go to the links you provide on GitHub - it require a scheduled free time, it's not for "just right now" tutorial)
      I mean, every previous video from this playlist was complete, I could pause, revert 10sec, 20sec and still get the point but here, without launching PC and getting into the code it's confusing, just two prefixes, no code examples in the form of transparent code over video or anything like that)
      I hope you got me 😀
      The point is - to understand the video, I have to switch from RUclips to GitHub, which is not what I expect from a RUclips Video Tutorial - it's incomplete, that's my point 👊

    • @technovangelist
      @technovangelist  17 дней назад

      I showed the code in the video. No need to go elsewhere. I also showed the prefixes. I’m not sure what else I could have added.

  • @FrankenLab
    @FrankenLab Месяц назад

    The wave of the future doesn't include MORE work to get models to digest our content, it involves models that perform better on their own without coaxing them to give us a marginal improvement in the results. Also, only having 2 models with prefixing doesn't give many options. Great content though, appreciate the effort it takes to research, edit, and produce videos!

    • @technovangelist
      @technovangelist  Месяц назад +2

      Eventually maybe, but not for a long while. It’s still early days for this tech. There are more than 2. 3 were in this video and there are others that can be imported. And 2x in some cases is hardly marginal

    • @rv7591
      @rv7591 Месяц назад

      Well yeah but the future is discovered through experiments.

    • @technovangelist
      @technovangelist  Месяц назад

      Yeah but wishing for things doesn’t make them happen

  • @ShaunyTravels.
    @ShaunyTravels. Месяц назад

    Wish there was more videos about running ollama on a mobile app I made a chat app using ollama running on a server on my phone with flutter dart but we need more videos to do that 😂

  • @grahaml6072
    @grahaml6072 Месяц назад +2

    I had to stop watching unfortunately with that fuzzy text flashing across the screen. Maybe I will just try and read the transcript

    • @technovangelist
      @technovangelist  Месяц назад

      I don’t have any fuzzy text on this one. If it’s fuzzy don’t watch at all low rez

    • @grahaml6072
      @grahaml6072 Месяц назад

      @@technovangelistI am not watching at low resolution. I watched on a 65” OLED. An iPad 12.9” a Samsung 49” widescreen and a 4K UST projector on 120” screen just to check it wasn’t me. It starts at 6:10 when you scroll through your outputs.

    • @technovangelist
      @technovangelist  Месяц назад +1

      oh, you were making a joke...got it....you aren't supposed to read that, which is why i said I am speeding forward.

    • @NLPprompter
      @NLPprompter Месяц назад

      ironically we all do AI with fuzzy input output too... better get used to fuzzy mate 😁