What is Retrieval Augmented Generation (RAG) and JinaAI?
HTML-код
- Опубликовано: 31 дек 2023
- Retrieval Augmented Generation (RAG) is one of the big AI patterns you must know for 2024, in this tutorial i break down the RAG pattern, What the Jina AI embeddings model isand why JinaAI is a game changer for LLM's such as GPT, Llama and Mistral.
In the video, chris breaks down what the issues with LLM's such as GPT, Mixtral 7B and Llama-2 are and how the RAG pattern helps solves the problems of hallucinations, extending data.
Chris also shows you in detail on how the RAG pattern exacty works under the hood, so you can truly understand what's going on
He also talks about how JinaAI is different, how it works, how compares to openai ada embeddings model and how Jina AI will kick off the next model trend for 2024. - Наука
Another clear and informative video, thank you! I agree, I think RAG will be huge in 2024. One thing I would like to know, is it possible to have the LLM list or identify the chunk or chunks used to produce a response? Perhaps metadata or indexes can be added to the chunks which the LLM can use when generating a response.
Thank you! very simple, precise, yet very informative!
Glad it was helpful!
Great video Chris. Even I could understand your explanation!
thank you, it was a really difficult one to find the right angle for, glad it was useful.
Great video Chris!
Thaaaank you!!
Chris, great video as always. I learn so much from your channel, thanks. One thing I at least didn't quite "get" from this - where you talk about vectorization and embeddings - what actually *is* that process? The general concept I understand - turn the chunk into a numerical vector and compare them for similarity - but the vectorization itself - what is JinaAI doing at that point and how does it overcome e.g. the challenges of mismatching vocab between the question and the knowledge chunk without being externally trained itself on a bunch of stuff? Or maybe the embeddings are based on some other training from elsewhere? Was just a bit hazy on that point... maybe a thought for a future video if you're inclined :)
you're right.. that's probably a really good video to do, as it's quite complex. it's quite interesting. my latest video on tiktoken, explains vectorization for decoder models, but that's a little simplistic compared to embedding models. will do a video on embeddings
Fantastic, thanks! @@chrishayuk
Thank you for a fantastic breakdown of RAG. I can now see why my Copilot trial at work is so bad at information retrieval. I'm guessing that as the queries get more complex and the spread of the data becomes wider, the less useful this method will become less effective. Does that push us toward a rolling fine-tune approach to a base model?
i think the hybrid model where you have a circle of RAG and finetune is likely the way forward
quality explanations!
Thank you, glad it was useful
Hi Chris, I appreciate the high quality content! Could you do a video or just give a simple reply on where are you taking your expertise from? Maybe some communities, projects or anything of the sort. Personally (and I am sure that other people as well), I would like to become proficient in basically the same things that you are an expert at when it comes to engineering solutions for AI related problems :)
That is a pretty difficult one to answer, in all honesty just let myself play. I’ve been using the RAG pattern a lot at work. However, I felt the explanations of RAG were either 1 level too high, or 2 levels too deep. That’s usually when I’ll do a video like this one, just to go under the covers and show what’s really happening
Thanks, I guess the takeaway is to try and play more with such things myself.
@@juliussakalys9600 it’s tough to know where to look and start, I’ll try and put some sort of guide together
If you want facts you need to pump the determinism by lowering the temperature.
Only works if the data is in the training set, also doesn’t solve the traceability issue. Finally the models shouldn’t be making up answers for Q&A type questions this is where models will get better through routing questions to the correct expert with MoE
@@chrishayuk Well, it just makes it stick to the highest confidence answer. You were saying it kept putting out a different answer. That's usually temperature and its effect on topP and topK. And the higher the temperature, the more it hallucinates. It sounds like you're just saying it should know when to lower the temperature on its own for the type of question.