Very logical! Most documents contain the answers, and not the questions, so you shouldn't use your question as the query for the documents, but a hypothetical answer instead.
It's sooo validating when you see a technique you use in a research paper and presented by one of your fave YTers. In my case, I take the user query and concatenate it with some static text to give context in case it's missing in the query. (example, prepend "CONTEXT: McDonald's" to "What's their most popular items?"). Then ask gpt-3.5 turbo to generate an answer. That response gets embedded and used for similarity search. Great video as always! Thanks, Sam!
This comment made my day. Thanks!! One of my biggest goals here is to try and get a conversation going between people who are actually doing things. I am really interested in any ideas you have to making that more of a 2-way thing. I am not a huge fan of Discord, so looking into other ideas for ways to build more of a community. Would love to hear ideas if you have any.
@@samwitteveenai I've just been to a conference on machine learning in Austin Texas and was shocked to see how much need there was from people for just normal rag systems and how to use them. We have some really cool lesser known rag systems and engineers with a large focus on making things inaccessible to people more easily usable and free. If you guys wanted to work together on this front @thenonlerot and @samwitteveenai id love to build a purpose built local model (maybe an ensemble with the retrieval model now that jinaai has done us such a huge favor) using our combined researchers/resources/known tricks And then megaphone it out as loud as possible to put it in as many hands as possible for free I would be very excited to pursue I can provide a lot on this front it's just that my personal free time to direct the project and keep the perspective of "people who don't know too much about ai" as the design principal is fairly limited. For some context my organization is founded by the developers of Open-Orca, openchat, axolotl, hermes, and others, so I'm hoping the noise we make can really go a long way to getting people caught up on rag, as in my opinion no matter what it's necessary infrastructure
I think this is where Tiny Llama could really shine. If you had a specific use case you could fine tune it to just do this augmentation step based on your library of knowledge and then speed up long term document retrieval queries.
@@i_accept_all_cookies depends on the machine running it. But it's 1.1b and with 4bit quantization it only takes up 560ish MB of ram. So, pretty fast. If you add attention sinks and limit the window to like 256, very fast
Yes, that's good if you ask some global questions, which are "known" to language model your are asking to, but if that's some corporate data question, it will only break things, because it will probably say something that i don't have access to the data you are asking etc. I guess there should be some additional filter, which will decide whether to direct that to vector db search or prefer just initial raw user question instead.
The internet search augments the keywords and triggers a more accurate association with the query _if_ the internet search gets relevant results. But it won't be able to answer "What are their best items?", requiring more context and use of previous conversation/summary. Providing more context in both the HyDE and retrieval would be necessary.
Thanks! This is a great video to take the RAG to the next level. Per my experience so far, most of the times the LLM (GPT4 in my case) is answering the questions well from my data if the quality of chunks are good! The challenges I face so far in chunking: 1) My PDFs contain a lot of contents with complex tabular structures (with merged rows and columns) for product specifications. Chunking break the relationship between rows and columns. 2) same kind of contents that replicate across different PDFs for different products. Unfortunately the pdfs are not named by products. Therefore the vector search is returning contents from an a wrong product not asked in the user query. 3) Sometimes with in the same PDF (Containing multiple products), the contents repeat with different specification per product. If I ask for input voltage for a product A, it might return product B since the context is lost while chunking. Looking for smarter ways to chunk to retain the contexts across the chunks.
I’m curious if a fine tuning pass using the target documents solves for the problem of the model being unaware here. Or would the combination of both fine tuning just to improve the generation be self defeating in that the answe is in the weights regardless?
Thanks a lot Sam. Sadly I cannot find any code for this. Anyways, got a problem which I thought maybe you could provide me with a few suggestions on solving it. I have a list of huge financial documents.; about 200 pages long. And I have built a RAG based bot using LlamaIndex to answer questions from the documents. But it's not able to answer some questions; for example, who are the board members? What is the price of the PPA and its length? etc. Any suggestions on improving it? I was thinkking about hyDE or trying out different embedding models. OR, Use GPT-4 to generate a list of possible questions for a given chunk. Then save the chunk and along with it, the questions. So, instead of doing semantic search with chunks, you do semantic search with question and when you find the most similar question, consider the respective chunk where the question was generated from. But your suggestion would be appreciated.
Langchain is quite busy in extending/finetuning RAG. If you use openai(not open) it will cost you more money which openai love in pampering their shareholders. What is the advantage asking to a llm instead using a websearch first? Thanks for the video.
Great video! I've watched all videos of your Advanced Rag series. One question: Is every advanced method an alternative or canyou build one rag application with all advanced methods in it (e.g. ensemble retriever combined with ParentDocumentRetriever)?
Which answer i should consider as a final answer and which answer to consider as a hypothetical answer for document QA? In the code which you have mentioned i am not getting which one is final answer and which is hypothetical answer?
It would have been good to compare the result from the query alone with the resuls from HyDE. Otherwise.. how do you know you're getting a better answer? Or did I miss something?
Things get screwed up when retriver pulls correct embeddings, and present it to llm, then llm while answering mixup the text to answer. This mixed up text is from two different answers. If we take these mixedup text and pass on to another llm to extract the correct answer the final answer is always wrong.
Shoulnd't there be a better method to expand the query rather than trying to answer it? What if instead the GPT model is tasked to re-write the query in a more specific context?
This direction sounds like a way to replace vector embeddings with LLMs and keyword search. Generating the hypothetical answer takes advantage of LLM's reasoning capability - something embeddings lack. On the other hand having all the keywords in the generated answer makes a case for using keyword search as there's no vocabulary mismatch problem.
But the advantage of embeddings is, that you don't have to manage synonyms or acronyms, it just learned a representation of the word and it's context. Also, it helps to distinguish terms, which might otherwise have different meanings, where the embedding on the other hand is indeed different.
Embeddings retrieve semantically, which is often better (or at least as good as) keyword search. Your vector retrievals can be used by an LLM to "take advantage of LLM's reasoning capability" as you said. The solution in this video creates a larger query embedding, which theoretically should retrieve better semantic results (but it's use case dependent, ymmv).
Very logical! Most documents contain the answers, and not the questions, so you shouldn't use your question as the query for the documents, but a hypothetical answer instead.
It's sooo validating when you see a technique you use in a research paper and presented by one of your fave YTers.
In my case, I take the user query and concatenate it with some static text to give context in case it's missing in the query. (example, prepend "CONTEXT: McDonald's" to "What's their most popular items?"). Then ask gpt-3.5 turbo to generate an answer. That response gets embedded and used for similarity search.
Great video as always! Thanks, Sam!
This comment made my day. Thanks!! One of my biggest goals here is to try and get a conversation going between people who are actually doing things. I am really interested in any ideas you have to making that more of a 2-way thing. I am not a huge fan of Discord, so looking into other ideas for ways to build more of a community. Would love to hear ideas if you have any.
@@samwitteveenai I've just been to a conference on machine learning in Austin Texas and was shocked to see how much need there was from people for just normal rag systems and how to use them. We have some really cool lesser known rag systems and engineers with a large focus on making things inaccessible to people more easily usable and free. If you guys wanted to work together on this front @thenonlerot and @samwitteveenai id love to build a purpose built local model (maybe an ensemble with the retrieval model now that jinaai has done us such a huge favor) using our combined researchers/resources/known tricks And then megaphone it out as loud as possible to put it in as many hands as possible for free I would be very excited to pursue I can provide a lot on this front it's just that my personal free time to direct the project and keep the perspective of "people who don't know too much about ai" as the design principal is fairly limited.
For some context my organization is founded by the developers of Open-Orca, openchat, axolotl, hermes, and others, so I'm hoping the noise we make can really go a long way to getting people caught up on rag, as in my opinion no matter what it's necessary infrastructure
@@samwitteveenai why not telegram?
I think this is where Tiny Llama could really shine. If you had a specific use case you could fine tune it to just do this augmentation step based on your library of knowledge and then speed up long term document retrieval queries.
What's the latency like with Tiny Llama?
@@i_accept_all_cookies depends on the machine running it. But it's 1.1b and with 4bit quantization it only takes up 560ish MB of ram. So, pretty fast. If you add attention sinks and limit the window to like 256, very fast
@@ChaseFreedomMusician Interesting, thanks, I'll check it out. Looking for small, fast models to perform very specific, narrow tasks.
@@i_accept_all_cookies if they are well understood tasks tinyllama + fine-tuning
Always learn new things from you, Sam. Thank you!
Yes, that's good if you ask some global questions, which are "known" to language model your are asking to, but if that's some corporate data question, it will only break things, because it will probably say something that i don't have access to the data you are asking etc. I guess there should be some additional filter, which will decide whether to direct that to vector db search or prefer just initial raw user question instead.
Best explanation of HyDE I've seen. thanks!
The internet search augments the keywords and triggers a more accurate association with the query _if_ the internet search gets relevant results. But it won't be able to answer "What are their best items?", requiring more context and use of previous conversation/summary. Providing more context in both the HyDE and retrieval would be necessary.
"Only once the question ripens... shall the apple fall from the tree."
Thanks! This is a great video to take the RAG to the next level.
Per my experience so far, most of the times the LLM (GPT4 in my case) is answering the questions well from my data if the quality of chunks are good!
The challenges I face so far in chunking:
1) My PDFs contain a lot of contents with complex tabular structures (with merged rows and columns) for product specifications. Chunking break the relationship between rows and columns.
2) same kind of contents that replicate across different PDFs for different products. Unfortunately the pdfs are not named by products. Therefore the vector search is returning contents from an a wrong product not asked in the user query.
3) Sometimes with in the same PDF (Containing multiple products), the contents repeat with different specification per product. If I ask for input voltage for a product A, it might return product B since the context is lost while chunking.
Looking for smarter ways to chunk to retain the contexts across the chunks.
hi @sivi3883 have you got any good techniques to breakdown the pdfs into good quality of chunks?
Great - very useful - I am still testing the best RAG systems for improve accuracy - in business the quality must be very high - it is not simple.
The difficult part is finding a good way of composing your existing SQL infrastructure into a retrieval system
I think that comes down to seqlen ultimately no matter how you cut it
I’m curious if a fine tuning pass using the target documents solves for the problem of the model being unaware here. Or would the combination of both fine tuning just to improve the generation be self defeating in that the answe is in the weights regardless?
Thanks a lot Sam. Sadly I cannot find any code for this. Anyways, got a problem which I thought maybe you could provide me with a few suggestions on solving it. I have a list of huge financial documents.; about 200 pages long. And I have built a RAG based bot using LlamaIndex to answer questions from the documents. But it's not able to answer some questions; for example, who are the board members? What is the price of the PPA and its length? etc. Any suggestions on improving it?
I was thinkking about hyDE or trying out different embedding models. OR, Use GPT-4 to generate a list of possible questions for a given chunk. Then save the chunk and along with it, the questions. So, instead of doing semantic search with chunks, you do semantic search with question and when you find the most similar question, consider the respective chunk where the question was generated from.
But your suggestion would be appreciated.
Langchain is quite busy in extending/finetuning RAG. If you use openai(not open) it will cost you more money which openai love in pampering their shareholders. What is the advantage asking to a llm instead using a websearch first? Thanks for the video.
Great video! I've watched all videos of your Advanced Rag series. One question: Is every advanced method an alternative or canyou build one rag application with all advanced methods in it (e.g. ensemble retriever combined with ParentDocumentRetriever)?
Good question you wil often mix them but not use all
Which answer i should consider as a final answer and which answer to consider as a hypothetical answer for document QA? In the code which you have mentioned i am not getting which one is final answer and which is hypothetical answer?
It would have been good to compare the result from the query alone with the resuls from HyDE. Otherwise.. how do you know you're getting a better answer? Or did I miss something?
This Logic can also be use to retrieve images
Things get screwed up when retriver pulls correct embeddings, and present it to llm, then llm while answering mixup the text to answer. This mixed up text is from two different answers. If we take these mixedup text and pass on to another llm to extract the correct answer the final answer is always wrong.
Shoulnd't there be a better method to expand the query rather than trying to answer it? What if instead the GPT model is tasked to re-write the query in a more specific context?
So the idea here is not to expand the query it is to write a a fake answer that you can then do similarity matching on.
Is it possible to retrieve an entire document using RAG?
I use already this approach, it is really promise
Thanks man
Thank you
thank you soooooooooo much!!!!!
This direction sounds like a way to replace vector embeddings with LLMs and keyword search. Generating the hypothetical answer takes advantage of LLM's reasoning capability - something embeddings lack. On the other hand having all the keywords in the generated answer makes a case for using keyword search as there's no vocabulary mismatch problem.
But the advantage of embeddings is, that you don't have to manage synonyms or acronyms, it just learned a representation of the word and it's context. Also, it helps to distinguish terms, which might otherwise have different meanings, where the embedding on the other hand is indeed different.
Embeddings retrieve semantically, which is often better (or at least as good as) keyword search. Your vector retrievals can be used by an LLM to "take advantage of LLM's reasoning capability" as you said. The solution in this video creates a larger query embedding, which theoretically should retrieve better semantic results (but it's use case dependent, ymmv).