thanks! not really on load testing yet, it's quite a niche and advanced topic which mostly depends on the machine where the apps is running, even so, in the next video from that one I showed how to deploy that app on Azure and how to chose the size of cpu and ram of the instance, and how to check the current load on cpu and memory, but I didn't perform load testing.
Thank You Very much for explaining very clearly. For me App worked locally, and I am using python 3.13 . To deploy to streamlit cloud i am facing few issues with version. do we need python version lesser than 3.11 to deploy to streamlit.
thanks! 3.10 to 3.12 will work for sure, I'm not sure with 3.13 as it was released 1-2 months ago and some libraries are not yet updated to work with it, I'm still using 3.11 for much of my work and apps.
thanks for the idea! There are some different options out there for this, the easier ones are SAAS startups like Pinecone that hosts the cloud vector store for you, then you can also store it in any cloud storage option and also have a microservice or an api serving it, having the docs there. The thing is that there is not a single solution for this, it depends on your use case, how often the vector store data needs to be updated, how many users you will have, how much docs they need to query, what is your budget, etc.
do you mean my computer parts? for these demos you don't need much, the LLMs run on the cloud from the OpenAI API and the RAG with few files doesn't take much RAM or CPU, I have a pretty powerful desktop PC with Windows 11, but this can run in almost any device with Python 3.11 or similar. If you are in Mac or Linux, maybe few commands or Python libraries may be slightly different, but 99% of it would be the same
Hi! What Operating System are you using? Try manually downgrading the onnxruntime python library one or a few versions and checking if this fixes it. For example, if you currently have the version 1.19.2 (you can check your current version with "pip show onnxruntime" or "pip freeze"), then try "pip install --upgrade onnxruntime==1.19.1", and if it still don't works, try 1.18.0 for example
Thanks! Try to "pip install cryptography" in the terminal with the venv active if you have created it, and also add it to the requirements.txt file if you want to upload the web app on streamlit cloud or somewhere else. Let me know if that fixes it :)
@@enricd It does, thank you! I'm exploring it and I'm really impressed so far. I will git it a try with 250 1.5 mb pdfs file eventually. Is it possible to configure it so if the requested info is not in the PDFs he simply mention it without using info from outside?
About the error, it seems to be in the OpenAI Embeddings model, you would need either to upload them in small groups, or to change the loading function so it doesn't try to load them all at once to the vector store, but one by one. Not completely sure about it, but it's my guess.
I'm not really sure if I get the question, do you want it to answer only using info from the RAG PDFs (not it's own learnt knowledge), or do you mean to go to the internet or ask you whenever the info it needs for answering your question is not in the RAG PDFs?
It's possible to use it but it also has some other custom ways of doing it uploading docs to google cloud and also openai and anthropic/claude are preferred for many people. You can check my Google Gemini video in the channel where I showed how to upload docs into it and chat with them :)
@@AaronBlox-h2t it depends.. I dont mind uploading some cooking recipes pdf to openAI or some public papers to chat with them, if I want a little bit more of security, then I would do what I did in the next video and build this on Azure, and finally if I really want privacy with my sensitive/private/secret data I would do it locally using open LLMs but the result will be slightly worse
Hi! you can do it with Anthropic otherwise, although you will need to find some other embeddings model. You can also check my older video about building a chat with Google Gemini that also allows to upload docs. I will try to do a video on how to this locally using open source (free) LLMs as well at some point.
Hey! yes, I will try to do maybe a video doing RAG local with open source models for example whenever I have time for it. Even so, you can check a video in my channel about building a chatbot with Gemini where I already showed how to upload docs into it, which is not exactly the same as the RAG in here, but very similar :)
Thank you for the detailed explanation. do you have any content on how to perform load testing for the Streamlit chat application?
thanks! not really on load testing yet, it's quite a niche and advanced topic which mostly depends on the machine where the apps is running, even so, in the next video from that one I showed how to deploy that app on Azure and how to chose the size of cpu and ram of the instance, and how to check the current load on cpu and memory, but I didn't perform load testing.
Amazing Enric as always
Great work
Thank You Very much for explaining very clearly. For me App worked locally, and I am using python 3.13 . To deploy to streamlit cloud i am facing few issues with version. do we need python version lesser than 3.11 to deploy to streamlit.
thanks! 3.10 to 3.12 will work for sure, I'm not sure with 3.13 as it was released 1-2 months ago and some libraries are not yet updated to work with it, I'm still using 3.11 for much of my work and apps.
@@enricd Thanks for the reply. Looking forward for more videos from you.
We are waiting continues update in videos and blog❤.
thanks @sitheekmohamedarsath ! it's in the oven :P
Great explanation ❤
Awsome
Could you please make a video on how to upload a vector store database to the cloud for reuse in the future?
thanks for the idea! There are some different options out there for this, the easier ones are SAAS startups like Pinecone that hosts the cloud vector store for you, then you can also store it in any cloud storage option and also have a microservice or an api serving it, having the docs there. The thing is that there is not a single solution for this, it depends on your use case, how often the vector store data needs to be updated, how many users you will have, how much docs they need to query, what is your budget, etc.
@@enricd so, you should make video for the rest of this world, Sir! :)
What is your system specifications
do you mean my computer parts? for these demos you don't need much, the LLMs run on the cloud from the OpenAI API and the RAG with few files doesn't take much RAM or CPU, I have a pretty powerful desktop PC with Windows 11, but this can run in almost any device with Python 3.11 or similar. If you are in Mac or Linux, maybe few commands or Python libraries may be slightly different, but 99% of it would be the same
getting error creating vector_db says "The onnxruntime python package is not installed." But its already installed. I'm using python3.11
Hi! What Operating System are you using? Try manually downgrading the onnxruntime python library one or a few versions and checking if this fixes it. For example, if you currently have the version 1.19.2 (you can check your current version with "pip show onnxruntime" or "pip freeze"), then try "pip install --upgrade onnxruntime==1.19.1", and if it still don't works, try 1.18.0 for example
Looks great. When loading a PDF I got "Error loading document sample.pdf: cryptography>=3.1 is required for AES algorithm".
Thanks! Try to "pip install cryptography" in the terminal with the venv active if you have created it, and also add it to the requirements.txt file if you want to upload the web app on streamlit cloud or somewhere else. Let me know if that fixes it :)
@@enricd It does, thank you! I'm exploring it and I'm really impressed so far. I will git it a try with 250 1.5 mb pdfs file eventually. Is it possible to configure it so if the requested info is not in the PDFs he simply mention it without using info from outside?
Oups, I get "ValueError: Batch size 664 exceeds maximum batch size 166" when trying to upload 8 PDFS of 1.5mb. Do you know what the limitation are?
About the error, it seems to be in the OpenAI Embeddings model, you would need either to upload them in small groups, or to change the loading function so it doesn't try to load them all at once to the vector store, but one by one. Not completely sure about it, but it's my guess.
I'm not really sure if I get the question, do you want it to answer only using info from the RAG PDFs (not it's own learnt knowledge), or do you mean to go to the internet or ask you whenever the info it needs for answering your question is not in the RAG PDFs?
Is there a reason why we're not using Google's Gemini?
It's possible to use it but it also has some other custom ways of doing it uploading docs to google cloud and also openai and anthropic/claude are preferred for many people. You can check my Google Gemini video in the channel where I showed how to upload docs into it and chat with them :)
You don't want to upload your stuff to the cloud man...not if you value your privacy.....I'm going to use this video but keep it local.
@@AaronBlox-h2t it depends.. I dont mind uploading some cooking recipes pdf to openAI or some public papers to chat with them, if I want a little bit more of security, then I would do what I did in the next video and build this on Azure, and finally if I really want privacy with my sensitive/private/secret data I would do it locally using open LLMs but the result will be slightly worse
Is there any way to do it without giving my credit card to open AI? In the first try I got the "You exceeded your current quota" message
Hi! you can do it with Anthropic otherwise, although you will need to find some other embeddings model. You can also check my older video about building a chat with Google Gemini that also allows to upload docs. I will try to do a video on how to this locally using open source (free) LLMs as well at some point.
And do you know why if we use Ollma, why do we need to use OpenAI? I thought it was enough just with Ollama.
Can you make with Google gemini models or other models this are paid models.
Hey! yes, I will try to do maybe a video doing RAG local with open source models for example whenever I have time for it. Even so, you can check a video in my channel about building a chatbot with Gemini where I already showed how to upload docs into it, which is not exactly the same as the RAG in here, but very similar :)