Llama3 Full Rag - API with Ollama, LangChain and ChromaDB with Flask API and PDF upload

Поделиться
HTML-код
  • Опубликовано: 24 янв 2025

Комментарии • 137

  • @fastandsimpledevelopment
    @fastandsimpledevelopment  9 месяцев назад +20

    Finally a complete Fast and Simple end to end API using Ollama, Llama3, LangChain, ChromaDB, Flask and PDF processing for a complete RAG system. If you like this one check out my video on setting up an AWS Server with GPU Support - ruclips.net/video/dJX9x7bETe8/видео.html

    • @Phoenix-Revived
      @Phoenix-Revived 6 месяцев назад

      Amazing tutorial! I know you work with Java, but I would absolutely love an implementation in Golang.

  • @lesptitsoiseaux
    @lesptitsoiseaux 7 месяцев назад +8

    We're about the same age. This is by far the best tutorial on the subject I've seen in a while. Thank you very much for your conscientiousness and dedication to quality! Cheers from Vancouver :)

  • @prokons
    @prokons 3 месяца назад +1

    Definitely the best tutorial I've found on RUclips. I especially appreciated that you included the problems you found while implementing the code like packages not yet installed, because when someone look at tutorial usually he sees that everything is always working fine but it's not what really happens when doing it for the first time. Great job.

  • @Dani-cg5yc
    @Dani-cg5yc 9 месяцев назад +7

    Literally just what I was looking for. Thx so much for the video. It's reaaaaally difficult to find good information about doing projects with llama models, every use case is for openai. Again, thx a lot!!!!

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  9 месяцев назад

      Glad you enjoyed it. I had the same experience and so many people asked me to do this video. I would normally only do Java but the tooling is not really ready so wanted to get this out. Remember that you need a GPU based system to really run fast. Linux or Windows, Apple appears go have given up on that Ollama needs to no option that I have found. I have a server running with a couple of NVIDIA cards and 128Gb ram, super fast and makes it production ready for my needs.

  • @ruggerovecchio
    @ruggerovecchio 7 месяцев назад +1

    Congratulations, this is really an exhaustive explanation on how to setup the necessary architecture for exposing AI, LLM based services from a private cloud. Thank you for sharing!

  • @ExpertKNowledgeGroup
    @ExpertKNowledgeGroup 9 месяцев назад +10

    Amazing step by step with complete explenation of what and why. Thanks!

  • @bornclasher1294
    @bornclasher1294 12 дней назад

    Great Tutorial sir!
    It is really helpfull. Thank you so much.

  • @wesleymogaka
    @wesleymogaka 6 месяцев назад +1

    Very excellent. I am doing chatbot creation for a bank and privacy/ security is of utmost importance. I'll surely use knowledge gained here.

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  6 месяцев назад

      Glad you found this useful, feel free to ask me questions, I've already setup banks and insurance companies using these techniques.

  • @juliandarley
    @juliandarley 7 месяцев назад +2

    thank you very much. i would add that adjusting the hyperparameters `k`, `score_threshold` and the `PromptTemplate` custom instruction can make a huge difference to the answer - by changing these i got the system to stop producing useless, inaccurate verbiage and give short, accurate, useful answers. it was like comparing a 1B early LLM with GPT4o.

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  7 месяцев назад

      Thanks for the info, hope it helps others as well

    • @juliandarley
      @juliandarley 7 месяцев назад

      ​@@fastandsimpledevelopment to be specific this is what i have found so far to work reasonably well:
      PromptTemplate
      "Based on the following context, provide a precise, concise, and accurate answer to the query. Do not give a load of waffle and empty verbiage, just the actual answer. Do not use your own knowledge, only that in the text supplied. The answer is almost certainly in the given text, but it may require some intelligence on your part to piece together information in order to answer the question. Try hard; don't give up easily."
      "k": 5,
      "score_threshold": 0.5,
      if anyone is interested, i can give some example answers; my impression is that the LLM is the weak link - it's often not quite smart enough to piece together the right pieces of information, where a human would easily work it out. i plan to try out other LLMs and embedding algorithms, but at least the results are looking promising.

    • @manihss
      @manihss 6 месяцев назад +1

      would you mind sharing those? the params and the Prompt.

    • @juliandarley
      @juliandarley 6 месяцев назад

      @@manihss i am away from my main desktop, but i will look as soon as i can. the prompt was something like: "keep the answer very short and don't give me a load of empty blather"! i think the score threshold was 5, but that may be erroneous.

    • @TechniqueIsKey
      @TechniqueIsKey 4 месяца назад +1

      @@juliandarley would still be interested in what params you ended up finding optimal if you still have it

  • @raymondcswong8602
    @raymondcswong8602 7 месяцев назад +1

    Thank you very much! I learnt a lot following your step-by-step guidance! Especially how you solve the 'errors', cheers!

  • @salamina_
    @salamina_ 8 месяцев назад +1

    Very helpful content. Great tutorial on putting it all together in a RAG application. Thank you for taking the time to put this together and sharing it!

  • @photorealm
    @photorealm 2 месяца назад

    Excellent video. Helped me immensely, Thank you for sharing.

  • @ChigosGames
    @ChigosGames Месяц назад

    Thanks! Was looking for this!

  • @RobynLeSueur
    @RobynLeSueur 9 месяцев назад +4

    Very useful, criminally underviewed too.

  • @singar1976
    @singar1976 7 месяцев назад +1

    Thank you for making this video. It was so helpful :)

  • @davidtindell950
    @davidtindell950 6 месяцев назад +1

    Thank You Again. Perhaps a follow-on Vid where you make this a full Flask App with a responsive Web-GUI ?!?!?

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  6 месяцев назад +1

      I have video that I'm editing that has a full React UI, I built it so anyone can build a product with it if they wanted, I have the Python code now as a Microservice which makes it much cleaner to deploy in production as well as full logging.

  • @dean-p6z
    @dean-p6z 7 месяцев назад +2

    nice starter tutorial that does not involve openai!

    • @ExpertKNowledgeGroup
      @ExpertKNowledgeGroup 7 месяцев назад

      There was so much OpenAI and almost none that really cover running Ollama locally, I uses this for a last company that has very private data, from HR information to Product Development, Jira and Confluence integration, no way could we use OpenAI and have them "Learn" all our IP content :)

  • @deathdefier45
    @deathdefier45 9 месяцев назад

    I was looking for just this, thank you sir you're a wonderful human being! Thank you so much for this content!

  •  4 месяца назад

    Amazing video! Your explanation is super insightful and well-presented. I'm curious-do you have any thoughts or experience with using Ollama in a production environment? I'm not sure if Ollama can handle multiple requests at scale.
    If I were to implement something like this in production, would you recommend Ollama, or would alternatives like llama.cpp or vllm be better suited? Would love to hear your perspective on its scalability and performance. Thanks again for sharing such awesome content!

  • @raghavareddy7134
    @raghavareddy7134 8 месяцев назад

    Really loved it......😍... can you add delete endpoint to delete from both pdf folder and chroma db

  • @yusuf50
    @yusuf50 6 месяцев назад

    Thats way man, great tutorial. Thank you

  • @kiranwork5466
    @kiranwork5466 9 месяцев назад +1

    Very informative and timely !

  • @TokyoNeko8
    @TokyoNeko8 9 месяцев назад

    Awesome with all of the latest LLM and APIs!

  • @candyman3537
    @candyman3537 3 месяца назад +2

    Thanks for your video. Have a question. If I a very big pdf, will this embedding data take more tokens? And what is the max length of the pdf?

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  3 месяца назад +1

      I normally use Ollama for for Rag applications with ChromaDB. Ollama is ran locally or at least on your servers so there is never a cost for tokens. If you were to use Gemini or Open AI then yes you have token costs. Depending on what database your using for your vector store, there may be small costs to store the data. In general when you retrieve data and use it in the LLM processing there are token costs. I have used up to 100Mb PDF files with ChromeDB. The big thing to watch is the Chunk Size, you may find that you will send 3 chunks to the LLM so if each one is 1Mb then 3 chunks are 3Mb, which could be 600,000 tokens (3,000,000 / 5) 5 bytes per token. That would be expensive, again using a local LLM and local vector store resolves these costs real fast ply gives you the security of your data not being outside your company.

    • @candyman3537
      @candyman3537 3 месяца назад +1

      @@fastandsimpledevelopment So the size of pdf files only affect the cost of the vector db?

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  3 месяца назад

      @@candyman3537 Correct, if you have a local vector db then there is no cost. The chunk size would have more effect on tokens, I normally have 3 chunks returned from the similarity results that are then sent to the LLM.

  • @chidi21
    @chidi21 9 месяцев назад

    This is awesome! Thank you very much for posting this!👏

  • @EVandPassions
    @EVandPassions 5 месяцев назад +1

    The best tutorial

  • @hebertgodoy5039
    @hebertgodoy5039 7 месяцев назад

    Excellent. Thanks very much for sharing

  • @Pekarnick
    @Pekarnick 9 месяцев назад +1

    Very good video, it helped me a lot with something I was looking for to integrate with a chatbot, but I had to adjust the search_kwargs a little to 3 because it got dizzy with the result. I would be happy to see how to delete added PDFs in Chroma, greetings and thank you very much for this content

  • @AbdulHalim-mp2rs
    @AbdulHalim-mp2rs 6 месяцев назад +1

    Thank you, thank you, thank you.

  • @Dan-mm9yd
    @Dan-mm9yd 8 месяцев назад

    very useful vedio, thanks for sharing!

  • @yongxiang4635
    @yongxiang4635 6 месяцев назад +1

    thank you so much for this video, this is very helpful for me now.
    But i have some questions, how can i deploy this Flask app to server so no need to use 'localhost'

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  6 месяцев назад

      You can just create a venv on your Linux server, activate it and then pip install -r requirements.txt to install all the required dependencies. The just start as normal python 3 app.py and it will be running. You may need to open a port on your firewall on the server and then you can connect externally so maybe 10/10/10/25:8081/api for a connection. I do this all the time. I break things into smaller services (microservices) and have them run in a Docker container. You can load the Ollama on a GPU based server (see my video on this).

  • @oscarcorreia2804
    @oscarcorreia2804 9 месяцев назад +1

    Thanks for the great tutorial. I would like to get your thoughts on 2 aspects: What is the minimum RAM needed to run this? Can the model be stored on an external hard disk?

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  9 месяцев назад +1

      You will need at least 8Gb or Ram. The model files are stored on a hard disk but they are also loaded into memory when used. So when there is no activity the memory is free but after about 5 minutes of no activity the cache is cleared so the next time you use it the model is loaded into ram again. If you have a GPU (Nvidia card) then it gets loaded into the GPU. I like to use an external Linux Ubuntu server with an NVidia card, 20Gb of ram, runs really fast.

    • @oscarcorreia2804
      @oscarcorreia2804 8 месяцев назад

      @@fastandsimpledevelopment Thanks

  • @out-of-sight
    @out-of-sight 9 месяцев назад +2

    Perfection 👏🏻👏🏻👏🏻

  • @nizark.5265
    @nizark.5265 6 месяцев назад +1

    Thanks for the tutorial.
    I had an error when saving the file in Windows , here is how itsolved:
    from pathlib import Path
    current_folder = Path(__file__).parent.resolve()
    save_file = str(current_folder) + "\\pdf\\" + file_name

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  6 месяцев назад

      Thanks for the input, hope it helps someone else as well.

    • @4cadia
      @4cadia 4 месяца назад

      Thanks so much for your comment -- where did you put this in the code?

  • @mukeshkarthik1480
    @mukeshkarthik1480 8 месяцев назад +1

    I have my ollama running in a external machine , how to configure the ip in cached_llm = Ollama(model="llama3")

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  8 месяцев назад +2

      Give this a try
      cached_llm = Ollama( model="llama3", base_url="OLLAMA_HOST:PORT" )

  • @RamonLopesFaustino
    @RamonLopesFaustino 8 месяцев назад

    Thx from Brazil.

  • @federicocalo4776
    @federicocalo4776 5 месяцев назад +1

    Hello i have a problem, the variable context is not declared

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  5 месяцев назад

      Can you give me more information? I do not see a variable "context" in the code, "context" is returned in the lookup of data from the ChromaDB so if there is no context then I suspect there is no matching results from the search, maybe tell me the line number or share the code with me

    • @federicocalo4776
      @federicocalo4776 5 месяцев назад

      @@fastandsimpledevelopment line 27 of your github repo. "NameError: name 'context' is not defined"

    • @federicocalo4776
      @federicocalo4776 5 месяцев назад

      my mistake. the editor put a f""" automatically

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  5 месяцев назад

      @@federicocalo4776 That is part of the PromptTemplate so not a real variable, it is populated by the Retriever so line #70 should create this value in the retriever for you.

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  5 месяцев назад

      Make sure you have line 27 in 3 double quotes and you are using Python 3.9 or greater

  • @darthcryod1562
    @darthcryod1562 8 месяцев назад

    nice video tutorial, some questions though. Q: if you already have your loader and call loader.load_and_split(), doesnt that function already use text_splitter by default using RecursiveText... class, is it really needed to call text_splitter.split_documents agai?
    2nd question, if you have that upload endpoint, if you call it multiple times with different pdfs will it overwrite previous vectore store?? since i see you are re-creating Chroma.from_documents each time, should an instance of Chroma be created and then just call chroma_instance.add_documents() ?

  • @Yuilix
    @Yuilix 12 дней назад

    does it store the chat history? like does it remember the chat if it's asked?

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  12 дней назад

      I normally set a session id for the user for chat and then store the chat history in MongoDB, this has worked well for me. There is no auto storing so you have to write that code.

  • @lbognini
    @lbognini 9 месяцев назад

    Great and to the point. 👏

  • @ChigosGames
    @ChigosGames Месяц назад +1

    Is this solution scalable? With many concurrent users?

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  Месяц назад +1

      No, it is not. That means that one request is processed in one thread of Python, you can scale this if you have multiple instances of the Python Flask app running, this can be done with a load balancer like NGINX but still does not make it fast. The bottleneck is Ollama, it can handle multiple requests but may have to reload the LLM each time if it is not the same LLM, if you were to use Llama3.2 every time then you would get a level of support for multiple requests but in the end it is not scale-able as you would expect, if 1 request takes 10 seconds, 2 requests takes 20 seconds so it does not solve anything. You can always add another GPU, Ollama supports this but then the first request takes 12 seconds, the seconds takes 18 seconds, etc. So again no scale-able solution. But if you have isolated machines that are feed from a load balancer like NGINX and each machine has your Flask API and Ollama running with its own GPU (I use the NVIDIA 4090) then yes, first request takes 10 seconds, 2nd request takes 10 seconds so the same is pretty consistent for multiple requests, you will quickly find that you may need 4 machines to create a production grade system. This is what I have done for a large LLM project and it does work well. I setup the Load Balancer for Round Robin and then process each request as they come in. If I need to support more requests them I will add more servers, I did this on AWS and it cost me about $700 per server per month but it did work. I now have my own servers that cost about $2,500 each to build that is the LLM Engine Cluster. I connect this into the cloud using ngrok and it is very fast. As far as I know there is no way to scale up vertically as far as getting more LLM ram or processor power other than replacing your GPU board. Adding boards in parallel gives more memory but does not effect the processing speed, well it is a bit slower from the overhead but Ollama will put different sections of the LLM into different cards so the memery is scaled but not the processors. Each processor runs the segment of memory based on the LLM loaded in its own instance so there is no performance increase.

    • @ChigosGames
      @ChigosGames Месяц назад +1

      ​@@fastandsimpledevelopment Clear! Thanks for the reply! So from a performance perspective parallel is the way to go, makes sense. Follow up question, how do you keep the source of truth (the RAG and your docs) in sync?

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  Месяц назад +1

      @@ChigosGames I do two things for Rag, initially I used PDF input and stored the Vectors into a ChromaDB, this can be a server so all the instances use the same database but the data is as old as the last PDF upload. I have moved to a better solution where I do not source anything from PDF, I query a database, in my case MongoDB, I then take that content (which should be the current truth) and feed it into the process as if it came from ChromaDB so it is then what Ollama uses to answer a question/prompt, this works very well and a lot of the PDF/VectorDB issue went away. I have a large set of data, think of Airline tickets so I have routes, times, destinations as well as passengers that purchase and I need to answer questions like "What is a cheaper flight" or "If I change a day how much will it cost" so some times it not as simple as a PDF document with content

    • @ChigosGames
      @ChigosGames Месяц назад +1

      ​@@fastandsimpledevelopmentok, I love mongodb, since it is so json friendly (a bit too much sometimes), so with that you already structurize and let it enrich by the LLM.
      How do you 'steer' the LLM from not to be too creative with amending your (flight) data?

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  Месяц назад +1

      @@ChigosGames I create very specific data for example "SFO 01/10/2024 10:00AM - JFK 01/10/2024 4:45 PM American Flight 1410 $445" This is then used in the LLM and I do have a filter and use JSON format output I then transform this as needed

  • @oguzhanylmaz4586
    @oguzhanylmaz4586 8 месяцев назад +1

    hi, can this project work without internet?

  • @DeanHorak
    @DeanHorak 9 месяцев назад

    Very nice. Thanks.

  • @gokulans-z9q
    @gokulans-z9q 4 месяца назад

    hi bro i tried this program will show error fastembed will not import but i will already install the package and again and again same error will show

  • @aiinsets
    @aiinsets 6 месяцев назад +1

    Can I use this without any limits or restrictions means its free no token and anything needed ? please reply

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  6 месяцев назад

      FREE FREE FREE - All you have to do is host the service yourself. This is all Open Source, no fees for Ollama, Llama3, LangChain, ChromaDB, Flask or any of the Python Libraries. Keep watching for my Python Microservices that make all this even simpler to setup and use with a React Front End

    • @aiinsets
      @aiinsets 6 месяцев назад

      @@fastandsimpledevelopment thanks for reply

  • @lololoololdudusoejdhdjswkk347
    @lololoololdudusoejdhdjswkk347 8 месяцев назад

    For AWS; are you using Amazon sagemaker or how would you go about deploying the local LLM model to host it?I’m new, so I apologize is this is an incoherent question.

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  8 месяцев назад

      I deploy a full Ollama server on EC2 with a GPU based server, no Sagemaker needed. This works on AWS or even your own Linux machines

  • @superman-h4i
    @superman-h4i 6 месяцев назад

    why is this giving answers outside the pdfs when i am asking unrelated questions to content of an error response?

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  6 месяцев назад

      This is the typical hallucination problem, in the prompt you need to say "Only use the content provided" and also if the results from the retriever or zero length then I normally put up a message that says I could not find the content in the PDF.

  • @oguzhanylmaz4586
    @oguzhanylmaz4586 8 месяцев назад +1

    Hi, first of all, thank you very much for this project. I will move forward on your project. I would like some support from you, can you integrate data streaming into your existing API project? In other words, the response should not come as the entire text. Like chatgpt, the response comes live word by word.

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  8 месяцев назад

      No, sorry I do not have time to work on streaming interface, I'm sure you can get it working if you dive in yourself. Streaming is supported and is the detail API for Ollama.

  • @taariqnoor8716
    @taariqnoor8716 7 месяцев назад

    About the search_kwargs in the as_retriever method, i can't find all other options that can be used and what are they for, can anybody help ?

  • @monishj101
    @monishj101 15 дней назад +1

    How to integrate it in flutter

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  14 дней назад

      It would be simple to integrate this into Flutter, just setup the Ollama / Flask on a server and then call the API form the Flutter Mobile app. Maybe put an APIKey around the call for security. Should only take a few minutes to get it running.

    • @monishj101
      @monishj101 14 дней назад

      @fastandsimpledevelopment Whether streamlit or flask , which is easy to integrate in flutter?

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  14 дней назад

      @@monishj101 I've never used streamlit so I have no experience to tall. Flask is as simple as adding an http rest call.

    • @monishj101
      @monishj101 13 дней назад +1

      ​@@fastandsimpledevelopment My chunk size is 10000😅.It is responding slower. Is there any way to increase speed of my chatbot??????

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  13 дней назад

      @@monishj101 Size should be OK, there are two things to make it faster, you need to run on a GPU and it needs to have memory. If your running locally then a nice card like the Nvidia 4090 works well. If you using a cloud solution then think about upgrading to a faster GPU. If your not using a GPU then its about 40X faster on a GPU. Other performance is based on selecting a model. I use Llama3.2 and its pretty fast.

  • @Larimuss
    @Larimuss 6 месяцев назад

    Can you do one for Windows or Linux 😂 sorry im a bit lost too. Guess I need more python knowledge

  • @thebluefortproject
    @thebluefortproject 9 месяцев назад +1

    Can this be run on a MacBook? Super informative, I just don't want to fry my mac trying it out :/

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  9 месяцев назад

      Yes, this video was recorded on a 2019 Intel Macbook pro. I've also ran this on an M2 Macbook. If you try large queries then it is very slow, you really need a GPU (NVIDIA) for performance.

    • @thebluefortproject
      @thebluefortproject 9 месяцев назад

      @@fastandsimpledevelopment awesome thank you very much!!

    • @brianclark4639
      @brianclark4639 9 месяцев назад +1

      Can it be run on PC?

    • @brianclark4639
      @brianclark4639 9 месяцев назад +1

      *Windows

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  9 месяцев назад

      @@brianclark4639 Yes, Ollama now supports Windows, code should be the same

  • @RichardEnglish1
    @RichardEnglish1 9 месяцев назад +1

    Sweet!

  • @hi-yuren
    @hi-yuren 8 месяцев назад

    I am having Error: ImportError: Could not import 'fastembed' Python package. Please install it with `pip install fastembed`, even I installed fastembed. It will be appreciated if anyone can help.

  • @learnfromIITguy
    @learnfromIITguy 6 месяцев назад

    Awesome

  • @nazgod11
    @nazgod11 7 месяцев назад

    Hi! your project worked for me yesterday, I tried to change the pdf loader to another one, then it broke and only gave empty chunks. Today I've tried recreating it in a new venv, but now I keep getting the following error: werkzeug.exceptions.BadRequestKeyError: 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
    KeyError: 'file' tried searching on google but I couldn't find an answer, also I've never had this error before. Thanks a lot in advance if you are able to help:)

    • @ChigosGames
      @ChigosGames Месяц назад

      Could it perhaps be that you need to relaunch your virtual environment? (venv)

  • @kyougetsubarano1534
    @kyougetsubarano1534 6 месяцев назад

    Postman gives this error, can anyone help?
    415 Unsupported Media Type
    Unsupported Media Type
    Did not attempt to load JSON data because the request Content-Type was not 'application/json'.

  • @brajeshsahu7981
    @brajeshsahu7981 8 месяцев назад

    How do i fix this?
    C:\Users\zzz>ollama pull llama3
    pulling manifest
    Error: 403:
    C:\Users\zzz>ollama serve
    Error: listen tcp 127.0.0.1:11434: bind: Only one usage of each socket address (protocol/network address/port) is normally permitted.

    • @fastandsimpledevelopment
      @fastandsimpledevelopment  8 месяцев назад

      The first issue is that you already have Ollama running. You need to stop the instance, on Windows I think you can do that with the Task Manager. The other issue may then work, if not then make sure Ollama is running, try "Ollama list" this will show you that it is running and give you a list of the models already loaded (if any). There should not be a 403 error unless you are behind a firewall or corporate security does not allow for the download. Sorry that's Ill I got.

  • @m.waqas27
    @m.waqas27 3 месяца назад

    Where is the calling API?

  • @ChigosGames
    @ChigosGames Месяц назад +1

    If you're on Windows remove uvloop from the requirements.txt. It will break your pip.

  • @superfreiheit1
    @superfreiheit1 3 месяца назад

    the code is blurred and to small, hard to read

  • @Mutar
    @Mutar 7 месяцев назад +1

    gj

  • @DaleIsWigging
    @DaleIsWigging 5 месяцев назад +2

    WINDOWS USERS INSTALL PROBLEM:
    "uvloop" says it doesn't work on windows
    remove it from requirements.txt
    it should install and seems to work anyways
    (I haven't tested the RAG part, only that the API call works)

  • @curiousguy9884
    @curiousguy9884 8 месяцев назад +1

    I keep getting an "langchain_community.llms.ollama.OllamaEndpointNotFoundError: Ollama call failed with status code 404. Maybe your model is not found and you should pull the model with `ollama pull llama3`." error. Any help resolving this would be greatly appreciated. (I am running llama3 on a Mac OS)

    • @ashwinkrishnan4435
      @ashwinkrishnan4435 8 месяцев назад

      Go to the command line pull the model using ollama pull llama3