Fully local RAG agents with Llama 3.1

Поделиться
HTML-код
  • Опубликовано: 12 сен 2024
  • With the release of Llama3.1, it's increasingly possible to build agents that run reliably and locally (e.g., on your laptop). Here, we show to how build reliable local agents using LangGraph and Llama3.1-8b from scratch. We build a simple corrective RAG agent w/ Llama3.1-8b, and compare its performance to larger models llama3-70b, gpt4-o. We test our Llama3.1-8b agent on a corrective RAG challenge, and show performance and latency versus a few competing models. On our small / toy challenge, Llama3.1-8b performs on par w/ much larger models w/ only slightly increased latency. Overall, Llama3.1-8b model is a strong option for local execution and pairs well with LangGraph to implement agentic workflows .
    Blog post:
    ai.meta.com/bl...
    Ollama:
    ollama.com/lib...
    Code:
    github.com/lan...

Комментарии • 35

  • @nachoeigu
    @nachoeigu Месяц назад +3

    I really like the langmith test section in the package. Great job!

  • @automatalearninglab
    @automatalearninglab Месяц назад +3

    OOhh nice! I had some issues with llama3-groq-tool-use even after pulling with ollama and trying, it kept returning an empty list instead of the actual tool calls. Just tested this code though and it works great! Love it! Thanks!!! Love the videos from the channel!

  • @chukwuinnocent2560
    @chukwuinnocent2560 Месяц назад +1

    Thanks so much for this open information ❤

  • @aaagaming2023
    @aaagaming2023 Месяц назад +1

    I really like the way you explain it, makes it easy to learn the concepts.

  • @jwickerszh
    @jwickerszh Месяц назад

    Nice. I need to test this as well on more complicated agents setup, I had a case were some models would not complete, run into loops, having too many errors trying to call tools ... have to give 3.1 a go at it.

    • @rahulsh5237
      @rahulsh5237 Месяц назад

      True it's going in loops I tested

  • @dbreardon
    @dbreardon Месяц назад +3

    Llama likely will make its way into online Ai products (it already does). But until someone builds a one click Llama download and install, the general public will likely never run a local Ai. And they will certainly never jump deep into coding just to build simple agents. It is just way over the heads of most general computer users. And if one click instal is not done soon, people will gravitate towards the online subscription proprietary Ai offerings (OpenAi, Claude, Gemini, etc) and never look back.
    I think that is what really killed Linux in competing in the OS space. Mac and Windows, even in the 90's was basically a one click installation process whereas Linux used command line, bin this bash that and installation was cumbersome and not easy....I know because I did it in the mid 90's. People will go for what is easy (Mac, Windows, an online Ai model) and once they are hooked into a particular Ai model, it will be darn hard to get them to change.
    It's a real shame because Llama is a pretty terrific LLM but local installation is just a nightmare for the majority of the general computing public.

    • @Mpanagiotopoulos
      @Mpanagiotopoulos 12 дней назад +1

      My guess is that Llama will be running on so many different devices without the consumer noticing, same way Linux is powering consumers fridge & dishwasher without the consumer knowing about it

  • @cnmoro55
    @cnmoro55 Месяц назад +4

    I think there is a missing step in the rag flow. If the user knows he is "talking" to some documents, he might prompt "What is the summary?". In this case, the grader will always answer "No", and the web search will be useless. There has to be an additional step to evaluate the question - if it is similar as the one I just mentioned, then you would simply fetch a block of text from those documents and send to the LLM to summarize.
    I have a bert classifier built specially for this on HF: cnmoro/bert-tiny-question-classifier

  • @IdPreferNot1
    @IdPreferNot1 Месяц назад

    Great video including the evaluation and closed source comparisons

  • @MohammedAlshayeb-r9m
    @MohammedAlshayeb-r9m Месяц назад +6

    few thing have to change:
    1. use :
    from langchain_ollama import OllamaEmbeddings # from langchain_nomic.embeddings import NomicEmbeddings
    ....
    embedding=OllamaEmbeddings(model='nomic-embed-text'),

  • @alitomix
    @alitomix Месяц назад

    I would like to see a sample of how to use this to elaborate large size text that follow a structure or script... without losing coherence by re-evaluating the progress

  • @uwegenosdude
    @uwegenosdude Месяц назад

    Thanks for your great video. Could you recommend llama3.1 also for RAG based on documents in german language? All the time when we tried this, the results were much worse compared to using LLMs like gpt-4o or gpt-4o-mini. And could you explain why you are using the OpenAI embeddings? If I want to use this demo as a RAG app for asking questions to local document, do I only have to replace the WebBaseLoader by a DocumentLoader?

  • @antonpictures
    @antonpictures Месяц назад +1

    IMAGE EMBEDINGS NOT WORKING - TEXT FINE - BRO LLAVA RAG MULTIMODAL PLS

  • @aaagaming2023
    @aaagaming2023 Месяц назад

    Any chance you might compare hosted api Llama 3.1 405b and Mistral Large 2 123b on same evals?

  • @shivamandfriendsreact4089
    @shivamandfriendsreact4089 Месяц назад

    super cool

  • @malikrumi1206
    @malikrumi1206 Месяц назад +1

    I find the fascination with parameter numbers boring. What I would like is a way to measure how much data a model can *hold*. Is there anything like that out there?

    • @mpnikhil
      @mpnikhil Месяц назад +2

      The parameter size is kind of a proxy for how much data the model is holding. More the number of parameters, more the model "remembers". Andrej Karpathy plainly states that it is kind of a compression of all the data it has seen in training.

    • @Hollowed2wiz
      @Hollowed2wiz 10 дней назад +1

      You should check out the last video of 3b1b about llm.
      He kinda explain how llm store informations and with his explanation, it makes sense why the number of parameters is such a big deal in llm performance and ability to memorize.

  • @DhirajPatra
    @DhirajPatra Месяц назад +1

    Why always openai embedding? Why not used faiss and open source one instead 😊

    • @rickmarciniak2985
      @rickmarciniak2985 Месяц назад +3

      I think he said that he was just trying to keep it consistent for evaluation against the other llama and gpt model runs that he was comparing benchmarks with towards the end of the video.
      But yes, I agree on using the locally installed embeddings. I have utilized FAISS however the most recent embeddings, that I just figured out how to install locally, are the BGE embeddings on HuggingFace. Good luck!

  • @ElaheKhatibi-q4j
    @ElaheKhatibi-q4j Месяц назад

    Thank you for sharing this informative video and its hands-on code. I've faced this connection error="Error running target function: [WinError 10061] No connection could be made because the target machine actively refused it" can anyone please guide me how to fix it?

  • @MansA-n5l
    @MansA-n5l 14 часов назад

    The link to code on GitHub is broken. Can you please fix it?

  • @qactus4031
    @qactus4031 21 час назад

    Is it just me or is the code not accessible?

  • @farnsworth3000
    @farnsworth3000 Месяц назад +10

    Can I run 405B LLMs with 8gb of ram? 🤣

    • @migf27
      @migf27 Месяц назад +2

      No😂

    • @Praveenkumar-rk1up
      @Praveenkumar-rk1up Месяц назад

      Yes, but forget abt ur system 😂

    • @rickmarciniak2985
      @rickmarciniak2985 Месяц назад +4

      Unfortunately, no. I tried this months ago with the Llama-2 70b model on workstation that has a NVIDIA RTX3080 and 16GB of system memory. Tried to use the LLM locally installed with a conversational RAG chain. First few questions ran relatively quick, but it would start to hallucinate after that. It was also probably the fact that I was attempting to load really big PDF files, which I am sure ate up all the resources. And we are working with a narrow but deep sub-domain of medical information.
      Go to Groq and get an API for the Llama 3 models if you are really interested. I am using a Llama 3 API from Groq for single shot question evaluation on a CSV file (reasonable size, 250 rows with 4 columns of metadata, single word labels, and one "description" column that contains longer explanations with semantic meaning) and it works well; meaning that the chain returns results that I would expect, or reasonably so, at this point. Constant work in progress. Good luck! Lance does a great job here but I also like Sam's channel also at @samwitteveenai

    • @Roni.eD0
      @Roni.eD0 Месяц назад

      Usa groq

    • @iamramiz
      @iamramiz Месяц назад

      lol… try with 200gb
      You can get away with running the 7B model

  • @kaneyxx
    @kaneyxx 6 дней назад

    I got an error info as " ValueError: Node `retrieve` is not reachable" while running the example code.
    Who can help me to figure out what happened?

    • @kaneyxx
      @kaneyxx 6 дней назад

      never mind, the code I pulled from github has one typo
      # Build graph
      workflow.add_edge(START, retrieve) --> must change to workflow.add_edge(START, "retrieve")

  • @Ronaldograxa
    @Ronaldograxa 29 дней назад

    thanks for this! Now get off the toilet and go put some clothes on