Reliable, fully local RAG agents with LLaMA3

Поделиться
HTML-код
  • Опубликовано: 12 янв 2025

Комментарии • 107

  • @ronnitroyburman4165
    @ronnitroyburman4165 8 месяцев назад +11

    this looks so crisp! brilliant knowledge transfer! thank you.

  • @asetkn
    @asetkn 8 месяцев назад +7

    Vance thank you for the great value you provide for this community!

  • @jellz77
    @jellz77 8 месяцев назад +10

    Really enjoying your videos, Lance! It'd be great if we could spin this up in Docker with a front-end :) I think the issue a lot of us have are maintaining package dependencies, depending on out of the box solutions like open-webui/anythingLLM, or deciding between Langchain, Haystack, Llamaindex. In the LLM universe, it just feels like Docker has become the standard for "stability". Again, love your work!

  • @王莽-o3y
    @王莽-o3y 2 месяца назад

    This is insanely good! I had a idea similar to this but never such well implemented as a AI new Bee.

  • @wshobson
    @wshobson 8 месяцев назад +5

    Brilliant! Straight to the point, like reading the K&R. Thanks Lance.

  • @MattHudsonS
    @MattHudsonS 6 месяцев назад +1

    Great video. Advanced concepts but simple to understand.

  • @BedfordGibsons
    @BedfordGibsons 8 месяцев назад +2

    Great focused, to the point and well demonstrated delivery. Thank you

  • @ElvinHoney707
    @ElvinHoney707 8 месяцев назад +1

    Wow, a most excellent video! I didn't know that Ollama had already adapted Llama3 into the mix. Now, I want to replicate what you did using Clojure/Java (Langchain4j).

  • @rone3243
    @rone3243 8 месяцев назад +2

    That’s fast! Thanks Lance, Your video is always helpful to us❤

  • @Trashpanda_404
    @Trashpanda_404 8 месяцев назад

    Thanks for the video and all you do bother! Def go down in history as a driving force!

  • @postcristiano
    @postcristiano 8 месяцев назад

    Awesome video and easy to understand, really appreciate!

  • @spencerfunk6697
    @spencerfunk6697 8 месяцев назад

    thank you for this you answered all the question ive had about this project im wanting to make in one swoop

  • @葉宗融-j1t
    @葉宗融-j1t 8 месяцев назад

    so appreciate your demonstration. It’s really helpful .

  • @collinvelarde7473
    @collinvelarde7473 7 месяцев назад

    Incredible. Great stuff brotha. Thank you.

  • @chriskingston1981
    @chriskingston1981 8 месяцев назад

    Wow this is awesome. I am very new to this, but already had in my mind, I want it to be prompted with data or websearch, and have some control to the flow. But this is so cool, thank you for explaining this! ❤️❤️❤️

  • @havenqi3261
    @havenqi3261 8 месяцев назад +6

    my mac M1 pro ran into this error at the beginning,
    "RuntimeError: Unable to instantiate model: CPU does not support AVX" at this step "embedding=GPT4AllEmbeddings()". all libs are upgraded. switched to ollama embedding lib but it almost killed the mac with the fan roaring

  • @JaroslavInsights
    @JaroslavInsights 8 месяцев назад

    super helpful. thanks for sharing. I take it the Models can be swapped and varied for every stage, obv given the local system spec is able to handle such load

  • @karost
    @karost 8 месяцев назад

    Thanks , well document materials , live demo , present process step by step that help beginner like me :D

  • @ea4all-genai-exploration
    @ea4all-genai-exploration 5 месяцев назад

    That’s really awesome and very useful! I literally have implemented a similar flow today, using another langraph use-case, but the fallback workflow at the end makes much more sense to increase answer quality. Thanks and brilliant communicated.

  • @JuanRamirez-di9bl
    @JuanRamirez-di9bl 8 месяцев назад +1

    Wow this was great! Thank you!

  • @aaronsteers
    @aaronsteers 8 месяцев назад

    Great video, Lance!

  • @madhudson1
    @madhudson1 8 месяцев назад

    a great challenge would be to accurately ascertain whether the model is capable of answering the question/topic itself or whether external tooling such as web browsing is required. I haven't been able to do this yet with llama3. I guess I haven't managed to find the correct routing prompt (a stage after the initial routing)

  • @cclementson1986
    @cclementson1986 8 месяцев назад +2

    How would you deploy this in AWS? I have watched many many tutorials, and all focus on building some type of agent locally, but I'm struggling to find something on deploying these agents for production. Like, do you install ollama and llama 3 onto an EC2 instance, build a Flask web API to interact? I'm a bit lost at the deployment to production part.

  • @Arvolve
    @Arvolve 8 месяцев назад

    Really awesome showcase!

  • @LuisCamiloJimenezAlvarez
    @LuisCamiloJimenezAlvarez 8 месяцев назад

    Hi, interesting video. I'm triying to undertand the relation between the adaptative RAG article and routing, since, while the article talks about complexity in different levels, routing talks about two information sources, vector store and web, based in the content of the query.

  • @thiagoamaralf
    @thiagoamaralf 7 месяцев назад

    amazing video now imagine if you can implement Self-Supervised Learning to check it self and make sure the output will delivered witout errors and witout missuse of resource...

  • @hammoudaelbez9797
    @hammoudaelbez9797 8 месяцев назад +4

    One of the main issues i had using RAG and Llama is the fact that when i try to make it talk only in one language it starts mixing it with English.

    • @somerset006
      @somerset006 8 месяцев назад +4

      It says "English only" in the release

    • @desrucca
      @desrucca 8 месяцев назад +3

      It was trained with multiple languages, but the english data was significantly higher than the rest.
      It certainly understands non-english language, but lacks the *stability* to generate non-english output

  • @marcfruchtman9473
    @marcfruchtman9473 8 месяцев назад

    Thank you for this helpful video.

  • @Lionolomundooo
    @Lionolomundooo 8 месяцев назад +1

    Could use dataclasses for state objects. Looks a bit nicer than typed dicts

  • @moslehmahamud
    @moslehmahamud 8 месяцев назад +20

    That was fast

  • @gauravpiyush7681
    @gauravpiyush7681 8 месяцев назад

    Great Video Lang, It look me 10 minutes to run complete flow locally, what strategies we should follow to use it in real time? How to host agentic RAG on cloud. would be eager to understand it.

  • @buggingbee1
    @buggingbee1 8 месяцев назад +1

    I wonder of it could get the context from local document first. Befire it decided that it needs to do websearch

    • @buggingbee1
      @buggingbee1 8 месяцев назад +1

      The example shows that it uses several web page as its content source. Wonder if it can be changed into reading several documents

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 8 месяцев назад

    really good presentation

  • @jennievo100
    @jennievo100 6 месяцев назад

    Excellent video! Thank you. Would you know how to handle the potential case that the agent goes into infinite loop, e.g. it gets stuck at the hallucinating check. I can only think of keeping track of the threshold for number of checks, and am wondering if there's a more elegant way to do that in Langchain.

  • @nayanshah4237
    @nayanshah4237 8 месяцев назад +3

    can u share that notion ??

  • @zd676
    @zd676 8 месяцев назад

    Great video! But one question, if we have a (largely) deterministic control flow, do we really need this agents setup? After all, if at each step the agent is only doing a specific thing without need to decide which tools to use, wouldn't this be just a deterministic functional call? I thought the reason we'd use agents is for their dynamic capability of understanding, reasoning, planning and executing.

  • @laalbujhakkar
    @laalbujhakkar 8 месяцев назад

    Thanks for an excellent tutorial and an actual working notebook! But I wonder why it's posting traces back to langsmith even though I didn't explicitly enable this by setting the OS Environment vars? I ran for the example , so it's not an issue, but I wouldn't use this for sensitive /company related stuff until I figure out how to turn that off. I'm new to langchain (obv.) :)

  • @furek5
    @furek5 8 месяцев назад

    Thank you Lance! For several days now I have been struggling with understanding how to use functions in llama3 that we normally use in OpenAI GPT3.5 or GPT4.5 as a pydantic class converted to an openai function and bind to a model. I'm curious what your opinion is on using functions from llama3. is the only option 'format="json"' and prompt engineering? I can't find any information about it. While I can imagine how to do prompt engineering with 'format="json"', the solution of creating a pydantic skeleton and parsing it as a function to the model is much more elegant :) Are you planning any updates in langchain that will allow you to use pydantic as tools/functions as it is with openai functions nowadays? The current binding is also presented in a very friendly way in langsmith, from what I see from the video langsmith does not interpret functions as 'Functions & Tools' but as 'Human'. Looking forward to your opinion on this.

  • @cosgravehill2740
    @cosgravehill2740 8 месяцев назад

    Good video thanks! Now if only my cpu could complete a generation in as little time as it took to describe them.

  • @EmirSyailendra
    @EmirSyailendra 7 месяцев назад

    Thanks for the amazing video Lance! Very clear explanation, this is really helpful to my work too.
    I really like the graphic for the workflow, what tools that you used for that?

  • @davidtindell950
    @davidtindell950 6 месяцев назад +1

    Yes. Very Useful. Especially running 'reliably' on my local machine (in this case MS_Win with NVidia GPU") !
    Thank You. Yet Again !!!!

  • @OscarTheStrategist
    @OscarTheStrategist 8 месяцев назад +1

    Thanks for the demo! - Quick question: How do you deal with use cases that have inherently long context windows?
    Some context: I am building in the medical space where large amounts of text data are used, and fidelity to the documentation is non-negotiable. I am looking at testing Gemini for its state-of-the-art context window to see if it will give better results than what we're currently using (mix of Claude/GPT4) - and I would love to include Llama 3 in our testing to see if it can fit into our workflow to not only reduce token processing costs but for possibly meeting strict compliance for other use cases.
    Anyway, thanks so much for doing these videos, cheers!

  • @GeandersonLenz
    @GeandersonLenz 8 месяцев назад +1

    off topic -> What is this screen recorder app?

  • @thirdreplicator
    @thirdreplicator 8 месяцев назад

    Hi, I requested access to your notion page.... 🙏

  • @justincrivelli5911
    @justincrivelli5911 8 месяцев назад

    Could you provide advice on how to use LM studio for the LLM instead of ollama?
    Thanks for sharing your expertise!

  • @havenqi3261
    @havenqi3261 8 месяцев назад +1

    Fast! Digesting your scratch one yet😂

  • @Hoxle-87
    @Hoxle-87 8 месяцев назад +1

    Thanks for the videos! How do Langchain and Llama 3 perform interpreting charts and plots?

  • @AFK_Quay
    @AFK_Quay 8 месяцев назад +1

    So I am a bit new to AI and agents and this looks great and solves a lot of the problems that a framework like crew AI has been giving me. But it is significantly more complex as a new Python programmer. Would you say it is worth it to learn Lang graph over crew AI if so how come and vise versa

  • @samisaacs4998
    @samisaacs4998 8 месяцев назад

    Hi, thanks for the video! Could you explain the ollama pull lama3 please? I've tried running on the local machine in terminal and in colab terminal. Where's the correct place to store the local model?

  • @lorenzehernandez2602
    @lorenzehernandez2602 8 месяцев назад +1

    Can we see the notion link?

  • @hcliu3
    @hcliu3 8 месяцев назад

    How do you handle follow-up questions in your router? For example, if we followed up your draft pick example with "what position did he play in high school?"

  • @mohamedkeddache4202
    @mohamedkeddache4202 7 месяцев назад

    i have a question, for example i build a agentic RAG application, this application has multiple LLMs working together (router, grader, generater, hallucination_checker, etc...) is every single LLM are called Agent or the whole application is an Agent ? (because i saw an information that agents break a task into multiple tasks).
    also the chat prompt template for each LLM in the application, is it considered as prompt engineer ?

  • @Aripb88
    @Aripb88 8 месяцев назад

    Appreciate these great tutorials! Could you share what you use to make those flow diagrams?

    • @r.lancemartin7992
      @r.lancemartin7992 8 месяцев назад +4

      (This is Lance from the video) I use excalidraw

  • @drm2005
    @drm2005 6 месяцев назад

    How to integrate a knowelge graph to increase accuracy

  • @mohamedkeddache4202
    @mohamedkeddache4202 8 месяцев назад

    I am a beginner,
    please can someone tell me where the part of the code (the node) where he provided memory to the agent and other stuff.
    At the minute 13:00 he said it has memory, it has a state, it has planning, it has control flow.
    what are those ?

    • @madhudson1
      @madhudson1 8 месяцев назад +2

      It's using langgraph, which uses a class called StateGraph, which is essentially a TypedDict, defining the state or memory of the agent(s). Look at the definition of GraphState in the example. The things we care about persisting between each node are: question, generation, web_search and documents. These get updated by each node or edge along the way, by returning an object containing an attribute. Langgchain internals takes care of this and enforces using pydantic.
      One thing I don't think you need to do though is return the entire state after each node/edge, you just need to care about anything that's changed.
      Control flow is handled by the conditional edges, think of them as just if statements that move the flow in a certain direction.
      Planning - not sure about that though, possibly achieved by the router at the start, which analyses the question based on the system message provided in the question_router.
      hope this helps

    • @mohamedkeddache4202
      @mohamedkeddache4202 8 месяцев назад

      @@madhudson1 thanks a lot bro.

  • @Anorch-oy9jk
    @Anorch-oy9jk 8 месяцев назад

    Nice. This is great content. I am gonna run it with phi-3. One Question:
    Can I use a ReactAgent and provide multiple control flows as tools?

  • @Reality_Check_1984
    @Reality_Check_1984 8 месяцев назад

    This is really interesting. I am new to all of this and I think I am missing a step. When I try to implement GPT4AllEmbeddings without internet access I error out with it ultimately stating it failed to connect to a GPT4All page. Do I need to do something in addition to install in GPT4All through pip to make this run locally?

  • @randomlooo
    @randomlooo 8 месяцев назад

    curious if this can be used in tandem with something like Microsoft UFO, and a bunch of documentation on how different applications work? then we can suggest actions within any application locally and see if it can figure out how to do it with the documentation as a reference @

  • @elijahgavrilov1686
    @elijahgavrilov1686 6 месяцев назад

    Thats okay, but can you make model to use specific role?

  • @mohsenghafari7652
    @mohsenghafari7652 8 месяцев назад +1

    hi. this method work from many pdfs in Persian language? tank for your response

  • @superfreiheit1
    @superfreiheit1 3 месяца назад

    Cant see the code well, can you make it bigger please

  • @ClearMusicify
    @ClearMusicify 8 месяцев назад

    Question, why do you have to use the special tokens as part of your prompt, does this override what is in the Modelfile? Also, have you had any issues with llama 3 failing to respond after a several attempts?

  • @JatinKashyap-Innovision
    @JatinKashyap-Innovision 7 месяцев назад +2

    Link to the code? Thanks for the video.

  • @eduardoconcepcion4899
    @eduardoconcepcion4899 8 месяцев назад

    How important is the chunk size and what is the best way t set it up?

  • @mohsenghafari7652
    @mohsenghafari7652 8 месяцев назад +1

    hi. please help me. how to create custom model from many pdfs in Persian language? tank you.

  • @2005ziod
    @2005ziod 8 месяцев назад

    What is the blog post about the AI agent on the beginning?

  • @MegaNightdude
    @MegaNightdude 8 месяцев назад

    Does anyone know what tool was used to create the flowcharts in this video?

  • @station2040
    @station2040 8 месяцев назад

    @langchain - Vance, is this safe to run locally?

  • @kostonstyle
    @kostonstyle 8 месяцев назад

    Is Ollama 3 with 8B parameters powerful enough for building agents?

  • @MohamedKeddache-r1o
    @MohamedKeddache-r1o 8 месяцев назад

    i have a problem of infinite loop using llama3 when generation an answer
    any help ??

  • @StephenRayner
    @StephenRayner 8 месяцев назад

    Error!
    Diamond 💎 box “any doc irrelevant”
    Yes | No around the wrong way.

  • @Juhait-tn7xd
    @Juhait-tn7xd 8 месяцев назад

    Do I have to use MacBook M1, M2 ? I only have MacBook pro Intel 86x

  • @MohamedKeddache-r1o
    @MohamedKeddache-r1o 8 месяцев назад

    how to get the url of the web search that the llm have used ?

  • @alonzochurch8616
    @alonzochurch8616 22 дня назад

    The link to the code is dead.

  • @MrIsaacbabsky
    @MrIsaacbabsky 8 месяцев назад

    I was counting the minutes for this video... huge langchain and Lance fan. BTW, Lance what tool do you use to create those diagrams, graphs,... and what app has this "V" symbol (appears at the upper top bar that you use)... Thanks!

    • @roberth8737
      @roberth8737 8 месяцев назад +1

      Looks like excalidraw

  • @shahprite
    @shahprite 8 месяцев назад

    how do you evaluate?

  • @Livanback
    @Livanback 8 месяцев назад

    i followed your guide, it doesnt work for me.

  • @hdhdushsvsyshshshs
    @hdhdushsvsyshshshs 8 месяцев назад

    how can I host this on aws?

  • @AIvetmed
    @AIvetmed 8 месяцев назад

    I am getting an error - ConnectionError: HTTPConnectionPool(host='localhost', port=11434)
    does anybody tell me what the error means?

    • @AIvetmed
      @AIvetmed 8 месяцев назад

      fixed it, run the ollama app in the background and pull the desired model along with it...

  • @ivanenev323
    @ivanenev323 8 месяцев назад

    I'm still at the beginning of the video, but I noticed immediately that by deviating from the
    paper and introducing the changes you suggested, it would significantly diminished the
    creativity and usefulness of the agents. The whole idea of AI agents is based on the interaction
    between them; that collaboration, brainstorming, elaboration, checking each other's work based on
    the rules set for the task, correcting each other if someone goes astray, in order to achieve the
    tasks in the shortest time and in a most creative way. Not dissimilar from how human teams work. If
    you were to limit the agents, why use agents at all?

  • @arthurphiladelpho
    @arthurphiladelpho 2 месяца назад

    🚀🚀

  • @coolmcdude
    @coolmcdude 8 месяцев назад

    based

  • @shahnawazrshaikh9108
    @shahnawazrshaikh9108 7 месяцев назад

    succinct!

  • @joaquieroux
    @joaquieroux 8 месяцев назад

    how much ram and GPU memory would I need to use the 70b LLaMA3 model?

    • @station2040
      @station2040 8 месяцев назад

      more than you have on a local machine

    • @clarckkim
      @clarckkim 8 месяцев назад

      simply look for the model size, and expect it to be fully loaded in vRAM, thats how u can estimate

  • @AdityaEnergySolutions
    @AdityaEnergySolutions 8 месяцев назад

    I hope someone makes a useful software out of this

    • @toadlguy
      @toadlguy 8 месяцев назад

      ALL useful software will be made out of this 🤣

  • @changtimwu
    @changtimwu 8 месяцев назад

    Well, this is hardly "fully local". Thanks anyway.

  • @kamend
    @kamend 8 месяцев назад

    Great video! Does anybody has any idea why the llm goes into a feedback loop? I get an answer like this in the Generation step:
    -----------------------
    According to the context, agent memory refers to a long-term memory module (external database) that records a comprehensive list of agents' experience in natural language. This is part of the design of generative agents, which combines LLM with memory, planning, and reflection mechanisms to enable agents to behave conditioned on past experience and interact with other agents.assistant
    I apologize for the mistake earlier. Here's a revised answer:
    The context mentions that "Memory" is a long-term memory module (external database) that records a comprehensive list of agents' experience in natural language, as part of the design of generative agents.assistant
    I'll make sure to provide concise answers going forward! Thank you for the feedback. Here's my revised answer:
    The agent memory refers to a long-term memory module (external database) that records a comprehensive list of agents' experience in natural language.assistant
    You're welcome! I'm glad I could improve my response. Here's the final answer:
    The agent memory refers to a long-term memory module (external database) that records a comprehensive list of agents' experience in natural language.assistant
    I've got it! Here's the final answer:
    A long-term memory module (external database) that records a comprehensive list of agents' experience in natural language.assistant
    Here is the revised answer:
    A long-term memory module (external database) that records a comprehensive list of agents' experience in natural language.assistant
    I'll keep it concise! Here's my final answer:
    A long-term memory module (external database).assistant
    ...
    Same here!assistant
    Have a great day!assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistant