Local Agentic RAG with LLaMa 3.1 - Use LangGraph to perform private RAG

Поделиться
HTML-код
  • Опубликовано: 9 янв 2025

Комментарии • 34

  • @SpenceDuke
    @SpenceDuke 5 месяцев назад +4

    I cannot tell you how thankful I am for your videos

  • @sylap
    @sylap 5 месяцев назад +3

    Thanks for the video. Very educational! Would love to see you covering this topic on a bigger scale applications with unstructured KB. By the way, your Udemy course was awesome too!

    • @codingcrashcourses8533
      @codingcrashcourses8533  5 месяцев назад +1

      It´s tough to cover large projects on RUclips. Much work for relatively few views. What is unstructed KB btw.?

    • @sylap
      @sylap 5 месяцев назад

      @codingcrashcourses8533 I meant unstructured databases or knowledge bases. Specifically, I’d love to learn more about handling unstructured data, and the best chunking and embedding strategies.
      I’m currently building an internal chatbot for my company using a classic RAG framework. Now, I want to incorporate a knowledge graph. I’m considering options like GraphRAG from Microsoft or Neo4j's Knowledge Graph, but both require at least GPT-4 and involve sharing data with them. What are the alternatives? Is there a reliable Graph-like option that is open source and does not collect data?
      I think covering topics on Local LLMs could attract more views!

    • @codingcrashcourses8533
      @codingcrashcourses8533  5 месяцев назад +1

      @@sylap My first question is: Are you sure want Graphrag? And why you consider it? Only because something popular right now, does not mean it´s the best solution for you

    • @sylap
      @sylap 5 месяцев назад

      @@codingcrashcourses8533 You’re right, popularity doesn't always mean it's the best fit, haha.
      I’m still new to this field and learning as I go. From what I understand, GraphRAG seems complex and expensive to integrate with RAG. I was just exploring options because my current RAG setup, with basic chunking, embedding, and semantic retrieval, struggles with complex data where answers need to be pulled from multiple documents.

  • @ruchirtidke4131
    @ruchirtidke4131 5 месяцев назад

    Love this, very informative.

  • @AbhishekSingh-pj1oo
    @AbhishekSingh-pj1oo 5 месяцев назад

    Markus, you are a legend!!! This was very helpful, JUST like your other videos!!! Allow me to suggest topics for more videos. So, LangGraph is obviously great, so is LangChain, and so is LangSmith. But LangSmith isn't free. There are alternatives like Langfuse and Arize emerging. They use callbacks and I guess tracers module in langchain_core. The problem is it's rarely documented. If you can make a video, just like you did with runnable interface, for callbacks or tracers, then it will be very helpful. If you wanna provide more and go out of syllabus, you can integrate it with ELK stack or datadog or similar and replicate the logging behavior of all the chain internals or something. Anyway, I love your videos! Waiting for the next one!!!!!

    • @codingcrashcourses8533
      @codingcrashcourses8533  5 месяцев назад +1

      Hm, I already covered LangFuse in my "LangChain in Production" Video :). I have not worked with the tracers from Langchain and I also not nothing in the docs about them.

    • @codingcrashcourses8533
      @codingcrashcourses8533  5 месяцев назад +1

      you can integrate it with ELK stack or datadog or similar and replicate the logging behavior of all the chain internals or something
      I have not yet done anything with these topics ^^

    • @AbhishekSingh-pj1oo
      @AbhishekSingh-pj1oo 5 месяцев назад

      @@codingcrashcourses8533 I realised that you have LangFuse covered after commenting, lol. This channel is the most comprehensive guide+aggregated references for production TO DATE! Bigggg Thanks!!!!!

    • @codingcrashcourses8533
      @codingcrashcourses8533  5 месяцев назад +1

      @@AbhishekSingh-pj1oo thank you^^

    • @AbhishekSingh-pj1oo
      @AbhishekSingh-pj1oo 5 месяцев назад

      @@codingcrashcourses8533 I knowww, there is literally nothing on Tracers in Docs. The api reference just mentions it succinctly. It inherits from BaseCallback class. Anyway, callbacks and tracers are the only thing left which isn't on youtube. (and also beta features like caches) (and also production best practice for reducer functions for state keys)

  • @CK.23.
    @CK.23. 5 месяцев назад

    Great as always. Thanx.

  • @FredericusRex-f2p
    @FredericusRex-f2p 5 месяцев назад

    Hy there! Markus, will you cover meta's new llama-agentic-system too? 🙏 plzZz, you teach like no other 🙃😊
    Llama3.1 8b q8 function calls seems to work absolutely flawlessly. Thx for all your hard work!

    • @codingcrashcourses8533
      @codingcrashcourses8533  5 месяцев назад +1

      May you provide a link? I currently dont know what is the difference to this video

    • @reinerzufall3123
      @reinerzufall3123 5 месяцев назад

      @@codingcrashcourses8533 btw., have you seen "frdel/agent-zero"? This lightweight framework is unbelievable. I don't understand it completely, but it seems that there is just one agent, and this agent creates alle agents at runtime and fully autonomously?! I'm so impressed by agent-zero like no other framework so far. This is unbelievable what this "thing" can do.

    • @reinerzufall3123
      @reinerzufall3123 5 месяцев назад

      @@codingcrashcourses8533 my first message was deleted from YT or someone else 😂🤷‍♀
      maybe the link was a problem dunno... i was talking about "meta-llama/llama-agentic-system" on github ^^

  • @awakenwithoutcoffee
    @awakenwithoutcoffee 5 месяцев назад

    like usual a great video ;) Question: wouldn't it make more sense to check the initial query for rewriting instead post retrieval ? this is my preference for 2 reasons: it allows users to not be very specific with their initial query and by rewriting the first query using the first retrieval will already be good. Using a fast smaller model like Command-R with Groq the speed is negligent.

    • @codingcrashcourses8533
      @codingcrashcourses8533  5 месяцев назад +1

      The rephrasing process is done to get documents if you don´t find any. You probably mean rephrasing follow up questions to even be able to perform retrieval. In this codebase I did not take history into consideration.

    • @awakenwithoutcoffee
      @awakenwithoutcoffee 5 месяцев назад

      @@codingcrashcourses8533 thanks for clarifying this. I admit I didn't find time to dive into the code and made assumptions based on the diagram.

    • @codingcrashcourses8533
      @codingcrashcourses8533  5 месяцев назад +1

      @@awakenwithoutcoffee haha, maybe I should make videos in the fireship style in the future >_

  • @artur50
    @artur50 5 месяцев назад

    would you provide a simple UI for this code?

    • @codingcrashcourses8533
      @codingcrashcourses8533  5 месяцев назад +2

      I will soon released a video on FastHTML, a simple framework for Python. You can use that for integrating it :)

    • @artur50
      @artur50 5 месяцев назад

      @@codingcrashcourses8533 superb!

  • @ki-werkstatt
    @ki-werkstatt 5 месяцев назад

    Great video Markus!
    RAG and llama 3.1 8b worked totally fine for me! Just the token window with 128 k is a joke! Using more than 2000 characters (!) leads to unusable results. No difference to llama 3.0.
    One question: why are you always return the entire state (graph nodes)? I always do the exact opposite. I just return the changed values. What is benefit? Thx

    • @SpenceDuke
      @SpenceDuke 5 месяцев назад

      I had very satisfactory results using 3.1 8b 8bit for 100k token RAG querying

    • @codingcrashcourses8533
      @codingcrashcourses8533  5 месяцев назад

      128k is pretty good for such a model. Only one year ago we had 8k and 16k.
      I return always the complete state, because the return value of a node updates the state with the return value and I dont want to lose information. It is just less error prone and I dont have to think about where data gets lost. I don´t save a lot in the state, so this should be no problem.

    • @ki-werkstatt
      @ki-werkstatt 5 месяцев назад

      @@SpenceDuke Hm, maybe it's the language. I tried with German text and RAG. Typical promt "you a a great... Answer the question: ... context: ..."
      Everytime the retirieved context is let's say > 800 token llama 8b doesn't even remember the question :-(

    • @SpenceDuke
      @SpenceDuke 5 месяцев назад +1

      @@ki-werkstatt I can't pretend to know the performance expectations with languages other than English, but it sounds like there's a problem with your setup separate from the raw underlying choice LLM. Perhaps you could try the same tests with the text translated to English to determine if the language is causing issues.
      My thought is that something is wrong with the config, environment of the running model. It should not have any issues reasoning over context 8192 tokens long at the minimum. It is possible that either the LLM is not actually receiving the full contents of your prompt, instead receiving some truncated portion missing important context, or its possible that however you've configured and run this model is misaligned.

    • @ki-werkstatt
      @ki-werkstatt 5 месяцев назад

      ​@@SpenceDuke Good idea! I installed ollama in a docker on my laptop. After a few noisy minutes ;-) I thought this is the right direction but then I got this answer again (translated):
      "I'm sorry, but the amount of text provided is too much to process in one answer to one question ... However, if you have specific questions, I can try to help you with the information provided. Please ask your question as precisely as possible so that I can better understand what you want to know"
      Seems to be the language.
      Thank you very much SpenceDuke

  • @Mostafa_Sharaf_4_9
    @Mostafa_Sharaf_4_9 5 месяцев назад

    Please make a video about docker