Reliable, fully local RAG agents with LLaMA3.2-3b

Поделиться
HTML-код
  • Опубликовано: 22 дек 2024

Комментарии • 60

  • @christopherhartline1863
    @christopherhartline1863 9 дней назад +1

    Dude. You're the man. I've gone through most of your LangChain course and lots of the YT content. You're ... you have a knack for teaching.

  • @homeandr1
    @homeandr1 2 месяца назад +17

    Great explanation, would be great to do one more tutorial using multimodal local RAG, considering the different chunks like tables, texts, and images, where you can use unstructured, Chroma, and MultiVectorRetriever completely locally.

  • @ytaccount9859
    @ytaccount9859 2 месяца назад +5

    Awesome stuff. Langgraph is a nice framework. Stoked to build with it, working through the course now!

  • @ravivarman7291
    @ravivarman7291 2 месяца назад +2

    Amazing session and content explained very nicely in just 30 mins; Thanks so much

  • @joxxen
    @joxxen 2 месяца назад +1

    You are amazing, like always. Thank you for sharing

  • @becavas
    @becavas 2 месяца назад +8

    Why did you use lama3.2:3b-instruct-fp16 instead of lama3.2:3b?

  • @sunderrajan6172
    @sunderrajan6172 2 месяца назад +1

    Beautifully done; thanks

  • @leonvanzyl
    @leonvanzyl 2 месяца назад +27

    The tutorial was "fully local" up until the moment you introduced Tavily 😜😉.
    Excellent tutorial Lance 👍

    • @sergeisotnik
      @sergeisotnik 2 месяца назад

      Any internet search, by definition, is no longer local. However, embeddings here are used from a third-party service (where only the first 1M tokens are free).

    • @starbuck1002
      @starbuck1002 2 месяца назад +4

      @@sergeisotnik Hes using nomic-embed-text embedding model locally, so there is no token cap at all.

    • @sergeisotnik
      @sergeisotnik 2 месяца назад +8

      @@starbuck1002 It looks like you're right. I saw that `from langchain_nomic.embeddings import NomicEmbeddings` is used, which usually means an API call. But in this case, the initialization is done with the parameter `inference_mode="local"`. I didn’t check the documentation, but it seems that in this case, the model is downloaded from HuggingFace and used for local inference. So, you’re right, and I was wrong.

  • @LandryYvesJoelSebeogo
    @LandryYvesJoelSebeogo Месяц назад +1

    may GOD bless you Bro

  • @marcogarciavanbijsterveld6178
    @marcogarciavanbijsterveld6178 Месяц назад +2

    I'm a med student interested in experimenting with the following: I'd like to have several PDFs (entire medical books) from which I can ask a question and receive a factually accurate, contextually appropriate answer-thereby avoiding online searches. I understand this could potentially work using your method (omitting web searches), but am I correct in thinking this would require a resource-intensive, repeated search process?
    For example, if I ask a question about heart failure, the model would need to sift through each book and chapter until it finds the relevant content. This would likely be time-consuming initially. However, if I then ask a different question, say on treating systemic infections, the model would go through the entire set of books and chapters again, rather than narrowing down based on previous findings.
    Is there a way for the system to 'learn' where to locate information after several searches? Ideally, after numerous queries, it would be able to access the most relevant information efficiently without needing to reprocess the entire dataset each time-while maintaining factual accuracy and avoiding hallucinations.

    • @JesterOnCrack
      @JesterOnCrack Месяц назад

      I'll take a minute to try and asnwer your question to the best of my ability.
      Basically, what you are describing are ideas that seem sound for your specific application, but are not useful everywhere. Whenever you restrict search results, there is a chance you're not finding the 1 correct answer you needed. Speaking from experience, even a tiny chance of not finding what you need is enough to deter many customers.
      Of course, your system would have a tradeoff in efficiency - completing queries quicker.
      Bottom line is, there are ways to achieve this with clever data- and AI-engineering. I don't think that there is a single straightforward fix to your problem though.

    • @ChigosGames
      @ChigosGames 7 дней назад

      Maybe you could filter first onmetadata to make the search easier.

    • @rose.sinclair
      @rose.sinclair 6 дней назад

      If thats the case, why not just fine tune and train over it rather than using RAG?

  • @VictorDykstra
    @VictorDykstra 2 месяца назад

    Very well explained.😊

  • @thepeoplesailab
    @thepeoplesailab 2 месяца назад

    Very informative ❤❤

  • @SavvasMohito
    @SavvasMohito 2 месяца назад

    That's a great tutorial that shows the power of LangGraph. It's impressive you can now do this locally with decent results. Thank you!

  • @davesabra4320
    @davesabra4320 Месяц назад

    Thanks it is indeed very cool. Last time you used 32Gb , do you think this will run with 16Gb? memory.

  • @AlexEllis
    @AlexEllis 2 месяца назад

    Thanks for the video and sample putting all these parts together. What did you use to draw the diagram at the beginning of the video? Was it generated by a DSL/config?

  • @Togowalla
    @Togowalla 2 месяца назад

    Great video. What tool did you use to illustrate the nodes and edges in your notebook?

  • @developer-h6e
    @developer-h6e Месяц назад

    is it possible to make agent that when provide with few hundred links extracts info in all links and store it

  • @beowes
    @beowes 2 месяца назад

    Question: You have operator.add on the loopstep, but tnen increment the loopstep in the state too… am i wrong in that it would then incorrect?

  • @sidnath7336
    @sidnath7336 2 месяца назад

    If different tools require different key word arguments, how can these be passed in for the agent to access?

  • @hari8568
    @hari8568 2 месяца назад

    Is there an elegant way to handle recursion errors?

  • @fernandobarros9834
    @fernandobarros9834 2 месяца назад

    Great tutorial! Is it necessary to add a prompt format?

    • @skaternationable
      @skaternationable 2 месяца назад +1

      Using PromptTemplate/ChatPromptTemplate works as well. It seems that the .format here is equivalent to the `input_variables` param within the former 2 classes

    • @fernandobarros9834
      @fernandobarros9834 2 месяца назад

      @@skaternationable Thanks!

  • @arekkusub6877
    @arekkusub6877 Месяц назад

    Interesting, you basically use old school workflow to orchestrate the steps of LLM based atomic tasks. But what about to let the LLM to execute the workflow and also to perform all required atomic tasks? That would be more like agentic approach...

    • @tellallimedium
      @tellallimedium 13 дней назад

      LLM by itself are not capable to execute workflows but can follow predefined workflows in text-based interactions. LLMs by themselves cannot interact with external systems or execute tasks like sending emails, making API calls, or controlling devices unless integrated with tools. So defining workflows and native integrations are required.

    • @arekkusub6877
      @arekkusub6877 13 дней назад

      @@tellallimedium Well, LLM by itself is only capable to give you another token when you present him a set of tokens :-)

  • @andresmauriciogomezr3
    @andresmauriciogomezr3 2 месяца назад

    thank you

  • @henson2k
    @henson2k 2 месяца назад +1

    You make LLM to do all hard work for candidates filtering

  • @adriangpuiu
    @adriangpuiu 2 месяца назад +4

    @lance, please add langgraph documentation to the chat. the community will appreciate that. Let me know what you think

  • @johnrogers3315
    @johnrogers3315 2 месяца назад

    Great tutorial, thank you

  • @jamie_SF
    @jamie_SF 2 месяца назад

    Awesome

  • @ephimp3189
    @ephimp3189 2 месяца назад

    Is it possible to add a "fact checker" method? What if the answer is obtained from a document that gives false information? it would technically answer the question, just not be true

    • @ChigosGames
      @ChigosGames 7 дней назад

      He's doing that with the graders

  • @serychristianrenaud
    @serychristianrenaud 2 месяца назад

    thanks

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 2 месяца назад

    Can you consider doing an example of contextual retrieval that Anthropic recently introduced.

  • @ericlees5534
    @ericlees5534 Месяц назад

    why does he make it so easy..

  • @aiamfree
    @aiamfree 2 месяца назад

    it's sooooo fast!

  • @ghostwhowalks2324
    @ghostwhowalks2324 2 месяца назад +1

    amazing stuff which can be done with few lines of code. disruption coming everywhere

  • @ameraid05
    @ameraid05 19 дней назад

    So you're telling me it was trained on all of the internet just to search the internet again

  • @HarmonySolo
    @HarmonySolo 2 месяца назад +13

    LangGraph is too complicated, you have to implement State, Node etc. I would prefer to implement the Agent workflow by myself, which is mush easier at least I do not need to learn how to use LangGraph

    • @generatiacloud
      @generatiacloud Месяц назад +1

      Any repo to share?

    • @RazorCXTechnologies
      @RazorCXTechnologies Месяц назад

      Excellent tutorial! Another easier option is to use n8n instead because it has Langchain integration with AI agents built in and almost no code required to achieve same functionality. N8n also has automatic chatbot interface and webhooks.

    • @kgro353
      @kgro353 Месяц назад

      langflow is best solution

    • @stanTrX
      @stanTrX 11 дней назад

      You are right, i like langchain too

    • @ChigosGames
      @ChigosGames 7 дней назад +1

      It's just a framework to help you architect the flow. Nothing is stopping you from creating it from scratch

  • @rorycawley
    @rorycawley 29 дней назад +1

    I had to change the code (remove .content) and:
    ```python
    # from langchain_ollama import ChatOllama
    from langchain_ollama.llms import OllamaLLM
    local_llm = "llama3.2:3b-instruct-fp16"
    # llm = ChatOllama(model=local_llm, temperature=0)
    # llm_json_mode = ChatOllama(model=local_llm, temperature=0, format="json")
    llm = OllamaLLM(model=local_llm, temperature=0)
    llm_json_mode = OllamaLLM(model=local_llm, temperature=0, format="json")
    ```

  • @_Learn_With_Me_EraofAI
    @_Learn_With_Me_EraofAI 2 месяца назад +1

    Unable to access chat ollama

  • @HELLODIMENSION
    @HELLODIMENSION 2 месяца назад +1

    You have no idea how much u saved me 😂 salute 🫡 thank u.