Building Corrective RAG from scratch with open-source, local LLMs

Поделиться
HTML-код
  • Опубликовано: 12 янв 2025

Комментарии • 69

  • @NS-tr9ej
    @NS-tr9ej 10 месяцев назад +15

    I did not find before a simpler and more practical explanation of running local model and using Langchain. Congrats

    • @ahmed_hefnawy1811
      @ahmed_hefnawy1811 9 месяцев назад

      is that solution solve multihop question that happend in basic RAG !?

  • @joshuacunningham7912
    @joshuacunningham7912 10 месяцев назад +2

    I love Lance’s teaching style. He makes this topic so accessible! 👏

  • @jim02377
    @jim02377 11 месяцев назад +9

    That is one of best explanations of using local models, langchain and langGraph that I have seen. Awesome job!

    • @priyanshugarg6175
      @priyanshugarg6175 10 месяцев назад +2

      Hi. Would we need a high end pc to run models locally ? My computer has only 4gb RAM which I suspect would not be enough.

    • @jim02377
      @jim02377 10 месяцев назад

      It won't be. I ran it on an i5 with 16GB of RAM and it can process one token per second. It is better on a the same machine running 32GB but still a bit slow. Trying it on a M2 mac next and an old server with 80GBytes of RAM
      @@priyanshugarg6175

    • @JJN631
      @JJN631 10 месяцев назад

      ​@@priyanshugarg6175you need at least 8 ram and 8vram

    • @roscatres
      @roscatres 10 месяцев назад

      @@priyanshugarg6175 Yes, you do need a powerful computer. As a rule of thumb, you can run models that are smaller than your VRAM (not RAM, the memory of your GPU). There are exceptions and other ways, but yes, you need powerful computers for this.

    • @aladinmovies
      @aladinmovies 9 месяцев назад

      ​@@priyanshugarg6175with linux it's enough for small models not more than 1Gb

  • @michaelwallace4757
    @michaelwallace4757 11 месяцев назад +1

    Great job explaining the process and keeping it understandable to a non-programmer!

  • @SolidBuildersInc
    @SolidBuildersInc 8 месяцев назад

    Ok, this was a jewel of a find and I couldn't have said it better than you did in your closing.
    Up until now I was looking for a local model solution to perform the agents and only could find
    OpenAI for LainGraph.
    Your description of how each node is performing a specific discpline with an ability to have a Boolean to proceed to the next step is really simplifying the experience.
    I have been saying to myself how this is really just writing code with some functions so why is it so exciting?
    It's basically integrating the AI into our everyday coding sandbox that we are so familiar with and giving it that Python Syntactic sugar experience.
    Anyway I really really appreciate you sharing this approach. It is so straightforward and clean to just Build whatever you want and now maybe reuse the code for other models instead of agents.
    It's endless

  • @Egorfreeman
    @Egorfreeman 10 месяцев назад +3

    This is a very helpful and practical video. And I would be interested in seeing an implementation of a chat application using Langraph and FastAPI.

  • @donb5521
    @donb5521 11 месяцев назад +2

    Thanks for another great video and notebook driven tutorial. Appreciate the heads up on JSON mode in Ollama - a lot of great functionality built into their API.

  • @JoshuaMcQueen
    @JoshuaMcQueen 11 месяцев назад +3

    Your videos continue to be pure gold. Keep 'em coming!

  • @jofus521
    @jofus521 8 месяцев назад

    Thank you. Wow. This example is even in the TS/JS repo. That is so awesome

  • @jim02377
    @jim02377 10 месяцев назад +2

    Same question on the different configurations (The Windows ll PC is actually an i7). "Why is the sky blue". It took 3 minutes on the I7, 45 seconds on the old server and 17 seconds on the Mac.

    • @roscatres
      @roscatres 10 месяцев назад

      I assume only the Mac has a GPU?

    • @jim02377
      @jim02377 10 месяцев назад

      That is correct. I can't get a card for the old server and the battery life with an NVIDIA card in my laptop would have been terrible. I have heard the Mac has a GPU and I have heard it has a neural processor. I am not sure what the M2 has but it works well. Especially for only having 8gigs of RAM@@roscatres

  • @MikewasG
    @MikewasG 11 месяцев назад +1

    Thank you very much for your efforts, your videos are very helpful to me!

  • @aladinmovies
    @aladinmovies 9 месяцев назад +1

    Good explanation, only about the topic

  • @TournamentPoker
    @TournamentPoker 11 месяцев назад +1

    excellent video! Thanks for sharing!

  • @nguyenanhnguyen7658
    @nguyenanhnguyen7658 7 месяцев назад

    Rerank model that is trained on relevant dataset will help, and that is the most important

  • @shakilkhan4306
    @shakilkhan4306 10 месяцев назад +1

    Fantastic job bro.. Thank you.

  • @preben01
    @preben01 11 месяцев назад +1

    This was sooooo usefull, thank you so much for this.

  • @kevinkawchak
    @kevinkawchak 8 месяцев назад

    Thank you for the discussion.

  • @kenchang3456
    @kenchang3456 9 месяцев назад

    Thanks for this. I see the value and now I have a way to toy around with it and learn.

  • @nityasingh3
    @nityasingh3 11 месяцев назад +1

    Awesome explaination

  • @howechan4818
    @howechan4818 10 месяцев назад +1

    I really appreciate.

  • @harshgupta3641
    @harshgupta3641 9 месяцев назад

    Great explanation, why we used graph in the end to make the process works , can we do this through procedural or functional way ?

  • @pich1337
    @pich1337 10 месяцев назад +1

    Awesome! Thanks!

  • @jim02377
    @jim02377 10 месяцев назад +5

    I ran it on a i5 PC running windows 11 with 16GB of RAM and it can process 1 token per second. It is better with 32GB of RAM. I am trying an M2 Mac with 8GB next and an old i5 server with 80 GB of RAM.

    • @r.lancemartin7992
      @r.lancemartin7992 10 месяцев назад

      Got it. I run on an m2 with 32GB of ram.

    • @jofus521
      @jofus521 8 месяцев назад

      How about i7 with 128gb ram and 48 gb vram?

  • @wuhaipeng
    @wuhaipeng 11 месяцев назад +1

    great sharing!

  • @klncgty
    @klncgty 5 месяцев назад

    For example I want to set up a rag sistei . Thanks to this RAG and LLM, it will calculate the data in my query according to the data in the PDF. For this, I get different results every time. How can I make this consistent?

  • @eointolster
    @eointolster 10 месяцев назад +1

    Just starting the video but can we do rag on documents with check boxes. Like can we retrieve an answer as to whether a check box is ticked. Thank you

    • @r.lancemartin7992
      @r.lancemartin7992 9 месяцев назад

      (This is Lance from the video) I've never tested check box state in documents. If it's in an image, multi-modal RAG would work.

  • @TheBodybuildingG
    @TheBodybuildingG 2 месяца назад

    I did not get one thing…is the web search triggered even if one of the document is considered “not relevant” ?

  • @mohamedkeddache4202
    @mohamedkeddache4202 8 месяцев назад

    how can i see the source of the web search ?
    in langsmith i see the documents that are retrieved, but i didn't find the source of the web site that was retrieved from.

  • @lucianst.2444
    @lucianst.2444 11 месяцев назад +1

    Awesome explanation. Thank you! But I have one question. I’m pretty new in this field. Why did you choose to use mistral instruct on behalf of mistral?

    • @shashank1062
      @shashank1062 10 месяцев назад +1

      version of the model is fine-tuned for conversation and question answering.

  • @jma7889
    @jma7889 11 месяцев назад +1

    In production env, what kind of GPU machine do you recommend for low usage case? Can it run with CPU only machine? Thank you!

    • @aladinmovies
      @aladinmovies 9 месяцев назад +1

      Only with cpu you can run, 8gb ram nice, but 16 and more is good

  • @Jeganbaskaran
    @Jeganbaskaran 10 месяцев назад

    Any idea why langchain not integrate with LLingua. Is there any roadmap to implement this ??

  • @hari8568
    @hari8568 6 месяцев назад

    Is web search called when even one context is irrelevant?or does it have to be that all are irrelevant?

  • @drm2005
    @drm2005 10 месяцев назад

    How can i use an api of langchaine to my RAG i found only open ai api ?

  • @MidSeasonGroup
    @MidSeasonGroup 11 месяцев назад +1

    How close can RAG enabled agent automate actual software development. PDF review and summation is great but actually application development is what I want.

    • @r.lancemartin7992
      @r.lancemartin7992 10 месяцев назад +1

      Do you mean that you want a coding assistant? We have a project coming out soon focused on this - can ingest a codebase and perform question-answering w/ executable code.

    • @MidSeasonGroup
      @MidSeasonGroup 10 месяцев назад

      Hi Lance @@r.lancemartin7992 thanks for getting back to me. That’s correct, an effective coding assistant for new and existing codebases. So many of our applications need to be refactored or rewritten altogether. I’m glad this is on your roadmap and looking forward to the beta.

    • @MidSeasonGroup
      @MidSeasonGroup 10 месяцев назад

      @@r.lancemartin7992 Yes, when is the project in beta.

  • @binstitus3909
    @binstitus3909 10 месяцев назад +1

    Can we build this application in production using the same approach as you do, downloading the model MistralAI? Can I use this in my production, or should I use APIs to access the model?

    • @choiswimmer
      @choiswimmer 9 месяцев назад +1

      If you are here asking langchain this question means you fundamentally don't know or understand what it takes to deploy these models

    • @r.lancemartin7992
      @r.lancemartin7992 9 месяцев назад

      (This is Lance from the video.) In production, it is typically user to use API unless you are deploying the model yourself.

  • @chitrakshakaushik3266
    @chitrakshakaushik3266 10 месяцев назад

    It doesn't work on windows as when we import WebBaseLoader it throw an error no module named PWD

  • @laurenstaples7778
    @laurenstaples7778 8 месяцев назад

    is anyone else trying this out on a Windows machine? I'm having an error "RuntimeError: Failed to generate embeddings: locale::facet::_S_create_c_locale name not valid" on the Index section code, seems to be a windows issue....

    • @laurenstaples7778
      @laurenstaples7778 8 месяцев назад

      I ended up installing Windows SubSystem Linux and running in the Linux environment and was able to get it to compile there!

  • @ganeshkamath89
    @ganeshkamath89 9 месяцев назад

    I think the function should have returned Yes / No mapped to specific question instead of an overall Yes / No.
    Then the calling function should have done web search only for the questions which had got a No.

  • @roi365
    @roi365 9 месяцев назад

    How is it local if we need api keys?

    • @r.lancemartin7992
      @r.lancemartin7992 9 месяцев назад

      (This is Lance from the video) API keys are only needed if you are using Mistral API. Not for local.

  • @akshatsingh6036
    @akshatsingh6036 10 месяцев назад

    You are not using format_docs function in generate fun.
    =====================
    your code:
    generation = rag_chain.invoke({"context": documents, "question": question})
    =====================
    Should be this:
    generation = rag_chain.invoke({"context": format_docs(documents), "question": question})

    • @r.lancemartin7992
      @r.lancemartin7992 9 месяцев назад

      (This is Lance from the video) Thanks, I'm updating the notebook and fixing.

  • @jawadmansoor6064
    @jawadmansoor6064 10 месяцев назад

    what's with the weird logo? i liked the parrot now it is pirate ninja

  • @GeoffLadwig
    @GeoffLadwig 10 месяцев назад

    nice

  • @EddyLeeKhane
    @EddyLeeKhane 10 месяцев назад

    anyone else has the issue that it runs indefinetly in jupyter notebook when asking bigger questions or letting it compare?
    my prompt:
    "Find us two more related papers and give a short summary for each"

    • @r.lancemartin7992
      @r.lancemartin7992 9 месяцев назад

      (This is Lance from the video.) Interesting, I haven't seen that. I'm going to update the notebook w/ a few things. If easy to share your code, I can have a look.