Locally-hosted, offline LLM w/LlamaIndex + OPT (open source, instruction-tuning LLM)

Поделиться
HTML-код
  • Опубликовано: 11 дек 2024

Комментарии • 86

  • @gcsk1982
    @gcsk1982 Год назад +2

    Love the video. Very helpful. Like your style of adding in specific details. But certainly a little slower.

    • @SamuelChan
      @SamuelChan  Год назад

      Thank you! I’m a total amateur at this video thing and plenty of room for improvement so I really appreciate the feedback

  • @deathstar4794
    @deathstar4794 Год назад +3

    I have tried local LLMs privateGPT and GPT4All, their response time and accuracy both were terrible. Response time is 40-90 seconds with a local index, model size is 8gb

    • @SamuelChan
      @SamuelChan  Год назад +2

      You’re running it local (downloaded the model to local) on a GPU?
      PrivateGPT isn’t a LLM model it’s a library so when you say it’s terrible it’s the underlying model - not the fault of privateGPT. What model did you choose and could you go with a bigger model? :)

  • @c0mpuipf
    @c0mpuipf 8 месяцев назад

    crazy how none of these are relevant anymore; i get all sorts of errors after installing llama_index (newest version as of today) - i see many deprecations - PromptHelper and others; but this has been a nice lesson though, i'm subscribed now

  • @remoteree
    @remoteree Год назад +1

    Excellent video, it was really easy to follow along!

  • @hamtsammich
    @hamtsammich Год назад +1

    So, how do I import pdfs?

  • @AstronautAJC
    @AstronautAJC Год назад +2

    When I'm trying this I am met with "AttributeError: 'GPTListIndex' object has no attribute 'query' " or an Attribute Error for "save_to_disk".
    Any help??

    • @SamuelChan
      @SamuelChan  Год назад

      hey, have you compared your code to the one on github? without seeing your code i can't tell, but the gist of it:
      index = GPTListIndex.from_documents(docs, service_context=service_context)
      index.query()
      Note that .query() is a method, not an attribute, so see that you did not miss out the parenthesis!
      Here's the full code for this video (and for the rest of the LLM tutorial series on my channel)
      github.com/onlyphantom/llm-python/blob/main/7_custom.py

    • @larawehbee
      @larawehbee Год назад +2

      For me, i did the follwing in execute_query():
      query_engine = index.as_query_engine()
      response = query_engine.query(
      And it worked

    • @SamuelChan
      @SamuelChan  Год назад +1

      @@larawehbee yes this work too. This is also the more common way I use the query engine. The error msg indicates that OP is calling query as an attribute so I'm guessing it's the missing () parentheis indicating a method call.

    • @AstronautAJC
      @AstronautAJC Год назад

      @@SamuelChan This is my code right now for that chunk so to speak
      @timeit()
      def execute_query():
      response = index.query("What did the president say?"
      #exclude_keywords=[""],
      #required_keywords[""],
      )
      ##response_mode="no_text"
      return response

    • @SamuelChan
      @SamuelChan  Год назад

      From the error message it looks like you missed out the parenthesis when calling query()
      Can you confirm? :) otherwise will need you to upload the code to GitHub so I can hop in and edit it for you / give you more targeted troubleshooting

  • @sebastiangoslin6736
    @sebastiangoslin6736 Год назад +2

    Hey man, Great video! You mentioned starting at 26:25, chaining another LLM to handle the natural language generation (not using chat GPT tokens). How would you do this? I followed your process from here and GitHub with my own LLM, and when I get to the part to output the response, Im wondering how to pass my own local LLM to the response instead of defaulting to openai, because as far as I'm understanding it isn't "completely" local and offline. Is it just passing a re-instantiated model or more complex? Looking forward to more content, and hopefully you can include this in a new video!

    • @SamuelChan
      @SamuelChan  Год назад +3

      Hey Sebastian, thank you! There are a number of open source LLM that you can pull from huggingface that you can pull to do the text generation. I believe video 3 in my LLM playlist covers this scenario:
      ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS
      I trim this video down to fit within a specific timeframe and kinda regretted it afterwards. In the untruncated version, it is demonstrably clear it runs off your local machine. If you inspect my GitHub code at the top of the script you see where the local path of this model could be stored (cache location).
      My recording process usually take me down these long tangents and then I realize 90% of viewers probably don’t care and find it boring, so I remove those parts. Also another way to confirm this is to enable logging (import logging) and then log your tokens usage - this verifies the same idea.
      I am on video 8 now of the LLM playlist and focusing on automating away aspects of my life that are the most important / time consuming. Video 8 is about using LlamaIndex and LangChain to build myself the perfect language learning app and video 9 is about making my personal diary “chattable / queryable”. For video 10 I’ll try and revisit this over here and maybe do a more elaborated example on this subject kinda as a follow up? Will have to see if I can come up with good use-cases in my daily life though! :)

    • @EmilTheLongshoreman
      @EmilTheLongshoreman Год назад +2

      I am also interested in this solution!

    • @SamuelChan
      @SamuelChan  Год назад +1

      I’ll see what I can do Nicholas!

    • @EmilTheLongshoreman
      @EmilTheLongshoreman Год назад +2

      I found a solution that's working for me - put this in the if __main__ == "__name__" section:
      if not os.path.exists(filename):
      print("No local cache. Downloading embeddings from huggingface...")
      index = create_index()
      index.save_to_disk(filename)
      else:
      print("Loading cached embeddings...")
      # instatiating the llm here avoids having to use OpenAI key
      llm = LLMPredictor(llm=LocalLLM())
      service_context = ServiceContext.from_defaults(llm_predictor=llm, prompt_helper=prompt_helper)
      index = GPTListIndex.load_from_disk(filename, service_context=service_context)

    • @SamuelChan
      @SamuelChan  Год назад +1

      Thank you for sharing Nicholas! If you’re open to submitting a PR to the repo with the suggestion above I’d accept it! 😊

  • @3nityC
    @3nityC Год назад

    What is the biggest LLM currently available that I can use this and embed to my Wordpress site? Tutorial you know of? I have to finish my project ASAP

  • @haidara77
    @haidara77 Год назад +1

    Hey man! just wondering if this LLM run locally without needing to search things up online, does it work for school work when you don't have internet access

    • @SamuelChan
      @SamuelChan  Год назад

      Yeah we can download the whole LLM model and the corresponding weights to a local drive so no data transfer between your machine and another remote machine. No connection required.

  • @ScorpoxOfficial
    @ScorpoxOfficial Год назад

    I seem to get issues in regards of "index = GPTListIndex.load_from_disk(
    AttributeError: type object 'ListIndex' has no attribute 'load_from_disk'", also similar for save_to_disk. There have been some changes, tried to modify accordingly but something is bugging. Any solution? Thanks!

    • @sinedeiras
      @sinedeiras 10 месяцев назад

      check another thread from @AstronautAJC below

  • @johannamenges3095
    @johannamenges3095 Год назад +1

    Great video! Is it also possible to add private data and train the LLM on my own data?

    • @SamuelChan
      @SamuelChan  Год назад

      Yeah absolutely.
      The entire LLM series is about building with your own data! Check out some of the videos in this playlist:
      LangChain & LLM tutorials (ft. gpt3, chatgpt, llamaindex, chroma)
      ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS
      Hope that’s helpful! 😊

  • @MohitSingh-ij5vq
    @MohitSingh-ij5vq Год назад

    Thanks for uploading the video

  • @eduardmart1237
    @eduardmart1237 Год назад +1

    How to tech it on custom data?

    • @SamuelChan
      @SamuelChan  Год назад

      Hey Eduard, all the videos in this playlist uses LLM on custom data! :) We instruct the code to create embeddings out of our own private data and then ask questions about it.
      LangChain & LLM tutorials (ft. gpt3, chatgpt, llamaindex, chroma)
      ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS

  • @JohnKwan-e9h
    @JohnKwan-e9h Год назад +1

    Hey nice video. Can you do a tutorial on how to implement offline LLM/chatGPT-like on a mobile device (ios)? Using something like mobileBERT/mediapipe to run LLM on mobile device with summarization/Q&A on data/files within the device, without internet access. thx!

    • @SamuelChan
      @SamuelChan  Год назад

      Mobile development is a little out of my reach; I dabbled in Swift for iOS programming a little back when Swift first came out but haven’t done much since 😬

    • @AstronautAJC
      @AstronautAJC Год назад +2

      I MIGHT be able to help you with this, but, I would use Flutter as it can work with Python

    • @SamuelChan
      @SamuelChan  Год назад +1

      @astronaut I might take you up on that offer too! Exciting times!

    • @JohnKwan-e9h
      @JohnKwan-e9h Год назад +1

      @@AstronautAJC A tutorial on this would be much appreciated! Ideally using Flutter would work better for cross-platform...

  • @jamiesmith8927
    @jamiesmith8927 Год назад +1

    Great video, and you are a very good communicator. One question I have is, how is the opt model stored locally here? I can't see where the model is stored in local storage. I see that you provide the model name when you generate the pipeline but I can't see where the model is retrieved from locally.

    • @SamuelChan
      @SamuelChan  Год назад +2

      Hey Jamie, thank you!
      the opt model is stored locally on your Home/.cache/huggingface/hub by default (i'm using linux, so this may be OS-dependent). You will find all downloaded models stored in this cache folder and called from here when it's required again.
      For the OPT model, the 60gb size may make it infeasible and you might have to change the cache location. You do so with:
      # os.environ["TRANSFORMERS_CACHE"] = "/media/samuel/external_drive/transformers_cache"
      Every time your code runs, it looks for that location instead.
      Hope this helps! The sample code is also on my github repo (the first 7 lines of code communicates this as well!)
      github.com/onlyphantom/llm-python/blob/main/7_custom.py

    • @jamiesmith8927
      @jamiesmith8927 Год назад

      @@SamuelChan Thank you very much, I understand now

    • @SamuelChan
      @SamuelChan  Год назад +1

      Jamie, thank you! Brings joy to know the work we do is useful to others - so thank you! 😊

    • @ratralf4738
      @ratralf4738 Год назад +1

      @@SamuelChan where does it store the local model for windows as I am using windows 11. And will this code download the model or can I download and store it on my local folder to have better control

    • @SamuelChan
      @SamuelChan  Год назад

      @@ratralf4738 yeah you can change where you want to download / store / source the model:
      export TRANSFORMERS_CACHE=/whatever/path/you/want
      My comment above also points to line 7 of the code provided in the repo so you can just change that line! :)

  • @drsamhuygens
    @drsamhuygens Год назад

    Awesome video. Can I use this model in haystack to generate embedding for a large number of documents?

    • @SamuelChan
      @SamuelChan  Год назад

      Yes, but perhaps a task that is better done through Haystack Agent than through LlamaIndex or LangChain; haystack.deepset.ai/blog/introducing-haystack-agents
      I poked around Haystack and this is the code (line 59)
      github.com/deepset-ai/haystack/blob/7c5f9313ff5eedf2b40e6211e3d41f2f9f134ba3/haystack/nodes/prompt/providers.py#L59 (The implementation can be a simple invocation on the underlying model running in a local runtime, or could be even remote)
      Your generated embeddings could still be something like Pinecoin, Livius, Chroma etc, and the underlying LLM could be a local LLM you downloaded off huggingface (like shon in this video).
      Hope that helps!

    • @drsamhuygens
      @drsamhuygens Год назад

      @@SamuelChan Thanks. Can't wait to watch the rest of your videos :)

    • @larawehbee
      @larawehbee Год назад

      @@SamuelChan Is there a video for integrating llama alpaca with haystack to apply custom docs semantic search?

    • @SamuelChan
      @SamuelChan  Год назад

      Hey Lara, the whole LLM series (esp the How Embeddings Work) goes through lots of examples - not directly with Haystack but it is quite transferable. When I have time I’ll work on one with haystack as the search engine! Traveling quite a bit so can’t commit yet to anything :)

    • @larawehbee
      @larawehbee Год назад

      @@SamuelChan Amazing! Thank you so much for the valuable information and content. looking forward to future videos! Best of luck

  • @larawehbee
    @larawehbee Год назад

    Great Video! It is super informative thanks. However, how can i use the indexer without openai api key ? I want to have a fully offline on-premises solution without the need for openai api key

    • @SamuelChan
      @SamuelChan  Год назад +1

      Hey take a look at Nicholas’ suggestion in the comment section. He also made a pull request to the repo so if you’re using the code from my GitHub repo you would be able to run this without the OpenAI requirement. The main change is to move the LLM instantiate call to the if __name__ == __main__ loop.

    • @larawehbee
      @larawehbee Год назад +1

      @@SamuelChan Thank you! Is there a way to integrate alpaca lora llama-7b-hf in haystack and use it on premises offline? without any api key or internet connection ?

    • @SamuelChan
      @SamuelChan  Год назад

      Yes possible. You will need a local LLM that does the embeddings (ie Sentence Transformer, through Chroma Interface or directly through transformer module (hugging face)). Through Chroma you won’t need a key. Through huggingface you would still need a huggingface inference token.

    • @larawehbee
      @larawehbee Год назад +1

      @@SamuelChan Great. Do you think its good to go with Chroma interface for a production instance? that is it might need to be highly scalable ..

    • @larawehbee
      @larawehbee Год назад +1

      And, what do you recommend as an open source vector db as an alternative to pinecone?

  • @jobautomation
    @jobautomation Год назад

    Thaaanks!!

  • @hyperbolictimeacademy
    @hyperbolictimeacademy Год назад

    Nice mechanical keyboard , good video.

    • @SamuelChan
      @SamuelChan  Год назад

      Thank you! It’s a Keychron Q1 (brown switches) ⌨️

  • @RedShipsofSpainAgain
    @RedShipsofSpainAgain Год назад +1

    Great vid, Samuel. The OpenAI API key charges based on the number of tokens and queries made to the LLM, right?
    How much does it cost to use and query the OpenAI API? This OPT model is free to use. So is there any disadvantage of using this OPT model vs the OpenAI GPT model?

    • @SamuelChan
      @SamuelChan  Год назад +3

      The OpenAI charges on the number of tokens and the type of model you use
      (but not queries, meaning you might make one big query that costs more than 10 smaller queries)
      The two ways that cost you money is:
      1. The embeddings: to generate numerical representation of your data, you can use OpenAI’s embedding service. This costs money (based on num of tokens), and has a max cut off
      2. Text generation: asking LLM to generate new sentences.
      openai.com/pricing the more powerful the model the more expensive they are. If you watch the “How Embeddings Works” video in this LLM series, I explained it in more details.
      The OPT model is an open source attempt at OpenAI’s proprietary models. You have to run it on your local machine or rent some cloud computing resources to run it, which one might argue is a disadvantage. Depending on your view on privacy, it actually might be an advantage not having to send data over onto OpenAI’s server though.

    • @joaoalmeida4380
      @joaoalmeida4380 Год назад

      There is any open source model for chatbot/QA apart of OpenAi that we don’t need an API for using chatgpt? Thank you

    • @SamuelChan
      @SamuelChan  Год назад +2

      If you use chatgpt you’ll always need an API since it’s an OpenAI product.
      You could use the open source LLM models from huggingface and if you have the computational resources to run it locally (ideally with CUDA and enough VRAM) then that’s worth the try too!

    • @joaoalmeida4380
      @joaoalmeida4380 Год назад +1

      @@SamuelChan thank you, I’ll try buying the chatgpt api to use directly.

  • @abcd2574
    @abcd2574 Год назад

    Do we require GPU for this.
    Will this work on Mac M1,M2

    • @SamuelChan
      @SamuelChan  Год назад +1

      Doesn’t require a CUDA driver / dedicated GPU.
      This will work on Mac. If you have a CUDA compatible graphics card, turn on that device=“cuda” setting. Otherwise, turn it off and it works normally (but will be slower since you’re running on a graphics integrated cpu).

    • @abcd2574
      @abcd2574 Год назад

      @@SamuelChan thanks a lot

    • @abcd2574
      @abcd2574 Год назад +1

      @@SamuelChan
      Thanks mate.
      One more doubt.
      Actually pytorch libraries which would require CUDA can be trigerred using META in m2 right?

    • @SamuelChan
      @SamuelChan  Год назад +1

      if torch.backends.mps.is_available():
      mps_device = torch.device("mps")
      This requires you to build PyTorch from source. MPS stands for Metal Performance Shaders.
      I have a full playlist on PyTorch with 6-8 videos if you’d like to learn more about PyTorch! :)

    • @abcd2574
      @abcd2574 Год назад

      @@SamuelChan Superb
      thanks a lot Sam

  • @amigos786
    @amigos786 Год назад

    Good video..had to watch it on 0.75x

    • @SamuelChan
      @SamuelChan  Год назад

      Thank you! Yeah it’s something I have to work on :/

  • @rezukidesu778
    @rezukidesu778 Год назад

    Hai mas Samuel, I'm wondering, do you speak Indonesian? 😁
    Thanks for the video, it helped me in completing my project.
    But, i have a question. I've tried already to use 'vicuna-7b-1.1' pretrained model with the same code you provide in this video and embeddings it with new data
    It run successfully, but it seems the pretrained model inherit existing knowledge from the dataset that people used it to fine-tuning
    So my question, is it possible to remove this existing knowledge so the model only know new knowledge that we're given when embedding documents ?

    • @SamuelChan
      @SamuelChan  Год назад +1

      Iya, bisa dong. Cuman gak sering pakai.
      There are 2 approaches i can think of:
      1. Prompt engineering: specify it in the prompt to use only the knowledge from xyz (your own data, and say “I dont know” otherwise)
      2. More drastically is to train your own base LLM with your own data (rather than using a pretrained model with fine tuning on it). Easier to do on platforms such as Cohere (I’m creating a series on this, but recording takes a lot of time and I have 3 full time jobs haha)

    • @rezukidesu778
      @rezukidesu778 Год назад

      @@SamuelChan Beneran bisa ternyata 😁
      I see, prompt engineering can be one of the solution, maybe i can start with that first.
      It's okay, just take your time. I'll be waiting for the content patiently :)

  • @AJ-jf5pm
    @AJ-jf5pm Год назад

    Nice playlist. Thanks Samuel!! I'm running into an issue while installing xformers : collecting xformers
    Using cached xformers-0.0.20.tar.gz (7.6 MB)
    Installing build dependencies ... done
    Getting requirements to build wheel ... error
    error: subprocess-exited-with-error
    × Getting requirements to build wheel did not run successfully.
    │ exit code: 1
    ╰─> [21 lines of output]
    Traceback (most recent call last):
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
    main()
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
    json_out['return_val'] = hook(**hook_input['kwargs'])
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
    return hook(config_settings)
    ^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
    return self._get_build_requires(config_settings, requirements=['wheel'])
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
    self.run_setup()
    File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 488, in run_setup
    self).run_setup(setup_script=setup_script)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 338, in run_setup
    exec(code, locals())
    File "", line 23, in
    ModuleNotFoundError: No module named 'torch'
    [end of output]
    note: This error originates from a subprocess, and is likely not a problem with pip.
    error: subprocess-exited-with-error
    × Getting requirements to build wheel did not run successfully.
    │ exit code: 1
    ╰─> See above for output.
    note: This error originates from a subprocess, and is likely not a problem with pip....
    Though touch is installed. Any help will be appreciated!

    • @SamuelChan
      @SamuelChan  Год назад

      Might be conflicting version of packages. Are you doing this in a clean environment?
      If you’re using venv or virtualenv, try creating a brand new environment, install from the dependencies file (requirements.txt) doing pip install -r requirements.txt and then activate the environment. Let me know if that works!

    • @AJ-jf5pm
      @AJ-jf5pm Год назад +1

      @@SamuelChan Thank you for your help. A new virtual environment worked, but the query results still took upwards of 20 minutes to return. I am using a MacBook Pro M1 with 16GB of RAM and 200+GB of available storage. Here is the output from the console:
      Output
      Loading local cache of model
      INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 3651 tokens
      INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens
      [1348.11438322 seconds]: f([]) -> Indonesia exports its coal to China in 2023.
      Indonesia exports its coal to China in 2023 ...
      Is it normal for the query results to take this long to return? If so, what sort of server would you recommend? If not, what could be the reason for the long wait time? Any pointers would be appreciated.

    • @SamuelChan
      @SamuelChan  Год назад

      20 minutes sounds unfathomably long :/ a reasonable time is