Locally-hosted, offline LLM w/LlamaIndex + OPT (open source, instruction-tuning LLM)

Samuel Chan

Просмотров 31 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 дек 2024

Комментарии • 86

@gcsk1982 Год назад ⁺²
Love the video. Very helpful. Like your style of adding in specific details. But certainly a little slower.
@SamuelChan Год назад
Thank you! I’m a total amateur at this video thing and plenty of room for improvement so I really appreciate the feedback
@deathstar4794 Год назад ⁺³
I have tried local LLMs privateGPT and GPT4All, their response time and accuracy both were terrible. Response time is 40-90 seconds with a local index, model size is 8gb
@SamuelChan Год назад ⁺²
You’re running it local (downloaded the model to local) on a GPU?
PrivateGPT isn’t a LLM model it’s a library so when you say it’s terrible it’s the underlying model - not the fault of privateGPT. What model did you choose and could you go with a bigger model? :)
@c0mpuipf 8 месяцев назад
crazy how none of these are relevant anymore; i get all sorts of errors after installing llama_index (newest version as of today) - i see many deprecations - PromptHelper and others; but this has been a nice lesson though, i'm subscribed now
@remoteree Год назад ⁺¹
Excellent video, it was really easy to follow along!
@SamuelChan Год назад
Thank you! :)
@hamtsammich Год назад ⁺¹
So, how do I import pdfs?
@AstronautAJC Год назад ⁺²
When I'm trying this I am met with "AttributeError: 'GPTListIndex' object has no attribute 'query' " or an Attribute Error for "save_to_disk".
Any help??
@SamuelChan Год назад
hey, have you compared your code to the one on github? without seeing your code i can't tell, but the gist of it:
index = GPTListIndex.from_documents(docs, service_context=service_context)
index.query()
Note that .query() is a method, not an attribute, so see that you did not miss out the parenthesis!
Here's the full code for this video (and for the rest of the LLM tutorial series on my channel)
github.com/onlyphantom/llm-python/blob/main/7_custom.py
@larawehbee Год назад ⁺²
For me, i did the follwing in execute_query():
query_engine = index.as_query_engine()
response = query_engine.query(
And it worked
@SamuelChan Год назад ⁺¹
@@larawehbee yes this work too. This is also the more common way I use the query engine. The error msg indicates that OP is calling query as an attribute so I'm guessing it's the missing () parentheis indicating a method call.
@AstronautAJC Год назад
@@SamuelChan This is my code right now for that chunk so to speak
@timeit()
def execute_query():
response = index.query("What did the president say?"
#exclude_keywords=[""],
#required_keywords[""],
)
##response_mode="no_text"
return response
@SamuelChan Год назад
From the error message it looks like you missed out the parenthesis when calling query()
Can you confirm? :) otherwise will need you to upload the code to GitHub so I can hop in and edit it for you / give you more targeted troubleshooting
@sebastiangoslin6736 Год назад ⁺²
Hey man, Great video! You mentioned starting at 26:25, chaining another LLM to handle the natural language generation (not using chat GPT tokens). How would you do this? I followed your process from here and GitHub with my own LLM, and when I get to the part to output the response, Im wondering how to pass my own local LLM to the response instead of defaulting to openai, because as far as I'm understanding it isn't "completely" local and offline. Is it just passing a re-instantiated model or more complex? Looking forward to more content, and hopefully you can include this in a new video!
@SamuelChan Год назад ⁺³
Hey Sebastian, thank you! There are a number of open source LLM that you can pull from huggingface that you can pull to do the text generation. I believe video 3 in my LLM playlist covers this scenario:
ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS
I trim this video down to fit within a specific timeframe and kinda regretted it afterwards. In the untruncated version, it is demonstrably clear it runs off your local machine. If you inspect my GitHub code at the top of the script you see where the local path of this model could be stored (cache location).
My recording process usually take me down these long tangents and then I realize 90% of viewers probably don’t care and find it boring, so I remove those parts. Also another way to confirm this is to enable logging (import logging) and then log your tokens usage - this verifies the same idea.
I am on video 8 now of the LLM playlist and focusing on automating away aspects of my life that are the most important / time consuming. Video 8 is about using LlamaIndex and LangChain to build myself the perfect language learning app and video 9 is about making my personal diary “chattable / queryable”. For video 10 I’ll try and revisit this over here and maybe do a more elaborated example on this subject kinda as a follow up? Will have to see if I can come up with good use-cases in my daily life though! :)
@EmilTheLongshoreman Год назад ⁺²
I am also interested in this solution!
@SamuelChan Год назад ⁺¹
I’ll see what I can do Nicholas!
@EmilTheLongshoreman Год назад ⁺²
I found a solution that's working for me - put this in the if __main__ == "__name__" section:
if not os.path.exists(filename):
print("No local cache. Downloading embeddings from huggingface...")
index = create_index()
index.save_to_disk(filename)
else:
print("Loading cached embeddings...")
# instatiating the llm here avoids having to use OpenAI key
llm = LLMPredictor(llm=LocalLLM())
service_context = ServiceContext.from_defaults(llm_predictor=llm, prompt_helper=prompt_helper)
index = GPTListIndex.load_from_disk(filename, service_context=service_context)
@SamuelChan Год назад ⁺¹
Thank you for sharing Nicholas! If you’re open to submitting a PR to the repo with the suggestion above I’d accept it! 😊
@3nityC Год назад
What is the biggest LLM currently available that I can use this and embed to my Wordpress site? Tutorial you know of? I have to finish my project ASAP
@haidara77 Год назад ⁺¹
Hey man! just wondering if this LLM run locally without needing to search things up online, does it work for school work when you don't have internet access
@SamuelChan Год назад
Yeah we can download the whole LLM model and the corresponding weights to a local drive so no data transfer between your machine and another remote machine. No connection required.
@ScorpoxOfficial Год назад
I seem to get issues in regards of "index = GPTListIndex.load_from_disk(
AttributeError: type object 'ListIndex' has no attribute 'load_from_disk'", also similar for save_to_disk. There have been some changes, tried to modify accordingly but something is bugging. Any solution? Thanks!
@sinedeiras 10 месяцев назад
check another thread from @AstronautAJC below
@johannamenges3095 Год назад ⁺¹
Great video! Is it also possible to add private data and train the LLM on my own data?
@SamuelChan Год назад
Yeah absolutely.
The entire LLM series is about building with your own data! Check out some of the videos in this playlist:
LangChain & LLM tutorials (ft. gpt3, chatgpt, llamaindex, chroma)
ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS
Hope that’s helpful! 😊
@MohitSingh-ij5vq Год назад
Thanks for uploading the video
@SamuelChan Год назад
You’re welcome!
@eduardmart1237 Год назад ⁺¹
How to tech it on custom data?
@SamuelChan Год назад
Hey Eduard, all the videos in this playlist uses LLM on custom data! :) We instruct the code to create embeddings out of our own private data and then ask questions about it.
LangChain & LLM tutorials (ft. gpt3, chatgpt, llamaindex, chroma)
ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS
@JohnKwan-e9h Год назад ⁺¹
Hey nice video. Can you do a tutorial on how to implement offline LLM/chatGPT-like on a mobile device (ios)? Using something like mobileBERT/mediapipe to run LLM on mobile device with summarization/Q&A on data/files within the device, without internet access. thx!
@SamuelChan Год назад
Mobile development is a little out of my reach; I dabbled in Swift for iOS programming a little back when Swift first came out but haven’t done much since 😬
@AstronautAJC Год назад ⁺²
I MIGHT be able to help you with this, but, I would use Flutter as it can work with Python
@SamuelChan Год назад ⁺¹
@astronaut I might take you up on that offer too! Exciting times!
@JohnKwan-e9h Год назад ⁺¹
@@AstronautAJC A tutorial on this would be much appreciated! Ideally using Flutter would work better for cross-platform...
@jamiesmith8927 Год назад ⁺¹
Great video, and you are a very good communicator. One question I have is, how is the opt model stored locally here? I can't see where the model is stored in local storage. I see that you provide the model name when you generate the pipeline but I can't see where the model is retrieved from locally.
@SamuelChan Год назад ⁺²
Hey Jamie, thank you!
the opt model is stored locally on your Home/.cache/huggingface/hub by default (i'm using linux, so this may be OS-dependent). You will find all downloaded models stored in this cache folder and called from here when it's required again.
For the OPT model, the 60gb size may make it infeasible and you might have to change the cache location. You do so with:
# os.environ["TRANSFORMERS_CACHE"] = "/media/samuel/external_drive/transformers_cache"
Every time your code runs, it looks for that location instead.
Hope this helps! The sample code is also on my github repo (the first 7 lines of code communicates this as well!)
github.com/onlyphantom/llm-python/blob/main/7_custom.py
@jamiesmith8927 Год назад
@@SamuelChan Thank you very much, I understand now
@SamuelChan Год назад ⁺¹
Jamie, thank you! Brings joy to know the work we do is useful to others - so thank you! 😊
@ratralf4738 Год назад ⁺¹
@@SamuelChan where does it store the local model for windows as I am using windows 11. And will this code download the model or can I download and store it on my local folder to have better control
@SamuelChan Год назад
@@ratralf4738 yeah you can change where you want to download / store / source the model:
export TRANSFORMERS_CACHE=/whatever/path/you/want
My comment above also points to line 7 of the code provided in the repo so you can just change that line! :)
@drsamhuygens Год назад
Awesome video. Can I use this model in haystack to generate embedding for a large number of documents?
@SamuelChan Год назад
Yes, but perhaps a task that is better done through Haystack Agent than through LlamaIndex or LangChain; haystack.deepset.ai/blog/introducing-haystack-agents
I poked around Haystack and this is the code (line 59)
github.com/deepset-ai/haystack/blob/7c5f9313ff5eedf2b40e6211e3d41f2f9f134ba3/haystack/nodes/prompt/providers.py#L59 (The implementation can be a simple invocation on the underlying model running in a local runtime, or could be even remote)
Your generated embeddings could still be something like Pinecoin, Livius, Chroma etc, and the underlying LLM could be a local LLM you downloaded off huggingface (like shon in this video).
Hope that helps!
@drsamhuygens Год назад
@@SamuelChan Thanks. Can't wait to watch the rest of your videos :)
@larawehbee Год назад
@@SamuelChan Is there a video for integrating llama alpaca with haystack to apply custom docs semantic search?
@SamuelChan Год назад
Hey Lara, the whole LLM series (esp the How Embeddings Work) goes through lots of examples - not directly with Haystack but it is quite transferable. When I have time I’ll work on one with haystack as the search engine! Traveling quite a bit so can’t commit yet to anything :)
@larawehbee Год назад
@@SamuelChan Amazing! Thank you so much for the valuable information and content. looking forward to future videos! Best of luck
@larawehbee Год назад
Great Video! It is super informative thanks. However, how can i use the indexer without openai api key ? I want to have a fully offline on-premises solution without the need for openai api key
@SamuelChan Год назад ⁺¹
Hey take a look at Nicholas’ suggestion in the comment section. He also made a pull request to the repo so if you’re using the code from my GitHub repo you would be able to run this without the OpenAI requirement. The main change is to move the LLM instantiate call to the if __name__ == __main__ loop.
@larawehbee Год назад ⁺¹
@@SamuelChan Thank you! Is there a way to integrate alpaca lora llama-7b-hf in haystack and use it on premises offline? without any api key or internet connection ?
@SamuelChan Год назад
Yes possible. You will need a local LLM that does the embeddings (ie Sentence Transformer, through Chroma Interface or directly through transformer module (hugging face)). Through Chroma you won’t need a key. Through huggingface you would still need a huggingface inference token.
@larawehbee Год назад ⁺¹
@@SamuelChan Great. Do you think its good to go with Chroma interface for a production instance? that is it might need to be highly scalable ..
@larawehbee Год назад ⁺¹
And, what do you recommend as an open source vector db as an alternative to pinecone?
@jobautomation Год назад
Thaaanks!!
@ghanshyam6676 Год назад
did your code ran successfully??
@hyperbolictimeacademy Год назад
Nice mechanical keyboard , good video.
@SamuelChan Год назад
Thank you! It’s a Keychron Q1 (brown switches) ⌨️
@RedShipsofSpainAgain Год назад ⁺¹
Great vid, Samuel. The OpenAI API key charges based on the number of tokens and queries made to the LLM, right?
How much does it cost to use and query the OpenAI API? This OPT model is free to use. So is there any disadvantage of using this OPT model vs the OpenAI GPT model?
@SamuelChan Год назад ⁺³
The OpenAI charges on the number of tokens and the type of model you use
(but not queries, meaning you might make one big query that costs more than 10 smaller queries)
The two ways that cost you money is:
1. The embeddings: to generate numerical representation of your data, you can use OpenAI’s embedding service. This costs money (based on num of tokens), and has a max cut off
2. Text generation: asking LLM to generate new sentences.
openai.com/pricing the more powerful the model the more expensive they are. If you watch the “How Embeddings Works” video in this LLM series, I explained it in more details.
The OPT model is an open source attempt at OpenAI’s proprietary models. You have to run it on your local machine or rent some cloud computing resources to run it, which one might argue is a disadvantage. Depending on your view on privacy, it actually might be an advantage not having to send data over onto OpenAI’s server though.
@joaoalmeida4380 Год назад
There is any open source model for chatbot/QA apart of OpenAi that we don’t need an API for using chatgpt? Thank you
@SamuelChan Год назад ⁺²
If you use chatgpt you’ll always need an API since it’s an OpenAI product.
You could use the open source LLM models from huggingface and if you have the computational resources to run it locally (ideally with CUDA and enough VRAM) then that’s worth the try too!
@joaoalmeida4380 Год назад ⁺¹
@@SamuelChan thank you, I’ll try buying the chatgpt api to use directly.
@abcd2574 Год назад
Do we require GPU for this.
Will this work on Mac M1,M2
@SamuelChan Год назад ⁺¹
Doesn’t require a CUDA driver / dedicated GPU.
This will work on Mac. If you have a CUDA compatible graphics card, turn on that device=“cuda” setting. Otherwise, turn it off and it works normally (but will be slower since you’re running on a graphics integrated cpu).
@abcd2574 Год назад
@@SamuelChan thanks a lot
@abcd2574 Год назад ⁺¹
@@SamuelChan
Thanks mate.
One more doubt.
Actually pytorch libraries which would require CUDA can be trigerred using META in m2 right?
@SamuelChan Год назад ⁺¹
if torch.backends.mps.is_available():
mps_device = torch.device("mps")
This requires you to build PyTorch from source. MPS stands for Metal Performance Shaders.
I have a full playlist on PyTorch with 6-8 videos if you’d like to learn more about PyTorch! :)
@abcd2574 Год назад
@@SamuelChan Superb
thanks a lot Sam
@amigos786 Год назад
Good video..had to watch it on 0.75x
@SamuelChan Год назад
Thank you! Yeah it’s something I have to work on :/
@rezukidesu778 Год назад
Hai mas Samuel, I'm wondering, do you speak Indonesian? 😁
Thanks for the video, it helped me in completing my project.
But, i have a question. I've tried already to use 'vicuna-7b-1.1' pretrained model with the same code you provide in this video and embeddings it with new data
It run successfully, but it seems the pretrained model inherit existing knowledge from the dataset that people used it to fine-tuning
So my question, is it possible to remove this existing knowledge so the model only know new knowledge that we're given when embedding documents ?
@SamuelChan Год назад ⁺¹
Iya, bisa dong. Cuman gak sering pakai.
There are 2 approaches i can think of:
1. Prompt engineering: specify it in the prompt to use only the knowledge from xyz (your own data, and say “I dont know” otherwise)
2. More drastically is to train your own base LLM with your own data (rather than using a pretrained model with fine tuning on it). Easier to do on platforms such as Cohere (I’m creating a series on this, but recording takes a lot of time and I have 3 full time jobs haha)
@rezukidesu778 Год назад
@@SamuelChan Beneran bisa ternyata 😁
I see, prompt engineering can be one of the solution, maybe i can start with that first.
It's okay, just take your time. I'll be waiting for the content patiently :)
@AJ-jf5pm Год назад
Nice playlist. Thanks Samuel!! I'm running into an issue while installing xformers : collecting xformers
Using cached xformers-0.0.20.tar.gz (7.6 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [21 lines of output]
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
self.run_setup()
File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 488, in run_setup
self).run_setup(setup_script=setup_script)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "", line 23, in
ModuleNotFoundError: No module named 'torch'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip....
Though touch is installed. Any help will be appreciated!
@SamuelChan Год назад
Might be conflicting version of packages. Are you doing this in a clean environment?
If you’re using venv or virtualenv, try creating a brand new environment, install from the dependencies file (requirements.txt) doing pip install -r requirements.txt and then activate the environment. Let me know if that works!
@AJ-jf5pm Год назад ⁺¹
@@SamuelChan Thank you for your help. A new virtual environment worked, but the query results still took upwards of 20 minutes to return. I am using a MacBook Pro M1 with 16GB of RAM and 200+GB of available storage. Here is the output from the console:
Output
Loading local cache of model
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 3651 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens
[1348.11438322 seconds]: f([]) -> Indonesia exports its coal to China in 2023.
Indonesia exports its coal to China in 2023 ...
Is it normal for the query results to take this long to return? If so, what sort of server would you recommend? If not, what could be the reason for the long wait time? Any pointers would be appreciated.
@SamuelChan Год назад
20 minutes sounds unfathomably long :/ a reasonable time is

Следующие

Автовоспроизведение

OpenAI's python QuickStart (tutorial/first look)