I have tried local LLMs privateGPT and GPT4All, their response time and accuracy both were terrible. Response time is 40-90 seconds with a local index, model size is 8gb
You’re running it local (downloaded the model to local) on a GPU? PrivateGPT isn’t a LLM model it’s a library so when you say it’s terrible it’s the underlying model - not the fault of privateGPT. What model did you choose and could you go with a bigger model? :)
crazy how none of these are relevant anymore; i get all sorts of errors after installing llama_index (newest version as of today) - i see many deprecations - PromptHelper and others; but this has been a nice lesson though, i'm subscribed now
When I'm trying this I am met with "AttributeError: 'GPTListIndex' object has no attribute 'query' " or an Attribute Error for "save_to_disk". Any help??
hey, have you compared your code to the one on github? without seeing your code i can't tell, but the gist of it: index = GPTListIndex.from_documents(docs, service_context=service_context) index.query() Note that .query() is a method, not an attribute, so see that you did not miss out the parenthesis! Here's the full code for this video (and for the rest of the LLM tutorial series on my channel) github.com/onlyphantom/llm-python/blob/main/7_custom.py
@@larawehbee yes this work too. This is also the more common way I use the query engine. The error msg indicates that OP is calling query as an attribute so I'm guessing it's the missing () parentheis indicating a method call.
@@SamuelChan This is my code right now for that chunk so to speak @timeit() def execute_query(): response = index.query("What did the president say?" #exclude_keywords=[""], #required_keywords[""], ) ##response_mode="no_text" return response
From the error message it looks like you missed out the parenthesis when calling query() Can you confirm? :) otherwise will need you to upload the code to GitHub so I can hop in and edit it for you / give you more targeted troubleshooting
Hey man, Great video! You mentioned starting at 26:25, chaining another LLM to handle the natural language generation (not using chat GPT tokens). How would you do this? I followed your process from here and GitHub with my own LLM, and when I get to the part to output the response, Im wondering how to pass my own local LLM to the response instead of defaulting to openai, because as far as I'm understanding it isn't "completely" local and offline. Is it just passing a re-instantiated model or more complex? Looking forward to more content, and hopefully you can include this in a new video!
Hey Sebastian, thank you! There are a number of open source LLM that you can pull from huggingface that you can pull to do the text generation. I believe video 3 in my LLM playlist covers this scenario: ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS I trim this video down to fit within a specific timeframe and kinda regretted it afterwards. In the untruncated version, it is demonstrably clear it runs off your local machine. If you inspect my GitHub code at the top of the script you see where the local path of this model could be stored (cache location). My recording process usually take me down these long tangents and then I realize 90% of viewers probably don’t care and find it boring, so I remove those parts. Also another way to confirm this is to enable logging (import logging) and then log your tokens usage - this verifies the same idea. I am on video 8 now of the LLM playlist and focusing on automating away aspects of my life that are the most important / time consuming. Video 8 is about using LlamaIndex and LangChain to build myself the perfect language learning app and video 9 is about making my personal diary “chattable / queryable”. For video 10 I’ll try and revisit this over here and maybe do a more elaborated example on this subject kinda as a follow up? Will have to see if I can come up with good use-cases in my daily life though! :)
I found a solution that's working for me - put this in the if __main__ == "__name__" section: if not os.path.exists(filename): print("No local cache. Downloading embeddings from huggingface...") index = create_index() index.save_to_disk(filename) else: print("Loading cached embeddings...") # instatiating the llm here avoids having to use OpenAI key llm = LLMPredictor(llm=LocalLLM()) service_context = ServiceContext.from_defaults(llm_predictor=llm, prompt_helper=prompt_helper) index = GPTListIndex.load_from_disk(filename, service_context=service_context)
Hey man! just wondering if this LLM run locally without needing to search things up online, does it work for school work when you don't have internet access
Yeah we can download the whole LLM model and the corresponding weights to a local drive so no data transfer between your machine and another remote machine. No connection required.
I seem to get issues in regards of "index = GPTListIndex.load_from_disk( AttributeError: type object 'ListIndex' has no attribute 'load_from_disk'", also similar for save_to_disk. There have been some changes, tried to modify accordingly but something is bugging. Any solution? Thanks!
Yeah absolutely. The entire LLM series is about building with your own data! Check out some of the videos in this playlist: LangChain & LLM tutorials (ft. gpt3, chatgpt, llamaindex, chroma) ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS Hope that’s helpful! 😊
Hey Eduard, all the videos in this playlist uses LLM on custom data! :) We instruct the code to create embeddings out of our own private data and then ask questions about it. LangChain & LLM tutorials (ft. gpt3, chatgpt, llamaindex, chroma) ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS
Hey nice video. Can you do a tutorial on how to implement offline LLM/chatGPT-like on a mobile device (ios)? Using something like mobileBERT/mediapipe to run LLM on mobile device with summarization/Q&A on data/files within the device, without internet access. thx!
Mobile development is a little out of my reach; I dabbled in Swift for iOS programming a little back when Swift first came out but haven’t done much since 😬
Great video, and you are a very good communicator. One question I have is, how is the opt model stored locally here? I can't see where the model is stored in local storage. I see that you provide the model name when you generate the pipeline but I can't see where the model is retrieved from locally.
Hey Jamie, thank you! the opt model is stored locally on your Home/.cache/huggingface/hub by default (i'm using linux, so this may be OS-dependent). You will find all downloaded models stored in this cache folder and called from here when it's required again. For the OPT model, the 60gb size may make it infeasible and you might have to change the cache location. You do so with: # os.environ["TRANSFORMERS_CACHE"] = "/media/samuel/external_drive/transformers_cache" Every time your code runs, it looks for that location instead. Hope this helps! The sample code is also on my github repo (the first 7 lines of code communicates this as well!) github.com/onlyphantom/llm-python/blob/main/7_custom.py
@@SamuelChan where does it store the local model for windows as I am using windows 11. And will this code download the model or can I download and store it on my local folder to have better control
@@ratralf4738 yeah you can change where you want to download / store / source the model: export TRANSFORMERS_CACHE=/whatever/path/you/want My comment above also points to line 7 of the code provided in the repo so you can just change that line! :)
Yes, but perhaps a task that is better done through Haystack Agent than through LlamaIndex or LangChain; haystack.deepset.ai/blog/introducing-haystack-agents I poked around Haystack and this is the code (line 59) github.com/deepset-ai/haystack/blob/7c5f9313ff5eedf2b40e6211e3d41f2f9f134ba3/haystack/nodes/prompt/providers.py#L59 (The implementation can be a simple invocation on the underlying model running in a local runtime, or could be even remote) Your generated embeddings could still be something like Pinecoin, Livius, Chroma etc, and the underlying LLM could be a local LLM you downloaded off huggingface (like shon in this video). Hope that helps!
Hey Lara, the whole LLM series (esp the How Embeddings Work) goes through lots of examples - not directly with Haystack but it is quite transferable. When I have time I’ll work on one with haystack as the search engine! Traveling quite a bit so can’t commit yet to anything :)
Great Video! It is super informative thanks. However, how can i use the indexer without openai api key ? I want to have a fully offline on-premises solution without the need for openai api key
Hey take a look at Nicholas’ suggestion in the comment section. He also made a pull request to the repo so if you’re using the code from my GitHub repo you would be able to run this without the OpenAI requirement. The main change is to move the LLM instantiate call to the if __name__ == __main__ loop.
@@SamuelChan Thank you! Is there a way to integrate alpaca lora llama-7b-hf in haystack and use it on premises offline? without any api key or internet connection ?
Yes possible. You will need a local LLM that does the embeddings (ie Sentence Transformer, through Chroma Interface or directly through transformer module (hugging face)). Through Chroma you won’t need a key. Through huggingface you would still need a huggingface inference token.
Great vid, Samuel. The OpenAI API key charges based on the number of tokens and queries made to the LLM, right? How much does it cost to use and query the OpenAI API? This OPT model is free to use. So is there any disadvantage of using this OPT model vs the OpenAI GPT model?
The OpenAI charges on the number of tokens and the type of model you use (but not queries, meaning you might make one big query that costs more than 10 smaller queries) The two ways that cost you money is: 1. The embeddings: to generate numerical representation of your data, you can use OpenAI’s embedding service. This costs money (based on num of tokens), and has a max cut off 2. Text generation: asking LLM to generate new sentences. openai.com/pricing the more powerful the model the more expensive they are. If you watch the “How Embeddings Works” video in this LLM series, I explained it in more details. The OPT model is an open source attempt at OpenAI’s proprietary models. You have to run it on your local machine or rent some cloud computing resources to run it, which one might argue is a disadvantage. Depending on your view on privacy, it actually might be an advantage not having to send data over onto OpenAI’s server though.
If you use chatgpt you’ll always need an API since it’s an OpenAI product. You could use the open source LLM models from huggingface and if you have the computational resources to run it locally (ideally with CUDA and enough VRAM) then that’s worth the try too!
Doesn’t require a CUDA driver / dedicated GPU. This will work on Mac. If you have a CUDA compatible graphics card, turn on that device=“cuda” setting. Otherwise, turn it off and it works normally (but will be slower since you’re running on a graphics integrated cpu).
if torch.backends.mps.is_available(): mps_device = torch.device("mps") This requires you to build PyTorch from source. MPS stands for Metal Performance Shaders. I have a full playlist on PyTorch with 6-8 videos if you’d like to learn more about PyTorch! :)
Hai mas Samuel, I'm wondering, do you speak Indonesian? 😁 Thanks for the video, it helped me in completing my project. But, i have a question. I've tried already to use 'vicuna-7b-1.1' pretrained model with the same code you provide in this video and embeddings it with new data It run successfully, but it seems the pretrained model inherit existing knowledge from the dataset that people used it to fine-tuning So my question, is it possible to remove this existing knowledge so the model only know new knowledge that we're given when embedding documents ?
Iya, bisa dong. Cuman gak sering pakai. There are 2 approaches i can think of: 1. Prompt engineering: specify it in the prompt to use only the knowledge from xyz (your own data, and say “I dont know” otherwise) 2. More drastically is to train your own base LLM with your own data (rather than using a pretrained model with fine tuning on it). Easier to do on platforms such as Cohere (I’m creating a series on this, but recording takes a lot of time and I have 3 full time jobs haha)
@@SamuelChan Beneran bisa ternyata 😁 I see, prompt engineering can be one of the solution, maybe i can start with that first. It's okay, just take your time. I'll be waiting for the content patiently :)
Nice playlist. Thanks Samuel!! I'm running into an issue while installing xformers : collecting xformers Using cached xformers-0.0.20.tar.gz (7.6 MB) Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error × Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [21 lines of output] Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in main() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main json_out['return_val'] = hook(**hook_input['kwargs']) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel return hook(config_settings) ^^^^^^^^^^^^^^^^^^^^^ File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=['wheel']) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires self.run_setup() File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 488, in run_setup self).run_setup(setup_script=setup_script) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 338, in run_setup exec(code, locals()) File "", line 23, in ModuleNotFoundError: No module named 'torch' [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error × Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> See above for output. note: This error originates from a subprocess, and is likely not a problem with pip.... Though touch is installed. Any help will be appreciated!
Might be conflicting version of packages. Are you doing this in a clean environment? If you’re using venv or virtualenv, try creating a brand new environment, install from the dependencies file (requirements.txt) doing pip install -r requirements.txt and then activate the environment. Let me know if that works!
@@SamuelChan Thank you for your help. A new virtual environment worked, but the query results still took upwards of 20 minutes to return. I am using a MacBook Pro M1 with 16GB of RAM and 200+GB of available storage. Here is the output from the console: Output Loading local cache of model INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 3651 tokens INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens [1348.11438322 seconds]: f([]) -> Indonesia exports its coal to China in 2023. Indonesia exports its coal to China in 2023 ... Is it normal for the query results to take this long to return? If so, what sort of server would you recommend? If not, what could be the reason for the long wait time? Any pointers would be appreciated.
Love the video. Very helpful. Like your style of adding in specific details. But certainly a little slower.
Thank you! I’m a total amateur at this video thing and plenty of room for improvement so I really appreciate the feedback
I have tried local LLMs privateGPT and GPT4All, their response time and accuracy both were terrible. Response time is 40-90 seconds with a local index, model size is 8gb
You’re running it local (downloaded the model to local) on a GPU?
PrivateGPT isn’t a LLM model it’s a library so when you say it’s terrible it’s the underlying model - not the fault of privateGPT. What model did you choose and could you go with a bigger model? :)
crazy how none of these are relevant anymore; i get all sorts of errors after installing llama_index (newest version as of today) - i see many deprecations - PromptHelper and others; but this has been a nice lesson though, i'm subscribed now
Excellent video, it was really easy to follow along!
Thank you! :)
So, how do I import pdfs?
When I'm trying this I am met with "AttributeError: 'GPTListIndex' object has no attribute 'query' " or an Attribute Error for "save_to_disk".
Any help??
hey, have you compared your code to the one on github? without seeing your code i can't tell, but the gist of it:
index = GPTListIndex.from_documents(docs, service_context=service_context)
index.query()
Note that .query() is a method, not an attribute, so see that you did not miss out the parenthesis!
Here's the full code for this video (and for the rest of the LLM tutorial series on my channel)
github.com/onlyphantom/llm-python/blob/main/7_custom.py
For me, i did the follwing in execute_query():
query_engine = index.as_query_engine()
response = query_engine.query(
And it worked
@@larawehbee yes this work too. This is also the more common way I use the query engine. The error msg indicates that OP is calling query as an attribute so I'm guessing it's the missing () parentheis indicating a method call.
@@SamuelChan This is my code right now for that chunk so to speak
@timeit()
def execute_query():
response = index.query("What did the president say?"
#exclude_keywords=[""],
#required_keywords[""],
)
##response_mode="no_text"
return response
From the error message it looks like you missed out the parenthesis when calling query()
Can you confirm? :) otherwise will need you to upload the code to GitHub so I can hop in and edit it for you / give you more targeted troubleshooting
Hey man, Great video! You mentioned starting at 26:25, chaining another LLM to handle the natural language generation (not using chat GPT tokens). How would you do this? I followed your process from here and GitHub with my own LLM, and when I get to the part to output the response, Im wondering how to pass my own local LLM to the response instead of defaulting to openai, because as far as I'm understanding it isn't "completely" local and offline. Is it just passing a re-instantiated model or more complex? Looking forward to more content, and hopefully you can include this in a new video!
Hey Sebastian, thank you! There are a number of open source LLM that you can pull from huggingface that you can pull to do the text generation. I believe video 3 in my LLM playlist covers this scenario:
ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS
I trim this video down to fit within a specific timeframe and kinda regretted it afterwards. In the untruncated version, it is demonstrably clear it runs off your local machine. If you inspect my GitHub code at the top of the script you see where the local path of this model could be stored (cache location).
My recording process usually take me down these long tangents and then I realize 90% of viewers probably don’t care and find it boring, so I remove those parts. Also another way to confirm this is to enable logging (import logging) and then log your tokens usage - this verifies the same idea.
I am on video 8 now of the LLM playlist and focusing on automating away aspects of my life that are the most important / time consuming. Video 8 is about using LlamaIndex and LangChain to build myself the perfect language learning app and video 9 is about making my personal diary “chattable / queryable”. For video 10 I’ll try and revisit this over here and maybe do a more elaborated example on this subject kinda as a follow up? Will have to see if I can come up with good use-cases in my daily life though! :)
I am also interested in this solution!
I’ll see what I can do Nicholas!
I found a solution that's working for me - put this in the if __main__ == "__name__" section:
if not os.path.exists(filename):
print("No local cache. Downloading embeddings from huggingface...")
index = create_index()
index.save_to_disk(filename)
else:
print("Loading cached embeddings...")
# instatiating the llm here avoids having to use OpenAI key
llm = LLMPredictor(llm=LocalLLM())
service_context = ServiceContext.from_defaults(llm_predictor=llm, prompt_helper=prompt_helper)
index = GPTListIndex.load_from_disk(filename, service_context=service_context)
Thank you for sharing Nicholas! If you’re open to submitting a PR to the repo with the suggestion above I’d accept it! 😊
What is the biggest LLM currently available that I can use this and embed to my Wordpress site? Tutorial you know of? I have to finish my project ASAP
Hey man! just wondering if this LLM run locally without needing to search things up online, does it work for school work when you don't have internet access
Yeah we can download the whole LLM model and the corresponding weights to a local drive so no data transfer between your machine and another remote machine. No connection required.
I seem to get issues in regards of "index = GPTListIndex.load_from_disk(
AttributeError: type object 'ListIndex' has no attribute 'load_from_disk'", also similar for save_to_disk. There have been some changes, tried to modify accordingly but something is bugging. Any solution? Thanks!
check another thread from @AstronautAJC below
Great video! Is it also possible to add private data and train the LLM on my own data?
Yeah absolutely.
The entire LLM series is about building with your own data! Check out some of the videos in this playlist:
LangChain & LLM tutorials (ft. gpt3, chatgpt, llamaindex, chroma)
ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS
Hope that’s helpful! 😊
Thanks for uploading the video
You’re welcome!
How to tech it on custom data?
Hey Eduard, all the videos in this playlist uses LLM on custom data! :) We instruct the code to create embeddings out of our own private data and then ask questions about it.
LangChain & LLM tutorials (ft. gpt3, chatgpt, llamaindex, chroma)
ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS
Hey nice video. Can you do a tutorial on how to implement offline LLM/chatGPT-like on a mobile device (ios)? Using something like mobileBERT/mediapipe to run LLM on mobile device with summarization/Q&A on data/files within the device, without internet access. thx!
Mobile development is a little out of my reach; I dabbled in Swift for iOS programming a little back when Swift first came out but haven’t done much since 😬
I MIGHT be able to help you with this, but, I would use Flutter as it can work with Python
@astronaut I might take you up on that offer too! Exciting times!
@@AstronautAJC A tutorial on this would be much appreciated! Ideally using Flutter would work better for cross-platform...
Great video, and you are a very good communicator. One question I have is, how is the opt model stored locally here? I can't see where the model is stored in local storage. I see that you provide the model name when you generate the pipeline but I can't see where the model is retrieved from locally.
Hey Jamie, thank you!
the opt model is stored locally on your Home/.cache/huggingface/hub by default (i'm using linux, so this may be OS-dependent). You will find all downloaded models stored in this cache folder and called from here when it's required again.
For the OPT model, the 60gb size may make it infeasible and you might have to change the cache location. You do so with:
# os.environ["TRANSFORMERS_CACHE"] = "/media/samuel/external_drive/transformers_cache"
Every time your code runs, it looks for that location instead.
Hope this helps! The sample code is also on my github repo (the first 7 lines of code communicates this as well!)
github.com/onlyphantom/llm-python/blob/main/7_custom.py
@@SamuelChan Thank you very much, I understand now
Jamie, thank you! Brings joy to know the work we do is useful to others - so thank you! 😊
@@SamuelChan where does it store the local model for windows as I am using windows 11. And will this code download the model or can I download and store it on my local folder to have better control
@@ratralf4738 yeah you can change where you want to download / store / source the model:
export TRANSFORMERS_CACHE=/whatever/path/you/want
My comment above also points to line 7 of the code provided in the repo so you can just change that line! :)
Awesome video. Can I use this model in haystack to generate embedding for a large number of documents?
Yes, but perhaps a task that is better done through Haystack Agent than through LlamaIndex or LangChain; haystack.deepset.ai/blog/introducing-haystack-agents
I poked around Haystack and this is the code (line 59)
github.com/deepset-ai/haystack/blob/7c5f9313ff5eedf2b40e6211e3d41f2f9f134ba3/haystack/nodes/prompt/providers.py#L59 (The implementation can be a simple invocation on the underlying model running in a local runtime, or could be even remote)
Your generated embeddings could still be something like Pinecoin, Livius, Chroma etc, and the underlying LLM could be a local LLM you downloaded off huggingface (like shon in this video).
Hope that helps!
@@SamuelChan Thanks. Can't wait to watch the rest of your videos :)
@@SamuelChan Is there a video for integrating llama alpaca with haystack to apply custom docs semantic search?
Hey Lara, the whole LLM series (esp the How Embeddings Work) goes through lots of examples - not directly with Haystack but it is quite transferable. When I have time I’ll work on one with haystack as the search engine! Traveling quite a bit so can’t commit yet to anything :)
@@SamuelChan Amazing! Thank you so much for the valuable information and content. looking forward to future videos! Best of luck
Great Video! It is super informative thanks. However, how can i use the indexer without openai api key ? I want to have a fully offline on-premises solution without the need for openai api key
Hey take a look at Nicholas’ suggestion in the comment section. He also made a pull request to the repo so if you’re using the code from my GitHub repo you would be able to run this without the OpenAI requirement. The main change is to move the LLM instantiate call to the if __name__ == __main__ loop.
@@SamuelChan Thank you! Is there a way to integrate alpaca lora llama-7b-hf in haystack and use it on premises offline? without any api key or internet connection ?
Yes possible. You will need a local LLM that does the embeddings (ie Sentence Transformer, through Chroma Interface or directly through transformer module (hugging face)). Through Chroma you won’t need a key. Through huggingface you would still need a huggingface inference token.
@@SamuelChan Great. Do you think its good to go with Chroma interface for a production instance? that is it might need to be highly scalable ..
And, what do you recommend as an open source vector db as an alternative to pinecone?
Thaaanks!!
did your code ran successfully??
Nice mechanical keyboard , good video.
Thank you! It’s a Keychron Q1 (brown switches) ⌨️
Great vid, Samuel. The OpenAI API key charges based on the number of tokens and queries made to the LLM, right?
How much does it cost to use and query the OpenAI API? This OPT model is free to use. So is there any disadvantage of using this OPT model vs the OpenAI GPT model?
The OpenAI charges on the number of tokens and the type of model you use
(but not queries, meaning you might make one big query that costs more than 10 smaller queries)
The two ways that cost you money is:
1. The embeddings: to generate numerical representation of your data, you can use OpenAI’s embedding service. This costs money (based on num of tokens), and has a max cut off
2. Text generation: asking LLM to generate new sentences.
openai.com/pricing the more powerful the model the more expensive they are. If you watch the “How Embeddings Works” video in this LLM series, I explained it in more details.
The OPT model is an open source attempt at OpenAI’s proprietary models. You have to run it on your local machine or rent some cloud computing resources to run it, which one might argue is a disadvantage. Depending on your view on privacy, it actually might be an advantage not having to send data over onto OpenAI’s server though.
There is any open source model for chatbot/QA apart of OpenAi that we don’t need an API for using chatgpt? Thank you
If you use chatgpt you’ll always need an API since it’s an OpenAI product.
You could use the open source LLM models from huggingface and if you have the computational resources to run it locally (ideally with CUDA and enough VRAM) then that’s worth the try too!
@@SamuelChan thank you, I’ll try buying the chatgpt api to use directly.
Do we require GPU for this.
Will this work on Mac M1,M2
Doesn’t require a CUDA driver / dedicated GPU.
This will work on Mac. If you have a CUDA compatible graphics card, turn on that device=“cuda” setting. Otherwise, turn it off and it works normally (but will be slower since you’re running on a graphics integrated cpu).
@@SamuelChan thanks a lot
@@SamuelChan
Thanks mate.
One more doubt.
Actually pytorch libraries which would require CUDA can be trigerred using META in m2 right?
if torch.backends.mps.is_available():
mps_device = torch.device("mps")
This requires you to build PyTorch from source. MPS stands for Metal Performance Shaders.
I have a full playlist on PyTorch with 6-8 videos if you’d like to learn more about PyTorch! :)
@@SamuelChan Superb
thanks a lot Sam
Good video..had to watch it on 0.75x
Thank you! Yeah it’s something I have to work on :/
Hai mas Samuel, I'm wondering, do you speak Indonesian? 😁
Thanks for the video, it helped me in completing my project.
But, i have a question. I've tried already to use 'vicuna-7b-1.1' pretrained model with the same code you provide in this video and embeddings it with new data
It run successfully, but it seems the pretrained model inherit existing knowledge from the dataset that people used it to fine-tuning
So my question, is it possible to remove this existing knowledge so the model only know new knowledge that we're given when embedding documents ?
Iya, bisa dong. Cuman gak sering pakai.
There are 2 approaches i can think of:
1. Prompt engineering: specify it in the prompt to use only the knowledge from xyz (your own data, and say “I dont know” otherwise)
2. More drastically is to train your own base LLM with your own data (rather than using a pretrained model with fine tuning on it). Easier to do on platforms such as Cohere (I’m creating a series on this, but recording takes a lot of time and I have 3 full time jobs haha)
@@SamuelChan Beneran bisa ternyata 😁
I see, prompt engineering can be one of the solution, maybe i can start with that first.
It's okay, just take your time. I'll be waiting for the content patiently :)
Nice playlist. Thanks Samuel!! I'm running into an issue while installing xformers : collecting xformers
Using cached xformers-0.0.20.tar.gz (7.6 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [21 lines of output]
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
self.run_setup()
File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 488, in run_setup
self).run_setup(setup_script=setup_script)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/s5/3rhs38ts3jdgc1vbq5tl69jr0000gn/T/pip-build-env-rnu1pshr/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "", line 23, in
ModuleNotFoundError: No module named 'torch'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip....
Though touch is installed. Any help will be appreciated!
Might be conflicting version of packages. Are you doing this in a clean environment?
If you’re using venv or virtualenv, try creating a brand new environment, install from the dependencies file (requirements.txt) doing pip install -r requirements.txt and then activate the environment. Let me know if that works!
@@SamuelChan Thank you for your help. A new virtual environment worked, but the query results still took upwards of 20 minutes to return. I am using a MacBook Pro M1 with 16GB of RAM and 200+GB of available storage. Here is the output from the console:
Output
Loading local cache of model
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 3651 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens
[1348.11438322 seconds]: f([]) -> Indonesia exports its coal to China in 2023.
Indonesia exports its coal to China in 2023 ...
Is it normal for the query results to take this long to return? If so, what sort of server would you recommend? If not, what could be the reason for the long wait time? Any pointers would be appreciated.
20 minutes sounds unfathomably long :/ a reasonable time is