LangChain - Using Hugging Face Models locally (code walkthrough)

Sam Witteveen

Просмотров 108 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 авг 2024
Colab Code Notebook: [drp.li/m1mbM](drp.li/m1mbM)
Load HuggingFace models locally so that you can use models you can’t use via the API endpoints. This video shows you how to use the end points, how to load the models locally (and access model that don’t work in the end points) and load the embedding models locally.
My Links:
Twitter - / sam_witteveen
Linkedin - / samwitteveen
Github:
github.com/sam...
github.com/sam...

Комментарии • 83

@insightbuilder Год назад ⁺¹³
Keep up the great work. And thanks for curating the important HF models that we can use as alternate for paid LLMs. When learning new tech, using the free LLMs can provide the learner a lot of benefits.
@sakshikumar7679 Месяц назад
saved me from hours of debugging and research! thanks a ton
@bandui4021 Год назад ⁺³
Thank you! I am a newbie in this area and your vid´s are helping me a lot to get a better picture of the current landscape.
@morespinach9832 6 месяцев назад
This is helpful because in some industries like banking or telcos, it's impossible to use open source things. So we need to host.
@prestigious5s23 11 месяцев назад
Great tutorial. I need to train a model on some private company documents that aren't publicly released yet and this looks like it could be a big help to me. Subbed!!
@tushaar9027 8 месяцев назад
Great video sam , i don't know how i missed this
@steev3d Год назад
Nice video. Im trying to connect an LLM and use Unity 3d as my interface for STT and TTS with 3d characters. I just found a tool that enables connex to a LLM on huggingface which is how I discovered that you need a paid endpoint with GPU support to even run most of them, I kinda wish I found this video when you posted it. Very useful info.
@luis96xd Год назад
Amazing video, everything was well explained, I needed it, thank you so much!
@alexandremarhic5526 Год назад ⁺¹
Thank for he work. Just let you know Loire Valley is in the north of France ;)
@samwitteveenai Год назад
Good for wine ? :D
@alexandremarhic5526 Год назад ⁺¹
@@samwitteveenai depends of your taste. If you love sugar wine, south is better. Specialy for withe wine like "Jurançon".
@AdrienSales Год назад
Excllent tutorial, ad so weel explained. Thanks a lot.
@stonez56 4 месяца назад
Please make a video on how to convert Safetensors to. GUFF format or format that can be used for Ollama? Thanks for these great AI videos!
@intelligenceservices 28 дней назад
is there a way to compile a huggingface repo to a single safetensors file? (compiled from a repo that has the separate directories: scheduler, text_encoder, text_encoder_2, tokenizer, etc...)
@hnikenna Год назад
Thanks for this video. You just earned a subscriber
@venkatesanr9455 Год назад
Thanks for the valuable series and highly informative. Can you provide some discussions on in-context learning(providing context/query), reasoning & chain of thoughts
@samwitteveenai Год назад ⁺²
Hi glad it is helpful. I am thinking about doing some vids on Chain of Thought prompting, Self Consistency, and PAL going through the basics of the paper and then looking at how they work in practice with an LLM. I will in the basics of in-context learning as well. Let me know if there are any others you think I should cover.
@Chris-se3nc Год назад ⁺²
Thanks for the video. Is there any way to get an example using the lang chain JavaScript library? I am new to this area, and I think many developers would have a node versus a python background
@anubhavsarkar1238 2 месяца назад
Hello. Can you please make a video on how to use the SeamlessM4T HuggingFace model with langchain ? Particularly for text to text translation. I am trying to do some prompt engineering with the model using Langchain's LLMChain module. But it does not seem to work ...
@atharvaparanjape9585 3 месяца назад
How can I load the model for some time later, once I download it on the local drive
@azzeddine1 Год назад
How can the ready-made projects on the platform be linked to Blogger blogs? I have long days searching to no avail
@MohamedNihal-rq6cz Год назад ⁺¹
Hi sam , how do you feed your personal documents and query them and return response in a generative question answering format and not as extractive question answering , I am bit new to this library , I don't want to use Openai api keys please provide some guidance on using with open source llm models, thanks in advance!
@samwitteveenai Год назад
that would require fine tuning the model, if you want to put the facts in there. That is probably not the best way to go though.
@KittenisKitten Год назад ⁺²
Would be useful if you explained what program your using, or what page your looking at, seems like waste of time if you don't know anything about the programs or what your doing 1/5
@samwitteveenai Год назад ⁺¹
The Colab is linkedin in the description, its all there to use.
@halgurgenci4834 Год назад
These are great videos Sam. I am using a Mac M1. Therefore, it is impossible to run any model locally. I understand this is because PyTorch has not caught up with M1 yet.
@samwitteveenai Год назад ⁺²
actually I think they will wrong. I use an M1 and M2 as well but I run models in the cloud. I might try to get them to run on my M2 and make a video if it works.
@surajnarayanakaimal Год назад ⁺¹
Than you for the awesome content, it would be very helpful if you make tutorial on how to use custom model with langchain embed it, so i want to train some documentations , so currently we can use open ai or other service APIs
But it is very costly consuming their APIs, so can you teach how to do that locally please consider training a custom documentation of site, and it can answer from the documentation, more context aware and also history remember.
Currently for that we depend on open ai APIs. So if it's achievable using local modal it would be very helpful
@megz_2387 Год назад
how to fine tune this model so that it can follow instructions on data provided
@markomilenkovic2714 10 месяцев назад
If we cannot afford to get A100, what's the cheaper option you would recommend to run these? I understand the models differ in size also. Thanks Sam.
@botondvasvari5758 3 месяца назад
and how can I use big models from huggingface ? I can't load them into memory because many of them are bigger than 15gb, some of them are 130gb+ . Any thoughts?
@samwitteveenai 3 месяца назад
you need a machine with multi GPUs
@SD-rg5mj Год назад
hello and thank you very much for this video,
on the other hand the problem is that I am not sure to have understood everything, I speak English badly, I am French
@samwitteveenai Год назад
Did you try the french sub titles? I upload English subtitles so I hope youtube does a decent job translating them. Also feel free to ask any questions if you are not sure.
@jzam5426 10 месяцев назад
Thanks for the content!! Is there a way to run a HuggingfacePipeline loaded model using M1/M2 processors on Mac? How would one set that up?
@binitapriya4976 9 месяцев назад
Hi Sam, Is there any way to generate question answer from a given text in a .txt file and save those questions answers in another .txt file with the help of free huggingface model?
@DarrenTarmey Год назад
It would be nice to have someone do review fir noobies as there are so much to learn and it's hard to know we're to start from.
@samwitteveenai Год назад
what exactly would you like me to cover? Any questions I am happy to make more vids etc.
@magnanimist Год назад
Just curious, do you need to redownload the model everytime you run scripts like these? Is there a way to save the model and use it after it's been downloaded?
@samwitteveenai Год назад ⁺¹
If you are doing this on a local machine the model will be there and HuggingFace should save it to a model. You can also do model.save_pretrained('model_name')
@luis96xd Год назад
I have a problem, when I use low_cpu_mem_usage or load_in_8_bit,
I get an error about I need to install xformers ,
When I install xformers , I get an error I need to install accelerate,
When I install accelerate, I get an error I need to install bitsandbytes,
And so on: einops accelerate sentence_transformers bitsandbytes
But finally, I got an error *NameError: name 'init_empty_weights' is not defined*
I don't know how I can solve this error and why it happens, could you help me please?
@yves1893 Год назад
i am using huggingface model chavinlo/alpaca-native
however, when i use those embeddings with this model
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_length=248,
temperature=0.4,
top_p=0.95,
repetition_penalty=1.2
)
local_llm = HuggingFacePipeline(pipeline=pipe)
my output is always only 1 word long. can anyone explain this?
@induu954 Год назад
Hi.. i would like to know that, Can we chain 2 models like a classification model and a pretrained model using langchain?
@samwitteveenai Год назад
You could do it through a tool. Not sure there is anything in built in LangChain for the classification models if you mean something like a BERT etc.
@ELECOEST 7 месяцев назад
Hello, Thanks for your video. for now it's : llm_chain = LLMChain(prompt=prompt,
llm=HuggingFaceHub(repo_id="google/flan-t5-xxl",
model_kwargs={"temperature":0.9,
"max_length":64}))
temperature must bu >0 and model : flan-t5-xxl
@XiOh 11 месяцев назад ⁺²
u are not doing it locally in this video.....
@samwitteveenai 11 месяцев назад
The LLMs are running locally on the machine where the code is running. The first bit shows pinging the API as a comparison.
@DanielWeikert Год назад
I tried to store the RUclipsDownloader loads in FAISS using HuggingFace Embeddings but the LLM was not able to do the similarity search. Colab finally ran into timeout.
Can you share how to do this instead of using OpenAI? With OpenAI I had no issues but like to do it with HF Models instead e.g. Flan
br
@younginnovatorscenterofint8986 Год назад
Hello Sam,how do you solve
Token indices sequence length is longer than the specified maximum sequence length for this model (2842 > 512). Running this sequence through the model will result in indexing errors.
thank you inadavance
@samwitteveenai Год назад ⁺¹
this is a limitation in the model not LangChain. There are some models on HF that are 2048.
@srimantht8302 Год назад
Awesome video! Was wondering how I could use Langchain with a custom model running on sagemaker? Is that possible?
@samwitteveenai Год назад
yeah that should be possible in a similar way.
@SomuNayakVlogs Год назад ⁺¹
can you create for csv as input
@samwitteveenai Год назад
I made another video for using CSVs with langchain check that ou
@SomuNayakVlogs Год назад
@@samwitteveenai Thanks Sam,i already watch that video that is with opeiai but i wanted lang chain with csv and huggingface
@SomuNayakVlogs Год назад
can you please help me on that
@Marvaniamehul Год назад
I am also curious if we can use hugginface pipeline (local run) and langchain to load csv file.
@brianrowe1152 Год назад
Stupid question, so I'll take a link to another video/docs/anything. Which Python version, cuda version, pytorch is the best to use for this work? I see many using python 3.9 or 3.10.6 specifically. The pytorch site recommends 3.6/3.7/3.8 on the install page. Then the cuda version 11.7 or 11.8 - it looks 11.8 is experimental? Then when I look at my nvcc output its says 11.5, but my nvidia-smi says cuda Version 12.0 .. head explodes... I'm on Ubuntu 22.04. I will google some more, but if someone know the ideal setup.. or at least the it works setup.. I appreciate it!!! Thank you
@computadorhumano949 Год назад
Hey, why it take time to response out? This needed of my CPU to be fast?
@samwitteveenai Год назад
yeah for the local stuff you really need a GPU rather than a CPU
@human_agi Год назад
what kind of cloab you need, becuase i am using $10 version with high ram and GPU on, and still cannotr run ValueError: A device map needs to be passed to run convert models into mixed-int8 format. Please run`.from_pretrained` with `device_map='auto'`
@samwitteveenai Год назад
If you don't have access to the bigger GPU then go with a smaller T5 model etc.
@rudy.d Год назад
I think you just need to add the argument device_map='auto' in the same list of arguments of your model's "*LM.from_pretrained(xxxx)" where you have "load_in_8bit=True"
@fintech1378 9 месяцев назад
how to do telegram chatbot with this
@hiramcoriarodriguez1252 Год назад
I'm a transformers user and I don't still get the point to learn this new library. Is just for very specific use cases?
@samwitteveenai Год назад ⁺¹
Think of it as an abstraction layer for prompting and and how to manage the user interactions with your LLM. It is not an LLM in itself.
@hiramcoriarodriguez1252 Год назад
@@samwitteveenai I know, it's not a LLM, the biggest problem that I see is learning a new library that wraps Open AI and HuggingFace libraries just to save 3 or 5 lines of code. I will follow your work, maybe that will change my mind.
@insightbuilder Год назад ⁺³
Consider the Transformers as the first layer of abstraction over the neural nets which create the LLMs. In order to interface with LLMs, we can use many of libraries including HF. HF Hub/ Langchain will be the 2nd layer. The USP of langchain is the ecosystem that is built around it, especially using the Agents, Utility Chains.
This ecosystem lets the LLMs to be connected with the outside world... The devs at LC have done a great job.
Do learn it, and share this absolutely brilliant vids with your friends/ team members etc.
@samwitteveenai Год назад
great way of describing it @Kamalraj M M
@neilzedd8777 Год назад
@@insightbuilder beyond impressed with how healthy their documentation is. Working on a flan-ul2 + lc app right now, very fun times.
@mrsupremegascon 10 месяцев назад
Ok, great tutorial, but as a French from Bordeaux, I am deeply disappointed by the answer of google about the best area to grow wine.
Loire valley ? Seriously ???? Name one great wine coming from Loire, Google, I dare you.
They are in the b league at best.
The answer is obviously Bordeaux, I would maybe had accepted Agen (wrong) or even Bourg*gne (very very wrong).
But Loire, it's outrageous and this answer made me certain that I will never use this cursed model.
@samwitteveenai 10 месяцев назад ⁺¹
lol well at least you live in a very nice area of the world.
@daryladhityahenry Год назад
Hi! Do you find a way to load vicuna gptq version using this? I try your video with gpt neo 125M and it's working, but not vicuna gptq. Thank youu
@evanshlom1 11 месяцев назад
U a legend
@nemtii_ Год назад
What happens always with this setup langchain + HuggingFaceHub is that it only increments on 80 characters for each call, anyone else having this problem, I tried max_length: 400 and still same issue
@nemtii_ Год назад
it's not local to langchain, I used the client directly and still getting the same issue
@samwitteveenai Год назад ⁺¹
I think this could be an issue with their API. Perhaps on the Pro/paid version they allow more? I am not sure, to be honest I don't use their API , I tend to load the models etc.
could also the max_new_tokens setting rather than max_length, that could help.
@nemtii_ Год назад
@@samwitteveenai wow! thank youuu!! worked with max_new_tokens
@nemtii_ Год назад ⁺¹
@@samwitteveenai I wish someone one do a list, mapping of which model sizes runs on google colab free, versus the paid colab, and so to see if it's worth to pay, and what can u experiment with within that tier, I'm kinda lost in that sense, at a stage where I just want to evaluate models myself, and see for a production-env later
@samwitteveenai Год назад
This would be good I agree
@litttlemooncream5049 5 месяцев назад
thx! helped a lot! but stuck at loading model...it says google/flan-t5-xl is too large to be loaded automatically (11GB > 10GB)....qaq
@samwitteveenai 5 месяцев назад
try a smaller model if your GPU isn't big enough google/flan-t5-small or something like that

Следующие

Автовоспроизведение

PAL : Program-aided Language Models with LangChain code