I used LLaMA 2 70B to rebuild GPT Banker...and its AMAZING (LLM RAG)

Nicholas Renotte

Просмотров 151 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 25 авг 2024
👨‍💻 Sign up for the Full Stack course and use RUclips50 to get 50% off:
www.coursesfro...
🐍 Get the free Python course
go.coursesfrom...
Hopefully you enjoyed this video.
💼 Find AWESOME ML Jobs: www.jobsfromni...
🤖 Get the Code:github.com/nic...
Learn how to use Llama 2 70B Chat for Retrieval Augmented Generation...for FINANCE! Albeit in a hella haphazard way, oh, we'll also build a streamlit app while we're at it.
Oh, and don't forget to connect with me!
LinkedIn: bit.ly/324Epgo
Facebook: bit.ly/3mB1sZD
GitHub: bit.ly/3mDJllD
Patreon: bit.ly/2OCn3UW
Join the Discussion on Discord: bit.ly/3dQiZsV
Happy coding!
Nick

Комментарии • 207

@moondevonyt Год назад ⁺⁴²
first off, respect for the hustle and the in-depth breakdown of integrating llama with other tools
really shows how much work goes behind the scenes
that said, not sure why everyone's so hyped about all these new models when sometimes simpler and older architectures can do the trick
but hey, if it's all about pushing boundaries and experimenting, you're killing it bro!
@NicholasRenotte Год назад ⁺⁶
Thanks a mill moondev!! Yah at this point I'm just pushing to see where it's going, I started fine tuning this for some custom use cases and it looks hyper promising though!
@xt3708 Год назад ⁺²
Thanks so much for the detailed videos @@NicholasRenotte Can you make a video on fine tuning?
@vyrsh0 11 месяцев назад ⁺³
can you name some of the old models? so I can look them up and learn about them?
@ZombiemanOhhellnaw 11 месяцев назад
@@vyrsh0 @moondevonyt yes, I would like to learn what older models do the trick as well!
@yudhiesh1997 Год назад ⁺¹¹⁸
You can’t load Llama2-70b on a single A100 GPU. Using full precision(float-32) would require 70billion * 4 bytes = 280GB of GPU memory. If you load it using float-16 it would reduce by half to 140GB. It finally worked cause you loaded it in int-8 which only requires 70GB of memory while the A100 has 80GB of GPU memory. If you wanted to load it in full/half precision you would need multiple GPUs and also need to leverage tensor parallelism whereby you slice the tensors across multiple GPUs.
@NicholasRenotte Год назад ⁺¹²
I'm not sure that was it, I successfully loaded in half precision over 2xA100-80GB (didn't show the loading in the vid). But when I went to generate this is what I came up against: github.com/huggingface/transformers/issues/24056. Solid calcs though!
@sluggy6074 Год назад ⁺⁹
That's nice. I'll just have to settle for my quantized 70b LLMs that run hot and fast on my 4090.
I think I can live with this.
@agusavior_channel Год назад ⁺¹
Use petals
@seanhuver4813 11 месяцев назад
It runs nice @ 4bit precision on A6000.
@bubbleboy821 10 месяцев назад
What you meant to say was that you can load LLama2-70b on a single A100 GPU, you just have to run it in int-8.
@splitpierre 6 месяцев назад ⁺²
Yeah, nice work!
I've been playing around RAG as well, I can relate to all roadblocks and pain points.
I'm trying to squeeze as much as possible so I can have a decent RAG, without any fancy GPU, consumer grade hardware running everything local, it's been fun/painful
@MikeAirforce111 Год назад ⁺⁵
This video was great. You have created a format that is very entertaining to watch! 🙌 Subbed!
@NicholasRenotte Год назад
Thanks so much Mike!
@juanpablopereira1479 Год назад ⁺⁶
I think "Amazing" falls short, the amount of knowledge, the fact that your using cutting edge Open source model and all of that in a really funny and light tone. Keep up the good work! I have a question, do you think is much harder to deploy that app into google cloud run compared with runpod?
@NicholasRenotte Год назад
Thanks so much Juan! I can't imagine it would be running on a VM instance with GPUs attached. Could also separate out the LLM bit and run that solely on a GPU then just run the app on a basic linux instance!
@princechijioke247 Год назад ⁺⁹
Always looking forward to your videos...
I've an MSc. In AI, but I still learn from you 👏🏼
@MikeAirforce111 Год назад ⁺¹²
I have a PhD and I am here as well 🤷‍♂
@siestoelemento4027 11 месяцев назад
I guess I'm on the right path then
@@MikeAirforce111
@ShahJahan_NNN Год назад ⁺²
please make a video on ocr on past question papers that can extract questions, and extract keywords and analyse with 10 years papers, and predicts upcoming questions
@malice112 Год назад ⁺⁷
NIcholas I love your videos and your way of making learning about ML/AI fun! In your next video can you please show us how to fine-tune a LLM model! Thanks for all the hard work you do on making these videos!
@FunCodingwithRahul Год назад ⁺³
Incredible stuff done... thank you Nich.
@NicholasRenotte Год назад
Anytime!! Glad you liked it Rahul!
@wayallen831 Год назад ⁺²
Great tutorial! Can you also help do a tutorial on setting up runpod to host the application on it? Found that part to be a bit confusing and would love a more thorough walk thru. Thanks for all you do!
@NicholasRenotte Год назад
Ya, might do something soon and add it to the free course on Courses From Nick. I'm saving infra style/setup videos for the Tech Fundamentals course.
@Nick_With_A_Stick Год назад ⁺²
My computer is currently training a lora on stable 7b for natural language to (30k)python, and (30k)sql. I also Included (30k)orca questions so it dosent loose its abilities as a language model, and 20k sentiment analysis for new headlines. I would love to try this model with this as soon as Its done training.
@NicholasRenotte Год назад
Noiceee, what data sets you using for Python?
@projecttitanium-slowishdriver Год назад ⁺¹
Huge thanks for you videos. Nowadays I code, demonstrate, and perhaps lead AI, ML, DL, and RL development in 1300 + worker engineering and consulting company
I am combining technical analysis tools (fem, CFD, MBS…) with AI to generate new digital business cases
@NicholasRenotte Год назад
Ooooh, sounda amazing!
@projecttitanium-slowishdriver Год назад
@@NicholasRenotte It is a 13 workers' digital business development group:) But thanks again mate
@hebjies 10 месяцев назад ⁺⁴
It is possible that when you tried to load the pdf with SimpleDirectoryReader, it was skipping pages, because of the chunk size /embedding model you selected, the model you selected (all-MiniLM-L6-v2) is limited to 384 while the chunk you specified was 1024, maybe and just maybe, that is why I think it was skipping pages, because it was unable to load all the chunk in the embedding model
@ShaneZarechian 6 месяцев назад
Taking the viewers along the development and debugging ride is a cool style
@dacoda85 Год назад ⁺³
Love this style of video. Fantastic content as always mate. You've given me some ideas to try out. Thanks :)
@NicholasRenotte Год назад ⁺¹
🙏🏽 thanks for checking it out!
@tejaskumarreddyj3133 Месяц назад
Can you please make a video explaining what is the LLM to use when developing a RAG!! It would be of great help if you could make one and also please tell us about how to run this locally on linux!!😁
@richardbeare11 10 месяцев назад
Love your videos Nicholas. Watching this with my morning coffee, a few chuckles, and a bunch of "ooohhh riiiiiight!"s. Your vid bridged a bunch of gaps in my knowledge.
Gonna be implementing my own RAG now 😎👍
@Bliss_99988 Год назад ⁺¹
'How to start a farm with no experience' - Hahaha, man, I just want to say that I love your sense of humour. Also, your videos are really useful for me, I'm an English teacher and I'm trying to build useful tools for my students. Thanks for your content.
@NicholasRenotte Год назад
😂 it's my secret dream job! Hahah thanks so much for checking it out man!!
@kevynkrancenblum5350 Год назад ⁺¹
2:40 😂
Thanks Nic the video is awesome ! 🤘🏽🤘🏽🤘🏽
@NicholasRenotte Год назад
LOL, stoked you liked it Kev!!
@shipo234 Год назад ⁺³
Nick this is insanely good, thank you for the effort
@NicholasRenotte Год назад
Thanks a mil!!!
@billlynch3400 День назад
where can we find the code that you use in the video? Can you please share it.
@kallamamran Год назад
I so wish I could do this. Maybe not specifically THIS, but things like this. I wish I understood the underlying principles for making something like this work, Great video!!!
@kallamamran 11 месяцев назад
@@jimmc448 Ha ha ha...
@Ryan-yj4sd Год назад ⁺¹
Nice video. You seem to have the taken the tough route. I didn't have as much trouble :)
@NicholasRenotte Год назад
LOL, murphy's law!
@andreyseas Год назад ⁺¹
Sick production value and great content!
@NicholasRenotte Год назад
Thanks a mil Andrey!
@knutjagersberg381 Год назад ⁺¹
Do you really have to get access by Meta to use the weights? My current interpretation is that you enter the license agreement as soon as you use the weights, where ever you got them (as you're also allowed to redistribute them).
I'm not 100% sure about this, but I think you don't need to register. I think that's more for them to keep track of early adopters.
@zamirkhurshid261 Год назад ⁺¹
Nice sharing sir your way of teching is very helpful for biggner. Please make a video how we can make deep learning model on earthquake dataset as you have make a project on image classification.
@NicholasRenotte Год назад ⁺¹
You got it!
@zamirkhurshid261 Год назад
Waiting sir
@hongyiyilim6830 10 месяцев назад
Great content! helped me alot with building my own open source model RAG
@synthclub 6 месяцев назад
Really cool llama application. Really Impressive.
@i2c_jason Год назад ⁺¹
...Sunday morning after a bender hahhaaha bro I love you.
@NicholasRenotte Год назад
Best time to deploy imho 😅
@ba70816 Год назад ⁺¹
Really great content, you might have the most effective style I’ve ever seen. Well done. I can’t remember which video I saw where you spoke about your hardware setup. It’s cloud based isn’t it?
@NicholasRenotte Год назад
Thanks a mil! This particular instance is cloud based, yup! It's all runpod, I used a remote SSH client to use the env with VsCode. Old HW vid might have been this: ruclips.net/video/GH1RuKguO54/видео.html
@ba70816 Год назад
Would you consider a video showing the setup process you use?
@autonomousreviews2521 Год назад ⁺¹
Great share! Thank you for your persistence and giving away your efforts :)
@NicholasRenotte Год назад
Anytime! Gotta share where I can!
@sinasec 9 месяцев назад
well done. one of the best and compact tutorial I ever had. Thanks for providing the source code
@user-cy4ld4cx1c 5 месяцев назад
I love you nicholas....you are awesome .my only regret is that i didn't found you earlier. all my dream projects in a channel....thankyou
@PritishMishra Год назад ⁺¹
Amazing editing and content, learnt a lot.
@NicholasRenotte Год назад
🙏🏽
@chrisweeks8789 Год назад ⁺³
All facets of your work are incredible! Are the context limits of llama2 similar to that of OpenAI?
@NicholasRenotte Год назад
Thanks a mil! Would depend on which models you're comparing!
@nimeshkumar8508 Год назад ⁺¹
Thankyou so much for this. God bless you
@NicholasRenotte Год назад
🙏🏽
@BudgetMow 9 месяцев назад
thank you for this tutorial , although i am facing a slight issue in parsing tables from pdfs , i managed to allow the parser to take in multiple documents , and it is answering in a quick time , only issue with if the question is related to data within a table or some times data spanning multiple lines it fails to retrieve that data
@ricowallaby 11 месяцев назад
Hi, just found your channel and enjoying it, but I can't wait till we have real Open Source LLMs, anyway keep up the good work, cheers from Sydney.
@deadcrypt 10 месяцев назад
8:57 nice auth key you got there
@horane Год назад
minute 4:45 comment is confirmation clutch! Never give up!
@zakaria20062 10 месяцев назад
Waiting open source like function call ChatGpT will be amazing
@himanshuahujaofficial7813 5 месяцев назад
Nick, thank you so much for the great content. I’m new to AI and want to build an LLM for my startup, but I’m not sure where to start. Can you recommend something?
@daniamaya 11 месяцев назад
wow! This is top-tier content. Thank you!
@mrrfrooty 2 месяца назад
Hi, could you provide the runpod source code for this? Can't find any outside documentation on how you made this possible
@jennilthiyam1261 9 месяцев назад
How can we set up llama=-2 on the local system with memory. Not just one time question but interactive conversation like online chatGPT
@youssefghaouipearls Месяц назад
Hello, This seems like a less expensive approach than using Google Cloud. How much did it cost?
@moshekaufman7103 5 месяцев назад
Hey Nicholas,
It's a little disappointing that you haven't actually released the final model yet, even though you mentioned it in the video. While showing the source code is a good start, it's not the same as actually providing the finished product.
Unfortunately, without the final model itself, it's difficult to take your word for it. To build trust and transparency, it would be much better to provide a download link for the model so people can try it out for themselves. This would be a much more impactful way to share your work and allow others to engage with it.
I hope you'll reconsider and release the final model soon!
@angelazhang9082 4 месяца назад
Hi Nick... really late but would be super grateful for a response. I'm trying to figure out how you used RunPOD for this. It looks like you created a folder to store the weights instead of using one of their custom LLM options. Did you pay for extra storage? I can't imaging you loaded all the weights each time you needed to use this on the cloud. I'm new to working with these models and cloud GPUs, so any help is greatly appreciated!
@warthog123 3 месяца назад
Excellent video
@muradahmad9357 4 месяца назад
can you please tell which cuda version and nvidia driver versions you used, I am having problem downloading it
@nfic5856 Год назад ⁺²
How can it be scalable, since this deployment costs like 2$ per hour? Thanks.
@NicholasRenotte Год назад ⁺¹
Didn't show it here but if I were scaling this out, the whole thing wouldn't be running on a GPU. The app would be on a lightweight machine and the LLM running on serverless GPU endpoints.
@Ryan-yj4sd Год назад ⁺¹
@@NicholasRenotte but you would still need to pay to rent an A100 GPU which around $1 to $4 per hour
@NicholasRenotte Год назад
Yeah, no real way around that, gotta host somewhere! Especially so if you want to be able to use your own fine-tuned model eventually (coming up soon)!
@nfic5856 Год назад
Does gpt3.5-turbo (4k or 16k context) remain cheaper in a small production scale?
@Shishir_Rahman_vg 6 месяцев назад
I have learned enter intermediate level machine learning, now can I start deep learning along with machine learning. please sir tell me
@Tripp111 7 месяцев назад
Thank you. ❤️🍕
@user-ht9st4up8q 8 месяцев назад
Gosh when i use GPT4 , it give me a response saying it can not further summarize personal report and it just stop there.
I think i will just need to switch to a diff models
@ullibowyer 6 месяцев назад
Most people pronounce cache the same way as cash 💲
@frazuppi4897 9 месяцев назад
TL;DR basic RAG with Llama 70B, nothing more, nothing less - (thanks a lot for the video, really well done)
@user-wg3rr9jh9h 7 месяцев назад
You are marvelous! I bow down after witnessing your next level hacking skills 🧐.
@krishnakompalli2606 10 месяцев назад
As you have used RAG method, I'd like to know how it can answer extrapolated questions?
@ciberola285 11 месяцев назад
Hi Nicolas, are you planing to make a video on training OWL-ViT model?
@Precision_Clips Год назад ⁺¹
A deep learning in Pytorch video pleaseee
@ahmadshabaz2724 Год назад ⁺¹
How i get free gpu in web server. I don't have gpu.
@yashsrivastava4878 11 месяцев назад
hey can it be done on chainlit along with LMQL and Langflow added to it, output shows pdfs file as a reference and scores based on whether its retrieves factual data or makes up its own answer
@wasgeht2409 Год назад ⁺²
Nice Video! I think it is impossible to use LLaMA 2 70B on a MacPro M1 with 8 GB RAM :( or is there a any chance without cloud services to use it locally ?
@NicholasRenotte Год назад ⁺¹
Could give it a crack with the GGML models, haven't tried it yet though tbh!!
@micbab-vg2mu Год назад ⁺¹
Great video - thank you
@NicholasRenotte Год назад
Thanks a mil for checking it ou!
@user-gx2wq5qw8n 11 месяцев назад
Hello nicholas,i still not understand the ./model.
@thepirate_kinz1509 11 месяцев назад
Can we have a tutorial on conditional GANs please. And multi feature conditional gans as well 😊
@eel789 Год назад ⁺¹
How do I use this with a react frontend?
@NicholasRenotte Год назад ⁺²
Could wrap the inference side of the app up into an api with FastAPI then just call out to it using axios!
@DGFilmsNYC Год назад ⁺¹
Thank you brother
@NicholasRenotte Год назад
Anytime!!
@vanshpundirv_p_r9796 Год назад
Hey, Can you tell me minimum vram, ram and space required to load and inference from the model?
@Kingupon 11 месяцев назад
I am saying Do I need to know metric level math to get Ahead in machine learning or just Know how things work like the specific or library I'm using Pls answer my Question
@ytsks 6 месяцев назад
When someone tells you they made something "as good as" or "better" than chatgpt, remember that even FB do not compare l70b to current gpt-4 turbo, but the previous release.
@user-vv3jd2qp8w Год назад ⁺¹
What are the difference between the meta released llama 2 models , hf models and quantised model (ggml) files found in the hugging face? Why cant we use the meta/llama-2-70b model ?
@NicholasRenotte Год назад ⁺¹
You could! llama-2-70b is the base model, chat is the model fine-tuned for chat. the GGML model is a quantized model (optimized for running on less powerful machines). The hf suffix indicates that it's been updated to run with the transformers library.
@poornipoornisha5616 Год назад
@@NicholasRenotte The 70b chat model downloaded from meta has consolidated.pth files in it. How to use the files to finetune the model for custom datasets ?
@strangnet 10 месяцев назад
Really interesting, but what was your total cost in the end?
@dkhundley Год назад ⁺¹
Well done! Have you considered a video around a formal fine tuning of one of the lesser variants (e.g. 7B) version of Llama 2? I’d love to see you do one. 😁
@NicholasRenotte Год назад ⁺⁴
On the cards for this week DK, had a client ask for it. Actually got a super interesting use case in mind!
@dkhundley Год назад
@@NicholasRenotte Awesome! Looking forward to it.
@nimaheydarzadeh343 Год назад ⁺¹
it's great , I try to find like this
@mohamedkeddache4202 11 месяцев назад
please help me 😓
( in your videos of licence plate tensorflow)
i have this error when i copy the train command in cmd :
ValueError: mutable default for field sgd is not allowed: use default_factory
@emanuelsanchez5245 11 месяцев назад
Hi!
What was the performancee of the method?
How many tokens per second with that deployment?
@jyothishkumar.j3619 11 месяцев назад
What are the limation on monetizing Ilama Banker app ? Could please explain?
@lashlarue59 11 месяцев назад
Nick you said that you were able to build your lip reading model in 96 epochs. How long in an epoch in real time?
@randomthoughts7838 11 месяцев назад
Hey, is there some structured way(steps) to learn to work with llms. As an analogy, DSA is one structured way to solve coding problems. I am new to llms realm and any advice is much appreciated.
@evanfreethy8375 9 месяцев назад
Wheres the code for the front end website?
@fur1ousBlob Год назад ⁺¹
I wanted to use llama in a chatbot. Do you know if that will be possible? I want to know your opinion. I am using rasa framework to build the chatbot but I am not sure how to integrate it.
@NicholasRenotte Год назад
Sure can! Seen this? forum.rasa.com/t/how-to-import-huggingface-models-to-rasa/50238
@Precision_Clips Год назад ⁺¹
You make me love machine learning more
@NicholasRenotte Год назад ⁺¹
My job here is done 🙌🏼
@tenlancer 11 месяцев назад
what is the response time for each query? and which GPU did you use for this app?
@americanswan 7 месяцев назад
Can someone explains to me the money required to run an AI application on my local machine?
@scottcurry3767 Год назад ⁺¹
RunPod A100 instances are looking scarce, any tips on how to adapt for multiple GPU instances?
@NicholasRenotte Год назад
Going to give it a crack this week, i've got a fine tuning project coming up. Will let you know. The other option is to use the GGML/4 bit quantized models, reduces the need for such a beefy instance. Also, check out RunPod Secure Cloud, a little pricier but seems to have more availability (I ended up using SC when I was recording results for this vid because the community instances were all unavailable). Not sponsored just in case I'm giving off salesy vibes.
@pantherg4236 Год назад
What is the best way to learn deep learning fundamentals via implementation (let's say pick a trivial problem of build a recommendation system for movies) using pytorch in Aug 26, 2023? Thanks in advance
@sunkarashreeshreya451 Год назад
You are brilliant. I've been trying to find a tutorial for slidebot.. could you work on it ?
@AraShiNoMiwaKo 6 месяцев назад
Any updates?
@accelerated_photon2265 11 месяцев назад
Love your videos , would love to deploy a model but the 70B compute is way too much do you have any idea or do you know any website where I can check compute requirements for the 7B model ? Just got my meta access last week thanks again for the video
@ml-techn Год назад
Hi, thanks for the video. which gpu are using? I want to buy and build a dl machine to play with llm.
@vitalis Год назад ⁺¹
can you do a video about analysing trends from websites such as WGSN?
@NicholasRenotte Год назад ⁺¹
You got it!
@sergeyfedatsenka7201 11 месяцев назад
Does anyone know if renting GPU is cheaper than using Open AI API? By how much? Thank Nicholas for your great content!
@farseen1573 4 месяца назад
What platform you are using for 1.69$/hr gpu? Cant find any good gpu cloud providers🥺
@user-pp4ts5ob1u Год назад ⁺¹
Excelet video, you are amazing, please update the video "AI Face Body and Hand Pose Detection with Python and Mediapipe", I can't solve the errors, it would be very useful for my university projects, thank you very much.
@NicholasRenotte Год назад
Will take a look!
@leonardoariewibowo7867 11 месяцев назад
do you use linux? because i cant run this with my windows machine, bitandbytes didn't support windows for cuda >= 11.0
@mfundomonchwe1313 Год назад ⁺¹
This is awesome!
@NicholasRenotte Год назад ⁺¹
Thanks a mil!!
@mfundomonchwe1313 Год назад ⁺¹
please attempt the DAG context {model} next, would love to see that, sort of like Causal inference model@@NicholasRenotte
@user-gd2uy7tz2u 8 месяцев назад
Make a video using it in html and JavaScript
@vikassalaria24 Год назад
i am getting error:ValidationError: 1 validation error for HuggingFaceLLM query_wrapper_prompt str type expected (type=type_error.str). I am using 7b chat llama2 model
@divyanshumishra6739 11 месяцев назад
Did you resolved that error? I am gettting same error and iam unable to solve it
@malice112 Год назад ⁺¹
I am confused is Llama 2 an LLM or did you use the Huggingface LLM ?
@NicholasRenotte Год назад ⁺¹
LLaMA 2 70b is the LLM, we loaded it here using the Hugging Face library.

Следующие

Автовоспроизведение

Using Ollama To Build a FULLY LOCAL "ChatGPT Clone"