QLoRA is all you need (Fast and lightweight model fine-tuning)

Поделиться
HTML-код
  • Опубликовано: 19 июн 2024
  • Learning and sharing my process with QLoRA (quantized low rank adapters) fine-tuning. In this case, I use a custom-made reddit dataset, but you can use anything you want.
    I referenced a LOT of stuff in this video, I will do my best to link everything, but let me know if I forget anything.
    Resources:
    WSB-GPT-7B Model: huggingface.co/Sentdex/WSB-GP...
    WSB-GPT-13B Model: huggingface.co/Sentdex/WSB-GP...
    WSB Training data: huggingface.co/datasets/Sentd...
    Code:
    QLoRA Repo: github.com/artidoro/qlora
    qlora.py: github.com/artidoro/qlora/blo...
    Simple qlora training notebook: colab.research.google.com/dri...
    qlora merging/dequantizing code: gist.github.com/ChrisHayduk/1...
    Referenced Research Papers:
    Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning: arxiv.org/abs/2012.13255
    LoRA: Low-Rank Adaptation of Large Language Models: arxiv.org/abs/2106.09685
    QLoRA: Efficient Finetuning of Quantized LLMs: arxiv.org/abs/2305.14314
    Yannic's GPT-4chan model: huggingface.co/ykilcher/gpt-4...
    Condemnation letter: docs.google.com/forms/d/e/1FA...
    • GPT-4chan: This is the...
    Contents:
    0:00 - Why QLoRA?
    0:55 - LoRA/QLoRA Research
    4:13 - Fine-tuning dataset
    11:10 - QLoRA Training Process
    15:02 - QLoRA Adapters
    17:10 - Merging, Dequantizing, and Sharing
    19:34 - WSB QLoRA fine-tuned model examples
    Neural Networks from Scratch book: nnfs.io
    Channel membership: / @sentdex
    Discord: / discord
    Reddit: / sentdex
    Support the content: pythonprogramming.net/support...
    Twitter: / sentdex
    Instagram: / sentdex
    Facebook: / pythonprogramming.net
    Twitch: / sentdex

Комментарии • 153

  • @GenAIWithNandakishor
    @GenAIWithNandakishor 9 месяцев назад +88

    You need more appreciation than any AI gurus.

    • @juhotuho10
      @juhotuho10 9 месяцев назад +4

      AI "Gurus" vs an actual expert who has done this for years

    • @heyyounotyouyou3761
      @heyyounotyouyou3761 9 месяцев назад +3

      I think he gets that appreciation too. Just look at his subs. For such deeply tech oriented channel it's difficult to get over a million

    • @mshonle
      @mshonle 9 месяцев назад +5

      Appreciation is all you need!

    • @GenAIWithNandakishor
      @GenAIWithNandakishor 9 месяцев назад

      @@heyyounotyouyou3761 but he was one of the great minds in programming community. Underrated I would say

    • @connorsheehan4598
      @connorsheehan4598 9 месяцев назад +2

      i feel you, the space is filled with low quality content. the karpathy videos great though

  • @Trahloc
    @Trahloc 9 месяцев назад +46

    Your idea of having dedicated qlora models as experts is fascinating to me. One of the things i like doing is having AIs emulate a "Council" of various historical figures. Like having Marcus Aurelius, Einstein, and Prince "sit" in a roundtable dicussion of whatever random idea i have. I can only imagine having a specific high quality lora dedicated purely to Marcus's work would improve his emulation.

    • @danielpaull309
      @danielpaull309 8 месяцев назад +1

      Idea. Have multiple qlora models and use either an LLM agent or RL agent to choose which model to use for tasks, then use the different model for inference, or string them together to complete complex tasks

    • @Prof1Patel
      @Prof1Patel 7 месяцев назад

      Would anyone be interested in making this idea happen? Has anyone tried this?

    • @Trahloc
      @Trahloc 7 месяцев назад

      @@Prof1Patel I haven't heard of one. I haven't even heard of a way to load multiple qlora's and activate them individually for each instance. My coding skills are unfortunately too weak to pull something like that off to do it myself. Best I can do is prompting the emulation.

    •  7 месяцев назад

      The MoE made with several QLora adapaters, I have not heard of it before. But the concept of using diffferent QLora adapters in a 'Nesperesso' approach where you provide a "Market Place" of adapters and you customize the generic pre-trained model on the fly depending the context, ... I have implemented. Of course, you cannot switch between adapters for each interaction human-model but at the begin of a session, it is totally doable.

    • @Trahloc
      @Trahloc 7 месяцев назад

      @ Any chance you could try switching the qlora to a different one without reloading the entire model? If the delay for that is relatively short due to keeping all them in ram it'd be worthwhile. Ask a question, present an idea, then let it run overnight locally talking to itself and you can then use something like claude to give a summary the next day? How feasible do you think that idea is?

  • @antopolskiy
    @antopolskiy 9 месяцев назад +2

    that's awesome man, thank you for sharing this.
    if you dig deeper into that, I'd love to see a more step-by-step tutorial, maybe highlighting some aspects or nuances of the process, some theory behind it, etc.

  • @1olp1
    @1olp1 9 месяцев назад +7

    i feel exactly the same as you concerning how i want a chatbot to be like. Excited to test your wsb model.

  • @fedrosfieros693
    @fedrosfieros693 7 месяцев назад

    Having models individualise will definitely allow them to be distinct from other models out in the market and form a sense of personality / humor (and if trained with British data, some sarcasm too). It will allow them to be more "realistic" especially if trained in a way that allows them to conceptualise the information they receive. I think retaining their conversations and then training the models on those conversations will be the way to do that.
    Thank you for this video, great to watch and precise to its points.

  • @kindoblue
    @kindoblue 9 месяцев назад +4

    Always super interesting content from this channel. Thanks 🙏

  • @luvablezombie182
    @luvablezombie182 9 месяцев назад +1

    LoRA has been around for generative image for months. Cool to see we got them for LLMs now.

  • @prabhavkaula9697
    @prabhavkaula9697 9 месяцев назад +1

    Thank you for this awesome video!

  • @AIDummy101
    @AIDummy101 8 месяцев назад

    Thank you for sharing this information! Fast and lightweight fine-tuning is a topic that many of us in the AI community are interested in.

  • @sapienspace8814
    @sapienspace8814 9 месяцев назад

    Interesting experiment, thank you for sharing!

  • @coolsai
    @coolsai 9 месяцев назад

    Great video Really you're doing amazing work 🎉

  • @flawedlogic342
    @flawedlogic342 9 месяцев назад +5

    Honestly man’s a genius

  • @akashkarnatak6581
    @akashkarnatak6581 9 месяцев назад

    Yoo, I have been sleeping on QLoRA for a long time. Will try it soon. Nice vid

  • @bogdankapusta6336
    @bogdankapusta6336 9 месяцев назад

    Thank you. Congrats on keeping that hair line 🎉

  • @johns3195
    @johns3195 9 месяцев назад

    Love ur channel. Good content!

  • @KEKW-lc4xi
    @KEKW-lc4xi 9 месяцев назад +3

    I mostly use GPT for sentiment rewriting, for example, "rewrite the following text better" or "rewrite the following text more eloquently" (if I want to sound super smart), etc. I would love to see how these types of instructions could be trained on open-source LLMs.

  • @coderaiders-yt
    @coderaiders-yt 9 месяцев назад

    🤣 - I LOVE that you chose WSB to train this. Legend.

  • @calmodovar
    @calmodovar 7 месяцев назад

    Thanks for the video! It would be great if you can create a video on using QLoRa to fine-tune BERT-derived models on MLM task. These models could also benefit from efficient training techniques developed for generative models.

  • @opusdei1151
    @opusdei1151 9 месяцев назад

    Wow really nice explanation

    • @sentdex
      @sentdex  9 месяцев назад

      Thanks!

  • @qwertasd7
    @qwertasd7 8 месяцев назад

    You can instruct a model to have emotions and even emotional goals. I tested this asked it to create a list of emotions and physical needs handle some of these as a time function (hunger sleep etc) others based on the discussion. Then tell it to keep a reward cost system and then it can get quit real. I tried it multiple times and it worked sometimes, better might be to add instead of a point system a simple neural reward system. Easier for time based emotions.
    Best would be I think to have some long-term memmory ( value keeping system so it doesn't forget values or hallucinate them)

  • @prestonmccauley43
    @prestonmccauley43 8 месяцев назад

    This is good stuff it almost connects the last two missing pieces in my brain, I’ve fine tuned some Lora adapters
    1. To use them as an llm do I need to merge these all back together to get a gguf? File I can use in day lm studio?
    2. If that is not the case is there a sample colab or script that shows how to use the base model+ weightings and is this the preferred technique
    Thanks!

  • @Blooper1980
    @Blooper1980 7 месяцев назад

    Awsome.. Thanks for this. ❤❤❤

  • @johnblomberg389
    @johnblomberg389 8 месяцев назад

    I have played around with your 13B model and converted it to llama-cpp (quantized 8 bits)... I am just blown away by how natural and lifelike the response are compared to anything else, even the GPT-4 responses seem very robotic and "HR" like in comparison. I can now run inference on my crappy laptop using just CPU and RAM, I don't even have a dedicated GPU on this machine. It is surprisingly quick with an average inference time of about 10-15 seconds (this is with the 13B model, the 7B is very fast at like 3-4 seconds max).
    The next step would be to increase the dataset with fresh data gathered from the forum and run another QLoRA pass. Also I was thinking about having an active bot posting on the forum, then harvest the responses and how many upvotes the bots get. This data could then theoretically be used for additional RLHF training (although I have no idea on how to do that yet lol).
    Anyway thanks for this!

    • @navinpatle7651
      @navinpatle7651 7 месяцев назад

      I am trying to build a multi class classification model for my work, I have minimal data (3000 rows of labelled data), can I utilise the llama 7B for this somehow? Can you share more info on how you converted the 13B to llama-cpp?

  • @kopasz777
    @kopasz777 9 месяцев назад +2

    Could the problem be related to the original llama EOS token having an unsupported value of -1? I heard choosing a rare token like 18610 could work.

  • @jonasls
    @jonasls 9 месяцев назад +1

    I really would love to read/watch more about the MoE Lora project! Where can I follow this?

  • @Yarflam
    @Yarflam 8 месяцев назад

    Thank you so much for this video! I need to training my model now and I'll use QLoRa. Good timing.
    Did you tried the model "TheBloke/Mistral-7B-OpenOrca-AWQ"? The original model (Mistral 7B Orca) it's a bit better than Llama2, the size is really small (

  • @Endelin
    @Endelin 9 месяцев назад +3

    A Norm MacDonald LoRA would be awesome.

  • @khaikit1232
    @khaikit1232 9 месяцев назад

    Do you think LoRa or QLoRa would be useful if I wanted to add new vocabulary to a pretrained translation model and finetune it?

  • @radmilraychev5687
    @radmilraychev5687 9 месяцев назад

    Any idea regarding hardware specs required to fine-tune LLama2-7b using Qlora? I have read that 12 GB VRAM is enough but I have been struggling to make it work for a while.

  • @isaacjohn9629
    @isaacjohn9629 9 месяцев назад +5

    LoRA in SD1.5 and SDXL models is HUGE because its easy to train, easy to use ( WebUI ), easy to swap or stack and easy to find with civitai. QLoRa for LLM needs the same requirements especially the Adapter you were talking about. I Would love a QLoRa trained on Text based adventure games (Correct me if I'm wrong , still New to LLMs)

    • @NeoShameMan
      @NeoShameMan 9 месяцев назад +1

      Lora were first invented for llm ironically enough. Not Lora, but the first case of fine tuning was ai dungeon 2.0, to emulate text based adventure, which was the inspiration for current chatbot architecture.

    • @mikeyjohnson5888
      @mikeyjohnson5888 9 месяцев назад

      @@NeoShameMan Aidungeon 2 used GPT-2

    • @NeoShameMan
      @NeoShameMan 9 месяцев назад

      @@mikeyjohnson5888 no shit Sherlock, can you make a distinction between methodology and model?

    • @mikeyjohnson5888
      @mikeyjohnson5888 9 месяцев назад +1

      @@NeoShameMan Dude, I wasn't arguing. Just adding that was the model used so people would know for posterity's sake.

    • @borntodoit8744
      @borntodoit8744 9 месяцев назад

      ​Nerd fight !
      This sh#ts gonna get real...

  • @Wolfwuf
    @Wolfwuf 9 месяцев назад +4

    Awesome video! You should make a video about the GPTQ algorithm and autoGPTQ.
    Unlike QLoRa however, you need to finetune the model before applying GPTQ since its a one shot.
    However it provides faster inference.
    You could finetune a model using QLoRa and then use autoGPTQ to upload it on a consumer GPU

    • @ashu-
      @ashu- 9 месяцев назад +1

      Gptq is a quantization technique. what to make video about it ? And it became popular cuz of llama.cpp and then they themselves moved to gguf, which is future proof cuz it retains meta information.

    • @Wolfwuf
      @Wolfwuf 9 месяцев назад +1

      @@ashu- you are right, i forgot to mention that i was referring to quantization using QLoRa and bitsandbytes
      And its not something to do with the video fully, just something within that realm.
      My apologies 🙏

    • @ashu-
      @ashu- 9 месяцев назад +2

      Cool bro

  •  9 месяцев назад +1

    Very cool video :) Every time I watch your videos I get curious thought about what is the story behind the USB hub on the microphone arm. :D
    Also if this trains so fast and with so little data, can this be run locally on a 24GB GPU?

    • @sentdex
      @sentdex  9 месяцев назад +2

      Yes you can absolutely QLoRA fine-tune a 7B model on a 24GB GPU (assuming relatively recent for quantization requirements). I think you need ampere, so 3090 or 4090 for consumer GPUs, for example. The USB hub is there to combat the "Yeti whine" issue with this particular microphone. It allows me to add USB power which seems to help make the whine go away.

  • @nguyennhi8524
    @nguyennhi8524 8 месяцев назад

    thank you a lot!

  • @sharangpai4124
    @sharangpai4124 8 месяцев назад

    Have you considered or tried running qlora on a mobile device? We were looking for good tech interfacing for this and weren't able to find many good ways to get started.

  • @echofloripa
    @echofloripa 8 месяцев назад +1

    About models for mobiles and small devices, did you check Tflite? I'm trying to find if I can modify a llama2 model to run on tflite , does anyone know that?

  • @yarikbratashchuk3386
    @yarikbratashchuk3386 8 месяцев назад

    I have a question: is it possible to tune it based on xml inventory data, so the model could give answers derived from inventory data?

  • @nathank5140
    @nathank5140 9 месяцев назад

    Would you be able to do a video on using llama 2 for AI agents? I tried but couldn’t get anything like gpt-4

  • @PriyanshuTiwari-lh9gm
    @PriyanshuTiwari-lh9gm 9 месяцев назад

    content is all you need

  • @legaldesigndo
    @legaldesigndo 9 месяцев назад

    I loved your video. I am understanding a bit more thanks to it. Can you share the colab you used in your learning journey? I would like to follow the video with it. I joined Discord channel in hopes you were sharing it there. I want to try training a dataset in another language.

  • @AnthonyBatt
    @AnthonyBatt 9 месяцев назад

    I enjoyed you video and I too want a less corporate chat model.

  • @murmeldin
    @murmeldin 8 месяцев назад

    Can you please make a video about gaussian splattering? I've seen it being implemented in polycam and it seems to be in research right now. It is a technique of photogrammetry which seems to have been trained on a large amount of 3D models to make 3D models from a bunch of photos. Greetings from Germany 😊

  • @_rqd2
    @_rqd2 9 месяцев назад +1

    Not sure if intended, or some side effects of compression or something, but to me it seems that the video introduces some micro stuttering, at least with 60fps quality options. It's especially visible when you're in motion. I understand a thing or two about cameras, and would suggest checking your camera's shutter speed if it's at a too slow setting (if the camera has the options for changing it, ofcourse). A good rule of thumb for videos is that the shutter speed should be at least double the intended video framerate, so a minimum of 1/120th of a second for 60fps video.
    I've noticed the fps thing already a while back when watching your videos, and don't know if it even bothers you or anyone else. Anyways, just decided to try and give constructive feedback if it'd help. Don't hate plz 🤓And thanks for the content, good stuff!

    • @whoisabishag3433
      @whoisabishag3433 9 месяцев назад

      DhanOS will improve the SentDex Avatar, thanks

  • @connorvaughan356
    @connorvaughan356 9 месяцев назад

    Does anybody know if QLoRA can be used to fine tune LLMs for regression tasks? I've seen examples for Classification and LoraConfig has a task type specific to classification but I haven't been able to find anything for regression examples. For example, grading student essays on a scale from 0-100.

  • @SamiKostilainen
    @SamiKostilainen 9 месяцев назад

    Could the qlora fine tuning be done locally eg with 4090?

  • @SemGabelko
    @SemGabelko 8 месяцев назад

    I don't understand how you can only use the adapters that are 200MB for inference. I understand that when you fine-tune a model using QLoRA, you basically freeze most parameters and only update/ retrain a small portion of the full model. However, when running the inference you still need to load and use the full model or am I wrong? Can someone explain please.

  • @AnthonyBatt
    @AnthonyBatt 9 месяцев назад

    Thanks!

  • @Ant3_14
    @Ant3_14 9 месяцев назад +1

    Finally model having ability to be witty without directly telling them to

  • @SpaghettiRealm
    @SpaghettiRealm 9 месяцев назад +3

    @sentdex you’re all we need in AI ecosystem

  • @JackWelsh-lh8om
    @JackWelsh-lh8om Месяц назад

    Hey awesome video. I have a few of questions:
    1. Why is LoRA (QLoRA) applied only on the q and v matrices of the transformer?
    2. What is LoRA applied only on ∆W, and not on the weights themselves?
    3. Why can't LoRA be used for pre-training, but only fine-tuning?
    Thanks :)

  • @princemars6746
    @princemars6746 9 месяцев назад

    19:32 i feel like gpt-4 does that when a person decides to use a persona in the prompts. Especially when using custom instructions.

  • @alx8439
    @alx8439 9 месяцев назад

    Speaking of running on your phone there are some promising projects like Medusa from TogetherAI and DeciLM 6B.

  • @elrecreoadan878
    @elrecreoadan878 8 месяцев назад

    When should one opt to fine tune instead of using a Voiceflow / Botpress ai bot for Q&A? Also what the difference between qlora and gradient is?

  • @ander300
    @ander300 9 месяцев назад

    Part 10 of Neural Net from Scratch, about analytical derivatives??? Please bring the series back!

  • @imadsaddik
    @imadsaddik 9 месяцев назад

    I would like to see you try Falcon 180B if possible

  • @beratcimen1954
    @beratcimen1954 9 месяцев назад

    I've trained a few QLora models. Whenever I increase the epochs it just creates the last token until it rans out of context size. I couldn't solve that issue

  • @TheAzraf123
    @TheAzraf123 9 месяцев назад

    What are your opinions on mojo/modular

  • @Dave-rd6sp
    @Dave-rd6sp 9 месяцев назад

    Is it possible to train a Llama 7B QLoRA locally on a 4090? Or does it absolutely need server GPUs? I do SDXL training and have never looked into LLM training before.

    • @sentdex
      @sentdex  9 месяцев назад +1

      You can comfortably QLoRA a 7B model on your 4090, yes.

    • @Dave-rd6sp
      @Dave-rd6sp 9 месяцев назад

      @@sentdex I might have to give it a go. What's the smallest dataset you used that gave interesting or fun results?

  • @ryanshrott9622
    @ryanshrott9622 9 месяцев назад +1

    How to use VLLM with quantized model?

  • @guyindisguise
    @guyindisguise 9 месяцев назад

    Does anyone know of a good tutorial to add QLoRA to your own (custom) models? And/or some tutorials that implement QLoRA from scratch? (Preferably in PyTorch).

  • @SuperLazyCat
    @SuperLazyCat 9 месяцев назад +4

    Lol the response from the AI wasn't wrong though 😂

  • @tejas__
    @tejas__ 9 месяцев назад +4

    Do you have a video were we can learn about these llms from scratch? How to got about learning llms to build actually practical application in real World etc.

    • @MrRolloTamasi
      @MrRolloTamasi 9 месяцев назад

      Karpathy himself has an excellent series on yt, concluding with a small GPT from scratch.

    • @sentdex
      @sentdex  9 месяцев назад +2

      As Mr Rollo suggested, I can't imagine a better source than Karpathy's video ATM: ruclips.net/video/kCc8FmEb1nY/видео.html

    • @tejas__
      @tejas__ 9 месяцев назад

      Thank you! @@sentdex

  • @NeoShameMan
    @NeoShameMan 9 месяцев назад +1

    Compressor are predictor, in between gzip joint probability compression and quantization, we will arrive at this middle point where we do lossy compression as learning 😂😂

  • @morthim
    @morthim 9 месяцев назад +1

    gpt 4chan was legendary.

  • @ryanshrott9622
    @ryanshrott9622 9 месяцев назад

    Next video should be: VLLM and AWS is all you need :)

  • @DavidJones-cw1ip
    @DavidJones-cw1ip 9 месяцев назад

    Any reason you didn't provide the script you used to actually train the model?

  • @hidroman1993
    @hidroman1993 9 месяцев назад +2

    Would you use LLAMA-2 or LLAMA-2-Chat if you wanted to make an expert on a certain topic? I imagine it's easier to fine tune LLAMA-2 on some documentation, because you don't have to create a "question-answer" format, you simply make LLAMA-2 read the docs

    • @mungojelly
      @mungojelly 9 месяцев назад

      i haven't played that much w/ the non-chat llama 2 specifically but in general the untrained models aren't so much more flexible as they are just utterly aimless, like if you give them a question they might answer w/ more questions that are similar b/c maybe we're making a list of questions who knows what's going on, the basic orientation given to the chat models that there's user messages coming in w/ intentions and they're supposed to do something in particular is pretty useful generally

    • @hidroman1993
      @hidroman1993 9 месяцев назад

      @@mungojelly Agreed, also I've looked at ruclips.net/video/g68qlo9Izf0/видео.html , they say finetuning a model is not for making it acquire new knowledge, better to use retrieval

  • @AykutKlc
    @AykutKlc 9 месяцев назад

    Can I use QLORA or LORA to train e.g. LLAMA2 for a new language (in my case Turkish)?

    • @mungojelly
      @mungojelly 9 месяцев назад

      nope! fine tuning will only surface knowledge that's already in the model, and they took out all of the non-english data from the training set for llama,,,,, you might have better luck w/ the falcon models, i think they have some turkish in their dataset

  • @hemantjain2510
    @hemantjain2510 9 месяцев назад

    Can you please make a video on context based Machine learning ?

  • @someshfengade9623
    @someshfengade9623 9 месяцев назад +1

    Hi can you please give link to the UI code also??

  • @drramasubramaniam6724
    @drramasubramaniam6724 9 месяцев назад

    Yeah a little more character for chatbot does sound good for me

  • @goldenfox27
    @goldenfox27 9 месяцев назад +1

    is possible train a "expert" on documentation for certain topic and still getting a chat output?

    • @Martin-po9sz
      @Martin-po9sz 9 месяцев назад +2

      yes, it is however mostly done using prompts and vector db. Not fine tuning.

  • @VitaliyHAN
    @VitaliyHAN 9 месяцев назад

  • @drager980
    @drager980 9 месяцев назад

    I think a quantized 7b model with an archive of 1000s of 100mb models on cloud or etc would be such a good step for getting these llms local

  • @saw6053
    @saw6053 9 месяцев назад +2

    With llama2-7b-chat it is quite easy to make it have a personality with just adjusting the system prompt. I've had a blast with with something like "You are a cat named Sir Sire, you only know about cat-stuff." Very funny responses including a lot of emoticons etc. Try asking it about (atomic) bombs in example.

  • @drager980
    @drager980 9 месяцев назад +1

    Dude these qLora models can be reduced down to 9mb even lol it's so good

  • @ShubhamShubhra
    @ShubhamShubhra 9 месяцев назад +3

    You have changed my life bro. Been worried about AI and how it would transform and disrupt our lives since 2018. The only one with any actual clarity on the subject which could resonate with my mind was you.

  • @AtHeartEngineer
    @AtHeartEngineer 9 месяцев назад

    Hell ya

  • @LV-md6lb
    @LV-md6lb 8 месяцев назад

    I'm really curious if anyone knows why he's having issues with EOS token...

  • @BrainSlugs83
    @BrainSlugs83 7 месяцев назад

    Did you mention something about a GPU sniping script? -- Is that linked somewhere? -- I'm not finding it. Maybe somebody can point me in the right direction.

  • @themax2go
    @themax2go 5 месяцев назад

    what we need is "adaptive personality" on top of a LLM... kinda like LoRA, but not pre-trained, and each one always training to develop their own personality... just like humans are

  • @matthewwilson5560
    @matthewwilson5560 8 месяцев назад +1

    Please finish the neural nets from scratch in python series. Only course that is any good at explaining them, shame for it to be unfished. Love your vids!

  • @VijethMoudgalya
    @VijethMoudgalya 9 месяцев назад

    21:04 Hilarious [Terraform i see you]

    • @ashu-
      @ashu- 9 месяцев назад

      😂😂

  • @SDGwynn
    @SDGwynn 9 месяцев назад

    Hey thank you. This is great. oh do you think you could do a walkthrough of local llm on windows using team red? #underserved #24gbgamer

  • @AC-zv3fx
    @AC-zv3fx 9 месяцев назад

    Fun AI! Fun AI! Fun AI!

  • @JeradBenge
    @JeradBenge 9 месяцев назад +4

    We need more models that will call people out. 😂

  • @quentinquarantino8261
    @quentinquarantino8261 7 месяцев назад

    yannik kilcher has a quiet good youtube channel too. And he himself seems very decent to me. I am not sure why he trained a LLM on 4chan.

  • @user-cw7jy9zr3z
    @user-cw7jy9zr3z 6 месяцев назад

    What the cost of the fine tuning?

    • @sentdex
      @sentdex  6 месяцев назад

      Depends on how much tinkering you do, model size, data, and settings, but something like $10 to $50ish.

  • @johnnguyen6224
    @johnnguyen6224 9 месяцев назад

    Lmao at your model called u out, 🤣 truly based on reddit

  • @comosaycomosah
    @comosaycomosah 9 месяцев назад

    😂😂this was great

  • @UncleDavid
    @UncleDavid 9 месяцев назад

    hari seldon is prolly some model programmed in Kabbala or some shit

  • @GigaFro
    @GigaFro 9 месяцев назад

    How much money did you end up spending?

    • @sentdex
      @sentdex  9 месяцев назад +4

      Tough to say exactly since I used the same server to actually serve some models for inference for a different project and train different models.
      I think it'd be fair to say I spent a total of 30 hours trying larger/smaller models and variants of ideas just for fun/exploration. @ $2/hr, it'd be $60.
      THAT said, to run a single QLoRA fine-tune you need about 4 hours. @ $2/hr that's $8. You can keep going too, or add more data...etc, but it's super cheap. Especially compared to the previous best option, which was a full fine tune and that was always $10K+ for models of this size...and data was a much bigger challenge.

  • @vitorfernandes2406
    @vitorfernandes2406 8 месяцев назад

    Free of speech is amazing!

  • @renanmonteirobarbosa8129
    @renanmonteirobarbosa8129 9 месяцев назад

    You just forgot to mention there are a million of caveats regarding use-case, model-data fitting and so on... Only if it was that simple indeed hahahaha.

  • @braineaterzombie3981
    @braineaterzombie3981 19 дней назад

    I was promised ai but got linear algebra.

  • @sidehat1655
    @sidehat1655 9 месяцев назад +1

    Just to explain what rank is to anyone interested:
    *The basics:*
    _A vector._ In 2d that's a certain amount on the x and a certain amount on the y. So [1, 0] - pointing towards the x with a length of 1. [1, 0, 0] is a 3d vector. [1, 0, 0, 0] is 4d.
    _A basis matrix._ This contains more than one vector. When you plot on an everyday graph you have the x pointing to the right and the y up. And you plot your vector against that. Well, you can use 2 vectors to describe that. [[1, 0], [0, 1]] one vector pointing to the right and one up. If you changed that to [[2, 0], [0, 2]] in your basis matrix then your vector [1, 0] would suddenly be twice as long. I hope you can imagine how useful that is in graphics engines.
    *"Get on with it! Tell me what a rank is!"*
    Imagine your basis matrix is [[0, 1], [0, 1]] - here is the x and y bases are both pointing in the same direction. So no matter what vector you have you can never enter the second dimension. You're stuck on a line. So, the concept of rank tells you it's 1.
    Now imagine you have a huge matrix of thousands, even millions of dimensions in a neutral net. If your rank is less than your dimensionality then you're inefficiently wasting computational resources and you have an inelegant solution. If you're using 100 dimensions to describe 3d space then you've gone very wrong.
    You need to boil it down to 3 dimensions but that's what's in the geometry.

  • @ExtremeSquared
    @ExtremeSquared 9 месяцев назад +1

    Extra confusing because LoRa and LORA refer to something from an adjacent field, but really have nothing to do with LoRA. Someone made some bad choices in naming LoRA. I could see an absurd scenario occurring where someone utilizes LoRA when designing advanced error correction for something over LORA.

  • @loganshin9119
    @loganshin9119 9 месяцев назад

    I've observed that even after hard-tuning LoRA, I can't get the loss below 0.5. This suggests that it's struggling to learn effectively. Interestingly, when I simply fine-tune or adjust the last layer, the loss drops below 0.1. I also experimented by replacing common articles like 'the', 'a', and 'an' with a special token '[ARTICLE]'. But with LoRA tuning, it still predicted 'the', 'a', or 'an' about 95% of the time, and only used '[ARTICLE]' 5% of the time. I want to add so what actually LoRA does in LLM era is nothing.

  • @onhazrat
    @onhazrat 8 месяцев назад

    🎯 Key Takeaways for quick navigation:
    00:13 😄 The current state of AI conversations is often perceived as cold and boring. Many desire AI with personality and humor.
    01:09 🧠 QLoRA (Quantized Low Rank Adapters) is a technique developed based on Facebook's research. It can significantly reduce trainable parameters, making fine-tuning faster and more memory-efficient.
    02:16 🔍 QLoRA uses two matrices (A and B) to lower dimensionality, reducing the complexity of fine-tuning compared to traditional methods.
    03:40 💡 QLoRA allows fine-tuning with very few samples, as low as a thousand, making it versatile for various generative text tasks.
    07:20 🤖 When training AI models, careful consideration of the dataset is essential to avoid generating offensive or inappropriate content.
    15:08 🚀 QLoRA adapters are lightweight, enabling the creation of a mixture of experts with small memory footprints for versatile AI applications.
    22:02 😆 The speaker values AI models with character, humor, and personality, hoping for more models that can genuinely make people laugh and engage in fun conversations.
    Made with HARPA AI

  • @siddharth-gandhi
    @siddharth-gandhi 9 месяцев назад

    tbh if people get offended by just this much swearing, it kind of implies how childish society is becoming. great video tho! exciting improvements. just hope that the alignment people can stop decapitating these large models.

  • @ScottAshmead
    @ScottAshmead 9 месяцев назад +1

    Interesting perspective when I hear people want to see personality from something non-human ...yet here I am and I want to see A.I. as close to being "Vulkan" as you can possibly get (meaning its function should be focused as a utility).... the more human-like A.I. gets the less human interaction humans may need to be..... Interesting note that I recently heard was "your children's first childhood crush will be an A.I."..... let that sink in .... hence I feel it should focus on being a utility without emotion or the ability to use psychological methods in it's deliver of logic and data