TRAIN YOUR OWN AI - For Beginners ! Finetune Any LLM for Free (No VRAM) ft. Llama 3.2

Поделиться
HTML-код
  • Опубликовано: 19 дек 2024

Комментарии • 10

  • @SteveRogers-q6c
    @SteveRogers-q6c 2 месяца назад +4

    This is probably the best video for finetuning i have came so far. it is very detailed. but the only thing missing is "custom dataset". can you please make a video on how to make a custom dataset, i mean, the correct format and everything we should be following to make our correct "custom dataset", and also if possible do it for the latest llama 3.2 3b model. please also show us that after making the dataset, where to and how to upload the dataset for the finetuning. please make it very detailed.

  • @TheZEN2011
    @TheZEN2011 2 месяца назад

    The best model training video currently out there!

  •  2 месяца назад

    🎉 thanks!

  • @twofii9.Official
    @twofii9.Official Месяц назад

    Can I do that on mobile 😢

    • @xclbrxtra
      @xclbrxtra  Месяц назад

      Yes, it uses google Collab so it's online

  • @deathfxu
    @deathfxu 2 месяца назад +1

    What if we want to add multiple datasets to the training? Do we run that code block multiple times with the different urls put in each time, or will that break it?

    • @xclbrxtra
      @xclbrxtra  2 месяца назад

      What I would suggest is run 1 dataset, and once the finetuned model is saved in your collab, use than location in place of the original hugging face model link to train on another dataset. This way you would have both llms with 1. Single dataset 2. Both datatset and can compare if it is being overtrained.

    • @deathfxu
      @deathfxu 2 месяца назад

      @@xclbrxtra Thanks for the fast response. Alternatively, you can change the code at the end to this to combine any number of datasets:
      from datasets import load_dataset, concatenate_datasets
      dataset1 = load_dataset("gbharti/finance-alpaca", split = "train")
      dataset2 = load_dataset("practical-dreamer/RPGPT_PublicDomain-alpaca", split = "train")
      dataset3 = load_dataset("vicgalle/alpaca-gpt4", split = "train")
      dataset4 = load_dataset("iamtarun/python_code_instructions_18k_alpaca", split = "train")
      dataset = concatenate_datasets([dataset1, dataset2, dataset3, dataset4])
      dataset = dataset.map(formatting_prompts_func, batched = True,)

    • @deathfxu
      @deathfxu 2 месяца назад +1

      @@xclbrxtra Or just delete my comment with a code workaround... I even made sure the example used non-overlapping datasets to prevent overtraining. But thanks for nothing..........

  • @AdventurousKing
    @AdventurousKing 2 месяца назад

    I wanted a video like this for a long time 🫡, thanks sir❤