Это видео недоступно.
Сожалеем об этом.

Fine-tune Stable Diffusion with LoRA for as low as $1

Поделиться
HTML-код
  • Опубликовано: 8 окт 2023
  • Fine-tuning large models doesn't have to be complicated and expensive. In this tutorial, I provide a step-by-step demonstration of the fine-tuning process for a Stable Diffusion model geared towards Pokemon image generation. Utilizing a pre-existing script sourced from the Hugging Face diffusers library, the configuration is set to leverage the LoRA algorithm from the Hugging Face PEFT library. The training procedure is executed on a modest AWS GPU instance (g4dn.xlarge), optimizing cost-effectiveness through the utilization of EC2 Spot Instances, resulting in a total cost as low as $1.
    ⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️
    - Blog: huggingface.co...
    - Model: huggingface.co...
    - Dataset: huggingface.co...
    - Amazon EC2 G4 instances: aws.amazon.com...
    Follow me on Medium at / julsimon or Substack at julsimon.subst....

Комментарии • 26

  • @hendrikvonfalkenhayn8966
    @hendrikvonfalkenhayn8966 8 месяцев назад +2

    cool, thanks !

  • @Willtry-l6g
    @Willtry-l6g Месяц назад

    Thank you for your videos😊
    Can I know if, when we have 5 pictures for each of two different subject(assume a1 person and a2 person), using stable diffusion can we generate these tree situation in one model? 1.a1 person in different background
    2. a2 person in different backgrounds
    3. a1 and a2 in the same image with different background .
    Do I need to create 3 model for that task?
    How to do that type of task ?

  • @annabarnov6412
    @annabarnov6412 7 месяцев назад +1

    Is there a stable diffusion models in Jumpstart Sagemaker that supports LoRA finetuning?

    • @juliensimonfr
      @juliensimonfr  6 месяцев назад +1

      Not that I know. I think JumpStart fine-tuning is vanilla.

  • @binodsharma1337
    @binodsharma1337 5 месяцев назад +1

    How to do this in sagemaker? What all changes are required?

    • @juliensimonfr
      @juliensimonfr  5 месяцев назад

      huggingface.co/docs/sagemaker/train

  • @user-hj7ji9mf4b
    @user-hj7ji9mf4b 4 месяца назад

    Nice demonstration, i appreciate your work , thanks for sharing with us :) . I was wondering wether you can provide some answers , can we fine tune Lora with more than 4k images? and how does the dataset format should be like? a folder containing the images and csv file with two columns (imageNameColumn + descriptionColumn) , or a specfic format ?

    • @juliensimonfr
      @juliensimonfr  4 месяца назад +2

      Hi, you can use as many images as you want, but I'd recommend starting with a small number, evaluating results, and adding more until you get to the right quality level. Regarding building a dataset, this should help: github.com/huggingface/diffusers/issues/5774

    • @user-hj7ji9mf4b
      @user-hj7ji9mf4b 4 месяца назад

      thanks so much , that was helpfull , i'll take that in consideration ! can you answer one last question if you dont mind ? What's the least VM gpu do you recommend getting if i were to fine stable diff with lora on 4k images? would a t4x4 (64 gpu vram ) be enough?@@juliensimonfr

    • @juliensimonfr
      @juliensimonfr  4 месяца назад

      It should be enough, I think. You can also use QLoRA if you need to shrink the model even more.

  • @robp8468
    @robp8468 7 месяцев назад +1

    Which VM image are you using? I have issues error 'Attempting to unscale FP16 gradients' and i guess this may be linked with some drivers or packages versions

    • @juliensimonfr
      @juliensimonfr  7 месяцев назад +1

      Deep Learning AMI from AWS

    • @user-dk9td7kl8c
      @user-dk9td7kl8c 4 месяца назад

      @robp8468 seems there was an update, just add --mixed_precision="fp16" for train_text_to_image_lora.py and it will run

  • @niu1909
    @niu1909 7 месяцев назад +1

    you've lost me at the text part (around 7:55) how do I batch get the text from a bunch of art images?

    • @niu1909
      @niu1909 7 месяцев назад +1

      do all images have to be square? can they be different?

    • @juliensimonfr
      @juliensimonfr  6 месяцев назад +2

      Not sure what you mean. You have to either the descriptions yourself, or use a image-to-text model to generate captions.

    • @juliensimonfr
      @juliensimonfr  6 месяцев назад +1

      No, but training images should be as similar to generated images as possible. We want to generate 512x512, so that's what we use in the training set.

  • @guyguy12385
    @guyguy12385 2 месяца назад

    do you need to have a description to fine tune it? is it possible to just have an image dataset without a description? and it can learn to produce those images at random?

    • @juliensimonfr
      @juliensimonfr  2 месяца назад

      An SD model needs an input to start generating :) Which is why labels are required.

    • @guyguy12385
      @guyguy12385 2 дня назад

      @@juliensimonfr i mean input can just be noise? like a GAN? why not

  • @maxhenry8886
    @maxhenry8886 2 месяца назад

    Is it possible to fine tune a model such that I feed it images of OpenMaps locations and then it randomly generates its own locations? In this case, would I even need to label the data? Is it possible to train the model on unlabelled data? I just want it to take a set of X maps locations, and then start inventing its own. Is this possible? Thanks!

    • @juliensimonfr
      @juliensimonfr  2 месяца назад +1

      Probably, and yes you would need to label the images. Another idea would be to use an off the shelf image to image model, e.g. huggingface.co/spaces/tonyassi/image-to-image-SDXL. Maybe it can generate variants of existing images.

    • @maxhenry8886
      @maxhenry8886 2 месяца назад

      @@juliensimonfr thanks! that might work. My plan was to show it half the image (the data) and get it to generate something close to the other half (the target).

    • @juliensimonfr
      @juliensimonfr  2 месяца назад

      I see. Another option would be variations on an input image: huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations

    • @maxhenry8886
      @maxhenry8886 2 месяца назад

      @@juliensimonfr thanks! I tried that one and it didn’t work. It might be harder than it sounds! Maybe I need to find a way to fine tune a model first and then get it to create its own variations somehow? But most of the algorithms work by using the label as the target, right? Is there such a model that I can fine tune that purely generates based on using one half of the image as the training data and the other half as the target?