Lesson 7: Practical Deep Learning for Coders 2022

Поделиться
HTML-код
  • Опубликовано: 1 авг 2024
  • 00:00 - Tweaking first and last layers
    02:47 - What are the benefits of using larger models
    05:58 - Understanding GPU memory usage
    08:04 - What is GradientAccumulation?
    20:52 - How to run all the models with specifications
    22:55 - Ensembling
    37:51 - Multi-target models
    41:24 - What does `F.cross_entropy` do
    45:43 - When do you use softmax and when not to?
    46:15 - Cross_entropy loss
    49:53 - How to calculate binary-cross-entropy
    52:19 - Two versions of cross-entropy in pytorch
    54:24 - How to create a learner for prediction two targets
    1:02:00 - Collaborative filtering deep dive
    1:08:55 - What are latent factors?
    1:11:28 - Dot product model
    1:18:37 - What is embedding
    1:22:18 - How do you choose the number of latent factors
    1:27:13 - How to build a collaborative filtering model from scratch
    1:29:57 - How to understand the `forward` function
    1:32:47 - Adding a bias term
    1:34:29 - Model interpretation
    1:39:06 - What is weight decay and How does it help
    1:43:47 - What is regularization
    Transcript thanks to nikem, fmussari, wyquek, bencoman, and gagan from forums.fast.ai
    Timestamps based on notes by Daniel from forums.fast.ai

Комментарии • 14

  • @yoverale
    @yoverale 4 месяца назад +2

    This course is truly priceless, much more deep and didactic than a lot of paid courses out there 🤩 thanks Jeremy

  • @sunderrajan6172
    @sunderrajan6172 2 года назад +22

    You are amazing as always! We all have such a gift and blessed to have you teaching these classes. I am truly amazed with your level of commitment to the society

  • @tumadrep00
    @tumadrep00 Год назад +6

    Jeremy my man, you are truly one hell of a human being. I wish you the best

  • @maraoz
    @maraoz Год назад +9

    I love how Jeremy explains techniques like gradient accumulation. He makes it seem so obvious and powerful that it's hard to forget them. Never again I'll think big models are out of scope for my experiments! :D

  • @merelogics
    @merelogics Год назад +8

    "At this point if you've heard about embeddings before you might be thinking: that can't be it. And yeah, it's just as complex as the rectified linear unit which turned out to be: replace negatives with zeros. Embedding actually means: “look something up in an array”. So there's a lot of things that we use, as deep learning practitioners, to try to make you as intimidated as possible so that you don't wander into our territory and start winning our Kaggle competitions." 🤣

  • @JohnSmith-he5xg
    @JohnSmith-he5xg Год назад +1

    Tremendous content!

  • @pranavdeshpande4942
    @pranavdeshpande4942 Год назад +3

    I loved the collaborative filtering stuff and your explanation of embeddings!

  • @mukhtarbimurat5106
    @mukhtarbimurat5106 Год назад

    Great, Thanks!

  • @vinodjoshi9127
    @vinodjoshi9127 Год назад

    Jeremy - In the deep learning implementation of collaborative filtering the input is concatenated embedding of user and items, however my understanding is that the model is not learning the embedding matrix here, instead it's learning the weights (176 * 100) in the first layer and (100 * 1) in the second layer. Am I missing something? Appreciate your inputs

  • @toromanow
    @toromanow 10 месяцев назад +1

    Hello where can I find the notebook for this? I found Road to the top Part1, Part two but can't find Part 3 anywhere.

  • @tljstewart
    @tljstewart 10 месяцев назад

    Accumulated gradients is a nice trick, however for sufficiently large datasets and run times your memory bandwidth latency will increase by the same multiple you accumulate

  • @matthewrice7590
    @matthewrice7590 Год назад

    I understand the advantage of gradient accumulation in terms of being able to run your training on smaller GPUs by "imitating" a larger batch size when calculating the gradients, but wouldn't a major drawback of the gradient accumulation an increase in training time and ultimately in energy use? i.e. isn't your training going to run half as slow when accum is set to 2? And the more you increase the accum number the slower the training gets because your actual batch sizes are getting smaller and smaller?

    • @ChristopheMeyerPro
      @ChristopheMeyerPro Год назад

      No, the total amount of work to be done is basically the same. There might be some more overhead from more frequent data transfers and also less parallelism optimization opportunity, but it's not like you're multiplying the work by the accum amount. You just batch the same total work by a smaller amount at a time.