Training learned optimizers: VeLO paper EXPLAINED

Поделиться
HTML-код
  • Опубликовано: 3 окт 2024

Комментарии • 24

  • @MachineLearningStreetTalk
    @MachineLearningStreetTalk Год назад +4

    Happy new year Ms. Coffee Bean!!! ☕

  • @zilliard1352
    @zilliard1352 Год назад +6

    I signed up to your sponsor , They should appreciate you more.

  • @deadbeat_genius_daydreamer
    @deadbeat_genius_daydreamer Год назад +7

    Great to see you again

  • @DerPylz
    @DerPylz Год назад +8

    Happy New Year, Ms. Coffee Bean! :D

  • @sriramgangadhar2408
    @sriramgangadhar2408 Год назад +5

    Happy New year, Ms Coffee Bean ✨

  • @DerPylz
    @DerPylz Год назад +4

    I often feel like I need a ccup of of prrdductce workk.

  • @KeinNiemand
    @KeinNiemand Год назад +5

    But can you train a better version of VeLO using VeLO have they tried that yet? What happens when you train VeLO again with the same training data using VeLO

  • @abhishekshakya6072
    @abhishekshakya6072 Год назад +3

    Thanks for the video - that really helped in understanding in short amount of time with clarity
    Question:
    Do you think it will perform well on large genomic datasets or Large Language Models(LLMs)?

    • @AICoffeeBreak
      @AICoffeeBreak  Год назад +3

      It's hard to day without someone having tested this. The paper is quite clear that VeLO performs well on the kind of architectures (transformers: check ✅) and objectives (language modelling ✅) and tasks (genomics ❌) it has seen in training .

  • @danji9485
    @danji9485 Год назад +3

    Since the VeLO neural net must first be pre-trained, shouldn't that be counted in the total training time when comparing benchmarks?

    • @AICoffeeBreak
      @AICoffeeBreak  Год назад +9

      Yes and no.
      No, because it has to be trained just once.
      Yes, but then hyperparameter tuning time should go into the time measures of standard optimizers.

  • @joshuadunford3171
    @joshuadunford3171 Год назад

    I have a question about AI, let’s say I put in “ Asian Police officer with pigtails” in the prompt; I have been told It should have a close resemblance to the police uniform, but not any real police officers unless that officer’s image is put in by random chance. Is this true, will the Image I have been given be of a police officer who doesn’t really exist?

  • @tamimyousefi
    @tamimyousefi Год назад +2

    This is one reasonable step towards not having to train NNs from scratch every time.

  • @josephsantarcangelo9310
    @josephsantarcangelo9310 Год назад +2

    I have been playing with hypergradients

  • @adi331
    @adi331 Год назад +2

    Puhhh seems way too complex to be useful imo.

    • @adi331
      @adi331 Год назад +2

      Also the performance doesn't seem that much better to justify all this complexity.

    • @AICoffeeBreak
      @AICoffeeBreak  Год назад +5

      How do you mean complex? Complex in usage? Or conceptionally? Because the conceptional complexity is taken off you by the author's work. You just need to use their neural net to so weight updates instead of an optimizer. They have a JAX implementation, so yeah, not easy to use in Keras. Yet. 😅

    • @adi331
      @adi331 Год назад +6

      @@AICoffeeBreak Hmmm. Good question. Since the training process of VeLO is so compute intensive, there is no way to fine tune it to new architectures. This wouldn't be an issue if it actually generalizes perfectly (which I doubt.).
      Then if you train something new and it doesn't work you have additional source of error. You might think to yourself oh maybe it's because VeLO isn't working with my architecture.
      Maybe it has a certain kind of affinity for certain activations functions etc.
      Also for big networks the forward pass requires a lot of GPU memory. Which means less memory I can spend on my actual dataset.
      While I agree that in theory it's simple, just one forward pass. I feel like in practice it will be too much of additional headache.
      It's more of an opinion of mine rather than hard facts.
      All of the above combined with the marginal improvement I don't see myself using this anytime soon.
      Thx for the video, the explanation was very good :)