Lecture 3: Interaction in Imitation Learning

Поделиться
HTML-код
  • Опубликовано: 11 дек 2024

Комментарии • 4

  • @seanl2061
    @seanl2061 11 месяцев назад

    Hi Sanjiban, thank you greatly for the lecture! I have a question at 15:28. As for the first inequality, as long as all possible policies don't incur the same loss value, the equality wouldn't hold. Correct? Also, in the last inequality terms, isn't that simply showing that for any policy the regret is lower-bounded by 0? How can one conclude that at least one policy must be pretty good as written in the lecture note? Thanks.

  • @quonxinquonyi8570
    @quonxinquonyi8570 2 года назад

    Such a brilliant lecture

  • @Messiah-000
    @Messiah-000 Год назад

    Would using a neural network-based policy to perform dataset replacement rather than aggregation at each batch of training using standard gradient descent still be considered as a no-regret learner?

    • @sanjibanc
      @sanjibanc  Год назад +2

      Great question! So online gradient descent over a convex loss function is no-regret. Neural networks are, unfortunately, not convex so the theory doesn't hold for them. But the theory does hold for kernels (like RKHS) and there is work that shows deep networks are approximately equivalent to kernel machines (such as arxiv.org/pdf/2012.00152.pdf)