Hi Sanjiban, thank you greatly for the lecture! I have a question at 15:28. As for the first inequality, as long as all possible policies don't incur the same loss value, the equality wouldn't hold. Correct? Also, in the last inequality terms, isn't that simply showing that for any policy the regret is lower-bounded by 0? How can one conclude that at least one policy must be pretty good as written in the lecture note? Thanks.
Would using a neural network-based policy to perform dataset replacement rather than aggregation at each batch of training using standard gradient descent still be considered as a no-regret learner?
Great question! So online gradient descent over a convex loss function is no-regret. Neural networks are, unfortunately, not convex so the theory doesn't hold for them. But the theory does hold for kernels (like RKHS) and there is work that shows deep networks are approximately equivalent to kernel machines (such as arxiv.org/pdf/2012.00152.pdf)
Hi Sanjiban, thank you greatly for the lecture! I have a question at 15:28. As for the first inequality, as long as all possible policies don't incur the same loss value, the equality wouldn't hold. Correct? Also, in the last inequality terms, isn't that simply showing that for any policy the regret is lower-bounded by 0? How can one conclude that at least one policy must be pretty good as written in the lecture note? Thanks.
Such a brilliant lecture
Would using a neural network-based policy to perform dataset replacement rather than aggregation at each batch of training using standard gradient descent still be considered as a no-regret learner?
Great question! So online gradient descent over a convex loss function is no-regret. Neural networks are, unfortunately, not convex so the theory doesn't hold for them. But the theory does hold for kernels (like RKHS) and there is work that shows deep networks are approximately equivalent to kernel machines (such as arxiv.org/pdf/2012.00152.pdf)