The Contextual Bandits Problem: A New, Fast, and Simple Algorithm

Поделиться
HTML-код
  • Опубликовано: 9 сен 2024
  • We study the general problem of how to learn through experience to make intelligent decisions. In this setting, called the contextual bandits problem, the learner must repeatedly decide which action to take in response to an observed context, and is then permitted to observe the received reward, but only for the chosen action. The goal is to learn through experience to behave nearly as well as the best policy (or decision rule) in some possibly very large and rich space of candidate policies. Previous approaches to this problem were all highly inefficient and often extremely complicated. In this work, we present a new, fast, and simple algorithm that learns to behave as well as the best policy at a rate that is (almost) statistically optimal. Our approach assumes access to a kind of oracle for classification learning problems which can be used to select policies; in practice, most off-the-shelf classification algorithms could be used for this purpose. Our algorithm makes very modest use of the oracle, which it calls far less than once per round, on average, a huge improvement over previous methods. These properties suggest this may be the most practical contextual bandits algorithm among all existing approaches that are provably effective for general policy classes. This is joint work with Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford and Lihong Li.

Комментарии • 11

  • @halflearned2190
    @halflearned2190 6 лет назад +25

    Please focus only on the slides. The general viewership of this kind of talk is not interested in the presenter's body language.

  • @rusergeev
    @rusergeev Год назад

    Super nice to learn about the contextual bandits!

  • @Amapramaadhy
    @Amapramaadhy 7 лет назад

    Very enlightening. This is bleeding-edge research in the field

  • @pankajsinghrawat1056
    @pankajsinghrawat1056 Год назад

    ahen there are multiple ads, how to de choose 'k' ads as actions, since actions are just classes right

  • @fuhualin
    @fuhualin 6 лет назад +1

    hope to see the slides

  • @omarrayyann
    @omarrayyann Месяц назад

    Cool

  • @geoffreyanderson4719
    @geoffreyanderson4719 6 лет назад +1

    tldr; is there a 15 minute version of this?

  • @Redactification
    @Redactification 6 лет назад

    So the innovation here is that the speaker has created a faster simpler algorithm to solve a problem based on the number of times that the algorithm references information that by definition one will never has access to? In the heterogenous treatment effects framework this in effect assumes that one has access to all potential outcomes for each observational unit. i.e. you give a cancer patient one of 20 treatments, observe the outcome from that treatment then some how know the outcome from every other treatment given to THAT patient? So what practical use is it?

  • @castonnyabadza6521
    @castonnyabadza6521 3 года назад +1

    The camera person sucks, next time please spend more time on the presentation while the presenter is talking us through

  • @lenkapenka6976
    @lenkapenka6976 3 года назад +1

    FFS show us the slides!!!!!!!!!!!!!!!!