Eugene Yan on Using LLMs as Judges: Insights, Challenges, and Best Practices

Поделиться
HTML-код
  • Опубликовано: 15 ноя 2024

Комментарии • 2

  • @MatijaGrcic
    @MatijaGrcic 2 месяца назад

    Great discussion, thanks.

  • @calmcode-io
    @calmcode-io 2 месяца назад

    Interesting. From my experience with annotators, I found it was less about "firing people" who performed bad and perhaps more about "rewriting the guidelines". Sometimes an annotator takes the guidelines literally (actually not a bad thing) and as a result generates annotations that the guideline designer did not have in mind. This is also partially why it makes a tonne of sense for folks who write guidelines to also annotate on the task.
    It can also help to have an annotation interface where folks are able to flag a task/example as confusion so that it's easy to reflect.
    I have not tried it with LLMs, but my gut says that allowing the LLM to flag an example/task combo as confusing can also really help in designing a few solid prompts.