Corrections + Few Shot Examples (Part 2) | LangSmith Evaluations

Поделиться
HTML-код
  • Опубликовано: 1 июл 2024
  • Evaluation is the process of continuously improving your LLM application. This requires a way to judge your application’s outputs, which are often natural language. Using an LLM to grade natural language outputs (e.g., for correctness relative to a reference answer, tone, or conciseness) is a popular approach, but requires prompt engineering and careful auditing of the LLM judge!
    Our new release of LangSmith presents a solution to this rising problem, allowing a user to (1) correct LLM-as-a-Judge outputs and then (2) pass those corrections back to the judge as few-shot example for future iterations. This creates LLM-as-a-Judge evaluators grounded in human feedback that better encode your preferences without the need for challenging prompt engineering.
    Here we show how apply Corrections + Few Shot to online evaluators that are pinned to a dataset.
  • НаукаНаука

Комментарии • 3

  • @darkmatter9583
    @darkmatter9583 8 дней назад

    thank you for everything,im bad now ,but i still followijg your channel and supporting, all your effort best of luck and best wishes ❤

  • @lukem121
    @lukem121 7 дней назад

    Great content as usual! I'm really excited for the new advanced customer support agent that will be using TypeScript. Do you have any updates on when the video will be published?

  • @93simongh
    @93simongh 8 дней назад

    In my experience and from some articles I read it appears that asking to provide a numeric score for evaluation is very susceptible to undeterministic, variable results everytime the evaluation prompt is launched. Are these numeric scores shown in langmsith to be trusted?