Interesting. From my experience with annotators, I found it was less about "firing people" who performed bad and perhaps more about "rewriting the guidelines". Sometimes an annotator takes the guidelines literally (actually not a bad thing) and as a result generates annotations that the guideline designer did not have in mind. This is also partially why it makes a tonne of sense for folks who write guidelines to also annotate on the task. It can also help to have an annotation interface where folks are able to flag a task/example as confusion so that it's easy to reflect. I have not tried it with LLMs, but my gut says that allowing the LLM to flag an example/task combo as confusing can also really help in designing a few solid prompts.
Great discussion, thanks.
Interesting. From my experience with annotators, I found it was less about "firing people" who performed bad and perhaps more about "rewriting the guidelines". Sometimes an annotator takes the guidelines literally (actually not a bad thing) and as a result generates annotations that the guideline designer did not have in mind. This is also partially why it makes a tonne of sense for folks who write guidelines to also annotate on the task.
It can also help to have an annotation interface where folks are able to flag a task/example as confusion so that it's easy to reflect.
I have not tried it with LLMs, but my gut says that allowing the LLM to flag an example/task combo as confusing can also really help in designing a few solid prompts.