Expert persona prompting—assigning roles such as expert in math to language models—is widely used for task improvement. However, prior work shows mixed results on its effectiveness, and does not consider when and why personas should improve …
The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One common evaluation approach uses multiple-choice questions to limit the response space. The model is then evaluated by …