Abstract
Just for fun, I tested several LLMs for their rhetorical skills by asking them to write a speech, then using two LLMs to judge the results. As expected, the closed commercial models scored highest, but the open-weight gpt-oss 20B model scored surprisingly well. The final results are shown in Figure 1. The LLMs shown in green in the chart are open-weight models that I ran locally on consumer hardware. The bars show the low, high and average scores for each model.
Leave a Reply