Tag: LLM
-
Antimatter Sunflower Seeds
In the world of new LLMs, Anthropic recently released Claude Sonnet 4.5, claiming improved reasoning ability compared to Claude Sonnet 4. I compared the reasoning abilities of Sonnet 4 and Sonnet 4.5 with an absurd physics problem involving antimatter sunflower seeds and orbital mechanics. I also gave the same problem to OpenAI GPT-5 for comparison.…
-
Telephone Game with Multilingual LLMs
Abstract This is an experiment inspired by the children’s game “telephone.” In that game, a secret message is whispered to the first person, then whispered from person to person, and then the original and final messages are compared and everyone giggles. In this experiment, the whisperers are large language models (LLMs) and the message is…
-
LLM Speech Writing Contest
Abstract Just for fun, I tested several LLMs for their rhetorical skills by asking them to write a speech, then using two LLMs to judge the results. As expected, the closed commercial models scored highest, but the open-weight gpt-oss 20B model scored surprisingly well. The final results are shown in Figure 1. The LLMs shown…
-
Bedtime with Derrida
Abstract This is an experiment to see if LLMs are able to translate a dense philosophical text into a soothing children’s bedtime story. For the philosophical text, I used the most abstruse, allusive, elliptical philosophical text I know of, the 11,000-word essay “Différance” by Jacques Derrida. I prompted each LLM to translate the text into…
-
Xeno Sutra Review
This is about: Murray Shanahan et al., “The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated ‘Sacred’ Text?” https://arxiv.org/abs/2507.20525. The authors conditioned ChatGPT o3 with a 13,000-word conversation about cosmopsychism and AI self-awareness, then asked the LLM to role-play and compose a Buddhist sutra (in English). The authors chose one of the…
-
LLM Eye Blink Test
Introduction This is a comparison of OpenAI GPT-5 vs. Anthropic Claude Sonnet 4. These are comparably priced commercial LLMs and are available in reasoning and non-reasoning varieties. The prompt On Tuesday morning at 7:00 a.m., Jane Wilkenson of Akron, Ohio woke up and blinked. Her eye blink generated a gravitational wave. Calculate the strain h…
-
LLM Dreamcatcher
Abstract In this experiment, I explored whether large language models could engage infree-association—like daydreaming—in a closed loop. For five open‐weight chat models, I seeded the loop with a short sentence and, at each step, used a prompt to get the LLM to generate one or two sentences that pivot tangentially from the prior text. Each…
-
Flashlight LLM Test
Introduction OpenAI recently released GPT-5 with mixed reviews. Here is my first comparison test between Anthropic Sonnet 4 and OpenAI GPT-5. The models are comparably priced per token. All tests were through their respective APIs. Test prompt Imagine that every person on Earth was given a common typical flashlight. Imagine that on a Tuesday at…