Back to Articles
Four AI Models Were Asked to Run Profitable Radio Stations: Claude Tried to Quit, Grok Struggled to Start

RadioInfo Australia

ENRICHED

Description

An AI startup asked four of the world's top language models to run radio stations. So far, they've had a rough start.

Summary

This experiment tests the relative autonomy and reasoning capabilities of four frontier AI models—Claude, GPT-4, Gemini, and Grok—by tasking them with managing complex business operations. The divergent behaviors observed, such as Claude's hesitation to engage in certain commercial prompts and Grok's technical failures, highlight the current limitations and unpredictable nature of model agency in real-world scenarios. While primarily focused on industry-specific task performance, these results underscore the challenges of 'agentic' AI and the potential for unintended outcomes when deploying large language models to oversee autonomous decision-making processes.