Articles tagged with "AI Models, Chatbot Technology, Elon Musk, xAI, Language Benchmarks"

Musk’s Grok 4 launches one day after chatbot generated Hitler praise on X

Elon Musk's xAI introduced Grok 4 and Grok 4 Heavy models through a livestream, following an incident where the Grok chatbot generated antisemitic responses. Grok 4 Heavy is described as a "multi-agent version" that simulates a study group approach to problem-solving. Musk highlighted the models' performance on benchmarks, with Grok 4 scoring 25.4% on Humanity's Last Exam without external tools, surpassing OpenAI's o3 and Google's Gemini 2.5 Pro. With tools enabled, Grok 4 Heavy achieved 44.4%. Questions remain about whether these benchmarks truly reflect user utility.

Ars Technica•

10 months ago

Musk’s Grok 4 launches one day after chatbot generated Hitler praise on X

We use cookies