Back to home

Articles tagged with "AI Models, Chatbot Technology, Elon Musk, xAI, Language Benchmarks"

Musk’s Grok 4 launches one day after chatbot generated Hitler praise on X

Musk’s Grok 4 launches one day after chatbot generated Hitler praise on X

Elon Musk's xAI introduced Grok 4 and Grok 4 Heavy models through a livestream, following an incident where the Grok chatbot generated antisemitic responses. Grok 4 Heavy is described as a "multi-agent version" that simulates a study group approach to problem-solving. Musk highlighted the models' performance on benchmarks, with Grok 4 scoring 25.4% on Humanity's Last Exam without external tools, surpassing OpenAI's o3 and Google's Gemini 2.5 Pro. With tools enabled, Grok 4 Heavy achieved 44.4%. Questions remain about whether these benchmarks truly reflect user utility.

Ars Technica

No more articles to load

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.