We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

Back to home

Two Nvidia DGX Spark systems fused with M3 Ultra Mac Studio to deliver 2.8x gain in AI benchmarks — EXO Labs demonstrates disaggregated AI inference serving

Source

Tom's Hardware

Published

TL;DR

AI Generated

EXO Labs has developed the EXO framework for efficient large language model (LLM) inference across various hardware setups. Their latest demo combines NVIDIA's DGX Spark systems with Apple's M3 Ultra Mac Studio to optimize performance. By dividing LLM inference phases between machines, EXO achieves a 2.8x speedup in AI benchmarks. This approach showcases the potential of utilizing existing hardware intelligently for enhanced AI performance. NVIDIA is also exploring similar concepts with their upcoming Rubin CPX platform. Although EXO's software is still in early stages, it demonstrates the benefits of disaggregated inference for boosting AI capabilities without relying solely on massive accelerators.