We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

Back to home

Nvidia details new software that enables location tracking for AI GPUs — opt-in remote data center GPU fleet management includes power usage and thermal monitoring

Source

Tom's Hardware

Published

TL;DR

AI Generated

Nvidia has unveiled new software for GPU fleet management that allows data center operators to monitor various aspects of their AI GPU fleet, including physical location tracking, power usage, and thermal monitoring. The software is opt-in and collects extensive telemetry, providing insights into GPU behavior but cannot act as a backdoor or kill switch. It aggregates data into a central dashboard on Nvidia's NGC platform, enabling operators to visualize GPU status across their fleet and generate reports on inventory data and system health. The software focuses on maximizing utilization and performance per watt, monitoring thermals, airflow conditions, and software stack consistency to prevent performance drops and premature aging of components. Additionally, Nvidia offers other tools like DCGM and Base Command for local GPU health monitoring and AI development workflow management.