Back to home

Articles tagged with "GPU, Monitoring, Management"

Nvidia details new software that enables location tracking for AI GPUs — opt-in remote data center GPU fleet management includes power usage and thermal monitoring

Nvidia details new software that enables location tracking for AI GPUs — opt-in remote data center GPU fleet management includes power usage and thermal monitoring

Nvidia has unveiled new software for GPU fleet management that allows data center operators to monitor various aspects of their AI GPU fleet, including physical location tracking, power usage, and thermal monitoring. The software is opt-in and collects extensive telemetry, providing insights into GPU behavior but cannot act as a backdoor or kill switch. It aggregates data into a central dashboard on Nvidia's NGC platform, enabling operators to visualize GPU status across their fleet and generate reports on inventory data and system health. The software focuses on maximizing utilization and performance per watt, monitoring thermals, airflow conditions, and software stack consistency to prevent performance drops and premature aging of components. Additionally, Nvidia offers other tools like DCGM and Base Command for local GPU health monitoring and AI development workflow management.

Tom's Hardware

No more articles to load

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.