Efficient Synchronous Dataflow Execution For GPUs (NVIDIA, UW-Madison)
Source
Published
TL;DR
AI GeneratedResearchers from NVIDIA and the University of Wisconsin-Madison have published a technical paper titled “Kitsune: Enabling Dataflow Execution on GPUs with Spatial Pipelines.” The paper discusses the challenges of using GPUs for deep learning applications due to their bulk-synchronous execution model. The researchers introduce Kitsune, a set of primitives and an end-to-end compiler based on PyTorch Dynamo, to enable dataflow execution on GPUs. Kitsune shows significant performance improvements and reduced off-chip traffic for both inference and training tasks across various applications.