Articles tagged with "Accelerators, Transformers, AI"

Algorithm–HW Co-Design Framework for Accelerating Attention in Large-Context Scenarios (Cornell)

Researchers at Cornell University have published a technical paper introducing a new framework called LongSight, designed to accelerate attention in large-context scenarios. The framework leverages a compute-enabled CXL memory device to offload Key-Value (KV) cache storage and retrieval, enabling support for context lengths of up to 1 million tokens for Llama models. By utilizing this framework, the value of LPDDR DRAM is elevated to that of high-end HBM, enhancing performance efficiency. The paper details how LongSight addresses the challenges of large input context windows in transformer-based models, improving output accuracy and personalization.

SemiEngineering•

6 months ago

Algorithm–HW Co-Design Framework for Accelerating Attention in Large-Context Scenarios (Cornell)

We use cookies