Back to home

Articles tagged with "Accelerators, Transformers, AI"

Algorithm–HW Co-Design Framework for Accelerating Attention in Large-Context Scenarios (Cornell)

Algorithm–HW Co-Design Framework for Accelerating Attention in Large-Context Scenarios (Cornell)

Researchers at Cornell University have published a technical paper introducing a new framework called LongSight, designed to accelerate attention in large-context scenarios. The framework leverages a compute-enabled CXL memory device to offload Key-Value (KV) cache storage and retrieval, enabling support for context lengths of up to 1 million tokens for Llama models. By utilizing this framework, the value of LPDDR DRAM is elevated to that of high-end HBM, enhancing performance efficiency. The paper details how LongSight addresses the challenges of large input context windows in transformer-based models, improving output accuracy and personalization.

SemiEngineering

No more articles to load

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.