We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

Back to home

Algorithm–HW Co-Design Framework for Accelerating Attention in Large-Context Scenarios (Cornell)

Source

SemiEngineering

Published

TL;DR

AI Generated

Researchers at Cornell University have published a technical paper introducing a new framework called LongSight, designed to accelerate attention in large-context scenarios. The framework leverages a compute-enabled CXL memory device to offload Key-Value (KV) cache storage and retrieval, enabling support for context lengths of up to 1 million tokens for Llama models. By utilizing this framework, the value of LPDDR DRAM is elevated to that of high-end HBM, enhancing performance efficiency. The paper details how LongSight addresses the challenges of large input context windows in transformer-based models, improving output accuracy and personalization.

Algorithm–HW Co-Design Framework for Accelerating Attention in Large-Context Scenarios (Cornell) - Tech News Aggregator