Technology

HW-SW Co-Designed System With 3 Core Optimization Pathways For Long-Context Agentic LLM Inference (Cambridge, ICL)

Source

SemiEngineering

Published

Sep 15, 2025

TL;DR

AI Generated

Researchers from University of Cambridge, Imperial College London, and University of Edinburgh have published a technical paper on optimizing long-context agentic LLM inference tasks. They introduce PLENA, a hardware-software co-designed system with three core optimization pathways to address challenges related to memory walls. PLENA includes efficient hardware implementation, a novel flattened systolic array architecture, and support for FlashAttention to handle memory walls in long-context LLM scenarios. Simulated results show PLENA achieves significantly higher utilization and throughput compared to existing accelerators like the A100 GPU and TPU v6e. The full PLENA system will be open-sourced.

Read Full Article

Panel-Level Packaging’s Second Wave Meets Engineering Reality

Panel-level packaging is gaining traction due to economic pressures and the increasing size of AI accelerators and HPC packages. Glass substrates are being explored to address warpage and dimensional stability issues, but they introduce new failure modes that require material solutions. Challenges in panel-level processing include materials and process integration, not just packaging problems. The industry is moving towards panels driven by economic and technological shifts, but solving these challenges requires a holistic approach.

SemiEngineering•

2 weeks ago

SemiEngineering

Inside the AI Accelerator: Essential IP Design Solutions: eBook

The eBook delves into how advanced IP, high-speed interconnects, memory interfaces, and multi-die architectures are utilized in next-gen AI accelerators to surpass single-chip limitations. It highlights the role of optical links in enhancing bandwidth and security IP in safeguarding AI data without compromising performance. The eBook also covers how technologies like UALink, PCIe, CXL, and Ultra Ethernet support scaling AI architectures, integrating compute, memory, and accelerators, and enhancing bandwidth density through optical I/O. The focus is on unlocking AI performance at scale and ensuring data security across accelerators.

SemiEngineering•

3 weeks ago

Industry's first TSMC COUPE-based optical connectivity solution for next-gen AI chips displayed — Alchip and Ayar Labs show future silicon photonics device

Alchip and Ayar Labs showcased a TSMC COUPE-based optical connectivity solution at TSMC's European OIP forum, designed for next-gen AI accelerators. This solution integrates Ayar's silicon-photonics TeraPHY IC with Alchip's electrical interface die and detachable fiber connector, offering up to 100 Tb/s of bandwidth per accelerator. The system targets hardware developers seeking optical connectivity without the need to build their own optical subsystem. The three-chiplet co-packaged optical I/O subsystem includes a protocol-converter chiplet, EIC, and TeraPHY PIC, enabling high-speed optical modulation and detection. This production-ready solution allows smaller chip designers to incorporate optical connectivity affordably and efficiently.

Tom's Hardware•

5 months ago

Meta reportedly buying RISC-V AI GPU firm Rivos — acquisition to bolster dev team and possibly replace Nvidia internally

Meta is in talks to acquire RISC-V chip startup Rivos to enhance its internal chip development teams and potentially reduce reliance on Nvidia GPUs. Rivos specializes in designing GPUs and AI accelerators on the RISC-V open standard. If the deal goes through, Meta could become a major player in RISC-V chip production. The acquisition may help Meta advance its AI initiatives and address CEO Mark Zuckerberg's concerns about slow progress in chip development. Apple previously sued Rivos for alleged theft of insider information, with the companies settling in 2024.