LLMs on Analog In-Memory Computing Based Hardware (IBM Research, ETH Zurich)
Source
Published
TL;DR
AI GeneratedA technical paper by IBM Research and ETH Zurich introduces a method to adapt large language models (LLMs) for execution on noisy, low-precision analog hardware, improving speed and power efficiency for neural network inference. The method enables high-capacity LLMs to achieve performance comparable to traditional architectures despite analog noise and quantization constraints. The paper also demonstrates benefits in test-time compute scaling and the adaptability of analog foundation models for inference on low-precision digital hardware. This work bridges the gap between LLMs and efficient analog hardware, offering energy-efficient foundation models.