970.

The Inference Shift

stratechery.com/2026/the-inference-shift?utm_source=flipboard&utm_medium=activitypub

Cerebras Systems, an AI chipmaker, is raising the price and size of its IPO due to high demand. While GPUs, particularly Nvidia’s, have dominated the AI compute landscape, Cerebras offers a different approach with its whole-wafer-as-chip design, providing immense compute power and high-speed memory access. This makes Cerebras particularly well-suited for inference workloads, though its high cost and limited memory capacity for larger models pose challenges.

The future of AI chips will be shaped by the distinction between “answer inference” and “agentic inference.” While answer inference, like coding, benefits from high-speed chips, agentic inference, which involves autonomous task completion, will prioritize memory capacity and cost over speed. This shift will lead to a more sophisticated memory hierarchy, potentially reducing the dominance of GPUs and favouring slower, cheaper memory types and CPUs.

Nvidia CEO Jensen Huang believes future computing speed-ups will come from systems innovation, not Moore’s Law. The implication is that existing computing power is sufficient, and the focus should be on optimizing its use.