
Memory stocks such as SK Hynix, Micron Technology and SanDisk—among the best performers in the AI space over the past 12 months—came under pressure yesterday following reports that Google is preparing to roll out its TurboQuant data compression algorithm for AI inference.
TurboQuant enables more efficient storage of data in the key-value (KV) cache—the memory used by AI chips to store past computations and frequently accessed data so models can respond faster without recomputing everything. By compressing this cache, Google claims it can significantly improve inference efficiency, with reports suggesting up to 8x faster performance and ~6x lower memory usage.
At first glance, this appears negative for memory demand. However, this interpretation is overly simplistic.
First, TurboQuant primarily targets the KV cache during inference. It does not materially impact training workloads or model weights, which remain the dominant drivers of memory consumption. As such, the headline memory reduction does not translate into a proportional decline in total hardware requirements.
Second, KV cache optimization is not new. Compression techniques have been an active area of research for years, and Google had already presented its technology as early as April 2025. TurboQuant should therefore be seen as an incremental improvement rather than a disruptive step-change.
More importantly, by reducing memory requirements without losing performance, TurboQuant effectively decreases the cost per query and overall inference costs. This is likely to drive a meaningful increase in usage of intelligent and compute-intensive applications and devices.
In other words, TurboQuant is likely to expand the overall AI market, enabling deployment across a much broader range of devices (smartphones, PCs, industrial devices, autonomous systems) and use cases. Notably, models that today need cloud clusters could run on local devices, effectively lowering the barrier to deploying AI at scale.
While each individual model may require fewer memory and compute resources, the explosion in deployment and usage intensity is likely to more than offset this effect. As AI becomes cheaper to run, it becomes ubiquitous, expanding the addressable market for semiconductors beyond the data center and driving demand for mobile DRAM or edge AI processors.
In conclusion, the broader lesson of TurboQuant is a familiar one in technology: efficiency drives demand, the best example being the Nvidia GPUs whose performances have skyrocketed in recent years. By lowering the cost and complexity of AI deployment, quantization unlocks new applications, increases adoption/usage intensity, and broadens adoption across industries. This dynamic ultimately leads to greater demand for semiconductors, even if the requirements per workload decline.






