
In recent weeks, Nvidia has announced a series of investments with a clear common thread: AI inference, i.e. the running of AI applications. The company signed a $20 billion technology licensing deal with Groq, a specialist in inference chips; took a stake in Baseten, a fast-growing startup providing inference infrastructure for AI applications; and, more surprisingly, acquired a 2.9% stake in Nokia, the legacy networking equipment maker, in late 2025.
These moves come as the industry enters the age of AI inference, which is expected to dominate computing spending over AI training in the coming years as AI applications scale to billions of users. GPUs have proven extraordinarily powerful for large-scale AI training, but they were not originally designed for inference workloads, which have very different architectural requirements. Training relies on massive datasets and billions of parameters and is highly compute- and energy-intensive, while inference uses trained models to generate outputs (text, images, recommendations, actions) and prioritizes cost, efficiency, and latency to deliver real-time services across devices ranging from smartphones and cars to humanoids.
Nvidia’s interest in Groq reflects this shift. Groq’s Language Processing Units (LPUs) and software are purpose-built for inference and claim significantly faster performance and up to 10x higher efficiency than GPUs on inference-heavy workloads. Unlike GPUs, which often face latency penalties from frequent access to off-chip high-bandwidth memory (HBM), LPUs rely heavily on on-chip memory, dramatically improving inferencing speed.
Another way to reduce latency is to bring compute closer to users—at the “edge”. While Nvidia’s partnership with Nokia was officially framed around accelerating AI-native mobile networks and AI networking infrastructure, we believe the strategic rationale runs deeper. Nokia’s vast footprint of radio sites, substations, and 5G/6G antennas effectively forms a distributed grid that could evolve into a network of micro data centers close to end-users, enabling low-latency AI inference for physical AI devices such as humanoids and autonomous vehicles.
With inference expected to account for 70–80% of total AI compute demand, versus 20–30% for training, as billions of inference queries are processed daily, this next phase of AI will require new, specialized chip and data center architectures. As the AI value chain increasingly splits between training and inference—each with fundamentally different requirements—every major AI player will need to adapt its technology roadmap. Nvidia is already positioning itself ahead of that shift and showing the way.






