During its recent earnings conference, security and edge computing infrastructure provider Cloudflare highlighted the AI inference opportunity, stating that it has a pipeline of customers (large and small) “interested in putting hundreds of billions of inference tasks on its infrastructure each month” and that it will have rolled out inference optimized GPUs in 100+ of its edge data centers worldwide by the end of the year.
As a reminder, inference, is the AI model’s implementation after it was successfully trained. Inference can be either performed on end-users’ devices like smartphones (e.g. for image/facial recognition) and, more generally, Internet-of-Things devices (autonomous vehicles, surveillance cameras…), either at the edge (small data centers close to end users to reduce latency) or in massive data centers operated by the world’s largest hyperscalers.
While the training of large models necessitates huge computing power, the inference part requires very fast response times, high energy efficiency and low operational costs in order to deploy it at scale without economic losses.
As AI inference is about to become a mainstream product used by billions of users in their daily lives (chatbots, autonomous driving, recommendation systems…), we believe that the revenue opportunity will be massive for the whole Tech industry.
On the software side, monetization drivers of AI inference for software infrastructure vendors (such as Cloudflare, MongoDB or Snowflake) are numerous and include the storage of AI models and their training sets, security of the AI systems and, for those who provide the service (Cloudflare), supply of GPU computing capacity (in competition with hyperscalers). As most of these services are consumption-based, revenue upside for software vendors could appear as soon as 2024.
On the hardware side, some early estimates put the inference chip market at twice the size of AI training in coming years. This promising market is attracting dozens of startups and hefty investments from VCs.
While the training side of AI has exclusively settled on GPUs, the chip type/solution that will dominate the inference side is not yet clearly defined: it will be either an AI dedicated chip (called ASIC) sitting next to the CPU, a programmable chip (called FPGA) or the integration of AI capabilities directly into the CPU.
As of now, it is almost impossible to tell which technological option will take the lead, but we believe that the direct integration of an AI accelerator in the CPU has the highest likelihood of success because 1) it lowers the overall chip cost (no additional packaging), 2) takes less footprint (important on small devices) and 3) makes perfect sense in a chiplet world.
This view is backed by the fact that the two major computing architectures x86 (Intel and AMD) and ARM have already added vector instructions (used in AI) to their cores since years and that they’re now “all-in” in the integration of full AI accelerators into their chips to support, at scale, Microsoft’s AI-enabled Windows 11.
Overall, GenAI is just in the early innings of mass adoption with inference expected to sustain growth after the initial training phase.