Nvidia’s last night earnings report was impressive, to say the least. The company notably guided to a 64% top-line growth in Q2 (53% above consensus!), including c.100% year-on-year growth in its data center segment. This strength should not come as a total surprise given that most companies are currently racing to deploy AI applications and that recent reports suggested that the leading cloud providers had to limit availability of AI computing/chips for customers due to massive demand. More surprising was Nvidia’s ability to massively step up its chip production, suggesting that supply chain issues are a thing of the past.
Now, one of the main investors’ questions is likely to be the sustainability of the growth trajectory after the current extraordinary year. In our view, the demand surge for AI training is likely to last well beyond 2023: in all of the recent tech conferences, not any single discussion or presentation occurred without integrating AI as the main guide of future product/service roadmaps and, outside of the tech world, many companies are working on highly specialized training for specific applications (healthcare, finance…).
If the AI frenzy is currently mainly about the first step in machine learning (training), the second step (inference), is slowly but surely becoming the key for AI’s mass adoption and is likely to become another massive growth driver for the AI chip industry in the next couple of years.
Before going further, let’s briefly explain these two steps. To make a very basic analogy, training is the phase when a student takes a class to learn a specific subject. In the AI world, this process is mainly done by “showing” to a model what output/answer is expected given a specific data set at the input (an approach called supervised learning).
This learning process stops when the accuracy of the model reaches a plateau, at a level considered adequate for the model to be reliable. Going back to our analogy, it is equivalent to our student taking many practice tests to gradually improve his answering success rate until he believes that he has mastered the topic.
This data and computing intensive training process is performed on supercomputers run by large IT companies (Google, Meta…) or government sponsored research groups with specialized hardware for the computing part (e.g., Nvidia’s GPUs), data storage (High Bandwidth Memory modules from SK Hynix for example) and networking gear (e.g., Arista Networks).
Inference, the second step, is a bit like the testing of the student’s knowledge/understanding during an exam. Passing the test will rely upon his ability to answer the exam’s questions correctly and rapidly, hence proving his understanding of the topic even on unseen data. Basically, inference is the deployment of the AI model in real conditions (e.g. computer vision in a self-driving car, recommendation engine on social media apps…).
Inference can be either performed at the edge, on end-users’ devices like smartphones (e.g. for facial recognition) and, more generally, Internet-of-Things devices (autonomous vehicles, surveillance cameras…) or in data centers (for large deep learning models with billions of parameters, such as ChatGPT, that would almost completely fill a consumer device memory).
Here, to the contrary of the training phase where computing power is the main criteria, inference solutions must deliver very fast response times while being energy efficient and cheap.
As inference scales with the number of users, the opportunity is massive for Nvidia and rival chip makers with some early estimates putting the data center inference chip market at twice the size of AI training in coming years. Accordingly, the AI inference chip (also called AI Edge) market is attracting dozens of startups and hefty investments from VCs. And, unsurprisingly, Nvidia is eyeing the opportunity with the recent launch of four inference platforms that combine its chips and software.
At the edge, several solutions are pursued: an AI dedicated chip called ASIC sitting next to the CPU, a FPGA (a programmable chip) or the integration of AI capabilities directly into the CPU. As of now, it is almost impossible to tell which technological option will take the lead, but we believe that the direct integration of an AI accelerator in the CPU has the highest likelihood of success because it lowers the overall cost (no packaging), takes less footprint (important on small devices) and makes perfect sense in a chiplet world.
This view is backed by the fact that the major computing architectures x86 (Intel and AMD) and ARM have already added vector instructions to their cores since years and that they’re now “all-in” in the integration of full AI accelerators into their chips to support, at scale, Microsoft’s AI-enabled Windows 11.
Overall, AI is just getting started with inference expected to sustain growth after the initial training phase. Looking even further, a virtuous circle could be in sight as the Nvidia CEO commented last night: “every time you deploy, you’re collecting new data. When you collect new data, you train with the new data”.
According to various reports, the global AI chip market could grow at a CAGR above 30% over a 10 year horizon and account for more than 80% of semiconductor’s sales growth.