The phenomenal success of Generative AI (GAI) services like ChatGPT or DALL-E has literally triggered an arms race amongst AI’s world leaders as the accuracy level coupled to the creative skills of GAI models are disrupting the $120bn annual revenue generating Web search industry (still heavily dominated by Google) as well as the nascent AI-based assistant segment (Apple’s Siri, Amazon’s Alexa…).
Training GAI models is extremely time-consuming and costly: it necessitates the aggregation and storage of Petabytes of (curated) data, a lengthy trial-and-error process to fine-tune the models and, finally, massive computing power to optimize the models’ billions of parameters. As an example, ChatGPT-3 took about 30 days to be trained on more than 10’000 Nvidia GPUs for a cost of roughly $5 million (hardware costs excluded…).
Training GAI models is thus reserved only for an elite that can afford the underlying costs and, most importantly, possesses the necessary skills and IT infrastructure to successfully undertake such projects. But training is only one part of the story.
In fact, the real challenge is to scale these models in order to offer them to millions of users on the Web. OpenAI, ChatGPT’s creator, had to quickly move its model to Microsoft’s cloud platform Azure as the service became almost immediately saturated (and still is…).
Delivering GAI-based services to the masses is thus only possible with the help of best-of-class cloud computing platforms, namely the ones being operated around the world by the so-called hyperscalers: Google, Microsoft, Amazon, Meta, Apple, Alibaba, Huawei, Baidu and Tencent to list the world’s largest ones. But, having a powerful cloud computing platform is not enough anymore to stay on top of this arms race: the fight has now turned to the underlying components of the cloud computing infrastructure and, more specifically, to the data processing chips.
As the complexity of machine learning models rose exponentially, the “do-it-all processing chips” from Intel and AMD (CPUs) were rapidly replaced by a more adequate solution: GPUs. A GPU (Graphics Processing Unit) is a specific type of chip designed to render 3D scenes. So, initially, GPUs were mainly used in the video gaming industry but, as 3D and machine learning algorithms share the same mathematical roots (vector/matrix calculation), GPUs naturally became the de-facto chips to train AI models.
This is one of the main reasons that explains why and how Nvidia became over the years the unavoidable and dominant player in the AI field. However, even though GPUs will remain the primary tools to further improve and/or develop new AI models, they are not well suited to deliver mass (G)AI services.
Indeed, Nvidia’s last generation server-class GPU chip (H100) costs more than $30’000 (when available, otherwise more than $40,000 on eBay) and sucks about 700 Watt. Multiplying these numbers by the 10’000 units used to train ChatGPT gives an idea of the needed budget to build the training platform as well as the associated operating costs. It is estimated that offering mainstream access to ChatGPT costs about $700’000 per day, or 36 cents per query vs. 1.61 cents for a traditional Google search query.
As these elevated operating costs will keep on increasing with the widespread adoption of AI and the increasing complexity of Deep Learning models, alternatives to pricey and power-hungry GPUs are needed.
This alternative is called Tensor Processing Units, or TPUs (loosely defined, a Tensor is a multi-dimensional matrix). In the machine learning world, a Tensor holds the training-acquired knowledge of the model: the billions of parameters (called the weights) linking the artificial neurons together. Since TPUs are specifically designed and optimized to calculate and manipulate these billions of weights, they are necessarily faster and less power hungry than generalist chips like GPUs.
Back in October 2022, we wrote an article about hyperscalers’ in-house designed ARM-based chips used to boost specific cloud workloads. ChatGPT’s success is now redirecting and reinforcing their proprietary semiconductor design efforts towards specialty AI chips.
Google was the TPU pioneer when it announced, in 2016, the first generation of its homegrown solution (4th gen was released last year). Amazon followed with its Trainium chip while Microsoft confirmed several days ago that it is working on the design of its own TPU-like training chip called Athena.
As the TPU market segment is expected to grow 50% faster than the global semiconductor industry over the coming years, it is no surprise that a flurry of private companies like Cerebras, Graphcore or SambaNova, have also entered the party to grab a share of this lucrative market. We believe that the likelihood of success of these startups will be quite limited without the backing of a hyperscaler (access to a state-of-the-art cloud computing platform).
As almost all hyperscalers will soon have their own TPUs, Nvidia’s AI hegemony and 95% training chip market share will inevitably shrink. That said, Nvidia will likely maintain a commanding position in the AI research/testing field as TPUs are designed for one specific AI model architecture without any possibility of evolution. And it’s widely expected that Microsoft and peers will not completely shift from Nvidia GPUs to their own TPUs but will rather seek to reduce their overall dependence and get leverage in negotiations with Nvidia.
Hence, despite all the TPU announcements and initiatives (that started as soon as 2016 as mentioned above), Nvidia’s technology remains a must have, as illustrated by recent reports that ChatGPT upgrade may require an additional 30,000 Nvidia GPUs this year and that Elon Musk purchased 10,000 GPUs (likely from Nvidia) for his own generative AI project.
And last but not least, AI training is a booming market, offering room for several products/technologies. Recent reports suggest notably that the leading cloud providers have had to limit availability of AI computing/chips for customers due to massive demand.