As the growth in data centers’ construction continues at an unabated pace around the world – hence sustaining investments in semis, software, networking, and storage solutions – the rise of Generative AI services (e.g., ChatGPT) is literally triggering a little revolution inside them.
The workloads (training and inference) of these Deep Learning models necessitate an IT architecture that diverges from “traditional” topologies of High-Performance Computing (HPC) platforms, which are largely relying on microprocessors (CPUs).
In fact, these huge models are trained by simultaneously executing billions of operations (parallel computing), a task performed by specialized chips like GPUs or TPUs. Furthermore, these massively parallel computing platforms must be fed with high-speed data flows, a task that is increasingly dedicated to so-called accelerators which are freeing CPUs from tedious networking and/or memory management tasks.
As HPC experts are still debating about the optimal GPU-to-CPU ratio (it will be clearly above 2), the data centers’ internal shift towards more GPUs and accelerators has just started. A consequence of this infrastructure shift is the rising energy consumption of data centers (currently estimated between 2% and 4% of the world’s total power demand), despite relentless improvements in semiconductors’ efficiency.
Indeed, Nvidia’s latest server-class GPU (the H100), could suck more than 600W (for the chip alone) or roughly two times the energy consumption of a server CPU from AMD or Intel. Hence, with a GPU-to-CPU ratio of only 2x, the energy consumption will be multiplied by a factor of five! Even though this back-of-the-envelope calculation is not totally correct, the energy consumption of data centers around the world will significantly increase. This is why hyperscalers like Microsoft, Google or Amazon are heavily investing in green energy solutions to power their cloud infrastructures.
Another problem is the heat dissipation of these thousands of power-hungry chips. Currently, the thermal management of a data center represents 40% of its total electric energy consumption. The Generative AI-triggered explosion of server-class GPU rollouts will only exacerbate this figure.
The commonly used air-cooling solutions (attached ventilators, cool air flows…) are no longer efficient enough to maintain these new “monster chips” at normal working temperatures (<80°C). This is why data center operators (and equipment manufacturers) are now focusing on liquid cooling solutions to keep these chips “cold”.
Liquid cooling is clearly not a new technique as it has been used for decades but only in a few very specific HPC configurations. With a penetration of only 5% of total thermal management revenues, liquid cooling is far from “mainstream” amongst cloud-based data center operators and then offers massive growth potential, with Dell’Oro expecting the data center cooling segment to grow at an annual rate of more than 45% over the next 5 years.
There are 2 main liquid cooling techniques: 1) direct liquid cooling (DLC), where a liquid flows through pipes on a cold plate directly attached to the chip and 2) immersion cooling (ImC), where the whole electronic boards are fully immersed in a tank filled with a dielectric (non-conductive) liquid.
ImC, the newest technique, is obviously the most efficient (it captures 100% of the generated heat vs. 80% for DLC) but also the most complex as it necessitates specific data connection types (the light from optical fibers can de distorted by the liquid), sealed servers or even the full redesign of the whole data center. ImC pricing is, for now, prohibitive as it is still in its prototyping stages.
The DLC technique, on the contrary, is well advanced and manufactured at scale as the main components are pipes, pumps, heat exchangers… Despite its apparent simplicity, DLC systems remain a high-tech product where know-how and partnerships with system builders like Intel, Nvidia, Hewlett Packard or Super Micro make a huge difference.
Almost all of the listed DLC pure play companies like TaiSol Electronics, Asia Vital Components or Vertiv, should then benefit from accelerating top-line growth (which topped 35% at Vertiv in Q1). This, combined with positive Free Cash Flow margins and low valuation levels (EV/Sales below 2x), should help sustain their recent outstanding stock performances and set them as an interesting alternative to the mainstream and usual AI-related listed names.
Our investment strategy is heavily tilted towards these lesser-known infrastructure names (liquid cooling systems, high-performance memory interfaces, networking chips, server racks…), which are all significantly benefiting from this multi-year Generative AI capex surge.