A lot has been said in the last couple of days about the impressive performances of Nvidia’s latest family of chips, Blackwell. But one thing that has gone largely unnoticed is Nvidia’s shift towards liquid cooling for the first time for the high-end version of its next-generation DGX server subsystem, SuperPOD, that is built on Blackwell.
Air cooling is the standard in data centers to dissipate heat and hyperscalers have largely avoided until now alternative solutions (such as liquid cooling) that appear more complex and require some data center architecture design evolutions. But with the power consumption of GPUs rising exponentially (Blackwell sucks more than 1,000 watts, compared to 700 watts for Nvidia’s H100) and air cooling failing to maintain these new “monster chips” at normal working temperatures, data centers have no choice but to pivot towards liquid solutions, in line with our previous reports.
Liquid cooling encompasses several different techniques, including 1/ evaporative systems which can be used to cool the air 2/ direct liquid cooling, where a liquid flows through pipes on a cold plate directly attached to the chip, and 3/ immersion cooling (pictured above), that involves submerging whole or parts of servers into a liquid solution in a tank.
Nvidia’s SuperPOD will use a direct liquid cooling solution that pumps in fluid at 25 degrees Celsius at a rate of two liters per second, with that fluid absorbing heat and exiting at 45 degrees Celsius. Despite their apparent simplicity (pipes, pumps, heat exchangers…), direct liquid cooling systems remain a high-tech product where know-how and partnerships with system builders like Hewlett Packard, Dell or Super Micro make a huge difference.
The next step will probably be immersion cooling which is obviously the most efficient (it captures 100% of the generated heat vs. 80% for direct liquid cooling) but also the most complex as it necessitates specific data connection types (the light from optical fibers can de distorted by the liquid), sealed servers or even the full redesign of the whole data center.
With a penetration below 5% of data center thermal management revenues, liquid cooling is clearly early stage with massive growth potential ahead as an increasing number of chips/servers transitions to this kind of solutions over the years. Research firm Dell’Oro expects the data center cooling segment to grow at an annual rate of more than 40% by 2028, with the risk on the upside if Nvidia follows up on its first liquid cooling initiative. Recent comments from several thermal management pure players such as Asia Vital and Auras tend to confirm this view as they expect liquid cooling revenues to grow 2x or 3x in 2024, while Vertiv plans to scale the production of liquid cooling by more than 40 times by the end of this year!
Overall, thermal management companies’ accelerating top-line growth profile, combined with positive and expanding margins should help sustain their valuations and set them as an interesting alternative to the mainstream AI-related names.