Heat Sink Solutions:Development of Data Centers

Elon Musk’s startup xAI is building a massive AI supercomputer named Colossus. The data center completed the deployment of over 100,000 GPUs, supporting storage, and ultra-high-speed networks in just 19 days (with a total engineering time of 122 days from design to the first LLM training). The data center adopts a raised floor design, with liquid cooling pipes underneath and power supplies above. Each computing hall houses approximately 25,000 GPUs, along with corresponding storage and high-speed fiber network equipment.

 

The basic building block of Colossus is the Supermicro liquid-cooled rack. Each rack is equipped with eight 4U servers, and each server carries 8 NVIDIA H100 GPUs, resulting in a total of 64 GPUs per rack. Eight such GPU servers, together with a Supermicro Coolant Distribution Unit (CDU) and related hardware, form a GPU rack module. The cluster is still under construction and will further expand in the future, potentially scaling to at least 1 million GPUs, with an estimated cost of approximately $40 billion (calculated at $40,000 per GPU).

 

Meanwhile, Meta is also stepping up, planning to purchase 350,000 H100 GPUs to inject more computing power into its powerful Llama 4 AI model. According to estimates by LessWrong, by 2025, the five major giants—Microsoft, Google, Meta, Amazon, and the emerging xAI—will collectively hold more than 12.4 million equivalent H100 GPUs/TPUs.
 2024 YE (H100 equivalent)2025 (GB200)2025 YE (H100 equivalent)
MSFT750,000 – 900,000800,000 – 1,000,0002,500,000 – 3,100,000
GOOG1,000,000 – 1,500,000400,0003,500,000 – 4,200,000
META550,000 – 650,000650,000 – 800,0001,900,000 – 2,500,000
AMZN250,000 – 400,000360,0001,300,000 – 1,600,000
xAI– 100,000200,000 – 400,000550,000 – 1,000,000
Besides the major giants, major countries or regions around the world are also actively joining the wave of AI technology, investing large budgets in building data centers.

According to data from Fortune Business Insights, the number of global data centers was 3.43 million in 2023, and it is expected to grow to approximately 3.6 million by 2027, with a compound annual growth rate of about 1.2% from 2023 – 2027. In terms of construction scale, the global data center construction market was valued at $259.97 billion in 2023 and is projected to increase to $348.23 billion by 2028, with a compound growth rate of 7.6% from 2023 – 2028.

The global data center construction scale was $259.97 billion in 2023.

The rise of large – scale AI models has accelerated the adoption of high – speed data communication optical modules, especially in the telecommunications and data communication markets. As leading cloud service providers increase their investment in AI clusters, the demand for high – end optical communication has surged, leading to a shortage of components for 400G and 800G optical modules. LightCounting predicts that Ethernet optical module sales will increase by nearly 30% year – on – year in 2024, and growth will gradually resume in various market segments. After the global optical module market size declined by 6% year – on – year in 2023, the compound annual growth rate (CAGR) from 2024 – 2028 is expected to reach 16%. Coherent, a leading optical module company, stated that the global market size for AI – driven 800G, 1.6T, and 3.2T data communication optical modules may have a CAGR of over 40% in the five – year period from 2024 – 2028, growing from $600 million in 2023 to $4.2 billion in 2028.

Global optical Module sales (in millions of US dollars) 2018-2028E
The operation of data centers relies heavily on a large amount of electrical energy.
 
According to statistics from the U.S. Energy Information Administration (EIA), in 2022, global data centers, the cryptocurrency industry, and the artificial intelligence (AI) sector consumed approximately 460 TWH of electricity, accounting for about 2% of the world’s total electricity demand.
 
Data centers, as key infrastructure for digitalization, are interdependent with power – supply infrastructure. With the continuous increase in data volume, it is imperative to expand and develop data centers for data processing and storage.
 
The future trends of the data center industry are full of uncertainties, with rapid technological innovation and digital service evolution. Based on deployment speed, the extent of efficiency improvement, and the development trends of AI and cryptocurrency, the EIA predicts that by 2026, the global electricity consumption of data centers, cryptocurrency, and AI will range from 620 to 1050 TWh.
 
Under a neutral scenario, the electricity demand will exceed 800 TWh, nearly doubling compared to the 460 TWh in 2022.

While consuming enormous amounts of electrical energy, data centers also generate significant energy consumption. Data center energy consumption is primarily composed of IT equipment energy consumption, cooling energy consumption, power supply and distribution energy consumption, and lighting and other energy consumption. Among these, IT equipment energy consumption and cooling energy consumption are the main components, with cooling energy consumption accounting for 43%.

Heat dissipation energy consumption ratio
Meanwhile, due to the rising environmental awareness, countries are increasingly strict with data center PUE (Power Usage Effectiveness) regulations. At the same time, as the TDP (Thermal Design Power) of GPU computing chips continues to rise in the AI era, the cooling design of data centers has become crucial.

 

Take NVIDIA’s GB200 NVL server mass-produced in Q4 2024 as an example: the TDP of a single B200 chip is 1,200W, and the TDP of the GB200 system chip (1 Grace CPU + 2 B200 GPUs) reaches 2,700W. The current thermal dissipation bottleneck of 3D VC air-cooling is approximately 1,000W, so liquid cooling achieves better results for power exceeding 1,000W. Therefore, a comprehensive upgrade to liquid cooling is required, combined with other cooling technologies. By 2025, NVIDIA’s B300 chip will further increase its TDP to 1,400W, and the next-generation AMD GPU server chips are also expected to exceed the kilowatt-level power consumption. Correspondingly, the energy consumption per rack of NVIDIA’s next-generation chips may exceed 1MW, and the peak rack density power consumption of the planned Rubin Ultra AI GPU to be launched around 2028 may exceed 1,000kW. Cooling has become a key bottleneck in the development of AI chips.

Latest Blog

From Concept to Mass Production – Your Professional Heat Dissipation Solution Partner

Scroll to Top

contact Ecotherm

We are available to assist you via email. Please don’t hesitate to get in touch, and we will respond to your inquiry as soon as possible.


Email:  support@ecothermgroup.com

Follow us on YouTube | TikTok | LinkedIn
Stay connected with us for updates, news, and more!


Please fill out the form below, and we will get back to you as soon as possible.

Contact Ecotherm

Please upload your design or requirements, and our experts will provide a precise cooling solution tailored to your needs.