The AI Data Center Arms Race: xAI vs Meta's Gigawatt Superclusters

My Privacy Blog

14 Jul 2025 — 6 min read

The artificial intelligence revolution has sparked an unprecedented race to build the world's most powerful computing infrastructure. Two tech titans, Elon Musk's xAI and Mark Zuckerberg's Meta, are leading the charge with ambitious plans for gigawatt-scale data centers that dwarf traditional computing facilities. This comparison examines their competing visions and the implications for the future of AI development.

xAI's Colossus: The Current Champion

xAI's Colossus supercomputer in Memphis, Tennessee, currently holds the title of the world's largest AI training system, housed in a 750,000 square foot facility with 100,000 Nvidia H100 GPUs. The achievement is particularly remarkable given that the entire system was brought online in just 122 days, showcasing an unprecedented speed of deployment in the data center industry.

The facility's current specifications are impressive:

Physical footprint: 750,000 square feet
GPU count: 100,000 Nvidia H100 GPUs initially
Current power capacity: Over 100 megawatts approved by the Tennessee Valley Authority
Deployment time: 122 days from start to finish

Expansion Plans: Racing to Gigawatt Scale

xAI has ambitious plans to expand Colossus from its current 250 megawatts to 1.2 gigawatts—a nearly fivefold increase. The system is planned to double in size to 200,000 GPUs (including 50,000 H200s) in the coming months.

The expansion timeline shows TVA initially approved 150 MW, with plans for 1.2 GW by 2026–2027. Beyond 1,200 MW, additional substations could potentially add 600–900 MW, pushing total capacity to 1,800–2,100 MW. Even more ambitiously, there are plans for xAI to eventually reach a 3 million GPU supercluster.

Meta's Multi-Gigawatt Strategy: Building an Empire

Meta's approach differs significantly from xAI's single massive facility strategy. Meta CEO Mark Zuckerberg announced that the company is building several massive data centers to power its artificial intelligence efforts with the first one expected to come online next year.

The Prometheus and Hyperion Projects

Zuckerberg said the first of the superclusters, called Prometheus, will come online sometime in 2026, with "multiple more titan clusters" to follow. According to Zuck, "Just one of these covers a significant part of the footprint of Manhattan". According to a report from SemiAnalysis, Prometheus is being built in Ohio. Another one of its clusters, reportedly named Hyperion, is currently being built in Louisiana and is expected to go online in 2027.

Meta's planned infrastructure includes:

Prometheus: Ohio location, online by 2026
Hyperion: Louisiana location, online by 2027, over 2GW capacity
Additional clusters: Multiple more "titan clusters" planned
2025 capacity: Meta plans to bring online ~1GW of compute in 2025 and end the year with more than 1.3 million GPUs

Financial Commitment

Meta is planning to invest $60-65bn in capex this year while also growing AI teams significantly. The company has announced plans to invest 'hundreds of billions' in new multi-gigawatt AI data centers, demonstrating a financial commitment that surpasses most competitors in the space.

Size and Scale Comparison

Power Consumption

xAI Colossus: Currently >100 MW, expanding to 1.2 GW by 2026-2027, with potential for 1.8-2.1 GW
Meta's clusters: Multiple gigawatt-scale facilities, with Hyperion alone exceeding 2 GW

Physical Scale

Meta's Zuckerberg claims that "Just one of these covers a significant part of the footprint of Manhattan", suggesting these facilities will be substantially larger than xAI's current 750,000 square foot Colossus facility.

GPU Capacity

xAI: Currently 100,000 GPUs, expanding to 200,000, with long-term plans for 3 million GPUs
Meta: Over 1.3 million GPUs planned by end of 2025

Timeline Competition

Zuckerberg claims that Meta is on pace to be the first to bring a supercluster with gigawatt capacity online, while Elon Musk earlier claimed that xAI was working on its next-generation data center and that it would be the "first gigawatt AI training supercluster". This creates a direct competition for the "first gigawatt" milestone.

Environmental and Infrastructure Challenges

Both companies face significant environmental concerns with their massive power requirements:

xAI's Environmental Issues

As Musk has ramped up construction in Memphis, the company reportedly brought in 35 portable methane gas turbines without air permits to power the project. Those turbines, capable of providing power to a neighborhood of 50,000 homes, could also emit up to 130 tons of harmful nitrogen oxides per year.

Meta's Power Strategy

According to SemiAnalysis, Meta is building two separate 200MW on-site natural gas plants to help meet the energy demands of its data center. While natural gas plants are cleaner than alternatives like coal, they still produce a considerable amount of pollutants, including nitrogen oxides linked to heightened cancer risks for exposed communities.

Industry Context and Competition

The race between xAI and Meta is part of a broader industry trend. OpenAI is also developing what could be the world's largest 300 MW AI data center, with plans to reach record 1 gigawatt scale by next year. Google, and Microsoft/OpenAI both have plans for larger than Gigawatt class training clusters in the works.

Technical Innovation and Speed

xAI has demonstrated remarkable deployment speed, with Super Micro CEO Charles Liang stating that he teamed up with Elon Musk's xAI to build the gargantuan Colossus data center in just 122 days. This speed-to-deployment advantage could be crucial in the rapidly evolving AI landscape.

Meta's approach appears more methodical and distributed, with multiple facilities planned across different locations, potentially providing better redundancy and risk distribution.

Conclusion: Different Strategies, Similar Goals

The comparison reveals two distinct approaches to achieving AI supremacy through computing power:

xAI's Strategy: Focus on rapid deployment and continuous expansion of a single massive facility, with exceptional speed-to-market capabilities.

Meta's Strategy: Build multiple distributed gigawatt-scale facilities with substantially larger individual footprints and a longer-term, more capital-intensive approach.

Both companies are pushing the boundaries of what's possible in data center construction and AI infrastructure. The winner of this race may ultimately be determined not just by who reaches gigawatt capacity first, but by who can most effectively translate that computing power into AI breakthroughs while managing the environmental and infrastructure challenges that come with such massive facilities.

The implications extend beyond corporate competition—these developments are reshaping the global AI landscape and raising important questions about energy consumption, environmental impact, and the concentration of AI capabilities in the hands of a few tech giants. As both companies race toward their gigawatt goals, the industry watches to see who will claim the title of operating the world's most powerful AI training infrastructure.

AI Security Risk Assessment Tool

Systematically evaluate security risks across your AI systems