Unveiled at CES 2026 by CEO Jensen Huang, NVIDIA's Rubin GPUs and Vera CPU form the core of the company's next-generation AI computing platform. Together, they are designed to function not as separate chips but as a tightly integrated, six-chip system. The chips were designed and built to meet the soaring demands of modern AI.
NVIDIA describes this coupling as "extreme co-design," meaning the hardware and software are engineered together from the start so the CPU, GPU, networking, and memory function like a single AI supercomputer. This approach allows the Rubin platform to deliver dramatically higher performance, lower inference costs, and faster training, especially for large language models and mixture-of-experts architectures.
The Vera CPU is an Arm-based, high-performance processor that succeeds NVIDIA's Grace CPU. It features 88 custom cores, 176 threads, and support for up to 1.5 TB of LPDDR5x memory, giving it the bandwidth and efficiency needed for massive AI workloads. Vera handles tasks like data preprocessing, scheduling, orchestration, and feeding the GPUs with data at extremely high speeds. It's designed to eliminate bottlenecks that traditionally slow down large-scale AI systems.
The Rubin GPU is the successor to NVIDIA's Blackwell architecture and is built on TSMC's 3nm process. It contains 336 billion transistors and delivers up to 5x the inference performance and 3.5x the training performance of Blackwell. Rubin GPUs are optimized for long-context reasoning, agentic AI, and massive parallel workloads. They also support next-generation high-bandwidth memory (HBM4), enabling extremely fast access to model parameters and context windows.
The flagship configuration is the Vera Rubin Superchip, which combines one Vera CPU with two Rubin GPUs into a single processor module. This design minimizes latency between CPU and GPU, increases memory coherence, and allows the system to behave like one unified accelerator. At larger scales, NVIDIA offers the NVL72 rack, which integrates 72 Rubin GPUs and 36 Vera CPUs with 260 TB/s of scale-up bandwidth, enabling data centers to build "AI factories" capable of continuous, industrial-scale model training and inference.
Thus, the Rubin GPUs provide the raw parallel compute for training and inference, while the Vera CPU supplies orchestration, memory bandwidth, and system-level intelligence.
Together, they form a single, tightly integrated platform - a unified AI supercomputer - built to power the next generation of AI systems from long-context models to real-time agentic AI. They represent the next generation following on the heels of Blackwell and Hopper.

Images by Nano Banana
Once upon a time in the silicon kingdom, there lived three legendary GPUs: Hopper (named for Admiral Grace Hopper), Blackwell, and the brand-new hotshot, Rubin.
Launched in 2022, Hopper was the wise old king. He was the first to hit exascale performance, the one that trained the first big LLMs, the one every data center bowed to. He wore his H100 badge like a crown and loved telling stories about "back in my day, we only had 80 billion transistors and we liked it!"
One day, Blackwell arrived. He was twice as fast, had 208 billion transistors, and came with a swagger that made Hopper's fans spin nervously.
Blackwell strutted into the server rack and said: "Move over, old man. I'm the new sheriff in town. I can train models so fast they'll be obsolete before they finish downloading. And I sip power like it's fine wine."
Hopper (grumbling): "Kid, I invented the whole 'train giant models' thing. You're just my bigger, shinier cousin who thinks he's hot stuff because he got more transistors for Christmas."
Blackwell smirked: "Christmas? I got a whole new architecture. You're still rocking that HBM3 memory while I'm on HBM3e. Catch up, grandpa."
The two were still bickering when the door to the data center swung open and in walked Rubin, the 2026 legend, the one everyone whispered about like he was the second coming of Moore's Law.
Rubin was massive. 1.5 trillion transistors. A new "Rubin" architecture that made Blackwell look like a calculator. He didn't even have to say anything. He just sat there glowing with quiet confidence, sipping 1,000 watts like it was a light snack.
Hopper (eyes wide): "Whoa... who's this guy?"
Blackwell (suddenly much quieter): "Uh... I think that's my replacement."
Rubin (in a calm, deep voice): "Gentlemen. I don't mean to interrupt your little family reunion, but I'm here to train the next generation of foundation models while you two are still arguing about who has the bigger memory bandwidth."
Hopper (trying to save face): "I've got history, kid. I trained GPT-4. I'm a legend."
Blackwell (desperately): "And I trained GPT-5! I'm the current champ!"
Rubin (smiling): "That's cute. I'm going to train GPT-6 while you two are still trying to figure out why your fans are spinning so loud."
The room fell silent. Then Hopper and Blackwell looked at each other, shrugged, and said in unison:
Hopper & Blackwell: "We're gonna need a bigger rack."
Rubin (already running a trillion-parameter model in the background): "Don't worry. I brought my own."
And that, folks, is how the GPU family tree keeps getting taller, faster, and more expensive.
Moral of the story: In the world of AI chips, you're never the king for long. You're just the current flavor of the month. Until, that is, the next flavor shows up with more transistors and a cooler name.
The End. (Or as Rubin would say: "See you in 2027, losers." Or maybe sooner.)
As interpreted by Grok
Rubin's software is what transforms the six-chip architecture into a single, unified AI supercomputer capable of powering the next generation of agentic, long-context, and industrial-scale AI.
The Vera Rubin software stack includes:
Major cloud providers - including Microsoft Azure - are already integrating Rubin into their next-generation AI data centers. This includes cluster scheduling, power and thermal management, multi-tenant security, and rack-scale orchestration. Rubin is designed to plug directly into hyperscale cloud software stacks.
NVIDIA's Vera Rubin platform is designed to transform how modern AI data centers operate by turning entire server racks into unified AI supercomputers. It represents a shift from traditional server-based design to AI factories, data centers built from the ground up to generate intelligence at industrial scale.

The NVIDIA NVL72 rack is a fully integrated system built from six coordinated chips: the Rubin GPU, Vera CPU, NVLink 6 switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet switch. Instead of treating CPUs, GPUs, and networking as separate components, Rubin merges them into a single architecture optimized for large-scale AI training, long-context reasoning, and real-time inference. This rack-scale design allows data centers to run AI workloads with far less overhead, higher bandwidth, and dramatically improved efficiency.
One of Rubin's most important contributions to data centers is its new context memory storage platform, which enables long-context LLMs and agentic AI systems to operate at scale. This storage layer, accelerated by BlueField-4 DPUs, allows racks to maintain and retrieve massive key-value caches for inference, making it possible to run models with far larger context windows than previous generations. Data centers adopting Rubin can support more advanced reasoning models, mixture-of-experts architectures, and continuous AI services that require fast, persistent memory across the entire rack.
Rubin also introduces rack-scale confidential computing and zero-downtime maintenance, two features that make data centers more secure and reliable for enterprise and government workloads. Confidential computing protects data while it is being processed, not just stored or transmitted, which is essential for regulated industries such as healthcare and finance. Zero-downtime maintenance allows operators to service hardware without interrupting running AI workloads. This is an important capability as AI factories move toward 24/7 continuous operation./p>
Another major impact is Rubin's revolutionary cooling efficiency.. NVIDIA CEO Jensen Huang announced that Rubin chips can be cooled with 45°C (113°F) water, eliminating the need for traditional water chillers in many data centers. By reducing cooling complexity and energy consumption, Rubin lowers operational costs, enables faster deployment of new AI capacity, and helps protect the environment.
Cloud providers are already preparing for Rubin at scale. Microsoft revealed that its Azure data centers were engineered years in advance to support Rubin's power, thermal, and networking requirements, enabling seamless large-scale deployments starting in 2026. This means Rubin will quickly become part of the backbone of U.S. cloud AI infrastructure, powering everything from enterprise AI services to national-scale AI research.,
Rubin isn't just a new chip; it's a new model for how America builds and operates AI infrastructure.
Vera Rubin was one of the most influential American astronomers of the 20th century, and her work reshaped our understanding of the universe. Although she wasn't an astronaut, her discoveries reached far beyond Earth.

Rubin's careful observations of how galaxies rotate provided the first strong, empirical evidence for dark matter, the invisible mass that makes up most of the universe. Before her research, dark matter was a theoretical idea; after Rubin, it became a central pillar of modern cosmology.
Rubin's insight came from a simple but profound observation: stars on the outer edges of galaxies were moving far faster than expected. According to the known laws of physics, those galaxies should have flown apart. The only explanation was that enormous amounts of unseen mass were holding them together. Her work forced scientists to rethink the structure of the cosmos and opened an entirely new frontier in astrophysics, one that continues to influence physics, astronomy, and national research priorities today.
Her impact wasn't limited to science. Rubin was a powerful advocate for women in STEM at a time when the field was overwhelmingly male. She pushed institutions to open doors, mentored young scientists, and used her platform to challenge the barriers she herself had faced. Her legacy is honored through the Vera C. Rubin Observatory, a major U.S. scientific facility dedicated to mapping the universe in unprecedented detail. And her legacy grows as the namesake of NVIDIA's new technology.
Rubin's story matters because it highlights a recurring theme in American innovation: transformative breakthroughs often come from people who question assumptions and look at familiar problems with new eyes. Just as Rubin revealed hidden structure in the universe, today's AI systems are uncovering patterns in data that were once invisible. Her work reminds us that discovery is not just about new tools. It's also about the courage to rethink what we think we know.
A Tale of Two Chips and other AI stories.
AI Infrastructure page.
Data centers and the Environment.
External links open in a new tab:
Videos: