The rumors are true: Microsoft has built its own custom AI chip that can be used to train large language models and potentially avoid a costly reliance on Nvidia. Microsoft has also built its own Arm-based CPU for cloud workloads. Both custom silicon chips are designed to power its Azure data centers and ready the company and its enterprise customers for a future full of AI.
Microsoft’s Azure Maia AI chip and Arm-powered Azure Cobalt CPU are arriving in 2024, on the back of a surge in demand this year for Nvidia’s H100 GPUs that are widely used to train and operate generative image tools and large language models. There’s such high demand for these GPUs that some have even fetched more than $40,000 on eBay.
“Microsoft actually has a long history in silicon development,” explains Rani Borkar, head of Azure hardware systems and infrastructure at Microsoft, in an interview with The Verge. Microsoft collaborated on silicon for the Xbox more than 20 years ago and has even co-engineered chips for its Surface devices. “These efforts are built on that experience,” says Borkar. “In 2017, we began architecting the cloud hardware stack and we began on that journey putting us on track to build our new custom chips.”
The new Azure Maia AI chip and Azure Cobalt CPU are both built in-house at Microsoft, combined with a deep overhaul of its entire cloud server stack to optimize performance, power, and cost. “We are rethinking the cloud infrastructure for the era of AI, and literally optimizing every layer of that infrastructure,” says Borkar.
The Azure Cobalt CPU, named after the blue pigment, is a 128-core chip that’s built on an Arm Neoverse CSS design and customized for Microsoft. It’s designed to power general cloud services on Azure. “We’ve put a lot of thought into not just getting it to be highly performant, but also making sure we’re mindful of power management,” explains Borkar. “We made some very intentional design choices, including the ability to control performance and power consumption per core and on every single virtual machine.”
Microsoft is currently testing its Cobalt CPU on workloads like Microsoft Teams and SQL server, with plans to make virtual machines available to customers next year for a variety of workloads. While Borkar wouldn’t be drawn into direct comparisons with Amazon’s Graviton 3 servers that are available on AWS, there should be some noticeable performance gains over the Arm-based servers Microsoft is currently using for Azure. “Our initial testing shows that our performance is up to 40 percent better than what’s currently in our data centers that use commercial Arm servers,” says Borkar. Microsoft isn’t sharing full system specifications or benchmarks yet.
Microsoft’s Maia 100 AI accelerator, named after a bright blue star, is designed for running cloud AI workloads, like large language model training and inference. It will be used to power some of the company’s largest AI workloads on Azure, including parts of the multibillion-dollar partnership with OpenAI where Microsoft powers all of OpenAI’s workloads. The software giant has been collaborating with OpenAI on the design and testing phases of Maia.
“We were excited when Microsoft first shared their designs for the Maia chip, and we’ve worked together to refine and test it with our models,” says Sam Altman, CEO of OpenAI. “Azure’s end-to-end AI architecture, now optimized down to the silicon with Maia, paves the way for training more capable models and making those models cheaper for our customers.”
Manufactured on a 5-nanometer TSMC process, Maia has 105 billion transistors — around 30 percent fewer than the 153 billion found on AMD’s own Nvidia competitor, the MI300X AI GPU. “Maia supports our first implementation of the sub 8-bit data types, MX data types, in order to co-design hardware and software,” says Borkar. “This helps us support faster model training and inference times.”
Microsoft is part of a group that includes AMD, Arm, Intel, Meta, Nvidia, and Qualcomm that are standardizing the next generation of data formats for AI models. Microsoft is building on the collaborative and open work of the Open Compute Project (OCP) to adapt entire systems to the needs of AI.
“Maia is the first complete liquid cooled server processor built by Microsoft,” reveals Borkar. “The goal here was to enable higher density of servers at higher efficiencies. Because we’re reimagining the entire stack we purposely think through every layer, so these systems are actually going to fit in our current data center footprint.”
That’s key for Microsoft to spin these AI servers up more quickly without having to make room for them in data centers around the world. Microsoft built a unique rack to house Maia server boards in, complete with a “sidekick” liquid chiller that works like a radiator you’d find in your car or a fancy gaming PC to cool the surface of the Maia chips.
Along with sharing MX data types, Microsoft is also sharing its rack designs with its partners so they can use them on systems with other silicon inside. But the Maia chip designs won’t be shared more broadly, Microsoft is keeping those in-house.
Maia 100 is currently being tested on GPT 3.5 Turbo, the same model that powers ChatGPT, Bing AI workloads, and GitHub Copilot. Microsoft is in the early phases of deployment and much like Cobalt it isn’t willing to release exact Maia specifications or performance benchmarks just yet.
That makes it difficult to decipher exactly how Maia will compare to Nvidia’s popular H100 GPU, the recently announced H200, or even AMD’s latest MI300X. Borkar didn’t want to discuss comparisons, instead reiterating that partnerships with Nvidia and AMD are still very key for the future of Azure’s AI cloud. “At the scale at which the cloud operates, it’s really important to optimize and integrate every layer of the stack, to maximize performance, to diversify the supply chain, and frankly to give our customers infrastructure choices,” says Borkar.
That diversification of supply chains is important to Microsoft, particularly when Nvidia is the key supplier of AI server chips right now and companies have been racing to buy up these chips. Estimates have suggested OpenAI needed more than 30,000 of Nvidia’s older A100 GPUs for the commercialization of ChatGPT, so Microsoft’s own chips could help lower the cost of AI for its customers. Microsoft has also developed these chips for its own Azure cloud workloads, not to sell to others like Nvidia, AMD, Intel, and Qualcomm all do.
“I look at this more as complementary, not competing with them,” insists Borkar. “We have both Intel and AMD in our cloud compute today, and similarly on AI we are announcing AMD where we already have Nvidia today. These partners are very important to our infrastructure, and we really want to give our customers the choices.”
You may have noticed the Maia 100 and Cobalt 100 naming, which suggests that Microsoft is already designing second-generation versions of these chips. “This is a series, it’s not just 100 and done... but we’re not going to share our roadmaps,” says Borkar. It’s not clear how often Microsoft will deliver new versions of Maia and Cobalt just yet, but given the speed of AI I wouldn’t be surprised to see a Maia 100 successor arrive at a similar pace to Nvidia’s H200 announcement (around 20 months).
The key now will be just how fast Microsoft gets Maia into action to speed up the rollout of its broad AI ambitions, and how these chips will impact pricing for the use of AI cloud services. Microsoft isn’t ready to talk about this new server pricing just yet, but we’ve already seen the company quietly launch its Copilot for Microsoft 365 for a $30-per-month premium per user.
Copilot for Microsoft 365 is limited to only Microsoft’s biggest customers right now, with enterprise users having to commit to at least 300 users to get on the list for its new AI-powered Office assistant. As Microsoft pushes ahead with even more Copilot features this week and a Bing Chat rebranding, Maia could soon help balance the demand for the AI chips that power these new experiences.