Microsoft just launched its first AI models built completely in-house, no OpenAI involvement this time. This marks a major milestone for the company as it steps out of its long-time role as OpenAI’s infrastructure partner and into the arena as a model developer itself.
Two New Models
Microsoft introduced two models:
- MAI-Voice-1: An expressive speech generation model that generates natural-sounding audio. You can try it out in Copilot here.
- MAI-1-preview: Microsoft’s first proprietary text foundation model that they trained end-to-end. Right now it’s only accessible through LM Arena, where it can be compared head-to-head with other models.
The text model is already ranked 15th on LM Arena, above GPT-4.1 Flash but just below gemini-2.5-flash.
Training Scale and Rankings
The model was trained on 15,000 H100 GPUs. That’s relatively small compared to:
- xAI’s Grok – ~200,000 GPUs
- OpenAI’s rumored GPT-5 cluster – ~200,000 GPUs
Despite the smaller training run, Microsoft’s model is competitive. Time for Microsoft to ride the scaling wave.
Benchmarks and Access
One unusual aspect: Microsoft hasn’t published any benchmarks. Right now, the only way to evaluate performance is through LM Arena matchups. Access is random, so most users won’t get a chance to test it directly (for now). It is possible to apply for API access, but it seems quite limited.
Voice Comparisons
In side-by-side tests against OpenAI’s real-time voice model, Microsoft’s voice was impressive: smooth, well-paced, and natural. Microsoft’s model felt more human-like, though that could simply be because I’m more accustomed to OpenAI’s voice model at this point.
What This Means
- Vertical integration: Microsoft can now power Copilot with its own models, reducing reliance on OpenAI.
- Competitive tension: Expect more divergence as both companies scale.
- Room to grow: With resources, talent (many from DeepMind and Inflection), and infrastructure, Microsoft has the ability to climb the LM Arena rankings fast.
TL;DR
- Microsoft launched its first in-house voice and text models.
- The text model ranks 13th on LM Arena—a strong debut.
- Training scale: 15k H100 GPUs, small compared to competitors.
- No benchmarks yet, limited access.
- Voice model sounds better than OpenAI’s
- Microsoft is moving from partner → competitor with OpenAI.