Microsoft just launched its first AI models built completely in-house, no OpenAI involvement this time. This marks a major milestone for the company as it steps out of its long-time role as OpenAI’s infrastructure partner and into the arena as a model developer itself.

Two New Models

Microsoft introduced two models:

  • MAI-Voice-1: An expressive speech generation model that generates natural-sounding audio. You can try it out in Copilot here.
  • MAI-1-preview: Microsoft’s first proprietary text foundation model that they trained end-to-end. Right now it’s only accessible through LM Arena, where it can be compared head-to-head with other models.

The text model is already ranked 15th on LM Arena, above GPT-4.1 Flash but just below gemini-2.5-flash.

Training Scale and Rankings

The model was trained on 15,000 H100 GPUs. That’s relatively small compared to:

  • xAI’s Grok – ~200,000 GPUs
  • OpenAI’s rumored GPT-5 cluster – ~200,000 GPUs

Despite the smaller training run, Microsoft’s model is competitive. Time for Microsoft to ride the scaling wave.

Benchmarks and Access

One unusual aspect: Microsoft hasn’t published any benchmarks. Right now, the only way to evaluate performance is through LM Arena matchups. Access is random, so most users won’t get a chance to test it directly (for now). It is possible to apply for API access, but it seems quite limited.

Voice Comparisons

In side-by-side tests against OpenAI’s real-time voice model, Microsoft’s voice was impressive: smooth, well-paced, and natural. Microsoft’s model felt more human-like, though that could simply be because I’m more accustomed to OpenAI’s voice model at this point.

What This Means

  • Vertical integration: Microsoft can now power Copilot with its own models, reducing reliance on OpenAI.
  • Competitive tension: Expect more divergence as both companies scale.
  • Room to grow: With resources, talent (many from DeepMind and Inflection), and infrastructure, Microsoft has the ability to climb the LM Arena rankings fast.

TL;DR

  • Microsoft launched its first in-house voice and text models.
  • The text model ranks 13th on LM Arena—a strong debut.
  • Training scale: 15k H100 GPUs, small compared to competitors.
  • No benchmarks yet, limited access.
  • Voice model sounds better than OpenAI’s
  • Microsoft is moving from partner → competitor with OpenAI.
Headshot of PromptHub Co-Founder Dan Cleary
Dan Cleary
Founder