Overview of Microsoft's MAI-Voice-1 and MAI-1-preview

Microsoft just launched its first AI models built completely in-house, no OpenAI involvement this time. This marks a major milestone for the company as it steps out of its long-time role as OpenAI’s infrastructure partner and into the arena as a model developer itself.

Two New Models

Microsoft introduced two models:

MAI-Voice-1: An expressive speech generation model that generates natural-sounding audio. You can try it out in Copilot here.
MAI-1-preview: Microsoft’s first proprietary text foundation model that they trained end-to-end. Right now it’s only accessible through LM Arena, where it can be compared head-to-head with other models.

The text model is already ranked 15th on LM Arena, above GPT-4.1 Flash but just below gemini-2.5-flash.

Training Scale and Rankings

The model was trained on 15,000 H100 GPUs. That’s relatively small compared to:

xAI’s Grok – ~200,000 GPUs
OpenAI’s rumored GPT-5 cluster – ~200,000 GPUs

Despite the smaller training run, Microsoft’s model is competitive. Time for Microsoft to ride the scaling wave.

Benchmarks and Access

One unusual aspect: Microsoft hasn’t published any benchmarks. Right now, the only way to evaluate performance is through LM Arena matchups. Access is random, so most users won’t get a chance to test it directly (for now). It is possible to apply for API access, but it seems quite limited.

Voice Comparisons

In side-by-side tests against OpenAI’s real-time voice model, Microsoft’s voice was impressive: smooth, well-paced, and natural. Microsoft’s model felt more human-like, though that could simply be because I’m more accustomed to OpenAI’s voice model at this point.

What This Means

Vertical integration: Microsoft can now power Copilot with its own models, reducing reliance on OpenAI.
Competitive tension: Expect more divergence as both companies scale.
Room to grow: With resources, talent (many from DeepMind and Inflection), and infrastructure, Microsoft has the ability to climb the LM Arena rankings fast.

TL;DR

Microsoft launched its first in-house voice and text models.
The text model ranks 13th on LM Arena—a strong debut.
Training scale: 15k H100 GPUs, small compared to competitors.
No benchmarks yet, limited access.
Voice model sounds better than OpenAI’s
Microsoft is moving from partner → competitor with OpenAI.

Dan Cleary

Founder

Overview of Microsoft's MAI-Voice-1 and MAI-1-preview

Two New Models

Training Scale and Rankings

Benchmarks and Access

Voice Comparisons

What This Means

TL;DR

Get the week's best prompt engineering and AI content

Join thousands of AI builders

More from the PromptHub Blog

LLMs Are Eating the Context Layer

OpenAI DevDay 2025 Roundup: Apps, Agents, and the New AI Stack

Everything You Need to Know about Claude 4.5