Models & Research

Thinking Machines Lab ships its first model and argues interactivity is what OpenAI gets wrong about voice

AI Quick Briefs Editorial Desk · May 12, 2026

What happened

Thinking Machines Lab, headed by Mira Murati, launched its first AI model designed to rethink voice interactions. Unlike existing voice AI solutions that respond in a question-and-answer pattern, this new model processes audio, video, and text as simultaneous 200-millisecond chunks. The aim is to deliver faster, more natural conversations that move beyond scripted exchanges. The startup targets superior interactivity compared to OpenAI’s GPT Realtime 2 and Google’s Gemini Live.

Why it matters

Current voice AI systems often feel sluggish or stilted because they wait for input before responding, sticking to a back-and-forth Q&A dynamic. Thinking Machines Lab challenges that by treating all types of input—speech, video cues, text—in parallel, which can enable more fluid, lifelike conversations. For operators building voice-enabled products, this approach pressures the incumbents to address latency and interactivity limitations. It also raises the bar for real-time multimodal understanding in applications like virtual assistants, customer service bots, and interactive devices.

This model’s ability to process several data types simultaneously can speed up responsiveness and make AI experiences feel less robotic and more intuitive. For enterprises investing in voice AI, this innovation could reduce friction and improve user retention by making dialogues feel genuinely interactive.

What to watch next

Attention will focus on how Thinking Machines Lab’s model performs in real-world scenarios against OpenAI and Google’s offerings. Key metrics will be latency, accuracy, and user satisfaction in live interactive settings. Adoption by developers and product teams will reveal if this model can reshape expectations around voice AI beyond the standard Q&A framework.

Additionally, watch for new product integrations or partnerships that bring this technology into commercial or consumer use. If successful, other voice AI vendors may have to revise their architectures to prioritize parallel processing of multimodal data and cut down response delays.

AI Quick Briefs Editorial Desk

Read Full Article →