Models & Research

Mira Murati’s Thinking Machines Lab Introduces Interaction Models: A Native Multimodal Architecture for Rea…

AI Quick Briefs Editorial Desk · May 13, 2026

What changed

Thinking Machines Lab unveiled TML-Interaction-Small, a new AI model that handles audio, video, and text inputs simultaneously in real time. This is a 276 billion parameter model with 12 billion active parameters, designed around a micro-turn architecture that processes 200-millisecond chunks at once. Unlike traditional turn-based AI models that pause input while generating output, this system runs parallel components for perception and generation, enabling continuous, live interaction without needing external voice-activity detection.

Why builders should care

Real-time, multimodal AI interfaces often hit responsiveness and integration limits when juggling multiple input types. Conventional models freeze perception during output, creating delays and fragmented user experiences. TML-Interaction-Small’s architecture removes this bottleneck by aligning multi-stream audio, video, and text inputs in real time and processing them concurrently with generation. This native multimodal approach lowers integration complexity, reduces latency, and can improve interaction fluidity in applications like live assistance, meeting transcription, or AI collaboration tools.

The practical takeaway

Operators building real-time human-AI collaboration tools get a scalable approach that cuts down on system overhead, especially no longer requiring external voice-activity detectors. This means fewer points of failure and implementation complexity while advancing multimodal fusion quality. The 200ms micro-turn chunk size balances granularity and responsiveness, making it easier to develop AI systems that work more like human conversational partners, reacting to speech, facial cues, and text simultaneously. This model architecture could raise the bar for user experience in live AI interactions.

What to watch next

The next key element to monitor is how Thinking Machines Lab expands this model in scale, domain adaptability, and multimodal capabilities beyond the research preview. Watching for practical deployments or open access will reveal whether this architecture becomes a standard for real-time AI interfaces. Also, it will be important to track how competitors attempt similar native multimodal processing and if TML’s approach can hold up under commercial, latency-sensitive workloads at scale.

AI Quick Briefs Editorial Desk

Read Full Article →