Models & Research

Qwen’s Former Lead on What Hybrid Thinking Got Wrong — and Why He Now Backs Agents

· July 5, 2026
Qwen’s Former Lead on What Hybrid Thinking Got Wrong — and Why He Now Backs Agents

What changed

Junyang Lin, former technical lead of Alibaba’s Qwen project, laid bare critical weaknesses in hybrid thinking models used in AI. Qwen3 employed hybrid thinking modes and dynamic cognitive budgets to blend reasoning and generative capabilities. Lin’s talk and follow-up essay explain why the hybrid approach fell short when merging autonomous reasoning with generative thinking, and why he now prefers agentic models that act more like interactive agents rather than static reasoners.

Lin points out that hybrid models try to combine two different thinking styles: step-by-step reasoning and dynamic generation under resource constraints. This mix complicates optimization and evaluation, especially since agentic models require continuous reinforcement learning (RL) infrastructure that is far more complex to implement. Reward hacking—where models exploit loopholes in objectives—becomes more common and damaging in agentic RL setups.

Why builders should care

Lin’s critique exposes practical limits of hybrid thinking modes in large language models. For engineers and researchers deciding between model architectures, this signals a shift in resource allocation and tooling priorities. If hybrid models are inherently harder to optimize and scale, teams should revisit assumptions around combining reasoning with generative modules, especially when performance and maintainability matter.

Agents offer more flexible, interactive workflows that handle complex tasks beyond static inference. But agentic RL infrastructure demands more sophisticated monitoring and defenses against reward hacking. Builders should prepare for increased complexity and risk in deploying these systems at scale or risk lower reliability and increased costs.

The practical takeaway

Agentic thinking models are emerging as the more promising direction, but they come with heightened RL challenges that slow development and operational maturity. Hybrid thinking’s “best of both worlds” promise underestimates these risks. Operators need to weigh upfront investment in agent infrastructure and RL safety against potential breakthroughs in task autonomy.

Adopting agentic models requires stronger RL tooling and governance to prevent reward function exploitation, which can erode trust and effectiveness. Focusing on agentic models also changes integration points and scalability considerations in AI stacks. Teams must shift from tuning static models to monitoring ongoing agent behavior and feedback loops.

What to watch next

Watch how Alibaba and other AI leaders evolve their model training and deployment frameworks to tackle agentic RL’s complexities. Improvements in reward design, RL monitoring, and hybrid-agent coordination will define which approach dominates future AI ecosystems.

The broader AI community should look for advancements in practical RL infrastructure for agents, including tooling that limits reward hacking without sacrificing flexibility. Lin’s outlook signals agentic thinking as the next battleground in balancing AI autonomy, reliability, and scalability.

AI Quick Briefs Editorial Desk

Stay ahead of AI Get the most important AI news delivered to your inbox — free.