Models & Research

Baidu Releases Unlimited OCR, a 3B Model That Keeps the KV Cache Flat for Long-Document Parsing

AI Quick Briefs Editorial Desk · June 25, 2026

What it does

Baidu has open-sourced Unlimited OCR, a 3-billion-parameter mixture-of-experts (MoE) model designed to parse long documents efficiently in a single forward pass. It introduces Reference Sliding Window Attention (R-SWA), a new attention mechanism that keeps the key-value (KV) cache size fixed regardless of document length. This innovation means memory use and latency remain constant as the model processes dozens of pages, a critical improvement for real-world document workflows.

Why it matters

Traditional OCR models slow down and consume more memory with longer documents because their attention mechanisms cache more data as output length increases. Unlimited OCR’s R-SWA breaks this constraint, making it practical to analyze entire long documents or batches without scaling hardware resources or waiting longer. This reduces operational costs and latency for businesses working with legal, financial, or scientific documents at scale.

In testing, Unlimited OCR scored 93.23 on OmniDocBench v1.5, outperforming the DeepSeek OCR baseline by over 6 points. This confirms the model’s accuracy while handling extensive texts, which is crucial for enterprises that rely on precise extraction from multi-page documents. Open-sourcing the model under an MIT license also encourages adoption and custom integration into diverse OCR pipelines.

Who it is for

Software developers, product teams, and AI operators who handle large-scale document parsing will find Unlimited OCR valuable. Its ability to maintain flat memory and latency profiles lowers the barrier for scaling OCR applications in legal tech, compliance, insurance, and research fields. Founders and CTOs can build more responsive document AI products without overhauling existing infrastructure or tolerating slowdowns on long inputs.

The catch

Unlimited OCR is a large model at 3 billion parameters, meaning inference costs remain non-trivial compared to lighter-weight OCR tools. Integration may require tuning the sliding window attention to handle unique document layouts or fonts. Also, while the model’s source code and weights are open, deploying it in latency-critical environments still demands sufficient GPU resources, which raises operational expenses.

What to watch next

Tracking how Baidu and the open-source community enhance Unlimited OCR is key. Expect refinements to R-SWA and MoE model efficiency, possibly unlocking even longer inputs or faster throughputs at lower cost. Watch how competitors respond with their strategies for long-document parsing and whether this approach persuades enterprises to transition from traditional OCR systems to large AI models that maintain steady resource use on scale.

AI Quick Briefs Editorial Desk

Read Full Article →