Models & Research

Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas

· June 23, 2026
Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas

What it does

Datalab has launched lift, a vision model with 9 billion open weights that converts PDFs and images directly into structured JSON output. Unlike typical extraction methods, lift uses schema-constrained decoding to ensure the output strictly matches the intended JSON schema. It also features trained abstention, meaning it returns null values when data fields are missing instead of fabricating false information. On a benchmark of 225 documents, lift achieves a 90.2% accuracy rate on field extraction.

Why it matters

This model tackles a major pain point in document automation: reliably turning visually complex PDFs into structured, machine-readable data. Most existing tools either require brittle templates or hallucinate incorrect entries, creating errors or forcing cumbersome manual cleanup. Lift’s schema-driven approach enforces valid output formats, reducing downstream parsing errors and operational risk. The trained abstention mechanism cuts the cost and error rate of false positives by opting out of uncertain extractions. This makes lift practical for businesses needing automated, trustworthy data pipelines from diverse document types.

Who it is for

Lift targets builders and operators who automate document workflows—things like invoicing, contracts, and compliance forms. Companies processing large volumes of invoices or reports can integrate lift to accelerate data ingestion with fewer errors. Developers can use the open weights to tailor the model for their internal schemas, improving performance on domain-specific documents. Its ability to handle both PDFs and images opens further use cases in mobile data capture or scanned record processing.

The catch

Open-weights models with advanced schema enforcement demand computational resources and domain expertise to deploy effectively. Users must prepare and maintain accurate JSON schemas and tune the model abstention threshold to balance recall and precision. Lift’s 90.2% accuracy still leaves some room for manual review in critical applications. The complexity of training abstention also means this approach may be less plug-and-play than simpler extraction tools.

What to watch next

Look for how lift performs in real-world deployments at scale and across different document types. Its open-weights release invites community experimentation; adoption will depend on how easy it is to integrate with existing workflows and how well it handles the edge cases typical in business documents. Future updates may improve abstention calibration and schema flexibility, which will be key to wider enterprise adoption.

AI Quick Briefs Editorial Desk

Stay ahead of AI Get the most important AI news delivered to your inbox — free.