Models & Research

GPT and Claude failed Bridgewater’s finance tests because the right answers were never public

AI Quick Briefs Editorial Desk · July 3, 2026

What happened

Bridgewater Associates and Thinking Machines Lab tested leading AI models, including GPT and Claude, on finance document analysis. Their evaluation showed that a finely tuned open-weight model outperformed these more powerful but general AI systems. Bridgewater revealed the key problem: the correct answers for these tests were never public, which hindered the large proprietary models from learning or verifying the right outcomes.

Why it matters

Financial intelligence tasks demand accuracy tied to hard, proprietary knowledge not available in public datasets. This exposes a major limitation for popular AI models that rely on internet-scale pretraining without access to private or domain-specific ground truth. Firms heavily dependent on pretrained black box models may face higher costs, lower trust, and incomplete automation when tackling specialized finance tasks. A cheaper, open approach with targeted tuning and transparent data outperforms bigger, closed models, shifting how AI should be deployed in finance.

What to watch next

Expect more hedge funds and financial institutions to push for open, customizable AI models they can train on proprietary data rather than relying on commercial large language models. The economics favor in-house tuning of smaller, transparent models over expensive black box offerings for critical, high-value financial decisions. The test also raises new standards for evaluating finance-focused AI tools, underscoring the need for real ground truth benchmarks rather than public datasets that do not reflect true deal-making or portfolio analytics environments.

AI Quick Briefs Editorial Desk

Read Full Article →