Towards Reliable Financial Agents: How a 4B Model Outsmarted a 235B Giant

Workshops

Large generalist models have excellent reasoning but this does not necessarily imply specialized knowledge and tool calling capabilities. They can still hallucinate column names, ignore constraints, and generate SQL that returns nonsensical results. The problem isn’t intelligence—it’s reliability and specialization.

In this talk we’ll show how a 4B model was fine-tuned to outperform a 235B model on real financial analysis tasks. The key was not adding more reasoning ability, but enforcing tool discipline. Using synthetic data generation and reinforcement learning with the open-source rLLM framework, the model learned to explore schemas, validate outputs, and retry failures instead of hallucinating confident nonsense.

One key result: tool-use fundamentals generalize. Training on simple tool interactions transferred to much harder, multi-step financial tasks. If you’re building LLM systems that interact with databases, APIs, or internal tools, this talk focuses on the behaviors that actually matter — and how to teach them without frontier-scale compute.

Senior Research Scientist

Charles Dickens

Snorkel AI

Charles Dickens is a Senior Research Scientist at Snorkel AI, where he works on data-centric AI, improving model performance with higher-quality and better-curated data. He earned his Ph.D. in Computer Science and Engineering from the University of California, Santa Cruz. His research focuses on advancing AI by combining human expertise, symbolic reasoning, and rigorous evaluation to build more reliable and trustworthy systems.

2026 Talks

Towards Reliable Financial Agents: How a 4B Model Outsmarted a 235B Giant

Workshops

The AI Conference for Humans Who Ship