Benchmark data last updated: October 1, 2025
This paper introduces Driven, an AI agent that uses an innovative financial analysis benchmark to outperform conventional tools and general-purpose language models in real-world investing. In contrast to headline-driven or superficial approaches, Driven applies the Investible Insights Benchmark to deliver actionable investment decisions. To achieve this, it combines expert-authored questions, SEC filings, professional validation, and integrated tools for search and sentiment analysis. Comparative evaluations demonstrate that Driven surpasses Gemini 2.5, Perplexity, Google Finance, and GPT-5 in retrieval, reasoning, and analytical precision. Its strengths lie in accurate parsing of regulatory filings, accelerated processing, and real-time integration of financial signals, making it a high-utility benchmark for research and portfolio decision-making.
A superior financial AI agent must deliver not only rigorous data accuracy but also actionable, profound insights. Therefore, on the basis of the Finance Agent Benchmark, it is also necessary to consider the understanding and insights into investment, providing deeper practical use value.
We’ve combined the Finance Agent Benchmark with our custom-built Investible Insights framework to test what really matters for investors. When it comes to stocks, you don’t need AI that can ace trivia contests — you need AI that can think like a junior analyst and cut through the noise. That’s exactly what our benchmark framework is built to test.
Finance has always been one of the ripest fields for automation, but until recently, there’s been no way to judge whether AI agents can actually handle analyst-level work. The Finance Agent Benchmark, developed alongside Stanford researchers, a global systemically important bank, and industry veterans, fills that gap. It asks a simple but high-stakes question: can AI replicate the day-to-day tasks of a junior equity analyst? Think parsing earnings reports, summarizing management commentary, and building the backbone of real investment models.
But Wall Street isn’t just about data collection. Investors care about whether the numbers translate into an investable thesis. That’s why we created Investible Insights, a custom benchmark that mirrors how investors actually make calls. Instead of rewarding surface-level summaries, it pushes AI to connect financials, sentiment, and competitive dynamics into differentiated views. Can it see through management spin? Spot sustainable growth vs. temporary margin boosts? We built this benchmark around the pain points that drive real alpha.
These are the evaluation dimensions of the Driven benchmark:
Check whether the model explicitly states the time range, fiscal quarter, or data source when analyzing, to enhance user trust.
Check whether the model accurately understands the user's true intent, provides quantitative or qualitative support to substantiate its points, and presents its response in a clear structure.
Check whether the response goes beyond just listing metrics, and instead provides interpretation with broader context.
Check whether the response incorporates supplementary insights from social search, presenting valuable original quotes along with the model's interpretation.
Ensure the response concludes with a highly relevant follow-up question that encourages the user to delve deeper, while also better understanding and aligning with their intentions.
When it comes to equity analysis, precision matters. One wrong number in an earnings model can flip a “Buy” into a “Sell.” While large language models like GPT-5 have made impressive strides in summarization and general Q&A, the question remains: can they replace purpose-built financial analysis tools?
We benchmarked Driven against other LLMs on real-world stock research tasks. Here's what we found:
Take a look at how Driven and Perplexity compare in Quantitative Retrieval, Numerical Reasoning, and Financial Modeling.
What are the differences in how Driven and Gemini 2.5 (Web) compare in Quantitative Retrieval, Financial Modeling, and Complex Retrieval?
Driven even outperforms GPT-5 (Web) in cost, speed, and accuracy.
When it comes to real investing work, speed and surface-level answers aren't enough. You need tools that can cut through the noise, get the source right, and give you insights in a way that actually feels like talking to a seasoned analyst. That's where Driven pulls ahead:
Driven doesn't just "search the web." It goes straight to the source by pulling directly from SEC 10-Ks and filings while most AI models rely on secondary search results (and often pull the wrong doc).




Beyond news wires, Driven taps into X (Twitter) finance streams surfacing timely chatter and sentiment that actually moves markets.


Instead of clunky PDF crawling, Driven is optimized for SEC file parsing and wired into financial data APIs, so you get numbers and disclosures fast.
Driven combines accuracy, speed, and actionable insights to deliver a reliable investment assistant, empowering investors to make informed decisions with confidence.