How Driven Leads in a New Benchmark for AI-Powered Investment Decision-Making

Benchmark data last updated: October 1, 2025

Abstract

This paper introduces Driven, an AI agent that uses an innovative financial analysis benchmark to outperform conventional tools and general-purpose language models in real-world investing. In contrast to headline-driven or superficial approaches, Driven applies the Investible Insights Benchmark to deliver actionable investment decisions. To achieve this, it combines expert-authored questions, SEC filings, professional validation, and integrated tools for search and sentiment analysis. Comparative evaluations demonstrate that Driven surpasses Gemini 2.5, Perplexity, Google Finance, and GPT-5 in retrieval, reasoning, and analytical precision. Its strengths lie in accurate parsing of regulatory filings, accelerated processing, and real-time integration of financial signals, making it a high-utility benchmark for research and portfolio decision-making.

Benchmark for Real-World Investing

A superior financial AI agent must deliver not only rigorous data accuracy but also actionable, profound insights. Therefore, on the basis of the Finance Agent Benchmark, it is also necessary to consider the understanding and insights into investment, providing deeper practical use value.

We’ve combined the Finance Agent Benchmark with our custom-built Investible Insights framework to test what really matters for investors. When it comes to stocks, you don’t need AI that can ace trivia contests — you need AI that can think like a junior analyst and cut through the noise. That’s exactly what our benchmark framework is built to test.

Finance Agent Benchmark

Finance has always been one of the ripest fields for automation, but until recently, there’s been no way to judge whether AI agents can actually handle analyst-level work. The Finance Agent Benchmark, developed alongside Stanford researchers, a global systemically important bank, and industry veterans, fills that gap. It asks a simple but high-stakes question: can AI replicate the day-to-day tasks of a junior equity analyst? Think parsing earnings reports, summarizing management commentary, and building the backbone of real investment models.

Investible Insights Benchmark

But Wall Street isn’t just about data collection. Investors care about whether the numbers translate into an investable thesis. That’s why we created Investible Insights, a custom benchmark that mirrors how investors actually make calls. Instead of rewarding surface-level summaries, it pushes AI to connect financials, sentiment, and competitive dynamics into differentiated views. Can it see through management spin? Spot sustainable growth vs. temporary margin boosts? We built this benchmark around the pain points that drive real alpha.

These are the evaluation dimensions of the Driven benchmark:

Accuracy

Check whether the model explicitly states the time range, fiscal quarter, or data source when analyzing, to enhance user trust.

Logical Coherence

Check whether the model accurately understands the user's true intent, provides quantitative or qualitative support to substantiate its points, and presents its response in a clear structure.

Thoroughness

Check whether the response goes beyond just listing metrics, and instead provides interpretation with broader context.

Social Media Supplement

Check whether the response incorporates supplementary insights from social search, presenting valuable original quotes along with the model's interpretation.

Intent-Guided Discourse

Ensure the response concludes with a highly relevant follow-up question that encourages the user to delve deeper, while also better understanding and aligning with their intentions.

Driven Surpasses General LLMs

When it comes to equity analysis, precision matters. One wrong number in an earnings model can flip a “Buy” into a “Sell.” While large language models like GPT-5 have made impressive strides in summarization and general Q&A, the question remains: can they replace purpose-built financial analysis tools?

We benchmarked Driven against other LLMs on real-world stock research tasks. Here's what we found:

Perplexity vs Driven

View comparison

Take a look at how Driven and Perplexity compare in Quantitative Retrieval, Numerical Reasoning, and Financial Modeling.

Driven

62.64

Perplexity

42.34

Gemini 2.5 (Web) vs Driven

View comparison

What are the differences in how Driven and Gemini 2.5 (Web) compare in Quantitative Retrieval, Financial Modeling, and Complex Retrieval?

Driven

62.64

Gemini 2.5

44.57

GPT-5 (Web) vs Driven

View comparison

Driven even outperforms GPT-5 (Web) in cost, speed, and accuracy.

Driven

62.64

GPT-5

60.65

What enabled Driven to achieve high performance?

When it comes to real investing work, speed and surface-level answers aren't enough. You need tools that can cut through the noise, get the source right, and give you insights in a way that actually feels like talking to a seasoned analyst. That's where Driven pulls ahead:

Reliable Document Retrieval for Accuracy

Driven doesn't just "search the web." It goes straight to the source by pulling directly from SEC 10-Ks and filings while most AI models rely on secondary search results (and often pull the wrong doc).

39 pages 10-Q

Driven

SEC

142 pages 10-K

Driven

SEC

Social Signals for Rapid Market Trends

Beyond news wires, Driven taps into X (Twitter) finance streams surfacing timely chatter and sentiment that actually moves markets.

Tool - Searched social media

Response - Social media quotation

Accelerated Access through Financial Data Integration

Instead of clunky PDF crawling, Driven is optimized for SEC file parsing and wired into financial data APIs, so you get numbers and disclosures fast.

Driven combines accuracy, speed, and actionable insights to deliver a reliable investment assistant, empowering investors to make informed decisions with confidence.

References

[1] Finance Agent Benchmark: Benchmarking LLMs on Real-world Financial Research Tasks [2] A Survey of Large Language Models in Finance (FinLLMs)[3] FinanceBench: A New Benchmark for Financial Question Answering [4] FinLLMs: A Framework for Financial Reasoning Dataset Generation with Large Language Models [5] FinBen: A Holistic Financial Benchmark for Large Language Models [6] Fine-Tuning Retrieval-Augmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews [7] Financial Sentiment Analysis: Techniques and Applications [8] Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis [9] Can Large Language Models Beat Wall Street? Unveiling the Potential of AI in Stock Selection [10] A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist [11] When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments [12] Beyond Classification: Financial Reasoning in State-of-the-Art Language Models [13] FinTextQA: A Dataset for Long-form Financial Question Answering [14] Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis [15] A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges