What Is Deep Analysis?
Deep Analysis ≠ Deep Search.
You might have come throughout “Deep Search” options in instruments like ChatGPT or Claude — designed to boost retrieval and concise solutions. Whereas Deep Search focuses on retrieval efficiency and short-form solutions, Deep Analysis is about understanding, reasoning, and synthesis — combining adaptive planning, retrieval, evaluation, and context engineering to supply long-form, well-cited analysis outputs.
Consider it as shifting from “discover me one thing” → “clarify and motive by this subject for me.”
In contrast to easy search duties, Deep Analysis requires persistence, iteration, and strategic depth — just like how a human analyst or guide would work.
Conventional AI search methods are constructed to reply fast factual questions:
“What’s Salesforce’s income in 2024?”
However Deep Analysis asks:
“How is Salesforce’s income development correlated with generative AI adoption within the enterprise sector, and what can we study from opponents’ go-to-market shifts?”
This shift modifications the whole lot — the time expectation, width of exploration, and depth of reasoning are far higher.
Deep Analysis blends planning, reasoning, and writing — not simply retrieval.
Key constructing blocks embrace:
- Adaptive Planning: dynamically decomposing complicated analysis targets
- Retrieval: gathering from numerous, multimodal sources
- Evaluation & Reasoning: connecting dots throughout proof
- Context Engineering: curating context for LLMs to remain constant
(see this nice piece) - Lengthy-Kind Synthesis: writing coherent, well-cited stories
What Is Enterprise Deep Analysis — and Why Does It Matter?
In an enterprise setting, analysis doesn’t reside in isolation. Info is scattered throughout:
- Inner methods equivalent to Salesforce, Slack, Google Docs, Calendars, inside data bases, and so forth.
- Exterior sources equivalent to LinkedIn, GitHub, public information, stories, net knowledge, and so forth.
Enterprise Deep Analysis bridges each worlds — combining inside data and exterior insights to serve strategic enterprise targets.
Instance enterprise purposes embrace:
- Gross sales: Account analysis and aggressive evaluation
- Service: Rising situation triage throughout assist knowledge and boards
- Advertising and marketing: Market pattern synthesis from CRM and exterior media
- Management: Strategic choice briefs and forecasting
- Engineering: Benchmarking opponents’ tech stacks and repos
The result isn’t simply solutions — it’s insights that drive motion.
Enterprise Deep Analysis stories are highly effective instruments that serve a number of functions. They act like clever consultants—guiding each worker from particular person contributors to senior executives in making higher choices.
These stories distill complicated data into accessible insights that may gas real-time companies, speed up reply discovery, and uncover hidden patterns or root causes behind enterprise challenges.
Distinctive Challenges in Enterprise Deep Analysis
- Planner Intelligence
- Does the system know the place to seek for what?
- Can it stability inside vs. exterior knowledge sources?
- How does it handle time-sensitive, contradictory, or incomplete info?
- How does it coordinate throughout instruments like Salesforce, Slack, Google Workspace, and LinkedIn?
- Instrument and Knowledge Entry
- Are the appropriate APIs and connectors in place?
- Can the system parse and retrieve knowledge precisely from structured and unstructured sources?
- Privateness and Entry Management
- Inner knowledge isn’t open to everybody. Who’s allowed to see what?
- How can the system respect permission hierarchies and knowledge residency guidelines?
- Quotation and Analysis
- How will we guarantee each perception is traceable to its supply?
- How will we consider the high quality of analysis when human experience is uneven or fragmented?
- How will we detect duplicated or conflicting info throughout methods?
An Instance of Enterprise Deep Analysis
In one among our inside use circumstances for gross sales, we designed a modular, multi-graph structure that mirrors how human researchers function, dividing and conquering by specialised sub-systems that collaborate intelligently.

1. Planner Sub-Graph
The Planner is the mind of the system — decomposing high-level analysis targets into actionable subtasks.
- Job Enter: Accepts pure language analysis requests (optionally paired with a predefined or LLM-generated template).
- Background Investigation:
The Background Investigator Agent scans a number of knowledge layers:- Public net (search, crawlers)
- Computational instruments (code execution, evaluation modules)
- Inner methods (Salesforce MCPs, CRM knowledge connectors)
- Job Decomposition:
Utilizing findings from the background stage, the planner breaks down the issue into well-defined subtasks mapped to appropriate instruments. - Subtask Execution:
Every subtask is dispatched to the Orchestrator Sub-Graph, both in parallel (for impartial duties) or sequentially.

2. Orchestrator Sub-Graph
The Orchestrator is the undertaking supervisor of Deep Analysis — overseeing every subtask and synthesizing partial findings right into a unified report.
- Step 1: Creates a tough define of the ultimate report and assigns a set of N analysis steps to specialised Job Researcher/Executor Sub-graphs (e.g., Public Net Researcher, Inner Salesforce Researcher, Coder, and so forth.).
- Step 2+: Iteratively refines the define by analyzing the outputs from executors, figuring out lacking items, and planning the following analysis wave.
- Reporting: As soon as protection is ample, the Reporter Agent consolidates findings, generates a long-form report, and attaches citations.
- Human-in-the-loop: Non-compulsory human suggestions could be built-in at any stage to refine path or validate conclusions.

3. Job Researcher / Executor Sub-Graph
Every executor agent focuses on a single job — whether or not that’s querying Salesforce knowledge, summarizing a GitHub repo, or operating a code experiment.
They act because the arms of the system, executing with precision, feeding outcomes again to the orchestrator for synthesis.
4. Instruments
Our framework employs specialised instruments designed to navigate distinct knowledge landscapes. These instruments act because the arms of the system, executing focused queries to construct a complete view. The first instruments embrace:
- Net Search: This device queries the general public web to assemble exterior info, equivalent to public information, market knowledge, and competitor stories. This enables the system to include a broad, exterior perspective into its evaluation.
- Enterprise Information Search: To faucet into proprietary firm intelligence, this device searches inside data bases like Trailhead, Highspot, and Confluence. It retrieves essential info from onboarding paperwork, product particulars, and aggressive battle playing cards, instantly addressing the necessity to leverage inside knowledge.
- CRM Search: This device connects on to inside methods like Salesforce to acquire account particulars and different buyer relationship knowledge equivalent to alternatives and conversational knowledge.
- Dialog Search: Inner Slack conversations from totally different channels and messages.
The Planner Sub-Graph intelligently coordinates these instruments, figuring out whether or not to deploy them in parallel or sequentially based mostly on the analysis job.
Enterprise Deep Analysis Analysis
Having established the first capabilities of those methods, it turns into essential to assess Deep Analysis fashions utilizing analysis strategies distinct from these utilized to look or Q&A fashions.
In Deep Analysis, accuracy is important, however not ample. A system could be 100% right on particular person knowledge factors but fail to supply a helpful strategic evaluation. Subsequently, our analysis should shift.
The last word objective is to measure how successfully an agent can perceive a fancy objective, motive throughout disparate sources, and synthesize info right into a coherent and actionable report.
Why Does Benchmarking Matter?
With out structured analysis, Deep Analysis stays anecdotal to customers, the place “it feels smarter” or “this report seems good.”
This isn’t sufficient for the enterprise. Enterprise enterprise contexts demand repeatable, explainable, and quantitative benchmarks. This rigor is very important when a number of brokers (OpenAI, Gemini, Slackbot, and so forth) are producing business-critical analyses.
Benchmarking Deep Analysis methods isn’t just about leaderboard scores. It’s about constructing belief. For any enterprise consumer to behave on an AI-generated perception, they want clear solutions to a few elementary questions:
- Traceability: The place an perception got here from
- Transparency: The way it was derived
- Consistency: Whether or not it aligns with company fact
Benchmark Practices in Pipeline
Our latest analysis: SFR-DeepResearch [1], DeepTrace [2], HERB [3], and LiveResearchBench [4], present complementary views on tips on how to measure these capabilities:
- SFR-DeepResearch [1] (https://arxiv.org/abs/2509.06283) focuses on coaching autonomous single-agent researchers with an RL recipe that improves planning and power use; it demonstrates functionality features on exterior reasoning benchmarks moderately than proposing a brand new analysis pipeline.
- DeepTRACE [2] (https://arxiv.org/abs/2509.04499) contributes an audit framework that measures traceability and factual assist on the assertion–quotation degree, turning recognized failure modes into eight measurable dimensions and revealing massive fractions of unsupported claims throughout methods.
- HERB [3] (https://arxiv.org/pdf/2506.23139) benchmarks Deep Search over heterogeneous enterprise knowledge (Slack, GitHub, conferences, docs), quantifying retrieval problem at scale (39k+ artifacts; answerable & unanswerable queries) and displaying retrieval as a main bottleneck for downstream reasoning.
- LiveResearchBench [4] (https://arxiv.org/pdf/2510.14240) is a benchmark of 100 expert-curated duties spanning each day life, enterprise, and academia, every requiring in depth, dynamic, real-time net search and synthesis. In addition to an analysis suite overlaying each content- and report-level high quality, together with protection, presentation, quotation accuracy and affiliation, consistency and depth of study.
Collectively, they encourage an enterprise analysis that spans protection/recall, quotation accuracy & auditability, reasoning coherence, and readability, whereas holding a transparent boundary between Deep Search (discovering the appropriate proof) and Deep Analysis (planning, reasoning, and long-form synthesis).
In our inside benchmarks, impressed by these frameworks, we suggest on 5 core dimensions for enterprise analysis:
| Dimension | What It Measures | Why It Issues |
| Protection | How broadly and deeply the agent explores related info throughout sources | Ensures completeness; crucial for strategic and aggressive analyses |
| Quotation Accuracy & Thoroughness | Whether or not insights are verifiably grounded in credible inside or exterior proof | Builds belief and accountability in enterprise choices |
| Reasoning Coherence | Logical consistency and readability of multi-step reasoning | Displays analytical depth; exhibits the mannequin “thinks” like an analyst |
| Readability & Construction | Readability, group, and fluency of the ultimate report | Makes outputs usable by enterprise stakeholders and management |
| Inner Information Richness | How successfully the agent leverages proprietary inside enterprise knowledge | Measures the system’s capacity to mix inside fact with exterior perception |
Implementation and Leads to Enterprise
To validate our analysis framework, we benchmarked a number of methods on a various set of Enterprise Deep Analysis stories, that are long-form stories that mix Salesforce’s inside data with exterior knowledge.
Primarily based on our enterprise use case of gross sales report technology and motivated by the above enterprise benchmark dimension, we choose for dimensions that want analysis for our pipeline: Report Readability & Construction, Inner Richness / Alignment, Quotation Accuracy, and Protection.
Our outcomes under present that whereas readability is essentially a solved problem for many fashionable LLM brokers, enterprise grounding and traceability clearly distinguish our Deep Analysis system. It not solely generates fluent, well-structured stories but additionally anchors each perception to verifiable sources and Salesforce’s inside data graph: a basis for decision-grade belief in enterprise AI.
| Analysis Dimension |
Gemini | OpenAI | SlackBot (Salesforce) | Salesforce AIR |
| Quotation Accuracy | 45.8% | 39.5% | * | 79.2% |
| Protection |
3.28 / 5 | 2.98 / 5 | 3.28 / 5 | 3.02 / 5 |
| Inner Information Richness |
42.3 % | 30.2% | 61.4% | 73.7% |
| Readability & Construction | 3.6 / 5 | 3.6/5 | 3.6/5 | 3.6/5 |
*inaccessible org citations
* On this desk, the numbers from Salesforce AIR are utilizing OpenAI GPT-4.1-mini
With belief on the core of Salesforce’s values, we took an additional step to make sure our benchmark outcomes weren’t solely quantitative but additionally human-verified. To validate consistency, we carried out a parallel analysis with professional human annotators and achieved a Fleiss’ κ (kappa) rating of 0.6, indicating sturdy alignment between mannequin judgments and human evaluations.
Instances like these spotlight the following frontier of Enterprise Deep Analysis, the place safe organizational knowledge could be accessed, reasoned over, and synthesized to energy numerous enterprise use circumstances.
In addition they showcase the necessity for enterprise-oriented benchmarks and frameworks that measure not simply accuracy or fluency, however the real-world affect and reliability of AI-driven analysis.
Conclusion
Enterprise Deep Analysis represents a big leap past conventional search, reworking how companies harness info. By integrating adaptive planning, numerous retrieval, subtle evaluation, and long-form synthesis, it strikes from merely answering “what” to deeply explaining “why” and “how.”
The distinctive challenges of enterprise settings—spanning inside and exterior knowledge, entry management, and sturdy quotation—necessitate a specialised method. Our multi-graph structure, designed to imitate human analysis, addresses these complexities by orchestrating specialised brokers and instruments. In the end, the analysis of such methods should transcend mere accuracy to concentrate on belief, traceability, transparency, and consistency.
Our inside benchmarks, validated by human specialists, display that our Deep Analysis system excels in offering correct, well-cited, and contextually wealthy insights from each proprietary and public knowledge, setting a brand new customary for decision-grade AI within the enterprise.
Quotation
Please cite this work as:
“`
Chien-Sheng Wu, Prafulla Kumar Choubey, Kung-Hsiang Huang, Jiaxin Zhang, Pranav Narayanan Venkit, “In direction of Reliable Enterprise Deep Analysis”, Salesforce AI Analysis, Oct 2025.
“`
Or use the BibTeX quotation:
``
@article{wu2025sfrdeepresearch,
creator = {Chien-Sheng Wu, Prafulla Kumar Choubey, Kung-Hsiang Huang, Jiaxin Zhang, Pranav Narayanan Venkit},
title = {In direction of Reliable Enterprise Deep Analysis},
journal = {Salesforce AI Analysis: Weblog},
yr = {2025},
}
```

