Business CircleBusiness Circle
  • Home
  • AI News
  • Startups
  • Markets
  • Finances
  • Technology
  • More
    • Human Resource
    • Marketing & Sales
    • SMEs
    • Lifestyle
    • Trading & Stock Market
What's Hot

TrustCo Bank Q1 2026 Earnings Deep Dive: Key Takeaways

April 23, 2026

Walmart+ Student: Helping Students Save Time and Money

April 23, 2026

The Bafta games awards showed me again that honouring art over commerce is a win for all | Games

April 23, 2026
Facebook Twitter Instagram
Thursday, April 23
  • Advertise with us
  • Submit Articles
  • About us
  • Contact us
Business CircleBusiness Circle
  • Home
  • AI News
  • Startups
  • Markets
  • Finances
  • Technology
  • More
    • Human Resource
    • Marketing & Sales
    • SMEs
    • Lifestyle
    • Trading & Stock Market
Subscribe
Business CircleBusiness Circle
Home » Meet SCUBA: The Next Frontier in Enterprise-Agent Evaluation
Marketing & Sales

Meet SCUBA: The Next Frontier in Enterprise-Agent Evaluation

Business Circle TeamBy Business Circle TeamOctober 30, 2025No Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Meet SCUBA: The Next Frontier in Enterprise-Agent Evaluation
Share
Facebook Twitter LinkedIn Pinterest Email


On the planet of AI brokers that click on, scroll, execute and automate — we’re shifting quick from “simply perceive textual content” to “really use software program for you.” The brand new benchmark SCUBA tackles precisely that: how effectively can brokers do actual enterprise workflows contained in the Salesforce platform?

What makes SCUBA stand out:

  • It’s constructed across the precise workflows contained in the Salesforce platform.
  • It covers 300 process cases derived from actual person interviews (platform admins, gross sales reps, and repair brokers).
  • The duties check not simply “does the mannequin reply the query” however “can the mannequin use the UI, manipulate knowledge, set off workflows, troubleshoot points.”
  • It addresses a spot: present benchmarks typically deal with net navigation and software program manipulation — however enterprise-software “pc use” is tough to measure. SCUBA goals to fill that.

Key Takeaway: If you need brokers that don’t simply chat, however act in enterprise software program, it is a huge step.

The Enterprise Influence

Think about an AI assistant that may navigate your CRM, replace data, launch workflows, interpret dashboard failures, and assist your service staff get unstuck. That’s the imaginative and prescient this paper leans into.

Right here’s why it’s compelling:

  • Enterprise alignment: Many benchmarks are tutorial or consumer-web oriented. SCUBA places the highlight on business-critical environments (admin, gross sales, and repair).
  • Lifelike duties: By deriving duties from person interviews and real personas, it bridges the hole between “toy benchmark” and “stay person state of affairs.”
  • Measurable agent efficiency in context: It allows analysis of how effectively an agent operates inside software program techniques, not simply by way of textual content.
  • Roadmap for future AI assistants: As extra organizations undertake AI to automate software program use (not simply evaluation), benchmarks like this set expectations, spotlight challenges, and direct progress.

For companies like Salesforce (and their prospects) the implications are clear: higher agent tooling, fewer guide clicks, quicker concern decision, extra environment friendly gross sales/service groups. For the AI neighborhood: a brand new frontier of “process execution in UI” reasonably than “simply textual content reasoning”.

Key Insights:

1. Actual-world area shift is tough

The efficiency drop when shifting from the extra generic OSWorld benchmark (which covers desktop functions) to SCUBA (CRM, enterprise workflows) is important. The experiment reveals a chart of drop in success charges when shifting the benchmark.

Efficiency drop when shifting from OSWorld (50 steps) to SCUBA.

2. Demonstrations Assist 

Data articles and tutorials on how one can use salesforce platforms are simply accessible. One pure query is whether or not AI brokers can leverage this info successfully like people do. The experiment outcomes reveal that

  • Human demonstrations (exhibiting the agent how one can do the same process) improved efficiency throughout most brokers: larger success charges, decrease time, decrease token utilization (please see the technical report for extra particulars). However, some brokers didn’t profit as a lot.
  • Additionally, some ended up utilizing extra steps in demonstration-augmented mode (for instance as a consequence of discovering “shortcuts” that the human demo didn’t present). So the design of demonstrations nonetheless issues.
Demonstration helps to enhance the duty success charges.

3. Price, latency, and sensible deployment matter

  • Success charge will not be the one metric; latency (time to finish duties) and price (API/token prices, variety of steps) are additionally reported. As an example, browser-use brokers had excessive success charges however larger latency (as a consequence of API service response time & multi-agent framework design).
  • Demonstration augmentation not solely improves success however can scale back time and prices (the paper reviews ~13% decrease time, ~16% decrease price within the demonstration-augmented setting).
  • For enterprise adoption, this issues: an agent that succeeds however is just too sluggish or too expensive could also be much less helpful in follow.

Implications for the Way forward for CRM Automation:



Ran Xu
Director, AI Analysis

Ran Xu acquired his Ph.D. in pc science from College at Buffalo from 2015. Presently, he leads a gaggle of remarkable pc imaginative and prescient and multimodal AI researchers at Salesforce to push the boundary of analysis and productive AI for CRM.


Extra by Ran

Zeyuan Chen
Senior Supervisor, Analysis

Zeyuan Chen is a Senior Supervisor of Analysis at Salesforce AI Analysis, the place he has been contributing since 2019. His work focuses on advancing pc imaginative and prescient, machine studying, multimodal AI, AI brokers, and workflow automation by means of code era and knowledge visualization. He holds a Bachelor’s…
Learn Extra
diploma from Huazhong College of Science and Know-how, a Grasp’s from Cornell College, and a Ph.D. from North Carolina State College, experiences which have formed his journey in AI analysis.


Extra by Zeyuan



Source link

EnterpriseAgent Evaluation Frontier Meet SCUBA
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Business Circle Team
Business Circle Team
  • Website

Related Posts

How Figma Scaled PLG to Enterprise Sales

April 23, 2026

8 sales automation benefits for sales teams beyond time savings

April 22, 2026

How to Implement Consultative Selling in 2025

April 22, 2026

15 Email Marketing Best Practices High-Performing Small Businesses Follow

April 22, 2026
LATEST UPDATES

TrustCo Bank Q1 2026 Earnings Deep Dive: Key Takeaways

April 23, 2026

Walmart+ Student: Helping Students Save Time and Money

April 23, 2026

The Bafta games awards showed me again that honouring art over commerce is a win for all | Games

April 23, 2026

What You 100% Absolutely Need to Know Before Even Thinking About Investing in the SpaceX IPO

April 23, 2026

How Small Businesses Can Build a Reliable Team Without Increasing Headcount?

April 23, 2026

How Figma Scaled PLG to Enterprise Sales

April 23, 2026

Subscribe to Updates

Get the latest sports news from SportsSite about soccer, football and tennis.

Business, Finance and Market Growth News Site

Important Pages
  • Advertise with us
  • Submit Articles
  • About us
  • Contact us
Recent Posts
  • TrustCo Bank Q1 2026 Earnings Deep Dive: Key Takeaways
  • Walmart+ Student: Helping Students Save Time and Money
  • The Bafta games awards showed me again that honouring art over commerce is a win for all | Games
© 2026 BusinessCircle.co
  • Privacy Policy
  • Terms and Conditions
  • Cookie Privacy Policy
  • Disclaimer
  • DMCA

Type above and press Enter to search. Press Esc to cancel.