Business CircleBusiness Circle
  • Home
  • AI News
  • Startups
  • Markets
  • Finances
  • Technology
  • More
    • Human Resource
    • Marketing & Sales
    • SMEs
    • Lifestyle
    • Trading & Stock Market
What's Hot

U.S. crude oil jumps after Iran says it attacked a tanker

March 6, 2026

The State of Social Media Engagement in 2026: 52M+ Posts Analyzed

March 6, 2026

Anthropic to challenge DOD’s supply-chain label in court

March 6, 2026
Facebook Twitter Instagram
Friday, March 6
  • Advertise with us
  • Submit Articles
  • About us
  • Contact us
Business CircleBusiness Circle
  • Home
  • AI News
  • Startups
  • Markets
  • Finances
  • Technology
  • More
    • Human Resource
    • Marketing & Sales
    • SMEs
    • Lifestyle
    • Trading & Stock Market
Subscribe
Business CircleBusiness Circle
Home » Meet SCUBA: The Next Frontier in Enterprise-Agent Evaluation
Marketing & Sales

Meet SCUBA: The Next Frontier in Enterprise-Agent Evaluation

Business Circle TeamBy Business Circle TeamOctober 30, 2025No Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Meet SCUBA: The Next Frontier in Enterprise-Agent Evaluation
Share
Facebook Twitter LinkedIn Pinterest Email


On the planet of AI brokers that click on, scroll, execute and automate — we’re shifting quick from “simply perceive textual content” to “really use software program for you.” The brand new benchmark SCUBA tackles precisely that: how effectively can brokers do actual enterprise workflows contained in the Salesforce platform?

What makes SCUBA stand out:

  • It’s constructed across the precise workflows contained in the Salesforce platform.
  • It covers 300 process cases derived from actual person interviews (platform admins, gross sales reps, and repair brokers).
  • The duties check not simply “does the mannequin reply the query” however “can the mannequin use the UI, manipulate knowledge, set off workflows, troubleshoot points.”
  • It addresses a spot: present benchmarks typically deal with net navigation and software program manipulation — however enterprise-software “pc use” is tough to measure. SCUBA goals to fill that.

Key Takeaway: If you need brokers that don’t simply chat, however act in enterprise software program, it is a huge step.

The Enterprise Influence

Think about an AI assistant that may navigate your CRM, replace data, launch workflows, interpret dashboard failures, and assist your service staff get unstuck. That’s the imaginative and prescient this paper leans into.

Right here’s why it’s compelling:

  • Enterprise alignment: Many benchmarks are tutorial or consumer-web oriented. SCUBA places the highlight on business-critical environments (admin, gross sales, and repair).
  • Lifelike duties: By deriving duties from person interviews and real personas, it bridges the hole between “toy benchmark” and “stay person state of affairs.”
  • Measurable agent efficiency in context: It allows analysis of how effectively an agent operates inside software program techniques, not simply by way of textual content.
  • Roadmap for future AI assistants: As extra organizations undertake AI to automate software program use (not simply evaluation), benchmarks like this set expectations, spotlight challenges, and direct progress.

For companies like Salesforce (and their prospects) the implications are clear: higher agent tooling, fewer guide clicks, quicker concern decision, extra environment friendly gross sales/service groups. For the AI neighborhood: a brand new frontier of “process execution in UI” reasonably than “simply textual content reasoning”.

Key Insights:

1. Actual-world area shift is tough

The efficiency drop when shifting from the extra generic OSWorld benchmark (which covers desktop functions) to SCUBA (CRM, enterprise workflows) is important. The experiment reveals a chart of drop in success charges when shifting the benchmark.

Efficiency drop when shifting from OSWorld (50 steps) to SCUBA.

2. Demonstrations Assist 

Data articles and tutorials on how one can use salesforce platforms are simply accessible. One pure query is whether or not AI brokers can leverage this info successfully like people do. The experiment outcomes reveal that

  • Human demonstrations (exhibiting the agent how one can do the same process) improved efficiency throughout most brokers: larger success charges, decrease time, decrease token utilization (please see the technical report for extra particulars). However, some brokers didn’t profit as a lot.
  • Additionally, some ended up utilizing extra steps in demonstration-augmented mode (for instance as a consequence of discovering “shortcuts” that the human demo didn’t present). So the design of demonstrations nonetheless issues.
Demonstration helps to enhance the duty success charges.

3. Price, latency, and sensible deployment matter

  • Success charge will not be the one metric; latency (time to finish duties) and price (API/token prices, variety of steps) are additionally reported. As an example, browser-use brokers had excessive success charges however larger latency (as a consequence of API service response time & multi-agent framework design).
  • Demonstration augmentation not solely improves success however can scale back time and prices (the paper reviews ~13% decrease time, ~16% decrease price within the demonstration-augmented setting).
  • For enterprise adoption, this issues: an agent that succeeds however is just too sluggish or too expensive could also be much less helpful in follow.

Implications for the Way forward for CRM Automation:



Ran Xu
Director, AI Analysis

Ran Xu acquired his Ph.D. in pc science from College at Buffalo from 2015. Presently, he leads a gaggle of remarkable pc imaginative and prescient and multimodal AI researchers at Salesforce to push the boundary of analysis and productive AI for CRM.


Extra by Ran

Zeyuan Chen
Senior Supervisor, Analysis

Zeyuan Chen is a Senior Supervisor of Analysis at Salesforce AI Analysis, the place he has been contributing since 2019. His work focuses on advancing pc imaginative and prescient, machine studying, multimodal AI, AI brokers, and workflow automation by means of code era and knowledge visualization. He holds a Bachelor’s…
Learn Extra
diploma from Huazhong College of Science and Know-how, a Grasp’s from Cornell College, and a Ph.D. from North Carolina State College, experiences which have formed his journey in AI analysis.


Extra by Zeyuan



Source link

EnterpriseAgent Evaluation Frontier Meet SCUBA
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Business Circle Team
Business Circle Team
  • Website

Related Posts

The State of Social Media Engagement in 2026: 52M+ Posts Analyzed

March 6, 2026

Beehiiv Names Calendly Leader Darren Chait As Its First CMO

March 6, 2026

What AI means for the future of SEO [Expert Tips & Interview]

March 6, 2026

5 Signs You Need a Sales Funnel, Not Just More Content

March 5, 2026
LATEST UPDATES

U.S. crude oil jumps after Iran says it attacked a tanker

March 6, 2026

The State of Social Media Engagement in 2026: 52M+ Posts Analyzed

March 6, 2026

Anthropic to challenge DOD’s supply-chain label in court

March 6, 2026

Better’s new ChatGPT app targets lenders Rocket and UWM

March 6, 2026

Your Boss Isn’t the Problem. Your Expectations Are

March 6, 2026

US Treasury signals global tariff hike to 15% as Trump trade policy returns

March 6, 2026

Subscribe to Updates

Get the latest sports news from SportsSite about soccer, football and tennis.

Business, Finance and Market Growth News Site

Important Pages
  • Advertise with us
  • Submit Articles
  • About us
  • Contact us
Recent Posts
  • U.S. crude oil jumps after Iran says it attacked a tanker
  • The State of Social Media Engagement in 2026: 52M+ Posts Analyzed
  • Anthropic to challenge DOD’s supply-chain label in court
© 2026 BusinessCircle.co
  • Privacy Policy
  • Terms and Conditions
  • Cookie Privacy Policy
  • Disclaimer
  • DMCA

Type above and press Enter to search. Press Esc to cancel.