Stop Building Static Software. We Engineer Autonomous Agents And Large Action Models (LAMs)

×

From Chatbots to Large Action Models: Architect’s Guide to Agentic AI That Does Work

Only 8% of customers would use a chatbot again for their next service interaction. Yet Gartner predicts agentic AI will autonomously resolve 80% of common customer service issues by 2029.

Those two numbers aren’t contradictory. They’re describing completely different technologies.

The chatbot your customers hate is a pattern-matching FAQ engine with no memory, no tools, and no ability to actually do anything. The AI agent Gartner is predicting runs on a fundamentally different architecture: Large Action Models (LAMs) that predict and execute actions, not just generate text.

Salesforce’s xLAM-1B — a 1-billion parameter Large Action Model — outperforms GPT-3.5 Turbo on function-calling benchmarks despite being roughly 1/175th the size. It doesn’t write better emails. It calls APIs, executes workflows, queries databases, and takes real business actions.

This guide walks you through the architectural spectrum from chatbot to autonomous agent, explains what makes LAMs different from LLMs, breaks down Agentforce’s Atlas Reasoning Engine, and gives you the bounded autonomy design patterns that determine whether your AI agent becomes a productivity multiplier or a liability.

LLM vs LAM: The Architecture Shift From Words to Actions

A Large Language Model predicts the next word. A Large Action Model predicts the next action. That single sentence captures the most important architectural shift in enterprise AI since transformers replaced RNNs.

LLMs are trained on text corpora and optimized for content generation. LAMs are trained on API call datasets, application interaction flows, and tool-use patterns — optimized for deciding which function to call, with what parameters, in what sequence. The architecture combines neural network pattern recognition with symbolic logical reasoning (neuro-symbolic AI) in a perception → planning → action → learning loop.

DimensionLLMLAM
OutputText and content generationActions, function calls, API requests
Core techniqueNext-token predictionAction prediction + execution
ArchitectureTransformer on text corporaHybrid neural + symbolic (neuro-symbolic)
Training dataText datasets (web, books)API call datasets, app flows, RL
InteractionReactive, prompt-drivenProactive, multi-step sequences
Enterprise valueDraft generation, summarizationWorkflow execution, system automation

The term gained mainstream traction at CES 2024 with Rabbit’s R1 device, but the underlying technology has been in development since tool-use capabilities emerged in GPT-4 and Claude. What’s changed is that specialized, smaller models now outperform generalist giants on action-execution tasks — which matters enormously for enterprise deployment where latency, cost, and reliability are non-negotiable.

Salesforce’s xLAM: The Models Powering Agentforce

Salesforce AI Research released two generations of xLAM (eXtended Large Action Model), both open-source, that demonstrate how purpose-built LAMs outperform general-purpose LLMs on enterprise action tasks.

xLAM v1 (March 2024)

The breakthrough result: xLAM-1B achieved #3 on the Berkeley Function-Calling Leaderboard, outperforming GPT-3.5 Turbo and Claude-3 Haiku despite being roughly 1/175th the parameter count. The larger xLAM-8x22B (Mixture of Experts architecture) secured the #1 position on BFCL v2 with a wide margin.

The secret is the training data. Salesforce’s APIGen pipeline — an automated system for generating verified function-calling datasets — collects 3,673 executable APIs across 21 categories and produces 60,000 verified training entries. Each entry passes three-stage verification: format checking, actual function execution, and semantic verification. Human evaluation of 600 samples confirmed over 95% correctness.

xLAM-2 Series (April 2025)

The second generation introduced multi-turn conversation support — enabling back-and-forth dialogue with clarifying questions and tool calls across multiple turns. The xLAM-2-70B achieved a 56.2% success rate on the τ-bench benchmark, outperforming GPT-4o (52.9%) and approaching Claude 3.5 Sonnet (60.1%). The entire xLAM-2 series achieved #1 on the Berkeley Function-Calling Leaderboard as of April 2025.

Why this matters for enterprises: A 1B-parameter model that outperforms GPT-3.5 on function-calling can run on-device, at the edge, or in a private cloud — with lower latency, lower cost, and no data leaving your infrastructure. The 8B and 70B variants scale up for complex multi-step workflows while maintaining the action-execution specialization that general-purpose models lack.

Gemini Generated Image 8hl8v98hl8v98hl8 1 1

The Chatbot-to-Agent Spectrum: Four Architectural Tiers

Level 1 — FAQ Bot (The Chatbot Your Customers Hate)

Responds turn-by-turn in narrow conversation. Scripted, intent-based, or basic LLM-powered. No persistent memory between interactions. No access to backend systems. User does all the thinking; AI does the typing. Architecture: rule-based decision tree or single LLM call, no tool access, no state management.

Level 2 — Copilot (AI Assistant)

Assists with tasks in context. Remembers conversation history, makes suggestions, generates drafts, references CRM data. But the human must approve every action. Architecture: LLM + RAG + basic tool integration, human-in-the-loop required for all actions. Examples: Salesforce Einstein Copilot (pre-Agentforce), GitHub Copilot.

Level 3 — Autonomous Agent (This Is Agentforce)

Completes multi-step tasks autonomously. Uses tools, makes decisions, queries databases, calls APIs, and takes action — all without human intervention for routine tasks. Asks for help when it hits confidence or capability boundaries. Human sets goals and reviews outcomes. Architecture: LAM + planning + tool use + reflection + memory + guardrails.

Level 4 — Co-Worker (Emerging, 2027+)

Operates as a trusted team member. Owns outcomes, not just tasks. Proactively identifies opportunities and risks. Initiates workflows without being asked. Human provides strategy and oversight only. Architecture: multi-agent orchestration + long-term memory + goal inference + organizational context. This is where Agentforce is headed — but we’re not there yet, and any vendor claiming otherwise is overselling.

The key framing: Chatbots extend your channels by answering and routing. Copilots extend your people by reducing time-to-task. Agents extend your systems by taking action. The architectural requirements at each level are fundamentally different — you can’t bolt autonomy onto a chatbot framework.

Inside Agentforce: How the Atlas Reasoning Engine Works

Agentforce’s Atlas Reasoning Engine is the central executive system that transforms Salesforce’s platform from a copilot into an autonomous agent. Understanding its architecture is essential for anyone building on it.

The critical detail: For any given query, Atlas uses between 8 and 12 different specialized language models — including multiple LLMs, Large Action Models for function calling, an Atlas RAG module for retrieval, and APIGen for function-calling optimization. This is not a single model answering questions. It’s an orchestration engine routing subtasks to specialized models.

Atlas uses “System 2” inference-time reasoning — deliberative, slow-thinking — rather than fast “System 1” approaches. Instead of Chain-of-Thought (which is linear and fragile), Atlas employs a ReAct (Reasoning and Acting) evaluation loop that evaluates the problem-solving search space at each step, allowing branching and self-correction.

The 8-step loop: (1) Understand user intent and scope → (2) Decide what data and actions are needed → (3) Retrieve structured + unstructured data via Data Cloud → (4) Plan execution using LAMs + APIGen → (5) Evaluate response quality (self-reflection) → (6) Refine by pulling additional data if needed → (7) Execute actions via Salesforce Flows → (8) Respond via customer’s channel. The evaluate-refine loop is what separates agents from chatbots: the system checks its own work before acting.

Five attributes define every Agentforce agent: Role (job description and persona), Data (what information it can access), Actions (what it can do — Flows, Apex, APIs), Guardrails (what it cannot do), and Channel (where it operates). Agent creation is declarative via YAML configuration, not imperative code.

Einstein Trust Layer wraps the entire architecture with enterprise-grade security: secure data retrieval respecting user permissions and field-level security, automatic PII masking, prompt defense guardrails blocking injection attacks, zero data retention by LLM partners, toxicity detection, and full audit trails in Data Cloud.

Andrew Ng’s Four Agentic Design Patterns (And How Agentforce Implements Each)

1. Reflection — The AI critiques its own output and iteratively improves. Agentforce implementation: Atlas’s evaluate-refine loop checks response quality at each step and re-retrieves data if the initial answer is insufficient. This is why Agentforce achieved a 2x increase in response relevance versus competitors’ DIY solutions.

2. Tool Use — LLMs are given functions they can call: APIs, database queries, email, calendar. Agentforce implementation: Actions — pre-built and custom Flows, Apex invocable methods, REST API callouts, MuleSoft integrations. xLAM models are specifically optimized for deciding which tool to call with what parameters.

3. Planning — Decomposing complex tasks into sub-tasks with dependencies. Agentforce implementation: Atlas’s System 2 reasoning breaks multi-step requests into an execution roadmap. Topics define scope boundaries; Instructions provide step-by-step guidance within each scope.

4. Multi-Agent Collaboration — Multiple specialized agents working together. Agentforce implementation: Agentforce 2.0 multi-agent orchestration with sequential pipelines (Research → Qualify → SDR), parallel fan-out, and event-driven headless agents triggered by Platform Events.

Bounded Autonomy: The Design Pattern That Prevents Disasters

The 40% of agentic AI projects Gartner predicts will be cancelled by 2027 will fail for the same reason: giving agents too much autonomy too fast, without architectural guardrails. Bounded autonomy is the design pattern that prevents this.

Tier 1 — Fully Autonomous (No Human Needed)

Agent handles end-to-end for routine, low-risk tasks: password resets, order status, FAQ responses, appointment scheduling. Every action is logged for audit but no human approval is required. Configure via standard Agentforce Topics with no ‘required’ flag on Actions.

Tier 2 — Human Approval Required

Agent proposes an action but requires human confirmation before execution. For refunds above a threshold, billing disputes, sensitive data changes, contract modifications. Configured via a ‘required’ flag on specific agent actions in Salesforce — the agent pauses, presents its recommendation, and waits for approval.

Tier 3 — Human Only (Escalation)

Agent transfers conversation to a human with full context preserved. Triggered by explicit customer request, low AI confidence, negative sentiment, legal/sensitive topics, or threshold amounts. Uses the pre-built ‘Escalation’ Topic with Omni-Channel routing to skill-based queues.

Escalation mechanisms: Sentiment-based (negative sentiment triggers escalation), confidence-based (low AI confidence triggers deferral), topic-based (legal requests auto-escalate), keyword detection (“talk to a person”), threshold-based (refund amounts above $X), and a Propensity to Escalate ML model that flags cases heading for trouble.

The golden rule: Start with Tier 2 for everything except the most routine tasks. Earn your way to Tier 1 autonomy through measured performance data. This is exactly how Salesforce’s own Customer Zero deployment evolved — from 83% resolution to 85% resolution with a handoff rate improving from 26% to under 5%.

DealerVogue: Bounded Autonomy in Production

DealerVogue, our Automotive Cloud accelerator, demonstrates bounded autonomy across all three tiers in a single agent deployment:

Tier 1 (Autonomous): Vehicle lookup, service history retrieval, appointment scheduling, parts availability check. The agent queries Automotive Cloud vehicle records and OEM inventory via Zero-Copy federation. No human needed — these are read-only operations with no financial risk.

Tier 2 (Approval Required): Warranty claim initiation, goodwill adjustments, loaner vehicle authorization. The agent assembles the claim, checks warranty coverage, calculates the recommended action — then presents it to a service advisor for approval before executing.

Tier 3 (Escalation): Lemon law inquiries, safety recall disputes, customer complaints about dealer behavior. Auto-escalated to a senior advisor with full conversation context, vehicle history, and agent’s preliminary analysis.

The result: Service advisors handle 60% fewer routine inquiries while maintaining full control over financial and legal decisions. The agent does the research; the human makes the judgment calls.

How Xillentech Builds Agentic AI That Actually Works

Every Xillentech Agentforce deployment follows the same architecture principles:

LAM-first design: We architect for action execution, not text generation. Every agent Topic maps to specific Actions (Flows, Apex, APIs) with zero overlap between Topics. If the agent can’t do something, it shouldn’t discuss it.

Data Cloud grounding: Every agent response is grounded in live enterprise data via Zero-Copy federation and DMOs. No hallucinated answers — the agent either retrieves the data or escalates.

Bounded autonomy by default: Every new agent starts at Tier 2. We measure resolution rate, handoff rate, CSAT, and error rate for 30 days before promoting any action to Tier 1 autonomy.

Trust Layer non-negotiable: PII masking, zero data retention, full audit trails, and toxicity detection are configured before the first agent goes live. These aren’t optional add-ons.

What is a Large Action Model?

A Large Action Model (LAM) is an AI system designed to predict and execute actions rather than generate text. While LLMs predict the next word in a sequence, LAMs predict the next action — making API calls, navigating interfaces, and executing multi-step workflows. They combine neural network pattern recognition with symbolic reasoning in a perception-planning-action-learning loop. Salesforce’s xLAM family is the leading open-source LAM, with the 1B-parameter version outperforming GPT-3.5 on function-calling despite being 1/175th the size.

What is the difference between a chatbot and an AI agent?

A chatbot responds to queries with text — it answers questions but cannot take action. An AI agent uses Large Action Models to actually execute tasks: querying databases, calling APIs, processing transactions, and completing multi-step workflows autonomously. Chatbots extend your channels (answering and routing). Copilots extend your people (reducing task time). Agents extend your systems (taking action). The architecture is fundamentally different: agents require tool use, planning, memory, reflection, and guardrails that chatbots don’t have.

How does Agentforce’s Atlas Reasoning Engine work?

Atlas is a multi-model orchestration engine that uses 8-12 specialized language models per query, including LLMs, Large Action Models, and RAG modules. It employs a ReAct (Reasoning and Acting) loop rather than linear Chain-of-Thought: understand intent, decide on data needs, retrieve data, plan execution, evaluate quality, refine if needed, execute actions, and respond. The evaluate-refine loop is what separates it from chatbots — the system checks its own work before acting. Atlas achieved a 2x increase in response relevance and 33% increase in accuracy versus competitors.

What is bounded autonomy in agentic AI?

Bounded autonomy is a design pattern where AI agents operate at different autonomy levels depending on task risk. Tier 1 (Fully Autonomous): handles routine, low-risk tasks end-to-end. Tier 2 (Human Approval): agent proposes an action but waits for human confirmation. Tier 3 (Escalation): agent transfers to a human with full context. In Agentforce, this is configured via flags on agent Actions and escalation Topics. The golden rule: start at Tier 2, earn Tier 1 through measured performance data.

What are Andrew Ng’s four agentic design patterns?

Andrew Ng identified four patterns driving agentic AI progress: (1) Reflection — the AI critiques and improves its own output. (2) Tool Use — the AI calls external functions (APIs, databases, email). (3) Planning — decomposing complex tasks into sub-tasks with dependencies. (4) Multi-Agent Collaboration — multiple specialized agents working together. Agentforce implements all four: Atlas’s evaluate-refine loop (reflection), Actions via Flows and APIs (tool use), System 2 reasoning for execution planning, and multi-agent orchestration with sequential and parallel patterns.

What is Salesforce’s xLAM and why does it matter?

xLAM (eXtended Large Action Model) is Salesforce AI Research’s open-source family of models optimized for function calling and action execution. The xLAM-1B outperforms GPT-3.5 on the Berkeley Function-Calling Leaderboard despite being 175x smaller. The xLAM-2 series (2025) added multi-turn conversation support and achieved #1 on BFCL. It matters because smaller, specialized models can run on-device with lower latency, lower cost, and without data leaving your infrastructure — critical for enterprise deployment.

How do I know if I need a chatbot, copilot, or AI agent?

If your use case is answering FAQs with no system access, a chatbot suffices. If your team needs AI assistance with drafting, research, and recommendations (but humans execute), a copilot fits. If you need AI to autonomously complete multi-step tasks across systems — processing returns, scheduling service, routing leads, resolving cases — you need an AI agent. Most enterprises benefit from agents for customer-facing operations and copilots for internal productivity, deployed in parallel.

Varun Patel

Recommanded for you