Prompt Engineering for Agentforce: Building Reliable Agent Instructions for Enterprise Workflows
The difference between an Agentforce agent that resolves 40% of cases and one that resolves 84% is not the technology. It’s the prompts.
Salesforce’s Atlas Reasoning Engine is the same engine powering every Agentforce deployment. The same 8–12 specialized models. The same ReAct evaluation loop. The same Data Cloud grounding. What changes between deployments is the quality of Topics, Instructions, and Actions — the three prompt-driven components that control how the engine behaves.
Endress+Hauser discovered this firsthand: they treated Agentforce like a new employee, carefully writing and testing instructions, checking every answer, and fixing weak responses iteratively. They even discovered that many of their knowledge articles were outdated or too hard to understand — garbage in, garbage out applies to AI just as much as databases.
This article is a practical engineering guide to writing Agentforce prompts that work reliably at enterprise scale. Not prompt “tips.” Architecture-grade prompt engineering.

Why Prompt Quality Determines Agent Quality
In Agentforce, prompts aren’t casual instructions you type and hope for the best. They’re the control plane for an autonomous system that executes real actions in your CRM, ERP, and backend systems.
Topic classification descriptions determine which domain the agent routes a request to. A vague description causes misclassification — a billing question routed to the warranty topic wastes an entire conversation’s worth of Flex Credits.
Instructions determine how the agent handles the request within that topic. Generic instructions produce generic responses. Specific instructions produce reliable, accurate resolutions.
Action descriptions determine when the agent invokes a Flow, Apex class, or API call. Poorly described actions either never fire (the agent doesn’t know when to use them) or fire incorrectly (the agent uses the wrong tool for the job).
Every prompt component has a direct, measurable impact on resolution rate, Flex Credit consumption, and customer satisfaction. Prompt engineering for Agentforce isn’t a creative exercise. It’s a precision engineering discipline.
Topic Design: The Classification Layer
Rule 1: One domain per topic, zero overlap
The Atlas Reasoning Engine compares every incoming message against all topic names and classification descriptions to determine which topic handles it. If two topics share keywords, the engine cycles between them — burning credits and producing inconsistent results.
Bad: Topic “Order Issues” and Topic “Returns & Exchanges” — the word “order” appears in both. A customer saying “I want to return my order” could route to either.
Good: Topic “Order Status & Shipping” (handles tracking, delays, delivery confirmation) and Topic “Returns & Refunds” (handles return requests, refund processing, exchange requests). No overlapping keywords.
Rule 2: Write classification descriptions as negative constraints, not just positive
Don’t just describe what a topic handles. Describe what it does NOT handle. The engine uses both for classification.
Example: “This topic handles warranty claim processing including eligibility verification, claim creation, and parts ordering. This topic does NOT handle general product inquiries, pricing questions, or service scheduling.”
Rule 3: Use unique trigger keywords per topic
Assign 5–10 specific trigger keywords to each topic that don’t appear in any other topic. OpenTable’s implementation shows this done well: their “booking changes” topic uses keywords like “reschedule,” “modify reservation,” and “change time” that don’t overlap with their “diner issues” topic.
Instructions: The Reasoning Layer
Rule 4: Write instructions as if training a brilliant new employee
The Atlas Reasoning Engine is powerful but knows nothing about your business. Every instruction should be as specific as what you’d tell a new hire on their first day.
Bad: “Check the order status and help the customer.”
Good: “Check the Shipment_Status__c field on the Order object. Compare the Expected_Delivery_Date__c against today’s date. If the shipment is more than 3 business days overdue, apologize for the delay, create a Case with Priority = High, and offer to connect the customer with a shipping specialist. If the shipment is on track, provide the tracking number and expected delivery date.”
The specific version tells the agent exactly which fields to check, what thresholds matter, what actions to take in each scenario, and what to say. The generic version leaves all of that to the model’s interpretation — which will vary across conversations.
Rule 5: Include guardrails in every instruction set
Explicitly state what the agent should NOT do. This prevents hallucinated actions.
Example: “Do NOT provide medical advice. Do NOT guarantee delivery dates. Do NOT offer discounts above 10% without supervisor approval. If the customer asks about topics outside this scope, politely explain that you’ll connect them with a specialist.”
Rule 6: Define escalation criteria precisely
OpenTable’s deflection score system is the gold standard. Every chat starts with a baseline score. Asking for help = 5 points. Requesting a rep = 10 points. All caps + extreme frustration = 20 points. The agent uses this live score to decide whether to continue, create a case, or escalate to a human. Configurable thresholds let the team adjust based on staffing levels.
Actions: The Execution Layer
Rule 7: Every action needs input/output instructions
When you create an Agent Action (Flow, Apex, or API), you must configure instructions for every input variable and every output variable. Mark required inputs with “Require Input” and “Collect data from user.” Mark outputs with “Show in conversation.”
Without these instructions, the agent either fails to collect necessary information (and the action errors) or collects information the customer didn’t intend to provide (and trust erodes).
Rule 8: Match action granularity to topic scope
if a topic handles warranty claims, the actions should map to warranty claim steps: verify eligibility, create claim, check parts, schedule service. Don’t attach a generic “update record” action to a specific topic — the agent won’t know when or how to use it correctly.
Rule 9: Set max turn limits to prevent token burn
The “Token Burn” problem occurs when an agent keeps trying different approaches to the same question, consuming Flex Credits with every turn without resolving anything. Set max turn limits in Agent Builder (10 turns is a good default). If unresolved in 10 turns, escalate to a human. Add this escalation instruction explicitly.
The Iteration Cycle: How to Improve Prompts Systematically
Prompt engineering is not a one-time activity. It’s a continuous improvement cycle:
Step 1 — Write: Draft Topics, Instructions, and Actions following the 9 rules above.
Step 2 — Test: Use the Agentforce Testing Center to simulate real-world conversations. Test edge cases: angry customers, ambiguous requests, out-of-scope questions, incomplete information.
Step 3 — Measure: Track resolution rate, Flex Credit consumption per topic, escalation triggers, and sentiment scores per conversation.
Step 4 — Refine: Identify low-performing topics. Tighten instructions. Add negative constraints. Improve action descriptions. Endress+Hauser used a shared checklist and nothing went live until fully reviewed.
Step 5 — Deploy: Stage changes in sandbox, validate via CI/CD, promote to production. Version-control all prompt changes.
At Xillentech, prompt engineering is not a soft skill. It’s an engineering discipline with the same rigor as Apex code review. Every Topic, Instruction, and Action goes through the same version control, peer review, and testing pipeline as our production code. Because in Agentforce, prompts are production code.
Frequently Asked Questions
Why does prompt quality matter so much in Agentforce?
In Agentforce, prompts control the autonomous behavior of an agent that executes real actions in your CRM and backend systems. Topic classification descriptions determine request routing, instructions determine reasoning and response quality, and action descriptions determine when tools fire. Vague prompts cause misclassification (wasted Flex Credits), generic responses (low resolution rates), and incorrect action execution (wrong records updated). The difference between 40% and 84% resolution is entirely attributable to prompt specificity.
What are Topics, Instructions, and Actions in Agentforce?
Topics are categories of tasks the agent handles — like job descriptions with classification keywords. Instructions are natural language directives telling the agent how to behave within each topic — what to check, what thresholds to apply, what to say, what not to do. Actions are the tools the agent uses to execute — Flows, Apex classes, prompt templates, or external API calls. Together, these three components control every aspect of agent behavior through the Atlas Reasoning Engine.
How do I prevent topic overlap in Agentforce?
Assign unique trigger keywords to each topic that don’t appear in any other topic’s classification description. Write negative constraints (“this topic does NOT handle…”) to help the engine distinguish between similar domains. Test with ambiguous customer messages to verify correct routing. If two topics share keywords, the Atlas Reasoning Engine cycles between them, burning Flex Credits without resolving. One domain per topic, zero overlap — this is the foundational rule.
How specific should Agentforce instructions be?
As specific as employee training documentation. Reference exact Salesforce field names (Shipment_Status__c, Expected_Delivery_Date__c), define numerical thresholds (more than 3 days overdue), specify actions for each scenario (create Case with Priority = High), include what NOT to do (do not guarantee delivery dates), and define escalation criteria with scoring systems. Endress+Hauser treated their agent like a new hire — checking every answer, fixing weak responses, and maintaining a shared improvement checklist.
What is the Token Burn problem?
Token Burn occurs when an Agentforce agent keeps trying different approaches to the same question, consuming Flex Credits on every turn without resolving the issue. This typically happens when instructions are too vague, when topic overlap causes classification confusion, or when actions don’t match the topic scope. Fix with three controls: set max turn limits (10 turns default), eliminate topic overlap, and write specific instructions that guide the agent to resolution quickly.
How does Xillentech approach prompt engineering for Agentforce?
Xillentech treats prompt engineering as a precision engineering discipline, not a creative exercise. Every Topic, Instruction, and Action goes through version control, peer review, and testing — the same pipeline as Apex code. We follow the 9-rule framework: one domain per topic, negative constraints in classification, unique trigger keywords, employee-grade instruction specificity, explicit guardrails, precise escalation criteria, input/output instructions for every action, action-topic scope matching, and max turn limits. Continuous improvement uses the Testing Center, resolution rate tracking, and weekly refinement cycles.
Ready to Engineer Prompts That Actually Resolve Cases?
The Atlas Reasoning Engine is the same for every Agentforce deployment. What separates 40% resolution from 84% is the prompt engineering. At Xillentech, we write Agentforce prompts with the same rigor as production code — version-controlled, peer-reviewed, tested across edge cases, and refined weekly.
