Stop Building Static Software. We Engineer Autonomous Agents And Large Action Models (LAMs)

×

Zero-Copy Architecture in Salesforce Data Cloud: The Definitive Guide

In Q3 FY2026, Salesforce Data Cloud ingested 32 trillion records in a single quarter. Of those, 15 trillion flowed through Zero-Copy connectors — a 341% year-over-year surge.

That ratio tells you everything: nearly half of all enterprise data entering Data Cloud never actually enters Data Cloud. It stays exactly where it is — in Snowflake, Databricks, BigQuery, or Redshift — and gets queried in place.

This is Zero-Copy architecture. And it’s fundamentally changing how enterprises think about data integration.

The traditional approach — extract data from your warehouse, transform it, load it into Salesforce, maintain the pipeline, reconcile the copies — costs 2,000 Data Cloud credits per million records. Zero-Copy federation costs 70 credits per million. That’s a 28.5x reduction in per-row credit costs.

But the cost savings aren’t even the biggest win. The biggest win is that your data is always current, always governed at the source, and never duplicated. No stale copies. No sync conflicts. No ETL pipelines to maintain at 3 AM.

This guide covers how Zero-Copy works under the hood, the three federation patterns you need to understand, which platforms are supported, when to use it versus traditional integration, how it powers Agentforce agents, and a practical implementation checklist.

Gemini Generated Image 8hl8v98hl8v98hl8 1

What Zero-Copy Actually Means (The Architecture Under the Hood)

Zero-Copy is a bidirectional data federation technology built into Salesforce Data Cloud. Instead of moving data between systems, it uses advanced metadata management and query pushdown to access external tables without creating, persisting, or hosting copies.

The foundation enabling this is Apache Iceberg — the open table format that underpins Data Cloud’s internal lakehouse. Salesforce Engineering reports that Data Cloud currently manages 4 million Iceberg tables spanning 50 petabytes of data, powered by Spark, Hyper, and Trino processing engines.

Because Iceberg is an open standard, Data Cloud’s query engines can read external Iceberg tables natively. This is what makes storage-level file federation possible without proprietary lock-in. Your Snowflake data stays in Snowflake’s format. Your Databricks data stays in Delta Lake. Data Cloud reads both through the Iceberg abstraction.

The practical impact: a customer action at 8 PM — an abandoned cart, a warranty claim, a service request — is immediately available to Agentforce agents, Marketing Cloud journeys, and Service Cloud workflows. No waiting for the next-day batch sync. No reconciling stale copies. The data is live because it was never copied in the first place.

Three Zero-Copy Patterns: Query Federation, File Federation & Data Sharing

Pattern 1: Query Federation (Data-In via External Compute)

Data Cloud sends SQL queries via JDBC to the external system’s compute layer. Snowflake’s engine, Databricks SQL, BigQuery, or Redshift executes the query and returns only the result set. This is the simplest pattern to set up and supports all four major platforms.

Two sub-modes: Live Query (pure zero-copy, nothing persists, 70 credits per million rows) and Cached Acceleration (temporarily caches results in Data Cloud with configurable refresh from 15 minutes to 7 days, 2,000 credits per million rows). Use Live for real-time dashboards. Use Cached for frequently-read, semi-static data.

Pattern 2: File Federation (Data-In via External Storage)

The newer and more powerful approach. Instead of querying the external compute layer, Data Cloud’s Hyper engine reads Parquet data files directly from external storage — bypassing the external system’s compute entirely. It uses the Iceberg REST Catalog to understand data layout, then accesses files in S3 or Azure Blob.

File Federation eliminates dual compute billing, supports change data feeds natively (enabling Data Actions without caching), and delivers near-native latencies when data and compute are co-located. Salesforce now recommends File Federation over Query Federation wherever the external platform supports it. Currently GA for Databricks and generic Iceberg catalogs.

Pattern 3: Data Sharing (Data-Out to External Systems)

The reverse direction. External systems query Data Cloud’s enriched, unified data — profiles, segments, calculated insights — without outbound ETL. Snowflake uses Secure Data Sharing. Databricks uses Delta Sharing and Unity Catalog. BigQuery uses Analytics Hub. At 800 credits per million rows, it’s costlier than inbound federation but eliminates outbound pipeline maintenance entirely.

FeatureQuery FederationFile FederationData Sharing
DirectionData-InData-InData-Out
Compute usedExternal engineData Cloud HyperExternal engine
Credits/1M rows70 (Live) / 2,000 (Cached)70800
Change feedsNot supported (Live)Supported nativelyN/A
Best forSimple queries, all platformsLarge datasets, same regionSharing enriched data out
PlatformsSnowflake, Databricks, BigQuery, RedshiftDatabricks, Iceberg catalogsSnowflake, Databricks, BigQuery

The Zero-Copy Partner Network: Every Major Cloud Platform

Salesforce launched the Zero-Copy Partner Network on April 25, 2024, at Salesforce World Tour NYC. The network now includes Snowflake (first partner, announced Dreamforce 2022), Google BigQuery (GA early 2024), Databricks (query + file federation), Amazon Redshift (query federation via Glue Data Catalog), Microsoft Fabric (under development), and IBM.

ISV Data Kit partners — companies distributing enrichment data through Zero-Copy — include Dun & Bradstreet, Moody’s, ZoomInfo, Workday, and The Weather Company. These partners deliver data directly to Data Cloud customers without any data movement. SI partners include Accenture, Deloitte, PwC, Capgemini, and Wipro.

Each platform uses native authentication: Snowflake via RSA key pairs, Databricks through Unity Catalog with credential vending, Redshift through Glue and Lake Formation, BigQuery through Analytics Hub. Private Connect is available for secure access to sources locked in an AWS VPC.

Gemini Generated Image ggxbrxggxbrxggxb 1

How Zero-Copy Powers Agentforce Agents

Data Cloud is the heart of Agentforce’s intelligence. Here’s the chain: external data connects to Data Cloud via Zero-Copy, creating External Data Lake Objects (DLOs). These map to Data Model Objects (DMOs) — the canonical business model with 89+ standard objects. Agentforce’s Atlas Reasoning Engine queries DMOs through auto-launched Flows, grounding every AI response in live enterprise data.

The critical insight: an Agentforce agent can access real-time warranty data from Snowflake, loyalty data from Databricks, and CRM data from Sales Cloud through a single unified model — without any of these sources needing to move data. This is exactly how DealerVogue works: the agent queries OEM parts inventory via Zero-Copy while simultaneously accessing Automotive Cloud vehicle records natively.

Heathrow Airport demonstrates this at scale: Zero-Copy unifies millions of passenger records across Databricks, Azure, and Smart IVR databases, providing Agentforce’s “Hallie” agent with a complete passenger view — achieving 90% chat resolution without human transfer.

When NOT to Use Zero-Copy (5 Scenarios Where ETL Still Wins)

1. Sub-second, high-frequency read patterns. Live Query performance depends on the external system. If Snowflake is slow, your queries are slow. CRMA dashboards firing 10–12 concurrent queries can saturate external concurrency limits. Use physical ingestion or Cached Acceleration for these workloads.

2. Complex pre-consumption transformations. External data must map to Salesforce’s Customer 360 Data Model. If transformation logic is heavy, design it outside Data Cloud first. Not all Data Cloud features work with federated data.

3. Availability-dependent workloads. If the external source goes down, data access is disrupted — there’s no local fallback. Organizations requiring continuity during external outages need a local copy.

4. Trigger-dependent workflows (without File Federation). In Live Query mode, zero-copy calls are initiated from Salesforce to the external system — external data changes cannot trigger Salesforce actions. File Federation with change data feeds solves this, but only for supported platforms.

5. High-volume, frequently-queried datasets. Federation credits are charged per access — every query consumes credits, not just initial ingestion. Combined with dual billing from the external provider, frequently queried large datasets can become more expensive than one-time batch ingestion.

The optimal architecture combines both: federate the long tail of external data queried infrequently, ingest the high-velocity streams that power sub-second experiences, and use File Federation wherever Iceberg tables are available to minimize cost and latency.

Implementation Checklist: Getting Zero-Copy Right

1. Data Cloud license: Free provisioning available for Enterprise Edition and above (includes 250,000 credits and 1 TB storage). Verify your edition supports Data Cloud.

2. Region co-location: Deploy Data Cloud and your external platform in the same cloud region. Data locality is the single biggest performance lever.

3. Credential setup: RSA key pair for Snowflake, Unity Catalog for Databricks, Glue Data Catalog for Redshift, Analytics Hub for BigQuery.

4. Create the connection: Data Cloud Setup → External Integrations → New Connection → Select platform → Enter credentials → Verify “Active.”

5. Configure Data Streams: Select objects, assign category (Profile/Engagement/Other), set primary key, choose Live vs Cached mode.

6. Map DLOs to DMOs: Map External Data Lake Objects to standard or custom Data Model Objects through the harmonization layer.

7. Keep related tables on the same method: Mixing accelerated and live tables breaks query pushdown optimization. This is the #1 implementation mistake.

8. Monitor credit consumption: Use the Digital Wallet to track federation credit usage by connection and stream. Set alerts for unusual consumption.

Real-World Results: FedEx, Wyndham & Heathrow

FedEx: Implemented Zero-Copy with Databricks in under two weeks. Reported +2,000% ROI. Tasks that previously required hours of data synchronization now take seconds. Enabled quote abandonment recovery, dormant account reactivation, and international expansion targeting.

Wyndham Hotels: Unified 165 million guest records across Amazon Redshift, Sabre, and Salesforce Clouds. Achieved 90-second reduction in average handle times, 55% improvement in franchisee-resolved cases, and 33% reduction in days to close a case.

Heathrow Airport: Zero-Copy unifies passenger records across Databricks, Azure, and IVR databases. Agentforce’s “Hallie” agent achieves 90% chat resolution via WhatsApp for 83 million annual passengers.

How Xillentech Architects Zero-Copy for Enterprise Clients

At Xillentech, Zero-Copy is the default data architecture for every Salesforce engagement:

DealerVogue (Automotive): Zero-Copy federation queries OEM warranty systems and parts inventory in real-time. The Agentforce agent checks warranty coverage, parts availability, and schedules service — all via federated data. No ETL. No stale copies.

MedVogue (Healthcare): Patient records from EHR systems accessed via Zero-Copy, with identity resolution unifying records across providers. HIPAA compliance maintained because data never leaves the governed source system.

ConnectVogue (AppExchange): BYOK architecture leverages Zero-Copy to connect customer data sources without requiring data to enter our infrastructure — a key differentiator for enterprise security reviews.

Zero-Copy isn’t just a cost optimization. It’s an architecture decision that determines whether your Agentforce agents operate on stale snapshots or live enterprise data. We’ve made our choice. The 341% YoY adoption growth suggests the market is making the same one.

What is Zero-Copy in Salesforce Data Cloud?

Zero-Copy is a bidirectional data federation technology that allows Salesforce Data Cloud to query external data warehouses (Snowflake, Databricks, BigQuery, Amazon Redshift) without copying, moving, or duplicating the data. Instead of traditional ETL pipelines that extract and load data, Zero-Copy sends queries to external systems and receives only the results. This keeps data fresh, governed at the source, and eliminates redundant storage. Zero-Copy federation costs 70 credits per million records versus 2,000 for batch pipelines — a 28.5x cost reduction.

How does Zero-Copy architecture work?

Zero-Copy uses advanced metadata management and query pushdown built on Apache Iceberg, the open table format underpinning Data Cloud’s lakehouse. Three patterns exist: Query Federation sends SQL via JDBC to the external system’s compute engine. File Federation reads Parquet files directly from external storage using Data Cloud’s Hyper engine, bypassing external compute entirely. Data Sharing exposes enriched Data Cloud data to external platforms without outbound ETL. All three directions operate without physical data movement — only metadata references and query results traverse the network.

What is the difference between Zero-Copy and ETL?

ETL (Extract, Transform, Load) physically copies data from source systems into Salesforce, creating duplicate datasets that require storage, synchronization pipelines, and ongoing maintenance. Data is stale by hours or days. Zero-Copy leaves data in place and queries it live. ETL costs 2,000 Data Cloud credits per million rows; Zero-Copy federation costs 70. ETL requires pipeline maintenance and produces sync conflicts. Zero-Copy eliminates both. However, ETL is still preferred for high-frequency sub-second reads, complex transformations, and workloads requiring offline access.

Which platforms support Zero-Copy with Data Cloud?

The Zero-Copy Partner Network (launched April 2024) includes Snowflake (query federation + data sharing), Databricks (query + file federation + data sharing), Google BigQuery (query federation + data sharing), Amazon Redshift (query federation via Glue Data Catalog), and Microsoft Fabric (under development). ISV Data Kit partners include Dun & Bradstreet, Moody’s, ZoomInfo, Workday, and The Weather Company. Each platform uses native authentication and governance mechanisms.

What is the difference between query federation and file federation?

Query Federation sends SQL queries to the external platform’s compute engine (Snowflake, Databricks SQL) which executes the query and returns results. You pay both Salesforce credits and external compute costs. File Federation has Data Cloud’s Hyper engine read data files directly from external object storage (S3, Azure Blob), bypassing external compute entirely. File Federation eliminates dual compute billing, supports change data feeds natively, and delivers better latency when co-located. Salesforce recommends File Federation wherever supported.

How does Zero-Copy power Agentforce?

External data connects to Data Cloud via Zero-Copy, creating External Data Lake Objects (DLOs) mapped to Data Model Objects (DMOs) — the canonical business model with 89+ standard objects. Agentforce’s Atlas Reasoning Engine queries DMOs through auto-launched Flows, grounding AI responses in live enterprise data from any federated source. An agent can simultaneously access warranty data from Snowflake, CRM data from Sales Cloud, and inventory data from Databricks through a unified model — all in real-time, without ETL.

How much does Zero-Copy cost?

Zero-Copy Query Federation (Live mode) costs 70 Data Cloud credits per million rows queried. Cached Acceleration costs 2,000 credits per million rows (same as batch ingestion). File Federation costs 70 credits per million rows. Data Sharing (outbound) costs 800 credits per million rows. At $0.005 per credit ($500 per 100,000 credits), federating a billion records costs approximately $350 versus $10,000 for batch ingestion. However, organizations also pay external platform compute costs for query federation — File Federation in the same cloud region eliminates this dual billing.

Varun Patel

Recommanded for you