Blog - Private Cloud Compute: A new frontier for AI privacy in the cloud

Apple Intelligence is the personal intelligence system that brings powerful generative models to iPhone, iPad, and Mac. For advanced features that need to reason over complex data with larger foundation models, we created Private Cloud Compute (PCC), a groundbreaking cloud intelligence system designed specifically for private AI processing. For the first time ever, Private Cloud Compute extends the industry-leading security and privacy of Apple devices into the cloud, making sure that personal user data sent to PCC isn’t accessible to anyone other than the user — not even to Apple. Built with custom Apple silicon and a hardened operating system designed for privacy, we believe PCC is the most advanced security architecture ever deployed for cloud AI compute at scale.

Apple has long championed on-device processing as the cornerstone for the security and privacy of user data. Data that exists only on user devices is by definition disaggregated and not subject to any centralized point of attack. When Apple is responsible for user data in the cloud, we protect it with state-of-the-art security in our services — and for the most sensitive data, we believe end-to-end encryption is our most powerful defense. For cloud services where end-to-end encryption is not appropriate, we strive to process user data ephemerally or under uncorrelated randomized identifiers that obscure the user’s identity.

Secure and private AI processing in the cloud poses a formidable new challenge. Powerful AI hardware in the data center can fulfill a user’s request with large, complex machine learning models — but it requires unencrypted access to the user's request and accompanying personal data. That precludes the use of end-to-end encryption, so cloud AI applications have to date employed traditional approaches to cloud security. Such approaches present a few key challenges:

Cloud AI security and privacy guarantees are difficult to verify and enforce. If a cloud AI service states that it does not log certain user data, there is generally no way for security researchers to verify this promise — and often no way for the service provider to durably enforce it. For example, a new version of the AI service may introduce additional routine logging that inadvertently logs sensitive user data without any way for a researcher to detect this. Similarly, a perimeter load balancer that terminates TLS may end up logging thousands of user requests wholesale during a troubleshooting session.
It’s difficult to provide runtime transparency for AI in the cloud. Cloud AI services are opaque: providers do not typically specify details of the software stack they are using to run their services, and those details are often considered proprietary. Even if a cloud AI service relied only on open source software, which is inspectable by security researchers, there is no widely deployed way for a user device (or browser) to confirm that the service it’s connecting to is running an unmodified version of the software that it purports to run, or to detect that the software running on the service has changed.
It’s challenging for cloud AI environments to enforce strong limits to privileged access. Cloud AI services are complex and expensive to run at scale, and their runtime performance and other operational metrics are constantly monitored and investigated by site reliability engineers and other administrative staff at the cloud service provider. During outages and other severe incidents, these administrators can generally make use of highly privileged access to the service, such as via SSH and equivalent remote shell interfaces. Though access controls for these privileged, break-glass interfaces may be well-designed, it’s exceptionally difficult to place enforceable limits on them while they’re in active use. For example, a service administrator who is trying to back up data from a live server during an outage could inadvertently copy sensitive user data in the process. More perniciously, criminals such as ransomware operators routinely strive to compromise service administrator credentials precisely to take advantage of privileged access interfaces and make away with user data.

When on-device computation with Apple devices such as iPhone and Mac is possible, the security and privacy advantages are clear: users control their own devices, researchers can inspect both hardware and software, runtime transparency is cryptographically assured through Secure Boot, and Apple retains no privileged access (as a concrete example, the Data Protection file encryption system cryptographically prevents Apple from disabling or guessing the passcode of a given iPhone).

However, to process more sophisticated requests, Apple Intelligence needs to be able to enlist help from larger, more complex models in the cloud. For these cloud requests to live up to the security and privacy guarantees that our users expect from our devices, the traditional cloud service security model isn't a viable starting point. Instead, we need to bring our industry-leading device security model, for the first time ever, to the cloud.

The rest of this post is an initial technical overview of Private Cloud Compute, to be followed by a deep dive after PCC becomes available in beta. We know researchers will have many detailed questions, and we look forward to answering more of them in our follow-up post.

Designing Private Cloud Compute

We set out to build Private Cloud Compute with a set of core requirements:

Stateless computation on personal user data. Private Cloud Compute must use the personal user data that it receives exclusively for the purpose of fulfilling the user’s request. This data must never be available to anyone other than the user, not even to Apple staff, not even during active processing. And this data must not be retained, including via logging or for debugging, after the response is returned to the user. In other words, we want a strong form of stateless data processing where personal data leaves no trace in the PCC system.
Enforceable guarantees. Security and privacy guarantees are strongest when they are entirely technically enforceable, which means it must be possible to constrain and analyze all the components that critically contribute to the guarantees of the overall Private Cloud Compute system. To use our example from earlier, it’s very difficult to reason about what a TLS-terminating load balancer may do with user data during a debugging session. Therefore, PCC must not depend on such external components for its core security and privacy guarantees. Similarly, operational requirements such as collecting server metrics and error logs must be supported with mechanisms that do not undermine privacy protections.
No privileged runtime access. Private Cloud Compute must not contain privileged interfaces that would enable Apple’s site reliability staff to bypass PCC privacy guarantees, even when working to resolve an outage or other severe incident. This also means that PCC must not support a mechanism by which the privileged access envelope could be enlarged at runtime, such as by loading additional software.
Non-targetability. An attacker should not be able to attempt to compromise personal data that belongs to specific, targeted Private Cloud Compute users without attempting a broad compromise of the entire PCC system. This must hold true even for exceptionally sophisticated attackers who can attempt physical attacks on PCC nodes in the supply chain or attempt to obtain malicious access to PCC data centers. In other words, a limited PCC compromise must not allow the attacker to steer requests from specific users to compromised nodes; targeting users should require a wide attack that’s likely to be detected. To understand this more intuitively, contrast it with a traditional cloud service design where every application server is provisioned with database credentials for the entire application database, so a compromise of a single application server is sufficient to access any user’s data, even if that user doesn’t have any active sessions with the compromised application server.
Verifiable transparency. Security researchers need to be able to verify, with a high degree of confidence, that our privacy and security guarantees for Private Cloud Compute match our public promises. We already have an earlier requirement for our guarantees to be enforceable. Hypothetically, then, if security researchers had sufficient access to the system, they would be able to verify the guarantees. But this last requirement, verifiable transparency, goes one step further and does away with the hypothetical: security researchers must be able to verify the security and privacy guarantees of Private Cloud Compute, and they must be able to verify that the software that’s running in the PCC production environment is the same as the software they inspected when verifying the guarantees.

This is an extraordinary set of requirements, and one that we believe represents a generational leap over any traditional cloud service security model.

Introducing Private Cloud Compute nodes

The root of trust for Private Cloud Compute is our compute node: custom-built server hardware that brings the power and security of Apple silicon to the data center, with the same hardware security technologies used in iPhone, including the Secure Enclave and Secure Boot. We paired this hardware with a new operating system: a hardened subset of the foundations of iOS and macOS tailored to support Large Language Model (LLM) inference workloads while presenting an extremely narrow attack surface. This allows us to take advantage of iOS security technologies such as Code Signing and sandboxing.

On top of this foundation, we built a custom set of cloud extensions with privacy in mind. We excluded components that are traditionally critical to data center administration, such as remote shells and system introspection and observability tools. We replaced those general-purpose software components with components that are purpose-built to deterministically provide only a small, restricted set of operational metrics to SRE staff. And finally, we used Swift on Server to build a new Machine Learning stack specifically for hosting our cloud-based foundation model.

Let’s take another look at our core Private Cloud Compute requirements and the features we built to achieve them.

Stateless computation and enforceable guarantees

With services that are end-to-end encrypted, such as iMessage, the service operator cannot access the data that transits through the system. One of the key reasons such designs can assure privacy is specifically because they prevent the service from performing computations on user data. Since Private Cloud Compute needs to be able to access the data in the user’s request to allow a large foundation model to fulfill it, complete end-to-end encryption is not an option. Instead, the PCC compute node must have technical enforcement for the privacy of user data during processing, and must be incapable of retaining user data after its duty cycle is complete.

We designed Private Cloud Compute to make several guarantees about the way it handles user data:

A user’s device sends data to PCC for the sole, exclusive purpose of fulfilling the user’s inference request. PCC uses that data only to perform the operations requested by the user.
User data stays on the PCC nodes that are processing the request only until the response is returned. PCC deletes the user’s data after fulfilling the request, and no user data is retained in any form after the response is returned.
User data is never available to Apple — even to staff with administrative access to the production service or hardware.

When Apple Intelligence needs to draw on Private Cloud Compute, it constructs a request — consisting of the prompt, plus the desired model and inferencing parameters — that will serve as input to the cloud model. The PCC client on the user’s device then encrypts this request directly to the public keys of the PCC nodes that it has first confirmed are valid and cryptographically certified. This provides end-to-end encryption from the user’s device to the validated PCC nodes, ensuring the request cannot be accessed in transit by anything outside those highly protected PCC nodes. Supporting data center services, such as load balancers and privacy gateways, run outside of this trust boundary and do not have the keys required to decrypt the user’s request, thus contributing to our enforceable guarantees.

Next, we must protect the integrity of the PCC node and prevent any tampering with the keys used by PCC to decrypt user requests. The system uses Secure Boot and Code Signing for an enforceable guarantee that only authorized and cryptographically measured code is executable on the node. All code that can run on the node must be part of a trust cache that has been signed by Apple, approved for that specific PCC node, and loaded by the Secure Enclave such that it cannot be changed or amended at runtime. This also ensures that JIT mappings cannot be created, preventing compilation or injection of new code at runtime. Additionally, all code and model assets use the same integrity protection that powers the Signed System Volume. Finally, the Secure Enclave provides an enforceable guarantee that the keys that are used to decrypt requests cannot be duplicated or extracted.

The Private Cloud Compute software stack is designed to ensure that user data is not leaked outside the trust boundary or retained once a request is complete, even in the presence of implementation errors. The Secure Enclave randomizes the data volume’s encryption keys on every reboot and does not persist these random keys, ensuring that data written to the data volume cannot be retained across reboot. In other words, there is an enforceable guarantee that the data volume is cryptographically erased every time the PCC node’s Secure Enclave Processor reboots. The inference process on the PCC node deletes data associated with a request upon completion, and the address spaces that are used to handle user data are periodically recycled to limit the impact of any data that may have been unexpectedly retained in memory.

Finally, for our enforceable guarantees to be meaningful, we also need to protect against exploitation that could bypass these guarantees. Technologies such as Pointer Authentication Codes and sandboxing act to resist such exploitation and limit an attacker’s horizontal movement within the PCC node. The inference control and dispatch layers are written in Swift, ensuring memory safety, and use separate address spaces to isolate initial processing of requests. This combination of memory safety and the principle of least privilege removes entire classes of attacks on the inference stack itself and limits the level of control and capability that a successful attack can obtain.

No privileged runtime access

We designed Private Cloud Compute to ensure that privileged access doesn’t allow anyone to bypass our stateless computation guarantees.

First, we intentionally did not include remote shell or interactive debugging mechanisms on the PCC node. Our Code Signing machinery prevents such mechanisms from loading additional code, but this sort of open-ended access would provide a broad attack surface to subvert the system’s security or privacy. Beyond simply not including a shell, remote or otherwise, PCC nodes cannot enable Developer Mode and do not include the tools needed by debugging workflows.

Next, we built the system’s observability and management tooling with privacy safeguards that are designed to prevent user data from being exposed. For example, the system doesn’t even include a general-purpose logging mechanism. Instead, only pre-specified, structured, and audited logs and metrics can leave the node, and multiple independent layers of review help prevent user data from accidentally being exposed through these mechanisms. With traditional cloud AI services, such mechanisms might allow someone with privileged access to observe or collect user data.

Together, these techniques provide enforceable guarantees that only specifically designated code has access to user data and that user data cannot leak outside the PCC node during system administration.

Non-targetability

Our threat model for Private Cloud Compute includes an attacker with physical access to a compute node and a high level of sophistication — that is, an attacker who has the resources and expertise to subvert some of the hardware security properties of the system and potentially extract data that is being actively processed by a compute node.

We defend against this type of attack in two ways:

We supplement the built-in protections of Apple silicon with a hardened supply chain for PCC hardware, so that performing a hardware attack at scale would be both prohibitively expensive and likely to be discovered.
We limit the impact of small-scale attacks by ensuring that they cannot be used to target the data of a specific user.

Private Cloud Compute hardware security starts at manufacturing, where we inventory and perform high-resolution imaging of the components of the PCC node before each server is sealed and its tamper switch is activated. When they arrive in the data center, we perform extensive revalidation before the servers are allowed to be provisioned for PCC. The process involves multiple Apple teams that cross-check data from independent sources, and the process is further monitored by a third-party observer not affiliated with Apple. At the end, a certificate is issued for keys rooted in the Secure Enclave UID for each PCC node. The user’s device will not send data to any PCC nodes if it cannot validate their certificates.

These processes broadly protect hardware from compromise. To guard against smaller, more sophisticated attacks that might otherwise avoid detection, Private Cloud Compute uses an approach we call target diffusion to ensure requests cannot be routed to specific nodes based on the user or their content.

Target diffusion starts with the request metadata, which leaves out any personally identifiable information about the source device or user, and includes only limited contextual data about the request that’s required to enable routing to the appropriate model. This metadata is the only part of the user’s request that is available to load balancers and other data center components running outside of the PCC trust boundary. The metadata also includes a single-use credential, based on RSA Blind Signatures, to authorize valid requests without tying them to a specific user. Additionally, PCC requests go through an OHTTP relay — operated by a third party — which hides the device’s source IP address before the request ever reaches the PCC infrastructure. This prevents an attacker from using an IP address to identify requests or associate them with an individual. It also means that an attacker would have to compromise both the third-party relay and our load balancer to steer traffic based on the source IP address.

User devices encrypt requests only for a subset of PCC nodes, rather than the PCC service as a whole. When asked by a user device, the load balancer returns a subset of PCC nodes that are most likely to be ready to process the user’s inference request — however, as the load balancer has no identifying information about the user or device for which it’s choosing nodes, it cannot bias the set for targeted users. By limiting the PCC nodes that can decrypt each request in this way, we ensure that if a single node were ever to be compromised, it would not be able to decrypt more than a small portion of incoming requests. Finally, the selection of PCC nodes by the load balancer is statistically auditable to protect against a highly sophisticated attack where the attacker compromises a PCC node as well as obtains complete control of the PCC load balancer.

Verifiable transparency

We consider allowing security researchers to verify the end-to-end security and privacy guarantees of Private Cloud Compute to be a critical requirement for ongoing public trust in the system. Traditional cloud services do not make their full production software images available to researchers — and even if they did, there’s no general mechanism to allow researchers to verify that those software images match what’s actually running in the production environment. (Some specialized mechanisms exist, such as Intel SGX and AWS Nitro attestation.)

When we launch Private Cloud Compute, we’ll take the extraordinary step of making software images of every production build of PCC publicly available for security research. This promise, too, is an enforceable guarantee: user devices will be willing to send data only to PCC nodes that can cryptographically attest to running publicly listed software. We want to ensure that security and privacy researchers can inspect Private Cloud Compute software, verify its functionality, and help identify issues — just like they can with Apple devices.

Our commitment to verifiable transparency includes:

Publishing the measurements of all code running on PCC in an append-only and cryptographically tamper-proof transparency log.
Making the log and associated binary software images publicly available for inspection and validation by privacy and security experts.
Publishing and maintaining an official set of tools for researchers analyzing PCC node software.
Rewarding important research findings through the Apple Security Bounty program.

Every production Private Cloud Compute software image will be published for independent binary inspection — including the OS, applications, and all relevant executables, which researchers can verify against the measurements in the transparency log. Software will be published within 90 days of inclusion in the log, or after relevant software updates are available, whichever is sooner. Once a release has been signed into the log, it cannot be removed without detection, much like the log-backed map data structure used by the Key Transparency mechanism for iMessage Contact Key Verification.

As we mentioned, user devices will ensure that they’re communicating only with PCC nodes running authorized and verifiable software images. Specifically, the user’s device will wrap its request payload key only to the public keys of those PCC nodes whose attested measurements match a software release in the public transparency log. And the same strict Code Signing technologies that prevent loading unauthorized software also ensure that all code on the PCC node is included in the attestation.

Making Private Cloud Compute software logged and inspectable in this way is a strong demonstration of our commitment to enable independent research on the platform. But we want to ensure researchers can rapidly get up to speed, verify our PCC privacy claims, and look for issues, so we’re going further with three specific steps:

We’ll release a PCC Virtual Research Environment: a set of tools and images that simulate a PCC node on a Mac with Apple silicon, and that can boot a version of PCC software minimally modified for successful virtualization.
While we’re publishing the binary images of every production PCC build, to further aid research we will periodically also publish a subset of the security-critical PCC source code.
In a first for any Apple platform, PCC images will include the sepOS firmware and the iBoot bootloader in plaintext, making it easier than ever for researchers to study these critical components.

The Apple Security Bounty will reward research findings in the entire Private Cloud Compute software stack — with especially significant payouts for any issues that undermine our privacy claims.

More to come

Private Cloud Compute continues Apple’s profound commitment to user privacy. With sophisticated technologies to satisfy our requirements of stateless computation, enforceable guarantees, no privileged access, non-targetability, and verifiable transparency, we believe Private Cloud Compute is nothing short of the world-leading security architecture for cloud AI compute at scale.

We look forward to sharing many more technical details about PCC, including the implementation and behavior behind each of our core requirements. And we’re especially excited to soon invite security researchers for a first look at the Private Cloud Compute software and our PCC Virtual Research Environment.