Confidential inference

Short version

Confidential inference lets you use AI models on sensitive data without exposing that data to the AI provider. Your prompts and responses stay encrypted even during processing.

The problem

To use an AI service, you send your data to someone else's servers. The AI provider can see everything: your prompts, the responses, your usage patterns, any documents you upload.

A healthcare provider can't send patient records to ChatGPT—that's a HIPAA violation. A law firm can't have Claude review confidential contracts—that exposes client information. A bank can't use external AI for fraud analysis—the transaction data is too sensitive.

This isn't about whether the provider is trustworthy. It's about having data visible to systems and people who shouldn't need access to it.

How confidential inference works

Traditional AI:
Your Data → [AI Provider sees everything] → Result

Confidential Inference:
Your Data → [Hardware-encrypted enclave] → Result
                 ↑ Provider can't see in

The AI runs inside a Trusted Execution Environment—a hardware-isolated area where even the cloud operator has no visibility. Your data is encrypted in memory. The provider's admins can't see your prompts. There's no logging of your queries. You can verify this cryptographically before sending anything sensitive.

The AI still works normally. You just get actual privacy instead of a policy promise.

"Private mode" isn't the same thing

Many AI providers offer modes where they claim not to train on your data or share it with third parties. This is different from confidential inference.

With "private mode," your data is still visible during processing. Administrators can still access it. It may be logged for debugging or monitoring. An insider threat can exfiltrate it. A subpoena can force disclosure.

Confidential inference encrypts data in use. The provider physically cannot see your data because the hardware prevents it.

Who's using confidential AI inference

Healthcare AI

BeeKeeperAI, developed at UCSF's Center for Digital Health Innovation, runs AI algorithms on protected health information without exposing patient data. Drug researchers previously couldn't access patient data due to PHI regulations—confidential computing removes this bottleneck by protecting data during AI processing, not just at rest.

Dana-Farber Cancer Institute implemented GPT-4 for clinical applications using Microsoft's Azure OpenAI within a HIPAA-compliant environment. They published their implementation under an open-source license for other healthcare organizations.

Healthcare systems are running models like GPT-4o and DeepSeek R1 within HIPAA-compliant Azure environments, with no PHI transmitted to OpenAI for training or human review. Smaller open-source models run on HIPAA-compliant on-premise servers with NVIDIA H100 GPUs.

Financial services AI

Royal Bank of Canada (RBC) integrated Azure confidential computing into their data clean room platform (Arxis) for privacy-preserving machine learning. Their Director of Service Engineering Justin Simonelis: "We fully recognize the importance of privacy preserving machine learning inference and training to protect sensitive customer data within GPUs."

Insurance fraud detection uses Opaque's confidential AI platform to enable secure data sharing among insurers. Multiple companies can run AI models on combined datasets to detect fraud patterns—without any party seeing the others' customer data.

Enterprise AI platforms

Azure AI Confidential Inferencing launched with the Whisper speech-to-text model as the first Azure AI model with confidential computing protection. Audio prompts and transcribed responses are decrypted only within the TEE on confidential GPU VMs.

Privatemode.ai runs Llama 3.3-70B (with DeepSeek R1 coming) in a confidential computing environment where prompts stay encrypted in memory. The inference code runs in a sandbox—the infrastructure can't access your data, and the inference code can't leak it. Available as both a chat application and API.

Opaque Systems partnered with Bloomfilter to enable AI features that analyze customer data from Jira and GitHub—data containing IP and known vulnerabilities. Security-conscious clients in regulated sectors had blocked adoption. After implementing Opaque's confidential AI, Bloomfilter saw a 57% increase in enterprise sales conversion and security review time dropped by 32%.

Accenture invested in Opaque to help enterprises run AI workloads on encrypted data. They're using it for telecom companies sharing data across departments, and retailers analyzing customer behavior without exposing sensitive information to third parties.

GPU-accelerated confidential AI

Google Cloud A3 VMs with NVIDIA H100 GPUs protect AI chatbot queries. Users enter private data in prompts to NLP models, and those queries need protection due to data privacy regulations. The H100's memory encryption keeps both the model and user data confidential.

Azure Confidential VMs with H100 are designed for inferencing, fine-tuning, or training models like Whisper, Stable Diffusion, Llama2, Falcon, and GPT-2. The VMs combine AMD EPYC confidential computing with NVIDIA GPU memory encryption.

The technology

Hardware-based encryption

Security starts in the processor, not software:

AMD SEV-SNP (Secure Encrypted Virtualization - Secure Nested Paging): Encrypts entire VMs with memory integrity protection. EPYC 9005 processors support 256-bit AES-XTS encryption with over 1000 keys for simultaneous encrypted VMs.

Intel TDX (Trust Domain Extensions): VM-level isolation with hardware-enforced boundaries across Xeon Scalable processors.

AWS Nitro Enclaves: Isolated compute environments with no persistent storage, no SSH, no external networking—only a secure channel to the parent instance.

NVIDIA H100: GPU memory encryption and Multi-Instance GPU isolation for AI workloads. The first GPU with built-in confidential computing support.

Hardware enforcement matters because software can be compromised. The processor physically blocks unauthorized memory access.

Remote attestation

Before sending sensitive data, you verify the enclave:

You request attestation
The secure processor measures the running code
That measurement gets cryptographically signed by the hardware
You verify the signature against trusted certificates
Only then does your data enter the enclave

This proves the code is what you expect, running on genuine hardware, with proper isolation active. Trust is mathematical.

Memory encryption

Data in RAM is encrypted with AES-256. Keys live in the secure processor, inaccessible to software or admins. Physical memory extraction yields only ciphertext.

AMD SEV-SNP adds:

Reverse Memory Paging (RMP) to prevent unauthorized access
Encrypted CPU registers during context switches
Trusted I/O (TIO) extending protection to PCIe devices like GPUs

Cost and performance

Major cloud providers include confidential computing at no extra charge:

AWS Nitro Enclaves: no additional fees beyond EC2 costs
Azure Confidential VMs: standard VM pricing
Google Confidential VMs: standard compute pricing

Performance overhead is typically 3-5% for memory encryption. Attestation adds about a second at session start. For AI inference, the difference is usually unnoticeable.

Implementation

Deployment is simpler than you might expect.

Code changes: Most platforms offer API compatibility. With Enclava, for example, you change the endpoint URL and add attestation verification. No model retraining needed.

Timeline: A proof of concept typically takes 1-2 weeks. Production pilot: 4-6 weeks. Full deployment: 2-3 months.

Skills needed: Security teams need to understand attestation (about a week of training). DevOps needs enclave deployment knowledge (1-2 weeks). Development teams need API integration basics (a few days). Standard cloud DevOps skills apply—Docker, Kubernetes, Terraform all work normally.

Vendor options

Cloud-native confidential AI:

Azure AI Confidential Inferencing (Whisper, expanding to more models)
Azure Confidential VMs with H100 (Llama2, Falcon, Stable Diffusion, etc.)
Google Cloud A3 Confidential VMs (H100 GPUs for AI inference)

Specialized confidential AI platforms:

Privatemode.ai (Llama 3.3-70B, DeepSeek R1, zero-retention)
Opaque Systems (multi-party confidential AI, encrypted data analysis)
Enclava (OpenAI-compatible API with simplified deployment)

Enterprise solutions:

Anjuna (confidential AI on Azure H100)
Fortanix (multi-cloud confidential AI)

Confidential inference

The problem

How confidential inference works

"Private mode" isn't the same thing

Who's using confidential AI inference

Healthcare AI

Financial services AI

Enterprise AI platforms

GPU-accelerated confidential AI

The technology

Hardware-based encryption

Remote attestation

Memory encryption

Cost and performance

Implementation

Vendor options

Further reading

Confidential AI documentation

Industry resources

Healthcare AI privacy

Enclava resources

The problem​

How confidential inference works​

"Private mode" isn't the same thing​

Who's using confidential AI inference​

Healthcare AI​

Financial services AI​

Enterprise AI platforms​

GPU-accelerated confidential AI​

The technology​

Hardware-based encryption​

Remote attestation​

Memory encryption​

Cost and performance​

Implementation​

Vendor options​

Further reading​

Confidential AI documentation​

Industry resources​

Healthcare AI privacy​

Enclava resources​

The problem

How confidential inference works

"Private mode" isn't the same thing

Who's using confidential AI inference

Healthcare AI

Financial services AI

Enterprise AI platforms

GPU-accelerated confidential AI

The technology

Hardware-based encryption

Remote attestation

Memory encryption

Cost and performance

Implementation

Vendor options

Further reading

Confidential AI documentation

Industry resources

Healthcare AI privacy

Enclava resources