Skip to main content

Extract - Confidential Document Processing

Process documents securely using vision language models inside hardware-protected enclaves.

What You'll Learn

  • Overview - What Extract does and when to use it
  • Templates - Create and manage extraction templates
  • Processing - Process documents and retrieve results
  • Examples - Integration examples in multiple languages

Overview

Extract uses vision language models to pull structured data from documents - invoices, receipts, expense reports - while keeping everything inside a Trusted Execution Environment. Your documents never leave the enclave unencrypted.

How It Works

  1. Upload - Send your document (PDF, JPG, PNG)
  2. Template - Select or create an extraction template
  3. Process - Vision model extracts structured data inside the TEE
  4. Validate - Results are validated against your schema
  5. Return - Get JSON with extracted data and confidence scores

Extract vs. Manual Processing

FeatureExtractManual Processing
SpeedSecondsMinutes to hours
AccuracyConsistentHuman error
ScalabilityUnlimitedStaff-dependent
Data privacyTEE-protectedExposure risk
Audit trailAutomaticManual logging

Use Cases

Use CaseDescription
Invoice ProcessingExtract vendor, line items, totals, tax info
Receipt DigitizationParse retail receipts for expense tracking
Expense ReportsClassify and extract expense data
Contract AnalysisPull key terms and dates from agreements
Compliance DocumentsExtract required fields for regulatory reporting

Quick Example

Process an invoice and get structured JSON:

import requests

# Process a document with a template
with open("invoice.pdf", "rb") as f:
response = requests.post(
"http://localhost/api/v1/extract/process",
headers={"Authorization": "Bearer YOUR_API_KEY"},
files={"file": f},
data={
"template_id": "detailed_invoice",
"context": '{"company_name": "Acme Corp"}'
}
)

result = response.json()
print(f"Job ID: {result['job_id']}")
print(f"Status: {result['status']}")
print(f"Extracted data: {result['result']['parsed_data']}")

Supported File Formats

FormatExtensionMax SizeNotes
PDF.pdf10 MBUp to 20 pages
JPEG.jpg, .jpeg10 MBSingle image
PNG.png10 MBSingle image

Default Templates

Extract ships with three templates you can use immediately:

TemplatePurposeOutput
detailed_invoiceFull invoice extractionVendor, buyer, line items, tax, totals
simple_receiptRetail receipt parsingStore, date, items, total
expense_reportExpense classificationCategory, amount, date, description

You can customize these or create your own templates for specific document types.

Security Model

Extract runs inside the same TEE infrastructure as all Enclava inference:

  • End-to-end encryption - Documents encrypted in transit and at rest
  • No persistence - Documents deleted after processing
  • Audit logging - All operations tracked with user/API key attribution
  • Budget controls - API key spending limits enforced before processing

Next Steps

  • Templates - Create custom extraction templates
  • Processing - Learn the full processing workflow
  • Examples - See Python integration code