Extract - Confidential Document Processing

Process documents securely using vision language models inside hardware-protected enclaves.

What You'll Learn

Overview - What Extract does and when to use it
Templates - Create and manage extraction templates
Processing - Process documents and retrieve results
Examples - Integration examples in multiple languages

Overview

Extract uses vision language models to pull structured data from documents - invoices, receipts, expense reports - while keeping everything inside a Trusted Execution Environment. Your documents never leave the enclave unencrypted.

How It Works

Upload - Send your document (PDF, JPG, PNG)
Template - Select or create an extraction template
Process - Vision model extracts structured data inside the TEE
Validate - Results are validated against your schema
Return - Get JSON with extracted data and confidence scores

Extract vs. Manual Processing

Feature	Extract	Manual Processing
Speed	Seconds	Minutes to hours
Accuracy	Consistent	Human error
Scalability	Unlimited	Staff-dependent
Data privacy	TEE-protected	Exposure risk
Audit trail	Automatic	Manual logging

Use Cases

Use Case	Description
Invoice Processing	Extract vendor, line items, totals, tax info
Receipt Digitization	Parse retail receipts for expense tracking
Expense Reports	Classify and extract expense data
Contract Analysis	Pull key terms and dates from agreements
Compliance Documents	Extract required fields for regulatory reporting

Quick Example

Process an invoice and get structured JSON:

import requests

# Process a document with a template
with open("invoice.pdf", "rb") as f:
    response = requests.post(
        "http://localhost/api/v1/extract/process",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        files={"file": f},
        data={
            "template_id": "detailed_invoice",
            "context": '{"company_name": "Acme Corp"}'
        }
    )

result = response.json()
print(f"Job ID: {result['job_id']}")
print(f"Status: {result['status']}")
print(f"Extracted data: {result['result']['parsed_data']}")

Supported File Formats

Format	Extension	Max Size	Notes
PDF	`.pdf`	10 MB	Up to 20 pages
JPEG	`.jpg`, `.jpeg`	10 MB	Single image
PNG	`.png`	10 MB	Single image

Default Templates

Extract ships with three templates you can use immediately:

Template	Purpose	Output
`detailed_invoice`	Full invoice extraction	Vendor, buyer, line items, tax, totals
`simple_receipt`	Retail receipt parsing	Store, date, items, total
`expense_report`	Expense classification	Category, amount, date, description

You can customize these or create your own templates for specific document types.

Security Model

Extract runs inside the same TEE infrastructure as all Enclava inference:

End-to-end encryption - Documents encrypted in transit and at rest
No persistence - Documents deleted after processing
Audit logging - All operations tracked with user/API key attribution
Budget controls - API key spending limits enforced before processing

Next Steps

Templates - Create custom extraction templates
Processing - Learn the full processing workflow
Examples - See Python integration code

What You'll Learn​

Overview​

How It Works​

Extract vs. Manual Processing​

Use Cases​

Quick Example​

Supported File Formats​

Default Templates​

Security Model​

Next Steps​