Extraction Templates

Templates define how Extract interprets your documents. Each template has prompts that guide the vision model and optional schemas that validate the output.

Template Structure

A template consists of:

Field	Required	Description
`id`	Yes	Unique identifier (e.g., `my_invoice_template`)
`system_prompt`	Yes	Instructions for model behavior
`user_prompt`	Yes	What to extract, with `{placeholders}`
`context_schema`	No	Defines available placeholder variables
`output_schema`	No	JSON schema for validating extracted data
`vision_model`	No	Override the default vision model

Using Default Templates

Three templates are available out of the box:

detailed_invoice

Extracts comprehensive invoice data including line items:

response = requests.post(
    "http://localhost/api/v1/extract/process",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files={"file": open("invoice.pdf", "rb")},
    data={
        "template_id": "detailed_invoice",
        "context": '{"company_name": "Your Company"}'
    }
)

Output includes:

Service provider details (name, address, tax ID)
Buyer information
Invoice metadata (number, date, due date)
Line items with descriptions, quantities, prices
Tax breakdown and totals

simple_receipt

Parses basic retail receipts:

response = requests.post(
    "http://localhost/api/v1/extract/process",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files={"file": open("receipt.jpg", "rb")},
    data={"template_id": "simple_receipt"}
)

Output includes:

Store name and location
Transaction date and time
List of items with prices
Subtotal, tax, and total

expense_report

Classifies expenses for reporting:

response = requests.post(
    "http://localhost/api/v1/extract/process",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files={"file": open("expense.png", "rb")},
    data={"template_id": "expense_report"}
)

Output includes:

Expense category
Amount and currency
Date
Vendor name
Description

Creating Custom Templates

Create templates for your specific document types:

import requests

template = {
    "id": "purchase_order",
    "system_prompt": """You are a document extraction specialist.
Extract data accurately from purchase orders.
Return valid JSON matching the requested structure.
If a field is not visible, use null.""",

    "user_prompt": """Extract the following from this purchase order for {company_name}:
- PO number
- Vendor name and address
- Order date
- Delivery date
- Line items (part number, description, quantity, unit price)
- Shipping terms
- Total amount""",

    "context_schema": {
        "company_name": {
            "type": "string",
            "description": "Name of the ordering company"
        }
    },

    "output_schema": {
        "type": "object",
        "properties": {
            "po_number": {"type": "string"},
            "vendor": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "address": {"type": "string"}
                }
            },
            "order_date": {"type": "string", "format": "date"},
            "delivery_date": {"type": "string", "format": "date"},
            "line_items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "part_number": {"type": "string"},
                        "description": {"type": "string"},
                        "quantity": {"type": "number"},
                        "unit_price": {"type": "number"}
                    }
                }
            },
            "shipping_terms": {"type": "string"},
            "total_amount": {"type": "number"}
        }
    }
}

response = requests.post(
    "http://localhost/api/v1/extract/templates",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json=template
)

print(f"Created template: {response.json()['id']}")

Template Placeholders

Use {placeholder} syntax in your user prompt to inject context at processing time:

# Template with placeholders
template = {
    "id": "contract_review",
    "user_prompt": """Review this contract between {party_a} and {party_b}.
Extract:
- Effective date
- Term length
- Key obligations for {party_a}
- Payment terms
- Termination conditions""",

    "context_schema": {
        "party_a": {"type": "string", "description": "First party name"},
        "party_b": {"type": "string", "description": "Second party name"}
    }
}

# Processing with context
response = requests.post(
    "http://localhost/api/v1/extract/process",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files={"file": open("contract.pdf", "rb")},
    data={
        "template_id": "contract_review",
        "context": '{"party_a": "Acme Corp", "party_b": "Widget Inc"}'
    }
)

Managing Templates

List All Templates

response = requests.get(
    "http://localhost/api/v1/extract/templates",
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

for template in response.json()["templates"]:
    print(f"{template['id']}: {template['system_prompt'][:50]}...")

Get Template Details

response = requests.get(
    "http://localhost/api/v1/extract/templates/detailed_invoice",
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

template = response.json()
print(f"System prompt: {template['system_prompt']}")
print(f"User prompt: {template['user_prompt']}")

Update Template

response = requests.put(
    "http://localhost/api/v1/extract/templates/my_template",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "user_prompt": "Updated extraction instructions...",
        "output_schema": {"type": "object", "properties": {...}}
    }
)

Delete Template

response = requests.delete(
    "http://localhost/api/v1/extract/templates/my_template",
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

Reset Default Templates

Restore the built-in templates to their original state:

response = requests.post(
    "http://localhost/api/v1/extract/templates/reset-defaults",
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

Template Wizard

Generate a template from a sample document:

with open("sample_invoice.pdf", "rb") as f:
    response = requests.post(
        "http://localhost/api/v1/extract/templates/wizard",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        files={"file": f},
        data={"description": "Monthly vendor invoice with line items"}
    )

suggested_template = response.json()
print(f"Suggested template: {suggested_template}")

The wizard analyzes your document and suggests prompts and schemas based on its structure.

Best Practices

Writing System Prompts

Good:

You are a document extraction specialist. Extract data accurately and completely.
Return valid JSON. Use null for missing fields. Do not hallucinate data.

Bad:

Extract stuff from the document.

Writing User Prompts

Good:

Extract the following fields from this invoice:
- Invoice number (top right corner, format: INV-XXXXX)
- Vendor name and full address
- Each line item with: description, quantity, unit price, line total
- Tax amount and rate
- Total amount due

Bad:

Get the invoice data.

Output Schemas

Define schemas to catch extraction errors early:

"output_schema": {
    "type": "object",
    "required": ["invoice_number", "total_amount"],
    "properties": {
        "invoice_number": {"type": "string", "pattern": "^INV-\\d{5}$"},
        "total_amount": {"type": "number", "minimum": 0},
        "line_items": {
            "type": "array",
            "minItems": 1,
            "items": {
                "type": "object",
                "required": ["description", "amount"],
                "properties": {
                    "description": {"type": "string"},
                    "quantity": {"type": "integer", "minimum": 1},
                    "unit_price": {"type": "number"},
                    "amount": {"type": "number"}
                }
            }
        }
    }
}

Next Steps

Processing - Learn the full document processing workflow
Examples - See complete integration examples

Template Structure​

Using Default Templates​

detailed_invoice​

simple_receipt​

expense_report​

Creating Custom Templates​

Template Placeholders​

Managing Templates​

List All Templates​

Get Template Details​

Update Template​

Delete Template​

Reset Default Templates​

Template Wizard​

Best Practices​

Writing System Prompts​

Writing User Prompts​

Output Schemas​

Next Steps​

Template Structure

Using Default Templates

detailed_invoice

simple_receipt

expense_report

Creating Custom Templates

Template Placeholders

Managing Templates

List All Templates

Get Template Details

Update Template

Delete Template

Reset Default Templates

Template Wizard

Best Practices

Writing System Prompts

Writing User Prompts

Output Schemas

Next Steps