Skip to main content

Extraction Templates

Templates define how Extract interprets your documents. Each template has prompts that guide the vision model and optional schemas that validate the output.

Template Structure

A template consists of:

FieldRequiredDescription
idYesUnique identifier (e.g., my_invoice_template)
system_promptYesInstructions for model behavior
user_promptYesWhat to extract, with {placeholders}
context_schemaNoDefines available placeholder variables
output_schemaNoJSON schema for validating extracted data
vision_modelNoOverride the default vision model

Using Default Templates

Three templates are available out of the box:

detailed_invoice

Extracts comprehensive invoice data including line items:

response = requests.post(
"http://localhost/api/v1/extract/process",
headers={"Authorization": "Bearer YOUR_API_KEY"},
files={"file": open("invoice.pdf", "rb")},
data={
"template_id": "detailed_invoice",
"context": '{"company_name": "Your Company"}'
}
)

Output includes:

  • Service provider details (name, address, tax ID)
  • Buyer information
  • Invoice metadata (number, date, due date)
  • Line items with descriptions, quantities, prices
  • Tax breakdown and totals

simple_receipt

Parses basic retail receipts:

response = requests.post(
"http://localhost/api/v1/extract/process",
headers={"Authorization": "Bearer YOUR_API_KEY"},
files={"file": open("receipt.jpg", "rb")},
data={"template_id": "simple_receipt"}
)

Output includes:

  • Store name and location
  • Transaction date and time
  • List of items with prices
  • Subtotal, tax, and total

expense_report

Classifies expenses for reporting:

response = requests.post(
"http://localhost/api/v1/extract/process",
headers={"Authorization": "Bearer YOUR_API_KEY"},
files={"file": open("expense.png", "rb")},
data={"template_id": "expense_report"}
)

Output includes:

  • Expense category
  • Amount and currency
  • Date
  • Vendor name
  • Description

Creating Custom Templates

Create templates for your specific document types:

import requests

template = {
"id": "purchase_order",
"system_prompt": """You are a document extraction specialist.
Extract data accurately from purchase orders.
Return valid JSON matching the requested structure.
If a field is not visible, use null.""",

"user_prompt": """Extract the following from this purchase order for {company_name}:
- PO number
- Vendor name and address
- Order date
- Delivery date
- Line items (part number, description, quantity, unit price)
- Shipping terms
- Total amount""",

"context_schema": {
"company_name": {
"type": "string",
"description": "Name of the ordering company"
}
},

"output_schema": {
"type": "object",
"properties": {
"po_number": {"type": "string"},
"vendor": {
"type": "object",
"properties": {
"name": {"type": "string"},
"address": {"type": "string"}
}
},
"order_date": {"type": "string", "format": "date"},
"delivery_date": {"type": "string", "format": "date"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"part_number": {"type": "string"},
"description": {"type": "string"},
"quantity": {"type": "number"},
"unit_price": {"type": "number"}
}
}
},
"shipping_terms": {"type": "string"},
"total_amount": {"type": "number"}
}
}
}

response = requests.post(
"http://localhost/api/v1/extract/templates",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json=template
)

print(f"Created template: {response.json()['id']}")

Template Placeholders

Use {placeholder} syntax in your user prompt to inject context at processing time:

# Template with placeholders
template = {
"id": "contract_review",
"user_prompt": """Review this contract between {party_a} and {party_b}.
Extract:
- Effective date
- Term length
- Key obligations for {party_a}
- Payment terms
- Termination conditions""",

"context_schema": {
"party_a": {"type": "string", "description": "First party name"},
"party_b": {"type": "string", "description": "Second party name"}
}
}

# Processing with context
response = requests.post(
"http://localhost/api/v1/extract/process",
headers={"Authorization": "Bearer YOUR_API_KEY"},
files={"file": open("contract.pdf", "rb")},
data={
"template_id": "contract_review",
"context": '{"party_a": "Acme Corp", "party_b": "Widget Inc"}'
}
)

Managing Templates

List All Templates

response = requests.get(
"http://localhost/api/v1/extract/templates",
headers={"Authorization": "Bearer YOUR_API_KEY"}
)

for template in response.json()["templates"]:
print(f"{template['id']}: {template['system_prompt'][:50]}...")

Get Template Details

response = requests.get(
"http://localhost/api/v1/extract/templates/detailed_invoice",
headers={"Authorization": "Bearer YOUR_API_KEY"}
)

template = response.json()
print(f"System prompt: {template['system_prompt']}")
print(f"User prompt: {template['user_prompt']}")

Update Template

response = requests.put(
"http://localhost/api/v1/extract/templates/my_template",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"user_prompt": "Updated extraction instructions...",
"output_schema": {"type": "object", "properties": {...}}
}
)

Delete Template

response = requests.delete(
"http://localhost/api/v1/extract/templates/my_template",
headers={"Authorization": "Bearer YOUR_API_KEY"}
)

Reset Default Templates

Restore the built-in templates to their original state:

response = requests.post(
"http://localhost/api/v1/extract/templates/reset-defaults",
headers={"Authorization": "Bearer YOUR_API_KEY"}
)

Template Wizard

Generate a template from a sample document:

with open("sample_invoice.pdf", "rb") as f:
response = requests.post(
"http://localhost/api/v1/extract/templates/wizard",
headers={"Authorization": "Bearer YOUR_API_KEY"},
files={"file": f},
data={"description": "Monthly vendor invoice with line items"}
)

suggested_template = response.json()
print(f"Suggested template: {suggested_template}")

The wizard analyzes your document and suggests prompts and schemas based on its structure.

Best Practices

Writing System Prompts

Good:

You are a document extraction specialist. Extract data accurately and completely.
Return valid JSON. Use null for missing fields. Do not hallucinate data.

Bad:

Extract stuff from the document.

Writing User Prompts

Good:

Extract the following fields from this invoice:
- Invoice number (top right corner, format: INV-XXXXX)
- Vendor name and full address
- Each line item with: description, quantity, unit price, line total
- Tax amount and rate
- Total amount due

Bad:

Get the invoice data.

Output Schemas

Define schemas to catch extraction errors early:

"output_schema": {
"type": "object",
"required": ["invoice_number", "total_amount"],
"properties": {
"invoice_number": {"type": "string", "pattern": "^INV-\\d{5}$"},
"total_amount": {"type": "number", "minimum": 0},
"line_items": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"required": ["description", "amount"],
"properties": {
"description": {"type": "string"},
"quantity": {"type": "integer", "minimum": 1},
"unit_price": {"type": "number"},
"amount": {"type": "number"}
}
}
}
}
}

Next Steps

  • Processing - Learn the full document processing workflow
  • Examples - See complete integration examples