Uploading Documents

Upload documents to your RAG collections. Documents are processed, chunked, and stored as vectors.

Upload a Document

Using cURL

curl -X POST http://localhost/api/v1/rag/upload \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@user_guide.pdf" \
  -F "collection_name=documentation" \
  -F "description=User guide for our product"

Using Python

import requests

with open("user_guide.pdf", "rb") as f:
    response = requests.post(
        "http://localhost/api/v1/rag/upload",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        files={"file": f},
        data={
            "collection_name": "documentation",
            "description": "User guide for our product"
        }
    )

result = response.json()
print(f"Document ID: {result['document_id']}")
print(f"Status: {result['status']}")

Using JavaScript

const formData = new FormData();
formData.append('file', fileInput.files[0]);
formData.append('collection_name', 'documentation');
formData.append('description', 'User guide for our product');

const response = await fetch('http://localhost/api/v1/rag/upload', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: formData
});

const result = await response.json();
console.log(`Document ID: ${result.document_id}`);

Upload Parameters

Parameter	Type	Required	Description
`file`	file	Yes	Document file to upload
`collection_name`	string	Yes	Target collection name
`description`	string	No	Document description
`metadata`	object	No	Custom metadata key-value pairs

Upload with Metadata

import requests

with open("policy_doc.pdf", "rb") as f:
    response = requests.post(
        "http://localhost/api/v1/rag/upload",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        files={"file": f},
        data={
            "collection_name": "policies",
            "description": "Privacy policy document",
            "metadata": json.dumps({
                "category": "legal",
                "version": "2.0",
                "effective_date": "2024-01-01",
                "department": "compliance"
            })
        }
    )

print(f"Uploaded: {response.json()['document_id']}")

Supported File Formats

Format	Extensions	Max Size
Text	`.txt`	10 MB
Markdown	`.md`	10 MB
PDF	`.pdf`	50 MB
Word	`.docx`	25 MB
JSON	`.json`	10 MB
HTML	`.html`	10 MB

Document Processing

When you upload a document:

Parse - Extract text from file
Chunk - Split into 500-1000 token pieces
Embed - Convert chunks to vectors
Index - Store vectors for fast search

Processing time depends on file size:

Small files (< 1 MB): 5-10 seconds
Medium files (1-10 MB): 10-30 seconds
Large files (10-50 MB): 30-120 seconds

Upload Multiple Documents

import os
import requests

documents = ["doc1.pdf", "doc2.txt", "doc3.md"]

for doc in documents:
    with open(doc, "rb") as f:
        response = requests.post(
            "http://localhost/api/v1/rag/upload",
            headers={"Authorization": "Bearer YOUR_API_KEY"},
            files={"file": f},
            data={
                "collection_name": "knowledge_base",
                "description": f"Document: {doc}"
            }
        )
        print(f"{doc}: {response.json()['status']}")

Upload from URL

import requests

# Download and upload from URL
url = "https://example.com/document.pdf"
response = requests.get(url)

# Upload downloaded content
files = {"file": ("document.pdf", response.content)}
upload_response = requests.post(
    "http://localhost/api/v1/rag/upload",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files=files,
    data={
        "collection_name": "documentation",
        "description": "Downloaded from external URL"
    }
)

print(f"Uploaded: {upload_response.json()['document_id']}")

Check Upload Status

import time

def upload_with_status(file_path, collection_name):
    with open(file_path, "rb") as f:
        response = requests.post(
            "http://localhost/api/v1/rag/upload",
            headers={"Authorization": "Bearer YOUR_API_KEY"},
            files={"file": f},
            data={"collection_name": collection_name}
        )
        return response.json()

result = upload_with_status("large_doc.pdf", "docs")

if result["status"] == "processing":
    print(f"Document {result['document_id']} is being processed...")
    print("You can search once processing completes.")
elif result["status"] == "completed":
    print(f"Document {result['document_id']} is ready for search.")
else:
    print(f"Error: {result.get('error', 'Unknown error')}")

Upload Response

Successful upload returns:

{
  "document_id": "doc_abc123",
  "collection_name": "documentation",
  "filename": "user_guide.pdf",
  "chunk_count": 45,
  "status": "processing",
  "created_at": "2024-01-15T10:30:00Z"
}

Best Practices

File Preparation

Use clean, well-formatted documents
Remove unnecessary images and formatting
Ensure text is readable and accessible
Check for sensitive data before uploading

Metadata Usage

Add relevant metadata for better filtering:

metadata = {
    "category": "product",
    "version": "1.2",
    "language": "en",
    "audience": "developers",
    "last_updated": "2024-01-15"
}

Batch Uploads

For large document sets:

Upload in small batches (5-10 documents)
Monitor server performance
Check processing status between batches
Handle errors and retry failed uploads

Document Size

Keep documents under 10 MB when possible
Split large documents into smaller files
Use PDFs for preserving formatting
Use plain text for fastest processing

Troubleshooting

File Too Large

Problem: {"error": "File size exceeds limit"}

Solution:

Split document into smaller files
Compress or optimize the file
Use a format with smaller file size

Unsupported Format

Problem: {"error": "Unsupported file format"}

Solution:

Convert to supported format (TXT, PDF, MD, DOCX, JSON, HTML)
Use PDF for complex documents
Use TXT for simple text documents

Processing Timeout

Problem: Upload succeeds but search returns no results

Solution:

Wait for processing to complete
Check document status endpoint
Verify collection name is correct
Review server logs for errors

Duplicate Document

Problem: Same document uploaded multiple times

Solution:

Check existing documents before upload
Use unique filenames
Delete duplicates from collection
Use metadata to identify original

Next Steps

Search Documents - Search uploaded documents
Manage Documents - List and delete documents
Bulk Upload Example - Upload multiple files

Upload a Document​

Using cURL​

Using Python​

Using JavaScript​

Upload Parameters​

Upload with Metadata​

Supported File Formats​

Document Processing​

Upload Multiple Documents​

Upload from URL​

Check Upload Status​

Upload Response​

Best Practices​

File Preparation​

Metadata Usage​

Batch Uploads​

Document Size​

Troubleshooting​

File Too Large​

Unsupported Format​

Processing Timeout​

Duplicate Document​

Next Steps​

Upload a Document

Using cURL

Using Python

Using JavaScript

Upload Parameters

Upload with Metadata

Supported File Formats

Document Processing

Upload Multiple Documents

Upload from URL

Check Upload Status

Upload Response

Best Practices

File Preparation

Metadata Usage

Batch Uploads

Document Size

Troubleshooting

File Too Large

Unsupported Format

Processing Timeout

Duplicate Document

Next Steps