Skip to main content

Platform Architecture

Understanding Enclava's architecture helps you make informed decisions about deployment, scaling, and integration.

High-Level Architecture

+---------------------------------------------------------------+
| ENCLAVA PLATFORM |
+---------------------------------------------------------------+
| |
| +----------------+ +------------------+ |
| | Frontend |<------------>| Backend | |
| | (Next.js) | | (FastAPI) | |
| +----------------+ +--------+---------+ |
| | |
| v |
| +------------------------------------+ |
| | CORE SERVICES | |
| +------------------------------------+ |
| | - LLM Service | |
| | - RAG Service | |
| | - Agent Service | |
| | - Tool Calling Service | |
| | - Plugin System | |
| +------------------+-----------------+ |
| | |
| +---------------+---------------+ |
| | | |
| v v |
| +-----------------+ +-----------------+ |
| | PostgreSQL | | Redis | |
| | (Primary DB) | | (Cache) | |
| +--------+--------+ +--------+--------+ |
| | | |
| v v |
| +-----------------+ +-----------------+ |
| | Qdrant | | PrivateMode.ai | |
| | (Vector DB) | | (LLM Proxy) | |
| +--------+--------+ +--------+--------+ |
| | | |
+----------------|------------------------------+----------------+
| |
Document Storage Confidential Inference

Component Overview

Frontend (Next.js)

Purpose: User interface for managing AI resources

Features:

  • Create and manage chatbots, agents, and RAG collections
  • View analytics and usage metrics
  • Configure budgets and API keys
  • Upload and manage documents

Technology: React 18, TypeScript, Tailwind CSS, Radix UI

Backend (FastAPI)

Purpose: API server and core business logic

Features:

  • RESTful API with OpenAI-compatible endpoints
  • JWT authentication for frontend, API key auth for clients
  • Modular architecture with pluggable services
  • Real-time capabilities with WebSocket support

Technology: Python 3.11, SQLAlchemy async, Pydantic

Core Services

LLM Service

Purpose: Abstracted LLM inference layer

Capabilities:

  • Multiple provider support (PrivateMode.ai, others)
  • Circuit breaker and retry patterns
  • Streaming responses
  • Token counting and cost calculation

Key Feature: Routes through PrivateMode.ai for confidential computing

RAG Service

Purpose: Document processing and semantic search

Pipeline:

  1. Document Parsing - Extract text from PDF, DOCX, TXT, MD, JSON
  2. Chunking - Split documents into manageable pieces
  3. Embedding Generation - Convert text to vector embeddings
  4. Vector Storage - Store in Qdrant for fast retrieval
  5. Semantic Search - Find relevant documents using similarity search

Technology: Qdrant vector database, embedding models

Agent Service

Purpose: AI agents with tool calling capabilities

Workflow:

  1. Planning - LLM decides which tools to use
  2. Tool Selection - Choose appropriate tools from available set
  3. Execution - Run tools with provided parameters
  4. Iteration - Repeat until task is complete
  5. Synthesis - Combine results into final answer

Built-in Tools:

  • RAG Search - Query your document knowledge base
  • Web Search - Real-time information via Brave Search
  • Code Execution - Run Python code for data analysis

Tool Calling Service

Purpose: Execute and manage tool calls

Capabilities:

  • Direct tool execution (without agents)
  • Tool result caching
  • Error handling and retries
  • Execution timeout management

Plugin System

Purpose: Extensible architecture for custom tools

Features:

  • Auto-discovery of plugins
  • Sandboxed execution for security
  • Dynamic route registration
  • Plugin lifecycle management

Databases

PostgreSQL

Purpose: Primary persistent storage

Stores:

  • Users and authentication data
  • API keys and permissions
  • Chatbot and agent configurations
  • RAG collections and documents metadata
  • Conversations and messages
  • Budgets and usage tracking
  • Audit logs

Redis

Purpose: Caching and session management

Caches:

  • API key authentication results
  • Frequently accessed configurations
  • Rate limiting counters
  • Session data

Qdrant

Purpose: Vector database for semantic search

Stores:

  • Document embeddings
  • Vector representations for RAG search
  • Metadata for retrieval

Features:

  • Fast similarity search
  • Filtering by metadata
  • Collection management

External Services

PrivateMode.ai

Purpose: Confidential LLM inference

How It Works:

  • Uses Trusted Execution Environments (TEE)
  • Data encrypted in memory during inference
  • No data retention after processing
  • Privacy guarantees through hardware-level security

Benefits:

  • Your prompts and data never leave secure boundaries
  • LLM provider cannot access your data
  • Compliant with strict privacy requirements

Data Flow

Chat Completion Request

  Client Request
|
v
[API Key Authentication]
|
v
[Rate Limiting Check]
|
v
[Budget Verification]
|
v
[LLM Service]
|
v
[PrivateMode.ai - TEE]
|
v
Response (Confidential)
|
v
[Cost Calculation]
|
v
[Usage Tracking]
|
v
Client

RAG-Enhanced Chat

  Client Request
|
v
[Query RAG Collection]
|
v
[Qdrant Vector Search]
|
v
[Retrieve Top-K Results]
|
v
[Inject Context into Prompt]
|
v
[Send to LLM with Context]
|
v
[Generate Response with Citations]
|
v
Client

Agent Execution with Tools

  Client Request
|
v
[LLM Planning]
|
v
[Select Tool] ---------> [Tool Execution Service]
| |
v v
[Execute Tool] [Run Tool (RAG/Web/Code)]
| |
v v
[Get Results] [Tool Output]
| |
v |
[Send Result to LLM] <-------------+
|
v
[More Tools Needed?] ---Yes---> [Repeat]
|
No
|
v
[Final Response]
|
v
Client

API Architecture

Enclava provides two API layers:

Internal API (/api-internal/v1)

Purpose: Frontend and admin operations

Authentication: JWT tokens (session-based)

Features:

  • User management
  • Chatbot and agent configuration
  • RAG document management
  • Budget and analytics
  • Admin settings

Public API (/api/v1)

Purpose: External client applications

Authentication: API keys

Features:

  • OpenAI-compatible endpoints
  • Chatbot chat interface
  • Agent interaction
  • RAG search
  • Tool execution

OpenAI-Compatible Endpoints

  • POST /api/v1/chat/completions - Drop-in for OpenAI chat
  • GET /api/v1/models - List available models
  • POST /api/v1/embeddings - Generate embeddings

Security Layers

  1. Network Security - CORS, SSL/TLS, rate limiting
  2. Authentication - JWT for frontend, API keys for clients
  3. Authorization - Role-based access control, permissions
  4. Data Privacy - PrivateMode.ai TEE for all LLM inference
  5. Audit Logging - All actions logged for compliance
  6. Input Validation - Request validation and sanitization

Scalability Considerations

Horizontal Scaling

  • Backend: Scale with Docker Swarm or Kubernetes
  • Database: PostgreSQL read replicas, connection pooling
  • Cache: Redis clustering for distributed caching
  • Vector DB: Qdrant clustering for large document sets

Performance Optimization

  • Caching: Redis for frequently accessed data
  • Connection Pooling: Database connection reuse
  • Async Processing: Non-blocking I/O throughout
  • Batch Operations: Bulk embedding generation for RAG

Integration Points

Enclava can integrate with:

  • OpenAI Clients - Just change base_url
  • LangChain - Use ChatOpenAI with custom base URL
  • LlamaIndex - Configure with custom endpoint
  • Custom Tools - Via MCP (Model Context Protocol)
  • Webhooks - For notifications and automation

Next Steps

Understanding architecture helps with: