Platform Architecture

Understanding Enclava's architecture helps you make informed decisions about deployment, scaling, and integration.

High-Level Architecture

+---------------------------------------------------------------+
|                       ENCLAVA PLATFORM                        |
+---------------------------------------------------------------+
|                                                               |
|   +----------------+              +------------------+         |
|   |    Frontend    |<------------>|     Backend      |         |
|   |    (Next.js)   |              |    (FastAPI)     |         |
|   +----------------+              +--------+---------+         |
|                                            |                  |
|                                            v                  |
|              +------------------------------------+            |
|              |          CORE SERVICES            |            |
|              +------------------------------------+            |
|              |  - LLM Service                    |            |
|              |  - RAG Service                    |            |
|              |  - Agent Service                  |            |
|              |  - Tool Calling Service           |            |
|              |  - Plugin System                  |            |
|              +------------------+-----------------+            |
|                                 |                             |
|                 +---------------+---------------+              |
|                 |                               |              |
|                 v                               v              |
|       +-----------------+            +-----------------+       |
|       |   PostgreSQL    |            |      Redis      |       |
|       |   (Primary DB)  |            |     (Cache)     |       |
|       +--------+--------+            +--------+--------+       |
|                |                              |                |
|                v                              v                |
|       +-----------------+            +-----------------+       |
|       |     Qdrant      |            |  PrivateMode.ai |       |
|       |   (Vector DB)   |            |   (LLM Proxy)   |       |
|       +--------+--------+            +--------+--------+       |
|                |                              |                |
+----------------|------------------------------+----------------+
                 |                              |
         Document Storage            Confidential Inference

Component Overview

Frontend (Next.js)

Purpose: User interface for managing AI resources

Features:

Create and manage chatbots, agents, and RAG collections
View analytics and usage metrics
Configure budgets and API keys
Upload and manage documents

Technology: React 18, TypeScript, Tailwind CSS, Radix UI

Backend (FastAPI)

Purpose: API server and core business logic

Features:

RESTful API with OpenAI-compatible endpoints
JWT authentication for frontend, API key auth for clients
Modular architecture with pluggable services
Real-time capabilities with WebSocket support

Technology: Python 3.11, SQLAlchemy async, Pydantic

Core Services

LLM Service

Purpose: Abstracted LLM inference layer

Capabilities:

Multiple provider support (PrivateMode.ai, others)
Circuit breaker and retry patterns
Streaming responses
Token counting and cost calculation

Key Feature: Routes through PrivateMode.ai for confidential computing

RAG Service

Purpose: Document processing and semantic search

Pipeline:

Document Parsing - Extract text from PDF, DOCX, TXT, MD, JSON
Chunking - Split documents into manageable pieces
Embedding Generation - Convert text to vector embeddings
Vector Storage - Store in Qdrant for fast retrieval
Semantic Search - Find relevant documents using similarity search

Technology: Qdrant vector database, embedding models

Agent Service

Purpose: AI agents with tool calling capabilities

Workflow:

Planning - LLM decides which tools to use
Tool Selection - Choose appropriate tools from available set
Execution - Run tools with provided parameters
Iteration - Repeat until task is complete
Synthesis - Combine results into final answer

Built-in Tools:

RAG Search - Query your document knowledge base
Web Search - Real-time information via Brave Search
Code Execution - Run Python code for data analysis

Tool Calling Service

Purpose: Execute and manage tool calls

Capabilities:

Direct tool execution (without agents)
Tool result caching
Error handling and retries
Execution timeout management

Plugin System

Purpose: Extensible architecture for custom tools

Features:

Auto-discovery of plugins
Sandboxed execution for security
Dynamic route registration
Plugin lifecycle management

Databases

PostgreSQL

Purpose: Primary persistent storage

Stores:

Users and authentication data
API keys and permissions
Chatbot and agent configurations
RAG collections and documents metadata
Conversations and messages
Budgets and usage tracking
Audit logs

Redis

Purpose: Caching and session management

Caches:

API key authentication results
Frequently accessed configurations
Rate limiting counters
Session data

Qdrant

Purpose: Vector database for semantic search

Stores:

Document embeddings
Vector representations for RAG search
Metadata for retrieval

Features:

Fast similarity search
Filtering by metadata
Collection management

External Services

PrivateMode.ai

Purpose: Confidential LLM inference

How It Works:

Uses Trusted Execution Environments (TEE)
Data encrypted in memory during inference
No data retention after processing
Privacy guarantees through hardware-level security

Benefits:

Your prompts and data never leave secure boundaries
LLM provider cannot access your data
Compliant with strict privacy requirements

Data Flow

Chat Completion Request

  Client Request
        |
        v
  [API Key Authentication]
        |
        v
  [Rate Limiting Check]
        |
        v
  [Budget Verification]
        |
        v
  [LLM Service]
        |
        v
  [PrivateMode.ai - TEE]
        |
        v
  Response (Confidential)
        |
        v
  [Cost Calculation]
        |
        v
  [Usage Tracking]
        |
        v
  Client

RAG-Enhanced Chat

  Client Request
        |
        v
  [Query RAG Collection]
        |
        v
  [Qdrant Vector Search]
        |
        v
  [Retrieve Top-K Results]
        |
        v
  [Inject Context into Prompt]
        |
        v
  [Send to LLM with Context]
        |
        v
  [Generate Response with Citations]
        |
        v
  Client

Agent Execution with Tools

  Client Request
        |
        v
  [LLM Planning]
        |
        v
  [Select Tool] ---------> [Tool Execution Service]
        |                            |
        v                            v
  [Execute Tool]           [Run Tool (RAG/Web/Code)]
        |                            |
        v                            v
  [Get Results]              [Tool Output]
        |                            |
        v                            |
  [Send Result to LLM] <-------------+
        |
        v
  [More Tools Needed?] ---Yes---> [Repeat]
        |
        No
        |
        v
  [Final Response]
        |
        v
  Client

API Architecture

Enclava provides two API layers:

Internal API (`/api-internal/v1`)

Purpose: Frontend and admin operations

Authentication: JWT tokens (session-based)

Features:

User management
Chatbot and agent configuration
RAG document management
Budget and analytics
Admin settings

Public API (`/api/v1`)

Purpose: External client applications

Authentication: API keys

Features:

OpenAI-compatible endpoints
Chatbot chat interface
Agent interaction
RAG search
Tool execution

OpenAI-Compatible Endpoints

POST /api/v1/chat/completions - Drop-in for OpenAI chat
GET /api/v1/models - List available models
POST /api/v1/embeddings - Generate embeddings

Security Layers

Network Security - CORS, SSL/TLS, rate limiting
Authentication - JWT for frontend, API keys for clients
Authorization - Role-based access control, permissions
Data Privacy - PrivateMode.ai TEE for all LLM inference
Audit Logging - All actions logged for compliance
Input Validation - Request validation and sanitization

Scalability Considerations

Horizontal Scaling

Backend: Scale with Docker Swarm or Kubernetes
Database: PostgreSQL read replicas, connection pooling
Cache: Redis clustering for distributed caching
Vector DB: Qdrant clustering for large document sets

Performance Optimization

Caching: Redis for frequently accessed data
Connection Pooling: Database connection reuse
Async Processing: Non-blocking I/O throughout
Batch Operations: Bulk embedding generation for RAG

Integration Points

Enclava can integrate with:

OpenAI Clients - Just change base_url
LangChain - Use ChatOpenAI with custom base URL
LlamaIndex - Configure with custom endpoint
Custom Tools - Via MCP (Model Context Protocol)
Webhooks - For notifications and automation

Next Steps

Understanding architecture helps with:

Deployment - Production setup
Confidential Computing - Privacy features
API Reference - Complete endpoint documentation

High-Level Architecture​

Component Overview​

Frontend (Next.js)​

Backend (FastAPI)​

Core Services​

LLM Service​

RAG Service​

Agent Service​

Tool Calling Service​

Plugin System​

Databases​

PostgreSQL​

Redis​

Qdrant​

External Services​

PrivateMode.ai​

Data Flow​

Chat Completion Request​

RAG-Enhanced Chat​

Agent Execution with Tools​

API Architecture​

Internal API (/api-internal/v1)​

Public API (/api/v1)​

OpenAI-Compatible Endpoints​

Security Layers​

Scalability Considerations​

Horizontal Scaling​

Performance Optimization​

Integration Points​

Next Steps​

High-Level Architecture

Component Overview

Frontend (Next.js)

Backend (FastAPI)

Core Services

LLM Service

RAG Service

Agent Service

Tool Calling Service

Plugin System

Databases

PostgreSQL

Redis

Qdrant

External Services

PrivateMode.ai

Data Flow

Chat Completion Request

RAG-Enhanced Chat

Agent Execution with Tools

API Architecture

Internal API (`/api-internal/v1`)

Public API (`/api/v1`)

OpenAI-Compatible Endpoints

Security Layers

Scalability Considerations

Horizontal Scaling

Performance Optimization

Integration Points

Next Steps