Platform Architecture
Understanding Enclava's architecture helps you make informed decisions about deployment, scaling, and integration.
High-Level Architecture
+---------------------------------------------------------------+
| ENCLAVA PLATFORM |
+---------------------------------------------------------------+
| |
| +----------------+ +------------------+ |
| | Frontend |<------------>| Backend | |
| | (Next.js) | | (FastAPI) | |
| +----------------+ +--------+---------+ |
| | |
| v |
| +------------------------------------+ |
| | CORE SERVICES | |
| +------------------------------------+ |
| | - LLM Service | |
| | - RAG Service | |
| | - Agent Service | |
| | - Tool Calling Service | |
| | - Plugin System | |
| +------------------+-----------------+ |
| | |
| +---------------+---------------+ |
| | | |
| v v |
| +-----------------+ +-----------------+ |
| | PostgreSQL | | Redis | |
| | (Primary DB) | | (Cache) | |
| +--------+--------+ +--------+--------+ |
| | | |
| v v |
| +-----------------+ +-----------------+ |
| | Qdrant | | PrivateMode.ai | |
| | (Vector DB) | | (LLM Proxy) | |
| +--------+--------+ +--------+--------+ |
| | | |
+----------------|------------------------------+----------------+
| |
Document Storage Confidential Inference
Component Overview
Frontend (Next.js)
Purpose: User interface for managing AI resources
Features:
- Create and manage chatbots, agents, and RAG collections
- View analytics and usage metrics
- Configure budgets and API keys
- Upload and manage documents
Technology: React 18, TypeScript, Tailwind CSS, Radix UI
Backend (FastAPI)
Purpose: API server and core business logic
Features:
- RESTful API with OpenAI-compatible endpoints
- JWT authentication for frontend, API key auth for clients
- Modular architecture with pluggable services
- Real-time capabilities with WebSocket support
Technology: Python 3.11, SQLAlchemy async, Pydantic
Core Services
LLM Service
Purpose: Abstracted LLM inference layer
Capabilities:
- Multiple provider support (PrivateMode.ai, others)
- Circuit breaker and retry patterns
- Streaming responses
- Token counting and cost calculation
Key Feature: Routes through PrivateMode.ai for confidential computing
RAG Service
Purpose: Document processing and semantic search
Pipeline:
- Document Parsing - Extract text from PDF, DOCX, TXT, MD, JSON
- Chunking - Split documents into manageable pieces
- Embedding Generation - Convert text to vector embeddings
- Vector Storage - Store in Qdrant for fast retrieval
- Semantic Search - Find relevant documents using similarity search
Technology: Qdrant vector database, embedding models
Agent Service
Purpose: AI agents with tool calling capabilities
Workflow:
- Planning - LLM decides which tools to use
- Tool Selection - Choose appropriate tools from available set
- Execution - Run tools with provided parameters
- Iteration - Repeat until task is complete
- Synthesis - Combine results into final answer
Built-in Tools:
- RAG Search - Query your document knowledge base
- Web Search - Real-time information via Brave Search
- Code Execution - Run Python code for data analysis
Tool Calling Service
Purpose: Execute and manage tool calls
Capabilities:
- Direct tool execution (without agents)
- Tool result caching
- Error handling and retries
- Execution timeout management
Plugin System
Purpose: Extensible architecture for custom tools
Features:
- Auto-discovery of plugins
- Sandboxed execution for security
- Dynamic route registration
- Plugin lifecycle management
Databases
PostgreSQL
Purpose: Primary persistent storage
Stores:
- Users and authentication data
- API keys and permissions
- Chatbot and agent configurations
- RAG collections and documents metadata
- Conversations and messages
- Budgets and usage tracking
- Audit logs
Redis
Purpose: Caching and session management
Caches:
- API key authentication results
- Frequently accessed configurations
- Rate limiting counters
- Session data
Qdrant
Purpose: Vector database for semantic search
Stores:
- Document embeddings
- Vector representations for RAG search
- Metadata for retrieval
Features:
- Fast similarity search
- Filtering by metadata
- Collection management
External Services
PrivateMode.ai
Purpose: Confidential LLM inference
How It Works:
- Uses Trusted Execution Environments (TEE)
- Data encrypted in memory during inference
- No data retention after processing
- Privacy guarantees through hardware-level security
Benefits:
- Your prompts and data never leave secure boundaries
- LLM provider cannot access your data
- Compliant with strict privacy requirements
Data Flow
Chat Completion Request
Client Request
|
v
[API Key Authentication]
|
v
[Rate Limiting Check]
|
v
[Budget Verification]
|
v
[LLM Service]
|
v
[PrivateMode.ai - TEE]
|
v
Response (Confidential)
|
v
[Cost Calculation]
|
v
[Usage Tracking]
|
v
Client
RAG-Enhanced Chat
Client Request
|
v
[Query RAG Collection]
|
v
[Qdrant Vector Search]
|
v
[Retrieve Top-K Results]
|
v
[Inject Context into Prompt]
|
v
[Send to LLM with Context]
|
v
[Generate Response with Citations]
|
v
Client
Agent Execution with Tools
Client Request
|
v
[LLM Planning]
|
v
[Select Tool] ---------> [Tool Execution Service]
| |
v v
[Execute Tool] [Run Tool (RAG/Web/Code)]
| |
v v
[Get Results] [Tool Output]
| |
v |
[Send Result to LLM] <-------------+
|
v
[More Tools Needed?] ---Yes---> [Repeat]
|
No
|
v
[Final Response]
|
v
Client
API Architecture
Enclava provides two API layers:
Internal API (/api-internal/v1)
Purpose: Frontend and admin operations
Authentication: JWT tokens (session-based)
Features:
- User management
- Chatbot and agent configuration
- RAG document management
- Budget and analytics
- Admin settings
Public API (/api/v1)
Purpose: External client applications
Authentication: API keys
Features:
- OpenAI-compatible endpoints
- Chatbot chat interface
- Agent interaction
- RAG search
- Tool execution
OpenAI-Compatible Endpoints
POST /api/v1/chat/completions- Drop-in for OpenAI chatGET /api/v1/models- List available modelsPOST /api/v1/embeddings- Generate embeddings
Security Layers
- Network Security - CORS, SSL/TLS, rate limiting
- Authentication - JWT for frontend, API keys for clients
- Authorization - Role-based access control, permissions
- Data Privacy - PrivateMode.ai TEE for all LLM inference
- Audit Logging - All actions logged for compliance
- Input Validation - Request validation and sanitization
Scalability Considerations
Horizontal Scaling
- Backend: Scale with Docker Swarm or Kubernetes
- Database: PostgreSQL read replicas, connection pooling
- Cache: Redis clustering for distributed caching
- Vector DB: Qdrant clustering for large document sets
Performance Optimization
- Caching: Redis for frequently accessed data
- Connection Pooling: Database connection reuse
- Async Processing: Non-blocking I/O throughout
- Batch Operations: Bulk embedding generation for RAG
Integration Points
Enclava can integrate with:
- OpenAI Clients - Just change
base_url - LangChain - Use
ChatOpenAIwith custom base URL - LlamaIndex - Configure with custom endpoint
- Custom Tools - Via MCP (Model Context Protocol)
- Webhooks - For notifications and automation
Next Steps
Understanding architecture helps with:
- Deployment - Production setup
- Confidential Computing - Privacy features
- API Reference - Complete endpoint documentation