Local-First Intelligence Engine
Cortex is a privacy-sovereign desktop application engineered to convert chaotic local file systems into a structured, semantic knowledge base. It functions as an on-device "Second Brain," capable of ingesting, understanding, and retrieving information from documents and images without a single byte leaving the machine.
By leveraging Edge AI, Cortex eliminates cloud dependence, ensuring absolute data privacy while delivering low-latency multimodal retrieval.
System Architecture
The system orchestrates a hybrid architecture, bridging a modern web-tech frontend with a high-performance Python inference engine.
- Architecture: Hybrid Electron (Node.js) + Python (FastAPI)
- Inference Strategy: Local LLM/VLM via Ollama
- Indexing: Vector Embeddings (ChromaDB)
- Privacy Level: Air-gapped / Local-only
Core Capabilities
Cortex transcends traditional keyword search by implementing a full Multimodal RAG (Retrieval-Augmented Generation) pipeline:
| Feature | Description | Technology |
|---|---|---|
| Visual Semantics | "Looks" at images to generate searchable captions (e.g., invoices, charts) | Llava (Vision-Language Model) |
| Vector Search | Maps queries to file content based on meaning, not just filenames | Hugging Face Embeddings |
| Cross-Lingual | Seamlessly maps Thai natural language queries to English content | Internal Translation Layer |
| Edge RAG | Synthesizes answers from retrieved context purely on-device | Mistral / Llama via Ollama |
Technology Stack
| Component | Technology | Role |
|---|---|---|
| Frontend | Electron + Next.js | "Organic Bento Glass" UI for a modern desktop experience. |
| Backend | Python (FastAPI) | Handles file ingestion, PDF extraction, and API orchestration. |
| Vector DB | ChromaDB | High-performance local vector store for embeddings. |
| Inference | Ollama | Manages local model quantization and execution. |
Implementation
Deploying Cortex requires the Ollama runtime for model orchestration.
1. Model Provisioning
Initialize the required local models (Vision and Text):
ollama pull mistral # The Reasoning Brain
ollama pull llava # The Visual Cortex2. Ignition
Start the inference backend and the client interface:
# Backend (The Brain)
cd backend
pip install -r requirements.txt
uvicorn main:app --reload
# Frontend (The Interface)
cd frontend
npm install
npm run devOperational Logic
The Ingestion Pipeline:
- Scan: System traverses target directories for .pdf, .txt, .png, and .jpg.
- Vision Decoding: Images are passed through Llava to generate dense descriptive captions (e.g., "A screenshot of a K-Bank transaction slip").
- Vectorization: Text and captions are converted into 768-dimensional vectors using paraphrase-multilingual models.
- Retrieval: User queries (e.g., "หาสลิปเงินที่โอนเมื่อวาน") are translated, vectorized, and matched against the local ChromaDB index to retrieve the exact file context.
Design Philosophy
Privacy Sovereignty: In 2026, data ownership is paramount. Cortex is built on the principle that your personal data—financial records, journals, project files—should never be processed on a third-party server.
Solving the Semantic Gap: Most local search tools fail at images. Cortex bridges this gap by automatically converting visual data into semantic text, making your screenshots just as searchable as your documents.