The Sovereign Modular
AI Stack

Kompile projects crawl your data and immediately compile knowledge.
Three pillars. One platform. Models, knowledge, and applications — compiled.

Kompile RAG Console — real-time crawl with live pipeline progress, graph extraction, and activity log

Real-time crawl in the Kompile RAG Console — live pipeline stages, graph extraction, embedding, and activity log

THE STARTING POINT

Kompile Projects

A project is the self-contained unit at the center of everything Kompile does. It crawls your data, compiles it into knowledge, and gives your AI a structured world to reason over.

1. Init

kompile init scaffolds a project with config, directory structure, default pipelines, and model assignments — ready to crawl in seconds.

2. Crawl & Compile

Point the project at your sources — Confluence, Jira, Slack, local files, databases, web — and Kompile automatically crawls, chunks, embeds, and indexes everything into sorted ontologies and knowledge graphs.

3. Act

Chat, query, or build agents against the compiled knowledge. Every project carries its own scoped config, model assignments, vector indexes, and knowledge graph — run multiple projects on the same machine.

kompile init kompile crawl kompile chat — from zero to answering questions against your compiled knowledge in three commands.

COMPILE MODELS

Optimize. Train. Deploy. On Your Hardware.

Download models from anywhere, compile them into optimized execution graphs, fine-tune on your proprietary data, and serve them locally — cutting inference costs while keeping full control.

GPU

Multi-GPU Automatic Scheduling

Kompile automatically routes workloads across your GPUs. Per-service device routing lets you pin embeddings, LLM inference, and vision models to specific devices, while the resource-aware scheduler handles memory reservation, priority preemption, and admission control.

Device Routing

Route embedding, LLM, VLM encoder, VLM decoder, ingest, and vector population workloads to specific CUDA devices. Auto-route vision models to the largest available GPU.

Dynamic Batching

Continuous batching with per-model priority queues, configurable batch sizes, and max queue delay — maximize throughput without sacrificing latency.

Memory Management

Reservation-based GPU memory pools with admission control, concurrent load limits, and KV cache management with prefix indexing and priority eviction.

RUNS ON YOUR HARDWARE

NVIDIAAMDIntel
optimization

Graph Optimizations

Raw models are compiled through a multi-pass optimization pipeline that eliminates waste, fuses operations, and targets your specific hardware.

Cleanup & Simplification

Dead code elimination, constant folding, identity removal, and algebraic simplifications (add-zero, multiply-one, subtract-self, divide-one) strip unnecessary computation.

Attention & Activation Fusion

Fuse Q*K*V attention patterns with causal masking, merge matmul+add into single ops, and collapse Sigmoid*Mul into SwiGLU and RMSNorm patterns.

Hardware Targeting

CuDNN kernel selection, Triton GPU compilation with warp and stage tuning, and automatic quantization to INT8, FP16, and BFloat16.

Performance Profiles

Choose from presets — Debug, Balanced, Max Performance, LLM Optimal — or compose your own pass pipeline for full control over the compilation.

training

Training & Fine-Tuning

Customize models on your own data without fragmented external tooling. Every method runs natively inside Kompile.

PEFT / Adapters

LoRA, QLoRA, AdaLoRA, DyLoRA, DoRA, IA3, Prompt Tuning, and Prefix Tuning — with native weight merging when you're ready to ship.

Alignment

DPO, KTO, ORPO, PPO, and GRPO alignment methods with reward model support and streaming training logs.

Distillation

Teacher-student distillation with logit, feature, attention, and combined modes — compress large models into production-sized versions.

registry

Registry & Air-Gapping

A proper model registry with full lifecycle management. Import from HuggingFace, package into .karch archives, and deploy to fully air-gapped environments.

.karch Archives

Self-contained model archives with manifests and SHA-256 checksums. Export, import, publish, and download via CLI or API. Move models across air-gapped boundaries with a single file.

Model Lifecycle

Full promote, replace, convert, and delete workflows. Import from ONNX, TensorFlow, Keras, and GGUF/GGML formats. Support for LLaMA, Mistral, Mixtral, Phi, Qwen, Gemma, Falcon, and more.

IMPORT FROM FRAMEWORKS YOU TRUST

TensorFlowPyTorchONNXKerasJAX

COMPILE KNOWLEDGE

Crawl Everything. Build Graphs. Reason Instantly.

Kompile projects crawl your data sources and compile them into structured ontologies and knowledge graphs your AI can reason over — no manual curation required.

RAG

RAG Pipeline

A full retrieval-augmented generation pipeline with pluggable stages. Embed, retrieve, rerank, and generate — each step swappable independently.

Query Transformers

HyDE (hypothetical document embeddings), multi-query generation, query expansion, compression, and step-back prompting — automatically reformulate queries for better retrieval.

Contextual Enrichment

LLM-based chunk enrichment adds surrounding context to each retrieved passage before generation, reducing hallucination and improving answer quality.

Guardrails

Built-in input guards (PII detection, prompt injection, toxicity, topic filtering) and output guards (hallucination detection, relevancy scoring, format enforcement).

Evaluation Harness

Measure RAG quality with built-in evaluators, experiment tracking, eval suites, and dataset management — know when your pipeline is actually improving.

GraphRAG

GraphRAG

Go beyond flat vector search. GraphRAG extracts entities and relationships from your documents, builds a graph, detects communities, and uses graph structure to answer questions that require reasoning across multiple sources.

Entity & Relation Extraction

Multi-agent extraction with LLM-based and pattern-based agents working in parallel to identify entities and relationships from unstructured text.

Community Detection

Louvain community detection, PageRank, betweenness centrality, and LLM-generated community summaries for hierarchical graph reasoning.

Neo4j & Native Storage

Run against Neo4j for production graph queries with Cypher, or use the built-in adjacency matrix graph for embedded deployments.

Knowledge Graphs

Knowledge Graphs

Compile your entire data estate into typed, versioned knowledge graphs with entity resolution, schema enforcement, and graph embeddings.

Automated Construction

LLM-driven or manual graph building with concept extraction, entity resolution, and graph compaction. Named graphs, fact sheets, and schema enforcement modes keep your ontology clean.

Graph Embeddings

Native TransE and RotatE knowledge graph embedding models for link prediction and entity similarity — turn your graph into a queryable vector space.

Export & Interop

Export to CSV, JSON, JSON-LD, GraphML, Cypher, HTML, SVG, Wiki, and Obsidian vault. Merge and sync graphs across environments.

Data Crawlers

Crawl Confluence, Jira, Notion, Slack, Discord, Google Workspace, OneDrive, Reddit, email inboxes, and the web — all compiled into your graph automatically.

COMPILE APPLICATIONS

One Interface. Every Provider.

Build one application and one CLI harness against Kompile's unified interface. Swap LLM providers, vector stores, embedding models, and data sources without rewriting a single line of business logic.

LLM Providers

Every provider speaks the same interface. Switch from OpenAI to a local Ollama instance or a self-hosted vLLM server — your application code stays identical. Any OpenAI-compatible endpoint works as a drop-in backend.

API PROVIDERS

OpenAIOpenAI
AnthropicAnthropic
Google GeminiGoogle Gemini
Meta LlamaMeta Llama
OllamaOllama
Azure OpenAIAzure OpenAI

CLI AGENT BACKENDS

Claude CLIClaude CLI
Codex CLICodex CLI
Gemini CLIGemini CLI
Qwen CLIQwen CLI

Embeddings & Vector Stores

The same retrieval code works across all embedding and storage backends. Run fully local with Anserini and SameDiff, or connect to managed services — one API for all.

EMBEDDING MODELS

OpenAI EmbeddingsOpenAI Embeddings
BGE / Arctic (ONNX)BGE / Arctic (ONNX)
SameDiff NativeSameDiff Native
Sentence TransformersSentence Transformers
PostgresMLPostgresML

VECTOR STORES

Anserini / LuceneAnserini / Lucene
PostgreSQL pgvectorPostgreSQL pgvector
VespaVespa
ChromaDBChromaDB

Data Sources & Crawlers

Crawl your entire data estate through a unified ingest pipeline. Every source feeds into the same chunking, embedding, and indexing stages.

ConfluenceConfluence
JiraJira
NotionNotion
SlackSlack
DiscordDiscord
Google WorkspaceGoogle Workspace
OneDriveOneDrive
RedditReddit
GmailGmail

Orchestration & Compute Engines

Go beyond simple chains. Plug in visual workflow engines, business rule systems, or graph databases — all through the same Kompile interface.

Apache CamelApache Camel
n8n Workflowsn8n Workflows
Neo4jNeo4j

Agent-to-Agent Protocol (A2A)

Kompile agents can communicate directly with each other via the A2A protocol, enabling multi-agent architectures where specialized agents coordinate without centralized orchestration.

JOIN THE WAITLIST

Kompile is currently in Early Access Only mode.
Join the waitlist & unlock the full potential of the Modular AI Stack on your own infrastructure.

By filling out this form and clicking submit, you agree to our Privacy Policy