10/10 task success | 91.3% precision | 78% less stale facts vs RAG →

A memory stronghold
for your AI coding agent.

The first thing a new engineer needs is context. The same is true for your AI coding agent. Borg compiles it automatically from every prior session — one Postgres, no SDKs, no re-explaining.

Run one install command, type borg init in your project, and every AI coding session builds a knowledge graph that makes the next session smarter.

Open source. Apache 2.0. One local install.

Current release: single-user, no auth, localhost only. Episodes are embedded and extracted via the OpenAI API (or your Azure OpenAI endpoint) — everything else stays on your machine. Not intended for shared deployments until auth lands.
Quick start
curl -fsSL https://raw.githubusercontent.com/villanub/borgmemory/main/install.sh | sh
borg init
Borg MCP across Codex, Claude Code, and Kiro
> "What patterns do I follow when debugging auth issues?"

borg_think → classify: debug (0.92) + architecture (0.31)
           → retrieve: graph_neighborhood + episode_recall + fact_lookup
           → rank: 14 candidates → 6 selected (1,840 tokens)
           → compile: structured XML

<borg model="claude" ns="product-engineering" task="debug">
  <knowledge>
    <fact status="observed" salience="0.94">Webhook gateway verifies HMAC signatures before enqueue</fact>
    <fact status="observed" salience="0.88">Background jobs retry with exponential backoff</fact>
  </knowledge>
  <episodes>
    <episode source="claude-code" date="2026-03-01">Fixed duplicate webhook delivery during replay</episode>
    <episode source="claude-code" date="2026-02-14">Resolved OAuth scope mismatch in staging</episode>
  </episodes>
  <patterns>
    <procedure confidence="0.92">Debug auth: verify scopes, then inspect token audience and issuer</procedure>
  </patterns>
</borg>

What Borg does differently

Most memory tools store text and search by similarity. Borg extracts structured knowledge, builds a temporal graph, and compiles task-specific context packages.

Knowledge Graph Extraction

An LLM pipeline extracts entities, facts, and procedures from every conversation. Three-pass entity resolution prevents collisions. 24 canonical predicates ensure consistent relationships.

Entities are resolved by exact match → alias match → semantic similarity (0.92 threshold). Fragmentation is preferred over collision — two separate entities can be merged later.

Temporal Facts with Supersession

Facts carry valid_from and valid_until timestamps. When a new fact contradicts an old one, the old fact is marked superseded — not deleted. The full history is always available for compliance queries.

Seven evidence statuses: user_asserted, observed, extracted, inferred, promoted, deprecated, superseded.

🎯

Task-Specific Compilation

Dual-profile intent classification determines what kind of memory a query needs. Debug tasks get episodic + procedural memory. Architecture tasks get semantic facts. Compliance tasks exclude procedural entirely.

Memory-type weight modifiers bias ranking without hard exclusion (except procedural in compliance where weight = 0.0).

📊

Inspectable Ranking

Every candidate is scored on four dimensions: relevance, recency, stability, and provenance. Every compilation logs which items were selected, which were rejected, and why.

The audit log is the primary tool for improving retrieval quality. No opaque composite scores.

🔒

Namespace Isolation

Hard isolation by default. Every entity, fact, and episode belongs to exactly one namespace. No cross-namespace queries. Configurable token budgets per namespace.

If 'APIM' appears in two projects, it exists as two separate entity records. Restrictive by design — cross-namespace is a future feature, not an accident.

🐘

PostgreSQL Maximalism

One database, no exceptions. Graph traversal via recursive CTEs. Embeddings via pgvector. Audit via pgAudit. No external graph database, no separate vector store. Nothing gets out of sync.

15 tables + 1 function. Runs on Azure PostgreSQL Flexible Server, Supabase, Neon, or any Postgres 14+.

How it works

Two pipelines that share a database but never share runtime. Online never waits for offline.

ONLINE PIPELINE

Serves borg_think queries. Latency-sensitive. Compiles context in real time.

1. Classify intent

Dual-profile — primary + secondary task class with confidence scores. Both profiles run retrieval.

2. Retrieve candidates

Up to 3 strategies in parallel: fact lookup, episode recall, graph neighborhood, procedure assist. Vector search when embeddings exist, recency fallback otherwise.

3. Rank and trim

4-dimension scoring (relevance × type weight, recency, stability + salience, provenance). Dedup by content. Trim to namespace token budget.

4. Compile package

Structured XML for Claude/Copilot, compact JSON for GPT/Codex. Model assignment via parameter.

5. Update access tracking

Batch-update entity_state and fact_state for selected candidates. Feeds hot-tier promotion.

6. Audit log

Full trace: classification, profiles executed, score breakdowns, rejection reasons, latency per stage.

OFFLINE PIPELINE

Processes episodes via borg_learn. Runs asynchronously. Never blocks queries.

1. Ingest + dedup

SHA-256 content hash + source_event_id unique constraint. Duplicate episodes return existing ID.

2. Generate embedding

Azure OpenAI text-embedding-3-small (1536-dim). Gracefully skips if not configured.

3. Extract entities

LLM extracts up to 10 entities per episode with typed taxonomy and aliases.

4. Resolve entities

Three-pass: exact match → alias match → semantic (0.92 threshold). Ambiguous matches flagged as conflicts.

5. Extract facts + validate predicates

LLM extracts up to 8 fact triples. Predicates validated against 24-predicate canonical registry. Custom predicates tracked.

6. Supersession check

Same subject + predicate + different object → old fact marked superseded with valid_until.

7. Extract procedures

LLM extracts up to 3 repeatable patterns. Existing patterns merged (observation count bumped, confidence averaged).

8. Snapshot

Every 24h, hot-tier state captured for all namespaces. Enables cold-start and drift detection.

Three MCP tools

Not five. Three tools that cover the entire interaction surface.

borg_think

Compile context for a query. Runs the full online pipeline — classify, retrieve, rank, compile. Returns structured or compact context package.

borg_think(
  query: "debug webhook delivery timeout",
  namespace: "product-engineering",
  model: "claude",
  task_hint: "debug"
)
borg_learn

Record a decision, discovery, or conversation. Stored immediately, extraction happens in the background. Returns in milliseconds.

borg_learn(
  content: "Decided to version event payloads through a schema registry...",
  source: "claude-code",
  namespace: "product-engineering"
)
borg_recall

Search memory directly without compilation. Returns raw episodes, facts, and procedures. For when you want to browse, not compile.

borg_recall(
  query: "release checklist",
  namespace: "product-engineering",
  memory_type: "semantic"
)

Try it

# In Codex, Claude Code, or Kiro with Borg connected:
> "Remember that preview environments expire after 7 days unless renewed"

  borg_learn → episode accepted, queued for extraction
  → worker: embedded, 2 entities extracted, 1 fact created

# Later, in the same client or a different one:
> "What's our preview environment policy?"

  borg_think → classify: architecture (0.87)
            → fact_lookup: "Preview environments expire after 7 days unless renewed"
            → compiled into context package (340 tokens)

# It works across clients because they all hit the same PostgreSQL database.

The stack

API Runtime

FastAPI + FastMCP 3

Streamable HTTP MCP and REST on :8080

Auth

Passthrough

No authentication — single-user, local deployment

Database

PostgreSQL 14+

Knowledge graph, pgvector embeddings, and audit trail

Extraction

OpenAI / Azure OpenAI

Supports both standard OpenAI and Azure OpenAI endpoints