agent-memoryollamalocal-first

Open-Source Agent Memory: Local-First Pattern (2026)

Multi-agent RAG + hybrid wiki search + transparency scores, all local with Ollama. The pattern works for entity-centric facts. Periodic re-verification via search keeps memory current.

5 min read

A post announced an open-source "super memory" for AI agents: multi-agent RAG, hybrid wiki search, transparency scores, all running locally with Ollama. The concept addresses a real gap: AI agents have no persistent memory across sessions, and existing solutions require cloud vector stores.

The agent memory problem

Today's AI agents forget everything between sessions. Each conversation starts from zero. RAG partially solves this by retrieving stored documents, but RAG requires embeddings, a vector store, and chunking infrastructure. For local-first setups, that is a lot of moving parts.

The wiki-search hybrid pattern

Instead of a full RAG pipeline, store agent knowledge as structured wiki entries (entity, fact, last_verified timestamp). Search over entries with keyword matching for exact lookups and embeddings for semantic similarity. This gives RAG-like retrieval without the chunking complexity.

Keeping memory current with live search

Python
import requests, os, json

key = os.environ["SCAVIO_API_KEY"]

# Agent memory entry that might be stale
memory_entry = {"entity": "Firecrawl", "fact": "Starts at $16/mo",
                "last_verified": "2026-04-01"}

# Re-verify via live search
resp = requests.post("https://api.scavio.dev/api/v1/search",
    headers={"x-api-key": key},
    json={"query": f"{memory_entry['entity']} pricing 2026",
          "platform": "google", "limit": 3})
snippets = [r["snippet"] for r in resp.json().get("results", [])]

# LLM diff: has the fact changed?
prompt = f"Stored: {memory_entry['fact']}. Current SERP: {snippets}. Changed? JSON: changed, new_fact."
# result = llm.complete(prompt)
# if result.changed: update memory_entry

The transparency score

The project introduces a "transparency score" per memory entry: how recently was this fact verified, how many sources confirmed it, how confident is the agent in this knowledge. This is a good pattern. Agents that know what they do not know are more useful than agents that confidently hallucinate.

Limitations of local-only memory

  • Local embeddings (Ollama) are less accurate than cloud embeddings (OpenAI, Cohere)
  • No multi-device sync without additional infrastructure
  • Memory search quality depends on the local model size
  • Facts go stale without periodic re-verification against live search