agentsmemorygrounding

Non-LLM Memory Servers: Stop Agent Reinterpretation

LLM-based memory systems reinterpret stored context on every recall, drifting from original meaning. Non-LLM memory with search grounding fixes this.

6 min read

LLM-based memory servers (like mem0) reinterpret your stored context through the model before returning it, which introduces subtle drift. Non-LLM memory servers (like Fidelis) store facts verbatim and return them as-is. When combined with a search MCP for current data, non-LLM memory gives you grounded recall without reinterpretation — the agent works with what you actually said, not what the LLM thinks you meant.

The reinterpretation problem

Discussions on r/ClaudeCode and r/ClaudeWorkflows about memory servers surfaced a pattern: developers store a specific fact ("our API rate limit is 100 requests/minute"), and when the agent retrieves it later, the memory system returns a paraphrased version ("the API has moderate rate limiting, approximately 100 requests per minute"). That "approximately" was not in the original. The LLM in the memory pipeline decided to be helpful and softened the precision.

This matters because agents act on retrieved memory. If the memory says "approximately 100" instead of "exactly 100", the agent might set a retry threshold at 110 instead of 99. Small reinterpretations compound into real behavioral drift over time.

How LLM-based memory works

Memory servers like mem0 use an LLM to process memories at storage time and/or retrieval time. At storage: the LLM summarizes, deduplicates, and categorizes the memory. At retrieval: the LLM selects relevant memories and may rephrase them to fit the current context. Both operations introduce reinterpretation.

The benefit of LLM-based memory is better semantic search and deduplication. The cost is that you never get back exactly what you stored. For creative and conversational applications, this trade-off is fine. For technical and factual applications, it is not.

How non-LLM memory works

Non-LLM memory servers like Fidelis store text as-is. No summarization, no paraphrasing, no LLM processing. Retrieval uses keyword matching, embedding similarity (computed without an LLM), or exact key lookup. What you store is what you get back, byte for byte.

The trade-off: semantic search is less sophisticated. If you store "API rate limit is 100/min" and later search for "request throttling", a non-LLM system might not find the match. LLM-based memory would make that connection. The mitigation is consistent key naming and explicit tagging at storage time.

Combining non-LLM memory with search MCP

The architecture that solves both problems — accurate recall and current data — is non-LLM memory for stored facts plus a search MCP for live information. The memory handles what you know (project config, decisions, team preferences). The search API handles what is happening now (current pricing, documentation updates, competitor changes).

JSON
// .mcp.json - Memory + Search configuration
{
  "mcpServers": {
    "memory": {
      "command": "npx",
      "args": ["-y", "fidelis-memory-server"],
      "env": {
        "STORAGE_PATH": "./project-memory"
      }
    },
    "scavio": {
      "type": "url",
      "url": "https://mcp.scavio.dev/mcp",
      "headers": {
        "x-api-key": "YOUR_KEY"
      }
    }
  }
}

What goes in memory vs what gets searched

  • Memory: API keys and endpoints, project architecture decisions, team coding conventions, deployment procedures, ICP definitions, pricing structures you have decided on
  • Search: Current competitor pricing, latest library versions, documentation for dependencies, market data, SERP rankings, anything that changes

The rule: if the information is stable and internal, store it in memory. If it is external and could change, search for it live. Mixing these two categories is where agents go wrong — storing competitor pricing in memory means working with stale data, and searching for your own internal decisions means getting hallucinated answers.

Example: grounded research workflow

Python
import requests

# Memory: stored facts (non-LLM, exact retrieval)
project_memory = {
    "our_pricing": "500 free credits/mo, $30/mo for 7K credits",
    "target_icp": "B2B SaaS, 10-200 employees, US-based",
    "competitor_list": ["serper", "tavily", "dataforseo"],
    "last_pricing_check": "2026-04-15"
}

# Search: live data (via Scavio API)
def get_current_competitor_pricing(competitor: str) -> dict:
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": "YOUR_KEY"},
        json={
            "platform": "google",
            "query": f"{competitor} API pricing 2026",
            "country": "us",
            "num_results": 5
        }
    )
    results = resp.json().get("organic_results", [])
    return {
        "competitor": competitor,
        "search_results": [
            {"title": r["title"], "snippet": r.get("snippet", "")}
            for r in results
        ]
    }

# Agent workflow:
# 1. Read ICP from memory (exact, no reinterpretation)
# 2. Read competitor list from memory (exact)
# 3. Search for current pricing of each competitor (live data)
# 4. Compare against our pricing from memory (exact)

for comp in project_memory["competitor_list"]:
    pricing_data = get_current_competitor_pricing(comp)
    print(f"\n{comp}:")
    for r in pricing_data["search_results"][:2]:
        print(f"  {r['title']}")
        print(f"  {r['snippet'][:100]}")

The drift experiment

To see reinterpretation drift in action, store the same 10 facts in an LLM-based memory and a non-LLM memory. Retrieve them after a week of the agent using both systems. Compare the retrieved text to the original. In our testing, LLM-based memory averaged 15% word-level difference from the original after multiple read/write cycles. Non-LLM memory returned the exact original every time.

15% word-level difference sounds small. But when the changed words are numbers, version identifiers, or boolean-like qualifiers ("always" becoming "usually"), the behavioral impact on the agent is disproportionate.

When LLM-based memory is actually better

Non-LLM memory is not always the right choice. LLM-based memory excels when:

  • You store unstructured conversation history and need semantic search over it
  • Deduplication matters — the same fact gets stored in different phrasings across sessions
  • The application is conversational and approximate recall is acceptable

For coding workflows, infrastructure config, and business data, use non-LLM memory. For chatbot-style applications with free-form user input, LLM-based memory handles the messiness better.

Bottom line

Non-LLM memory stores what you said. LLM memory stores what the model thinks you said. For technical and factual agent workflows, the difference matters. Pair non-LLM memory (Fidelis or similar) with a search MCP (Scavio) to ground both stored context and live data. The agent gets exact recall of decisions and real-time access to current information, with no reinterpretation layer in between.