local-llmollamaknowledge-base

Local LLM Personal KB: What Actually Works

Obsidian plus Ollama plus search API builds a private grounded assistant. The pattern that works for daily knowledge base use.

May 15, 2026

8 min

A local LLM as a personal knowledge base works well when you combine Obsidian (or similar) for personal documents, Ollama for local inference, and a search API for live external data. Pure RAG on personal docs without external search produces a system that answers historical questions about your notes but cannot connect them to current context.

What people want vs what RAG delivers

The dream: ask your local AI "What should I do about the competitor pricing change I noted last week?" and get an answer that combines your notes with current market data.

What pure local RAG delivers: "Based on your note from May 8, you mentioned competitor X changed their pricing. Your note says to review their pricing page." The system retrieves your note but cannot check if the pricing has changed further since you wrote it.

The working architecture

The setup that actually works has three layers:

Layer 1: Personal documents (Obsidian vault, markdown files, PDFs) indexed in a local vector store
Layer 2: Local LLM (Ollama with Llama 3.3 or Mistral) for inference without sending data to the cloud
Layer 3: Search API for current external data when the question requires it

Python

import requests, os
import ollama
from pathlib import Path

# Layer 1: Simple local document search
def search_local_docs(query, vault_path="~/obsidian-vault"):
    """Search local markdown files by keyword matching."""
    vault = Path(vault_path).expanduser()
    results = []
    query_terms = query.lower().split()

    for md_file in vault.rglob("*.md"):
        content = md_file.read_text(errors="ignore")
        content_lower = content.lower()
        score = sum(1 for term in query_terms if term in content_lower)
        if score > 0:
            # Extract relevant paragraph
            for para in content.split("\n\n"):
                if any(term in para.lower() for term in query_terms):
                    results.append({
                        "file": str(md_file.name),
                        "content": para[:500],
                        "score": score,
                    })
                    break

    results.sort(key=lambda x: x["score"], reverse=True)
    return results[:5]

# Layer 2: Local LLM via Ollama
def local_llm(prompt, model="llama3.3"):
    response = ollama.chat(
        model=model,
        messages=[{"role": "user", "content": prompt}],
    )
    return response["message"]["content"]

# Layer 3: External search for current data
def web_search(query, num_results=3):
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
        json={"query": query, "num_results": num_results},
    )
    return resp.json().get("organic_results", [])

Combining all three layers

Python

def personal_kb_query(question):
    """Query personal KB with optional web grounding."""
    # Check local docs first
    local_results = search_local_docs(question)
    local_context = "\n".join(
        f"[{r['file']}]: {r['content']}" for r in local_results
    )

    # Determine if web search is needed
    needs_web = any(
        kw in question.lower()
        for kw in ["current", "latest", "now", "today", "2026", "price", "news"]
    )

    web_context = ""
    if needs_web:
        web_results = web_search(question)
        web_context = "\n".join(
            f"[Web: {r['title']}]: {r['snippet']}" for r in web_results
        )

    # Build prompt for local LLM
    prompt = f"""Based on the following context, answer the question.
Prefer personal notes for personal/historical context.
Prefer web results for current/external information.
If neither source has the answer, say so.

Personal Notes:
{local_context if local_context else "No relevant notes found."}

{"Web Results:" if web_context else ""}
{web_context}

Question: {question}"""

    return local_llm(prompt)

# Examples
print(personal_kb_query("What did I note about competitor X's pricing?"))
print(personal_kb_query("What is competitor X's current pricing?"))

What works

Personal notes + local LLM: great for "what did I write about X?" queries
Local LLM + search API: great for "what is happening with X now?" queries
All three combined: great for "how does what I noted compare to what is current?" queries
Ollama with 8B-70B models: fast enough for interactive use on recent hardware
Simple keyword search on local docs: surprisingly effective, no embedding pipeline needed

What fails

Pure RAG on personal docs: answers are limited to what you wrote, cannot fill gaps
Vector search on small vaults (under 1K documents): overkill, keyword search works fine
Local LLM without any retrieval: hallucinates about your personal context
Overly complex embedding pipelines: take hours to set up, marginal improvement over keyword search for personal notes
Trying to replace all cloud AI: local models are good but not frontier-quality for complex reasoning

Hardware requirements

Llama 3.3 8B: 8GB RAM, runs on any modern laptop. Good enough for most personal KB queries
Llama 3.3 70B: 48GB RAM or GPU with 24GB+ VRAM. Significantly better reasoning
Mistral 7B: lighter weight, faster responses, slightly less capable
Storage: your Obsidian vault plus vector index (if used) adds minimal disk usage

Privacy advantage

The primary reason to use a local LLM for personal knowledge is privacy. Your notes, journal entries, financial records, and personal documents never leave your machine. The only data that goes to an external service is the web search query, which contains your question but not your personal documents. If the question itself is sensitive, you can skip the web search and rely on local context only.

Starting point

Install Ollama and pull a model. Point it at your notes folder. Add a search API key for web grounding. The entire setup takes under 30 minutes. Start with keyword search on local docs -- you can add vector embeddings later if keyword search proves insufficient for your vault size.