Local LLM Personal KB: What Actually Works
Obsidian plus Ollama plus search API builds a private grounded assistant. The pattern that works for daily knowledge base use.
A local LLM as a personal knowledge base works well when you combine Obsidian (or similar) for personal documents, Ollama for local inference, and a search API for live external data. Pure RAG on personal docs without external search produces a system that answers historical questions about your notes but cannot connect them to current context.
What people want vs what RAG delivers
The dream: ask your local AI "What should I do about the competitor pricing change I noted last week?" and get an answer that combines your notes with current market data.
What pure local RAG delivers: "Based on your note from May 8, you mentioned competitor X changed their pricing. Your note says to review their pricing page." The system retrieves your note but cannot check if the pricing has changed further since you wrote it.
The working architecture
The setup that actually works has three layers:
- Layer 1: Personal documents (Obsidian vault, markdown files, PDFs) indexed in a local vector store
- Layer 2: Local LLM (Ollama with Llama 3.3 or Mistral) for inference without sending data to the cloud
- Layer 3: Search API for current external data when the question requires it
import requests, os
import ollama
from pathlib import Path
# Layer 1: Simple local document search
def search_local_docs(query, vault_path="~/obsidian-vault"):
"""Search local markdown files by keyword matching."""
vault = Path(vault_path).expanduser()
results = []
query_terms = query.lower().split()
for md_file in vault.rglob("*.md"):
content = md_file.read_text(errors="ignore")
content_lower = content.lower()
score = sum(1 for term in query_terms if term in content_lower)
if score > 0:
# Extract relevant paragraph
for para in content.split("\n\n"):
if any(term in para.lower() for term in query_terms):
results.append({
"file": str(md_file.name),
"content": para[:500],
"score": score,
})
break
results.sort(key=lambda x: x["score"], reverse=True)
return results[:5]
# Layer 2: Local LLM via Ollama
def local_llm(prompt, model="llama3.3"):
response = ollama.chat(
model=model,
messages=[{"role": "user", "content": prompt}],
)
return response["message"]["content"]
# Layer 3: External search for current data
def web_search(query, num_results=3):
resp = requests.post(
"https://api.scavio.dev/api/v1/search",
headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
json={"query": query, "num_results": num_results},
)
return resp.json().get("organic_results", [])Combining all three layers
def personal_kb_query(question):
"""Query personal KB with optional web grounding."""
# Check local docs first
local_results = search_local_docs(question)
local_context = "\n".join(
f"[{r['file']}]: {r['content']}" for r in local_results
)
# Determine if web search is needed
needs_web = any(
kw in question.lower()
for kw in ["current", "latest", "now", "today", "2026", "price", "news"]
)
web_context = ""
if needs_web:
web_results = web_search(question)
web_context = "\n".join(
f"[Web: {r['title']}]: {r['snippet']}" for r in web_results
)
# Build prompt for local LLM
prompt = f"""Based on the following context, answer the question.
Prefer personal notes for personal/historical context.
Prefer web results for current/external information.
If neither source has the answer, say so.
Personal Notes:
{local_context if local_context else "No relevant notes found."}
{"Web Results:" if web_context else ""}
{web_context}
Question: {question}"""
return local_llm(prompt)
# Examples
print(personal_kb_query("What did I note about competitor X's pricing?"))
print(personal_kb_query("What is competitor X's current pricing?"))What works
- Personal notes + local LLM: great for "what did I write about X?" queries
- Local LLM + search API: great for "what is happening with X now?" queries
- All three combined: great for "how does what I noted compare to what is current?" queries
- Ollama with 8B-70B models: fast enough for interactive use on recent hardware
- Simple keyword search on local docs: surprisingly effective, no embedding pipeline needed
What fails
- Pure RAG on personal docs: answers are limited to what you wrote, cannot fill gaps
- Vector search on small vaults (under 1K documents): overkill, keyword search works fine
- Local LLM without any retrieval: hallucinates about your personal context
- Overly complex embedding pipelines: take hours to set up, marginal improvement over keyword search for personal notes
- Trying to replace all cloud AI: local models are good but not frontier-quality for complex reasoning
Hardware requirements
- Llama 3.3 8B: 8GB RAM, runs on any modern laptop. Good enough for most personal KB queries
- Llama 3.3 70B: 48GB RAM or GPU with 24GB+ VRAM. Significantly better reasoning
- Mistral 7B: lighter weight, faster responses, slightly less capable
- Storage: your Obsidian vault plus vector index (if used) adds minimal disk usage
Privacy advantage
The primary reason to use a local LLM for personal knowledge is privacy. Your notes, journal entries, financial records, and personal documents never leave your machine. The only data that goes to an external service is the web search query, which contains your question but not your personal documents. If the question itself is sensitive, you can skip the web search and rely on local context only.
Starting point
Install Ollama and pull a model. Point it at your notes folder. Add a search API key for web grounding. The entire setup takes under 30 minutes. Start with keyword search on local docs -- you can add vector embeddings later if keyword search proves insufficient for your vault size.