Grounding LLMs in Code Repo Context
Naive RAG on code hallucinates. Four grounding strategies that work: structural indexing, call graph traversal, git blame, external docs.
A r/LanguageTechnology thread on "grounding LLM workflows in repo understanding" captured a real 2026 problem: LLMs hallucinate API details constantly, and the fix is not a better model, it is grounding. This post is the practical implementation.
What Grounding Actually Means
Grounding means the LLM's answer is traceable back to a concrete source in the context window. A grounded answer includes citations ("per file x.py line 42, the function returns None") and refuses when the source is absent. An ungrounded answer sounds confident regardless of whether the source exists.
The Problem with Naive RAG
Most teams ship a vector-search RAG, push code chunks in, and call it grounded. It is not. Vector search retrieves semantically similar chunks, which for code is often wrong. The function you asked about might live in a different file than the one with the similar embedding. The LLM then composes an answer based on the wrong chunk and reports it as grounded. This is the worst of both worlds.
Four Grounding Strategies That Work
- Structural indexing: index by function/class name, not by chunk similarity. When the user asks about
calculateTax, retrieve every occurrence of that symbol. - Call graph traversal: given a function, also retrieve its callers and callees. The LLM gets the full execution context.
- Git blame integration: surface the commit that introduced the code. The commit message often explains the why.
- External docs grounding: when the code uses a library, retrieve the library's current docs via Scavio SERP with
site:docs.*operator.
Implementing Structural Indexing
Tree-sitter parses source files into an AST. Index every top-level function and class name with its file path and line range.
import tree_sitter_python as tspython
from tree_sitter import Language, Parser
import psycopg2, os
parser = Parser(Language(tspython.language()))
def index_file(path: str):
with open(path) as f:
src = f.read().encode()
tree = parser.parse(src)
conn = psycopg2.connect(os.environ['DATABASE_URL'])
for node in tree.root_node.children:
if node.type in ('function_definition', 'class_definition'):
name = node.child_by_field_name('name').text.decode()
start, end = node.start_point[0], node.end_point[0]
with conn.cursor() as c:
c.execute("""
INSERT INTO symbols(name, file, start_line, end_line, body)
VALUES (%s, %s, %s, %s, %s)
""", (name, path, start, end, src[node.start_byte:node.end_byte]))
conn.commit()Combining Internal and External Grounding
A code agent that only sees your repo is missing half the context. The other half is the library documentation. Scavio plus the site operator on docs domains pulls in the third-party side.
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
def ground_answer(symbol: str, library_domain: str | None = None):
internal = lookup_symbol(symbol) # from structural index
external = []
if library_domain:
r = requests.post('https://api.scavio.dev/api/v1/search',
headers={'x-api-key': API_KEY},
json={'query': f'site:{library_domain} {symbol}'})
external = r.json().get('organic_results', [])[:3]
return {'internal': internal, 'external': external}The Refusal Rule
A grounded agent must refuse when it has nothing to cite. The system prompt is explicit:
SYSTEM PROMPT:
You are a code assistant. For every factual claim:
1. Cite a specific file and line range from the context, or
2. Cite a specific external documentation URL from the context, or
3. Say "I don't have that information in my context" and stop.
Never invent function names, parameter types, or return values.How to Evaluate Grounding Quality
Build a test set of 50 questions where the correct answer is unambiguously in one specific file. Score the agent on: (a) does the answer cite that file? (b) is the line range correct? (c) if removed from context, does the agent refuse?
A well-grounded agent scores above 80% on (a) and (b) and 100% on (c). Most naive RAG implementations land around 40% on (a) and well below 100% on (c), which is why they feel "confident but wrong".
The Honest Limitation
Grounding raises the floor on hallucination but does not eliminate it. An LLM can still misread a correctly retrieved chunk. Pair grounded retrieval with unit tests on the agent's output (does the code it suggests actually compile and pass tests?) for the highest reliability tier.
The full implementation walkthrough is in the how-to-ground-llm-with-github-repo-data tutorial.