Tutorial

How to Ground an LLM with GitHub Repo Data

Ground LLM answers in actual repo content by combining GitHub search via SERP site operators with Scavio's fetch endpoint.

Grounding LLM answers in source code beats hallucinated explanations. This tutorial uses Scavio's SERP with site:github.com plus its fetch endpoint to bring repo content into the agent loop without a heavy GitHub API integration.

Prerequisites

  • Python 3.10+
  • A Scavio API key
  • An LLM API key

Walkthrough

Step 1: Search inside a repo via SERP

site:github.com/ORG/REPO scoped search finds the right file fast.

Python
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def repo_search(repo, query):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': f'site:github.com/{repo} {query}', 'num_results': 10})
    return r.json().get('organic_results', [])

Step 2: Fetch the selected file

GitHub raw URLs work with Scavio's fetch endpoint.

Python
def fetch_raw(url):
    raw = url.replace('github.com', 'raw.githubusercontent.com').replace('/blob/', '/')
    r = requests.post('https://api.scavio.dev/api/v1/extract',
        headers={'x-api-key': API_KEY},
        json={'url': raw})
    return r.json().get('content', '')

Step 3: Ground the answer

Pass the fetched content into the LLM prompt with source citation.

Python
import anthropic
client = anthropic.Anthropic()

def grounded_answer(repo, question):
    hits = repo_search(repo, question)
    content = fetch_raw(hits[0]['link']) if hits else ''
    msg = client.messages.create(
        model='claude-sonnet-4-6',
        max_tokens=1024,
        messages=[{'role': 'user', 'content': f'{question}\n\nCONTEXT:\n{content[:4000]}'}])
    return msg.content[0].text

Step 4: Add multi-file composition

Pull top 3 results, rank by relevance, compose context.

Python
def multi_file_context(repo, question):
    hits = repo_search(repo, question)[:3]
    return '\n\n'.join([fetch_raw(h['link'])[:2000] for h in hits])

Step 5: Validate citations

Ensure the LLM response mentions at least one source URL.

Python
def has_citations(answer, urls):
    return any(u in answer for u in urls)

Python Example

Python
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']

def repo_grounded(repo, question):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': f'site:github.com/{repo} {question}'})
    return r.json().get('organic_results', [])[:3]

print(repo_grounded('prisma/prisma', 'migrate.ts'))

JavaScript Example

JavaScript
const API_KEY = process.env.SCAVIO_API_KEY;
export async function repoGrounded(repo, question) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ query: `site:github.com/${repo} ${question}` })
  });
  return ((await r.json()).organic_results || []).slice(0, 3);
}

Expected Output

JSON
LLM answers cite exact files and code paths in the target repo. Hallucination rate drops materially versus ungrounded answers.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.10+. A Scavio API key. An LLM API key. A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Ground LLM answers in actual repo content by combining GitHub search via SERP site operators with Scavio's fetch endpoint.