An r/AI_Agents post asked specifically about tools for a Karpathy-style LLM Wiki: search, scraping, MCPs, ingestion. This walks the minimum stack with verified-online costs.
Prerequisites
- Python 3.10+
- Scavio API key
- Qdrant Cloud free tier or self-hosted Qdrant
- An LLM API (Claude/OpenAI/DeepSeek)
Walkthrough
Step 1: Discover sources via Scavio search
For a topic, get top SERP + top Reddit threads + top YouTube videos.
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
def discover(topic):
return {
'web': requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': topic}).json(),
'reddit': requests.post('https://api.scavio.dev/api/v1/reddit/search', headers=H, json={'query': topic}).json(),
'youtube': requests.post('https://api.scavio.dev/api/v1/youtube/search', headers=H, json={'query': topic}).json(),
}Step 2: Extract clean markdown for top sources
Per source, /extract returns markdown ready for embedding.
def extract(url):
return requests.post('https://api.scavio.dev/api/v1/extract',
headers=H, json={'url': url, 'format': 'markdown'}).json()Step 3: Embed and store in Qdrant
Chunk markdown, embed, upsert with source URL as payload.
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct
client = QdrantClient(url='https://your-qdrant.cloud')
# embed_fn = your embedding function (OpenAI/Cohere/Jina)
for i, chunk in enumerate(chunks):
client.upsert(collection_name='wiki', points=[PointStruct(
id=i, vector=embed_fn(chunk), payload={'text': chunk, 'url': source_url})])Step 4: Query with citation prompt
LLM emits [N] markers tied to chunk source URLs.
def answer(question, k=5):
hits = client.search(collection_name='wiki', query_vector=embed_fn(question), limit=k)
sources = [{'i': i+1, 'text': h.payload['text'], 'url': h.payload['url']} for i, h in enumerate(hits)]
prompt = f'Question: {question}\nSources:\n' + '\n'.join(f'[{s["i"]}] {s["url"]}: {s["text"][:300]}' for s in sources)
prompt += '\nAnswer with [N] citations referencing sources.'
return llm.complete(prompt), sourcesStep 5: Render with clickable citations
[1] becomes a link to the source URL.
import re
def render(answer, sources):
for s in sources:
answer = answer.replace(f'[{s["i"]}]', f'[[{s["i"]}]]({s["url"]})')
return answerPython Example
# Cost per question: ~5 search credits + ~3 extract credits + 1 LLM call = ~$0.04-0.10JavaScript Example
// Same flow in TS using qdrant-js + Scavio fetch calls.Expected Output
LLM Wiki agent that pulls from Google + Reddit + YouTube under one Scavio key, embeds into Qdrant, answers with clickable citations. Stack cost: Scavio $30 + Qdrant Cloud ~$25 + LLM tokens.