Build a Perplexity-Style Answer Engine in One File
How to build an open-source Perplexity clone backend in a single file using a search API and an LLM. Full Python code included.
Perplexity-style answer engines look complex from the outside, but the core loop is simple: take a question, search the web, feed the results to an LLM, return a cited answer. You can build a working backend for this in a single file using Scavio for search and any LLM for synthesis. No vector database. No crawling infrastructure. One file, under 100 lines.
How Answer Engines Work
Every answer engine follows the same pattern: retrieve, then generate. The user asks a question. The system converts it into a search query, fetches real-time results, and passes those results as context to an LLM that generates a grounded answer with citations. The quality of the answer depends almost entirely on the quality of the search results.
This is where Scavio fits. Instead of building a web crawler or dealing with Google's bot detection, you make one API call and get structured results back -- titles, snippets, URLs, knowledge graph data, and People Also Ask questions.
The Full Backend
Here's a complete answer engine backend in a single Python file using FastAPI, Scavio, and the Anthropic SDK:
from fastapi import FastAPI
from pydantic import BaseModel
import requests
from anthropic import Anthropic
app = FastAPI()
llm = Anthropic()
SCAVIO_KEY = "your-scavio-api-key"
class Query(BaseModel):
question: str
def search_web(query: str) -> list[dict]:
resp = requests.post(
"https://api.scavio.dev/api/v1/search",
headers={"x-api-key": SCAVIO_KEY},
json={"platform": "google", "query": query, "type": "search", "mode": "full"}
)
data = resp.json()
return data.get("organic_results", [])[:8]
def build_context(results: list[dict]) -> str:
parts = []
for i, r in enumerate(results, 1):
parts.append(f"[{i}] {r.get('title', '')}\n{r.get('snippet', '')}\nURL: {r.get('link', '')}")
return "\n\n".join(parts)
@app.post("/answer")
def answer(query: Query):
results = search_web(query.question)
context = build_context(results)
msg = llm.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Answer this question using the sources below. Cite sources as [1], [2], etc.\n\nQuestion: {query.question}\n\nSources:\n{context}"
}]
)
return {"answer": msg.content[0].text, "sources": results}What You Get From Scavio's Full Mode
Using mode: "full" returns more than just organic links. You also get:
- Knowledge graph entries with structured entity data
- People Also Ask questions and their answers
- News results when the query is time-sensitive
- Featured snippets that Google has already extracted
Each of these can be passed to the LLM as additional context. Knowledge graph data is especially useful for factual questions about entities -- companies, people, places.
Improving Answer Quality
The naive approach works, but there are three things that make a real difference in answer quality:
- Query rewriting -- use the LLM to convert a conversational question into an effective search query before calling Scavio
- Multi-query -- for complex questions, split into 2-3 sub-queries and merge the results before synthesis
- Snippet ranking -- sort the returned snippets by relevance to the original question before passing them to the LLM
Each of these adds a few lines of code but significantly improves the output. The multi-query approach is the highest-leverage change -- it catches information that a single query misses.
Running It
Save the file as main.py, install dependencies with pip install fastapi uvicorn requests anthropic, and run with uvicorn main:app. Hit the endpoint:
curl -X POST http://localhost:8000/answer \
-H "Content-Type: application/json" \
-d '{"question": "What are the best alternatives to Heroku in 2026?"}'You get back a cited answer with source URLs. That's a working answer engine backend in one file. From here, add a frontend, streaming responses, and conversation history -- but the core is done.