agentspythonanswer-engine

Build a Perplexity-Style Answer Engine in One File

How to build an open-source Perplexity clone backend in a single file using a search API and an LLM. Full Python code included.

10 min read

Perplexity-style answer engines look complex from the outside, but the core loop is simple: take a question, search the web, feed the results to an LLM, return a cited answer. You can build a working backend for this in a single file using Scavio for search and any LLM for synthesis. No vector database. No crawling infrastructure. One file, under 100 lines.

How Answer Engines Work

Every answer engine follows the same pattern: retrieve, then generate. The user asks a question. The system converts it into a search query, fetches real-time results, and passes those results as context to an LLM that generates a grounded answer with citations. The quality of the answer depends almost entirely on the quality of the search results.

This is where Scavio fits. Instead of building a web crawler or dealing with Google's bot detection, you make one API call and get structured results back -- titles, snippets, URLs, knowledge graph data, and People Also Ask questions.

The Full Backend

Here's a complete answer engine backend in a single Python file using FastAPI, Scavio, and the Anthropic SDK:

Python
from fastapi import FastAPI
from pydantic import BaseModel
import requests
from anthropic import Anthropic

app = FastAPI()
llm = Anthropic()
SCAVIO_KEY = "your-scavio-api-key"

class Query(BaseModel):
    question: str

def search_web(query: str) -> list[dict]:
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": SCAVIO_KEY},
        json={"platform": "google", "query": query, "type": "search", "mode": "full"}
    )
    data = resp.json()
    return data.get("organic_results", [])[:8]

def build_context(results: list[dict]) -> str:
    parts = []
    for i, r in enumerate(results, 1):
        parts.append(f"[{i}] {r.get('title', '')}\n{r.get('snippet', '')}\nURL: {r.get('link', '')}")
    return "\n\n".join(parts)

@app.post("/answer")
def answer(query: Query):
    results = search_web(query.question)
    context = build_context(results)
    msg = llm.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Answer this question using the sources below. Cite sources as [1], [2], etc.\n\nQuestion: {query.question}\n\nSources:\n{context}"
        }]
    )
    return {"answer": msg.content[0].text, "sources": results}

What You Get From Scavio's Full Mode

Using mode: "full" returns more than just organic links. You also get:

  • Knowledge graph entries with structured entity data
  • People Also Ask questions and their answers
  • News results when the query is time-sensitive
  • Featured snippets that Google has already extracted

Each of these can be passed to the LLM as additional context. Knowledge graph data is especially useful for factual questions about entities -- companies, people, places.

Improving Answer Quality

The naive approach works, but there are three things that make a real difference in answer quality:

  • Query rewriting -- use the LLM to convert a conversational question into an effective search query before calling Scavio
  • Multi-query -- for complex questions, split into 2-3 sub-queries and merge the results before synthesis
  • Snippet ranking -- sort the returned snippets by relevance to the original question before passing them to the LLM

Each of these adds a few lines of code but significantly improves the output. The multi-query approach is the highest-leverage change -- it catches information that a single query misses.

Running It

Save the file as main.py, install dependencies with pip install fastapi uvicorn requests anthropic, and run with uvicorn main:app. Hit the endpoint:

Bash
curl -X POST http://localhost:8000/answer \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the best alternatives to Heroku in 2026?"}'

You get back a cited answer with source URLs. That's a working answer engine backend in one file. From here, add a frontend, streaming responses, and conversation history -- but the core is done.