Build a Perplexity-Style Answer Engine in One File

Perplexity-style answer engines look complex from the outside, but the core loop is simple: take a question, search the web, feed the results to an LLM, return a cited answer. You can build a working backend for this in a single file using Scavio for search and any LLM for synthesis. No vector database. No crawling infrastructure. One file, under 100 lines.

How Answer Engines Work

Every answer engine follows the same pattern: retrieve, then generate. The user asks a question. The system converts it into a search query, fetches real-time results, and passes those results as context to an LLM that generates a grounded answer with citations. The quality of the answer depends almost entirely on the quality of the search results.

This is where Scavio fits. Instead of building a web crawler or dealing with Google's bot detection, you make one API call and get structured results back -- titles, snippets, URLs, knowledge graph data, and People Also Ask questions.

The Full Backend

Here's a complete answer engine backend in a single Python file using FastAPI, Scavio, and the Anthropic SDK:

Python

from fastapi import FastAPI
from pydantic import BaseModel
import requests
from anthropic import Anthropic

app = FastAPI()
llm = Anthropic()
SCAVIO_KEY = "your-scavio-api-key"

class Query(BaseModel):
    question: str

def search_web(query: str) -> list[dict]:
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": SCAVIO_KEY},
        json={"platform": "google", "query": query, "type": "search", "mode": "full"}
    )
    data = resp.json()
    return data.get("organic_results", [])[:8]

def build_context(results: list[dict]) -> str:
    parts = []
    for i, r in enumerate(results, 1):
        parts.append(f"[{i}] {r.get('title', '')}\\n{r.get('snippet', '')}\\nURL: {r.get('link', '')}")
    return "\\n\\n".join(parts)

@app.post("/answer")
def answer(query: Query):
    results = search_web(query.question)
    context = build_context(results)
    msg = llm.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Answer this question using the sources below. Cite sources as [1], [2], etc.\\n\\nQuestion: {query.question}\\n\\nSources:\\n{context}"
        }]
    )
    return {"answer": msg.content[0].text, "sources": results}

What You Get From Scavio's Full Mode

Using mode: "full" returns more than just organic links. You also get:

Knowledge graph entries with structured entity data
People Also Ask questions and their answers
News results when the query is time-sensitive
Featured snippets that Google has already extracted

Each of these can be passed to the LLM as additional context. Knowledge graph data is especially useful for factual questions about entities -- companies, people, places.

Improving Answer Quality

The naive approach works, but there are three things that make a real difference in answer quality:

Query rewriting -- use the LLM to convert a conversational question into an effective search query before calling Scavio
Multi-query -- for complex questions, split into 2-3 sub-queries and merge the results before synthesis
Snippet ranking -- sort the returned snippets by relevance to the original question before passing them to the LLM

Each of these adds a few lines of code but significantly improves the output. The multi-query approach is the highest-leverage change -- it catches information that a single query misses.

Running It

Save the file as main.py, install dependencies with pip install fastapi uvicorn requests anthropic, and run with uvicorn main:app. Hit the endpoint:

Bash

curl -X POST http://localhost:8000/answer \\
  -H "Content-Type: application/json" \\
  -d '{"question": "What are the best alternatives to Heroku in 2026?"}'

You get back a cited answer with source URLs. That's a working answer engine backend in one file. From here, add a frontend, streaming responses, and conversation history -- but the core is done.

How Answer Engines Work

The Full Backend

Here's a complete answer engine backend in a single Python file using FastAPI, Scavio, and the Anthropic SDK:

Python

from fastapi import FastAPI
from pydantic import BaseModel
import requests
from anthropic import Anthropic

app = FastAPI()
llm = Anthropic()
SCAVIO_KEY = "your-scavio-api-key"

class Query(BaseModel):
    question: str

def search_web(query: str) -> list[dict]:
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": SCAVIO_KEY},
        json={"platform": "google", "query": query, "type": "search", "mode": "full"}
    )
    data = resp.json()
    return data.get("organic_results", [])[:8]

def build_context(results: list[dict]) -> str:
    parts = []
    for i, r in enumerate(results, 1):
        parts.append(f"[{i}] {r.get('title', '')}\\n{r.get('snippet', '')}\\nURL: {r.get('link', '')}")
    return "\\n\\n".join(parts)

@app.post("/answer")
def answer(query: Query):
    results = search_web(query.question)
    context = build_context(results)
    msg = llm.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Answer this question using the sources below. Cite sources as [1], [2], etc.\\n\\nQuestion: {query.question}\\n\\nSources:\\n{context}"
        }]
    )
    return {"answer": msg.content[0].text, "sources": results}

What You Get From Scavio's Full Mode

Using mode: "full" returns more than just organic links. You also get:

Knowledge graph entries with structured entity data

People Also Ask questions and their answers

News results when the query is time-sensitive

Featured snippets that Google has already extracted

Each of these can be passed to the LLM as additional context. Knowledge graph data is especially useful for factual questions about entities -- companies, people, places.

Improving Answer Quality

The naive approach works, but there are three things that make a real difference in answer quality:

Query rewriting -- use the LLM to convert a conversational question into an effective search query before calling Scavio

Multi-query -- for complex questions, split into 2-3 sub-queries and merge the results before synthesis

Snippet ranking -- sort the returned snippets by relevance to the original question before passing them to the LLM

Each of these adds a few lines of code but significantly improves the output. The multi-query approach is the highest-leverage change -- it catches information that a single query misses.

Running It

Save the file as main.py, install dependencies with pip install fastapi uvicorn requests anthropic, and run with uvicorn main:app. Hit the endpoint:

Bash

curl -X POST http://localhost:8000/answer \\
  -H "Content-Type: application/json" \\
  -d '{"question": "What are the best alternatives to Heroku in 2026?"}'

You get back a cited answer with source URLs. That's a working answer engine backend in one file. From here, add a frontend, streaming responses, and conversation history -- but the core is done.

Build a Perplexity-Style Answer Engine in One File

How Answer Engines Work

The Full Backend

What You Get From Scavio's Full Mode

Improving Answer Quality

Running It

Continue reading

AEO Tracking for D2C Ecommerce Brands in 2026

Agent Discovery vs Extraction: Why Cost Split Matters

Build a Perplexity-Style Answer Engine in One File

How Answer Engines Work

The Full Backend

What You Get From Scavio's Full Mode

Improving Answer Quality

Running It

Continue reading

AEO Tracking for D2C Ecommerce Brands in 2026

Agent Discovery vs Extraction: Why Cost Split Matters