Build a Perplexity Clone with Genkit + Search API
Firebase Genkit flow that searches the web, extracts sources, and generates cited answers. Perplexity-style experience with your own search backend.
You can build a Perplexity-style search-and-synthesize experience using Google's Genkit framework with a search API as the grounding layer. The architecture is: user query goes into a Genkit flow, the flow calls a search tool to get web results, passes results plus the original question to an LLM, and the LLM generates an answer with inline citations. Total cost per query is roughly $0.005 for search plus $0.01-0.03 for the LLM call.
Why Genkit
Genkit is Google's open-source framework for building AI-powered features in TypeScript/JavaScript. It has first-class support for tool calling, flow orchestration, streaming, and observability. If you are already in the Firebase/Google Cloud ecosystem, Genkit integrates naturally. The flow abstraction makes it easy to chain search, retrieval, and generation steps with built-in tracing.
Define the Search Tool
import { defineTool } from "genkit";
import { z } from "zod";
const webSearch = defineTool(
{
name: "webSearch",
description: "Search the web for current information",
inputSchema: z.object({
query: z.string().describe("Search query"),
numResults: z.number().default(5),
}),
outputSchema: z.object({
results: z.array(z.object({
title: z.string(),
url: z.string(),
snippet: z.string(),
})),
}),
},
async (input) => {
const resp = await fetch("https://api.scavio.dev/api/v1/search", {
method: "POST",
headers: {
"x-api-key": process.env.SCAVIO_API_KEY!,
"Content-Type": "application/json",
},
body: JSON.stringify({
query: input.query,
num_results: input.numResults,
}),
});
const data = await resp.json();
return {
results: (data.results || []).map((r: any) => ({
title: r.title || "",
url: r.url || "",
snippet: r.snippet || "",
})),
};
}
);Define the Answer Flow
import { defineFlow, generate, generateStream } from "genkit";
import { gemini20Flash } from "@genkit-ai/googleai";
const searchAndAnswer = defineFlow(
{
name: "searchAndAnswer",
inputSchema: z.string(),
outputSchema: z.object({
answer: z.string(),
sources: z.array(z.object({ title: z.string(), url: z.string() })),
}),
},
async (question) => {
// Step 1: Search for relevant results
const searchResults = await webSearch({ query: question, numResults: 6 });
// Step 2: Build context with numbered sources
const context = searchResults.results
.map((r, i) => `[${i + 1}] ${r.title}\n${r.snippet}\nSource: ${r.url}`)
.join("\n\n");
// Step 3: Generate answer with citations
const { text } = await generate({
model: gemini20Flash,
prompt: `Answer this question using the search results below.
Cite sources using [1], [2], etc. inline.
If the results do not contain the answer, say so.
Question: ${question}
Search Results:
${context}`,
});
return {
answer: text,
sources: searchResults.results.map((r) => ({
title: r.title,
url: r.url,
})),
};
}
);Add Streaming for the Perplexity Feel
import { defineFlow } from "genkit";
const searchAndStream = defineFlow(
{
name: "searchAndStream",
inputSchema: z.string(),
streamSchema: z.string(),
},
async (question, streamingCallback) => {
const searchResults = await webSearch({ query: question, numResults: 6 });
const context = searchResults.results
.map((r, i) => `[${i + 1}] ${r.title}\n${r.snippet}`)
.join("\n\n");
const { stream, response } = await generateStream({
model: gemini20Flash,
prompt: `Answer using these sources. Cite with [1], [2], etc.
Question: ${question}
Sources:
${context}`,
});
for await (const chunk of stream) {
streamingCallback?.(chunk.text);
}
return (await response).text;
}
);Cost Breakdown vs Perplexity
- Perplexity Pro: $20/mo, 300 Pro searches/day, limited API
- Perplexity API: $0.005/search + LLM cost on top
- DIY with Scavio + Gemini Flash: $0.005/search + ~$0.01/generation = $0.015/query
- 1,000 queries/mo: $15 total. You own the UX, the prompts, and the data.
- Swap Gemini for Claude or GPT without changing the search layer
What You Lose vs Perplexity
Perplexity has its own index, so it can answer questions even when web search returns poor results. Their re-ranking is heavily optimized. They also handle follow-up questions with context carryover out of the box. Your DIY version will need explicit conversation history management and will sometimes produce worse citations because you are relying on search snippet quality rather than full-page content extraction.
Improving Citation Quality
The biggest gap between a Perplexity clone and the real thing is citation accuracy. Search snippets are often truncated or misleading. Two ways to improve this: fetch full page content for the top 3 results (adds latency and cost), or use a two-pass approach where the first pass identifies the most relevant 2-3 results and the second pass generates the final answer from only those sources.