langgraphresearchsearch-api

LangGraph Research Agent: Why It Needs Live Search

Building a LangGraph research pipeline with live web search for scope, plan, search, and synthesis stages.

9 min

A LangGraph research agent with live search follows a four-stage pipeline: scope clarification, query planning, parallel web search, and synthesis. Integrating Scavio as the search tool gives the agent structured multi-platform results instead of raw text snippets, producing higher-quality research output with fewer hallucinations.

The four-stage pipeline

Stage 1: Scope clarification. The agent rewrites the user's vague question into a specific research brief with explicit criteria. Stage 2: Query planning. The agent decomposes the brief into 3-7 targeted search queries, each aimed at a different facet of the research question. Stage 3: Parallel web search. All queries execute concurrently through the search tool. Stage 4: Synthesis. The agent combines search results with its training knowledge to produce a structured report with citations.

Define the Scavio search tool

LangGraph tools are Python functions decorated with @tool. The search tool wraps the Scavio API and returns structured results that the agent can reason over.

Python
import requests, os
from langchain_core.tools import tool

SCAVIO_H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

@tool
def web_search(query: str, platform: str = 'google') -> str:
    """Search the web for current information.

    Args:
        query: Search query string.
        platform: One of google, reddit, youtube, amazon, walmart.
    """
    resp = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SCAVIO_H,
        json={'query': query, 'platform': platform},
        timeout=15)
    data = resp.json()
    results = data.get('organic_results', [])[:5]
    lines = []
    for r in results:
        lines.append(f"Title: {r.get('title', '')}")
        lines.append(f"URL: {r.get('link', '')}")
        lines.append(f"Snippet: {r.get('snippet', '')}")
        lines.append("---")
    return "\n".join(lines) if lines else "No results found."

Build the LangGraph agent

Python
from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0)
tools = [web_search]
agent = create_react_agent(llm, tools)

# Run a research query
result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Compare the top 3 SERP APIs by price and coverage in 2026"
    }]
})

Adding query planning

The basic ReAct agent works but generates queries one at a time. For real research, add a planning node that generates all queries upfront, then a search node that executes them in parallel.

Python
from langgraph.graph import StateGraph, MessagesState
from typing import TypedDict

class ResearchState(TypedDict):
    messages: list
    queries: list[str]
    search_results: list[dict]
    report: str

def plan_queries(state: ResearchState) -> ResearchState:
    """LLM generates targeted sub-queries from the research brief."""
    prompt = f"Generate 5 specific search queries to research: {state['messages'][-1].content}"
    response = llm.invoke(prompt)
    queries = response.content.strip().split("\n")
    return {"queries": [q.strip("- ") for q in queries if q.strip()]}

def execute_searches(state: ResearchState) -> ResearchState:
    """Run all planned queries in parallel via Scavio."""
    import concurrent.futures
    results = []
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as pool:
        futures = {pool.submit(web_search.invoke, q): q
                   for q in state['queries']}
        for future in concurrent.futures.as_completed(futures):
            results.append({
                'query': futures[future],
                'results': future.result()
            })
    return {"search_results": results}

def synthesize(state: ResearchState) -> ResearchState:
    """Combine search results into a structured report."""
    context = "\n\n".join(
        f"Query: {r['query']}\n{r['results']}"
        for r in state['search_results']
    )
    prompt = f"Based on this research data, write a structured report:\n{context}"
    report = llm.invoke(prompt)
    return {"report": report.content}

graph = StateGraph(ResearchState)
graph.add_node("plan", plan_queries)
graph.add_node("search", execute_searches)
graph.add_node("synthesize", synthesize)
graph.add_edge("plan", "search")
graph.add_edge("search", "synthesize")
graph.set_entry_point("plan")
agent = graph.compile()

Why structured results matter

When the search tool returns typed fields (title, URL, snippet, price, rating), synthesis can cite specific sources and build comparison tables. Raw text summaries lose attribution.

Cost and production notes

A typical research task generates 5 sub-queries. At $0.005 per Scavio credit, that is $0.025 per task. Multi-platform queries (Google + Reddit + YouTube) cost $0.075 for 15 calls. Compare to Tavily at $0.12 or SerpAPI at $0.225 for the same 15 calls. Add retry logic with exponential backoff on the search tool and cache results for duplicate queries within the same session. The search API cost is minor compared to LLM inference, so optimize LLM calls first.