content-researchcrewaiagents

Content Research Agents: Live Data vs Summaries

Perplexity API returns pre-summarized results. Raw search data gives agents actual sources to analyze. CrewAI + search API for trending topic detection with code.

9 min

Content research agents that use Perplexity API get pre-summarized results. The model already decided what matters before your agent sees the data. Using raw search API results gives your agent the actual sources to analyze, compare, and draw its own conclusions. The difference shows up in content originality: agents working from summaries produce generic articles, while agents working from raw data produce content with unique angles.

The pre-summarized data problem

Perplexity API returns a synthesized answer with citation links. This is useful for question-answering but counterproductive for content research. When your agent receives a summary, it loses access to:

  • The full range of perspectives across sources. The summary picks a consensus view. Your content strategy might benefit from the outlier perspective that got filtered out.
  • Source authority signals. You cannot tell which sources are authoritative industry publications versus blog reposts. The summary treats them equally.
  • Gaps and contradictions between sources. When three sources disagree on a fact, that is a content opportunity. Summaries smooth over disagreements.
  • Raw data points and quotes. The specific numbers, expert quotes, and case studies that make content authoritative get compressed into generic statements.

Raw search data gives agents better source material

A search API returns individual results with titles, snippets, URLs, and metadata. Your agent can analyze each source independently, compare claims across sources, identify the most authoritative voices, and find the specific data points that make content unique.

Python
import requests, os

def research_topic(topic, num_sources=15):
    """Get raw search results for agent analysis."""
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
        json={
            "query": topic,
            "num_results": num_sources
        }
    )
    results = resp.json().get("organic_results", [])

    # Organize by domain authority
    sources = []
    for r in results:
        sources.append({
            "title": r.get("title"),
            "snippet": r.get("snippet"),
            "url": r.get("url"),
            "domain": r.get("domain"),
            "position": r.get("position")
        })

    return sources

# Agent gets 15 individual sources to analyze
# instead of one pre-digested summary
sources = research_topic("MCP server security best practices 2026")

CrewAI + search API for trending topic detection

CrewAI lets you build multi-agent workflows where each agent has a specific role. For content research, the pattern that works: one agent discovers trending topics, another researches each topic in depth, and a third identifies unique angles.

Python
from crewai import Agent, Task, Crew
import requests, os, json

def search_web(query: str) -> str:
    """Search tool for CrewAI agents."""
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
        json={"query": query, "num_results": 10}
    )
    results = resp.json().get("organic_results", [])
    return json.dumps([{
        "title": r.get("title"),
        "snippet": r.get("snippet"),
        "url": r.get("url")
    } for r in results])

def search_reddit(query: str) -> str:
    """Reddit search for trending discussions."""
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
        json={
            "query": query,
            "platform": "reddit",
            "num_results": 10
        }
    )
    return json.dumps(resp.json().get("organic_results", []))

# Agent 1: Trend detection
trend_scout = Agent(
    role="Trend Scout",
    goal="Find trending topics in the target niche",
    backstory="You monitor Reddit and web search for emerging topics.",
    tools=[search_reddit, search_web]
)

# Agent 2: Deep researcher
researcher = Agent(
    role="Research Analyst",
    goal="Research each trending topic in depth from raw sources",
    backstory="You analyze multiple sources and find unique data points.",
    tools=[search_web]
)

# Agent 3: Angle finder
angle_finder = Agent(
    role="Content Strategist",
    goal="Identify unique content angles from research data",
    backstory="You find gaps in existing coverage that become articles."
)

# Define tasks
find_trends = Task(
    description="Search Reddit for trending discussions about MCP servers. Return the top 5 topics with the most engagement.",
    expected_output="List of 5 trending topics with context",
    agent=trend_scout
)

research_topics = Task(
    description="For each trending topic, search the web and collect 10+ sources. Note specific data points, expert quotes, and contradictions between sources.",
    expected_output="Detailed research notes with source citations",
    agent=researcher
)

find_angles = Task(
    description="Based on the research, identify 3 unique content angles that existing articles do not cover well.",
    expected_output="3 content briefs with unique angles",
    agent=angle_finder
)

crew = Crew(
    agents=[trend_scout, researcher, angle_finder],
    tasks=[find_trends, research_topics, find_angles],
    verbose=True
)

result = crew.kickoff()
print(result)

Why raw data produces better content

When your agent analyzes 15 individual sources instead of one summary, the output is fundamentally different:

  • It can cite specific sources and data points rather than making general claims
  • It identifies when sources disagree and can present both sides
  • It finds the expert voices and quotes that make content authoritative
  • It spots gaps in existing coverage that become unique content angles
  • It produces content that passes originality checks because it synthesized from multiple sources rather than paraphrasing one summary

Cost comparison

Perplexity API charges $0.005/query for their search endpoint. A search API like Scavio also charges $0.005/query. The cost per query is identical. But with raw search results, each query returns 10-20 individual sources your agent can analyze. With Perplexity, each query returns one synthesized answer. Per-source, the raw approach is 10-20x more cost-effective for content research.

When summaries are better

Summarized results are better for quick factual lookups (What is the current price of X?), chatbot responses where speed matters more than depth, and use cases where the agent does not need to analyze multiple perspectives. For content research, trend detection, and competitive analysis, raw data wins.