coding-agentsearcharchitecture

Coding Agent Multi-Backend Search Architecture

A Pi Coding Agent user built 4 search backends with parallel queries and automatic failover. The architecture pattern for any coding agent. Code included.

9 min

A Pi Coding Agent user shared their extension that runs 4 different search backends in parallel, querying 2 at a time with automatic failover. The architecture pattern is applicable to any coding agent that needs web search: primary plus secondary backends, parallel queries for speed, deduplication for quality, and markdown extraction for token efficiency.

Why One Search Backend Is Not Enough

Coding agents make search queries that are harder than typical web searches. Queries like "python asyncio TaskGroup cancel remaining on first exception" or "Next.js 15 middleware matcher regex syntax" are long-tail and technical. Different search backends have different strengths: one might index Stack Overflow well but miss GitHub Discussions, another might surface official docs but bury community solutions.

Single-backend setups have a single point of failure. When your search provider has an outage or rate limits you, the agent loses grounding entirely. Multi-backend architecture provides resilience and broader coverage simultaneously.

The Architecture Pattern

The design has four components: backend pool, parallel executor, deduplicator, and content extractor.

Python
import asyncio
import hashlib
from typing import List, Dict, Optional
import aiohttp

class SearchBackend:
    """Base class for search backends."""
    def __init__(self, name: str, priority: int):
        self.name = name
        self.priority = priority
        self.consecutive_failures = 0
        self.max_failures = 3

    async def search(self, query: str, num_results: int = 5) -> List[Dict]:
        raise NotImplementedError

    @property
    def is_healthy(self) -> bool:
        return self.consecutive_failures < self.max_failures


class ScavioBackend(SearchBackend):
    def __init__(self, api_key: str):
        super().__init__("scavio", priority=1)
        self.api_key = api_key

    async def search(self, query: str, num_results: int = 5) -> List[Dict]:
        async with aiohttp.ClientSession() as session:
            async with session.post(
                "https://api.scavio.dev/api/v1/search",
                headers={
                    "x-api-key": self.api_key,
                    "Content-Type": "application/json",
                },
                json={"query": query, "num_results": num_results},
            ) as resp:
                data = await resp.json()
                self.consecutive_failures = 0
                return data.get("results", [])


class BraveBackend(SearchBackend):
    def __init__(self, api_key: str):
        super().__init__("brave", priority=2)
        self.api_key = api_key

    async def search(self, query: str, num_results: int = 5) -> List[Dict]:
        async with aiohttp.ClientSession() as session:
            async with session.get(
                "https://api.search.brave.com/res/v1/web/search",
                headers={"X-Subscription-Token": self.api_key},
                params={"q": query, "count": num_results},
            ) as resp:
                data = await resp.json()
                self.consecutive_failures = 0
                return [
                    {
                        "title": r.get("title"),
                        "url": r.get("url"),
                        "description": r.get("description"),
                    }
                    for r in data.get("web", {}).get("results", [])
                ]

Parallel Execution with Failover

The parallel executor picks the top 2 healthy backends by priority and races them. The first to return valid results wins. If both fail, it tries the next backends in the pool.

Python
class MultiBackendSearch:
    def __init__(self, backends: List[SearchBackend]):
        self.backends = sorted(backends, key=lambda b: b.priority)

    def _get_healthy_backends(self, count: int = 2) -> List[SearchBackend]:
        """Get top N healthy backends by priority."""
        healthy = [b for b in self.backends if b.is_healthy]
        return healthy[:count]

    async def _search_one(
        self, backend: SearchBackend, query: str, num_results: int
    ) -> Optional[List[Dict]]:
        """Search with one backend, handle failures."""
        try:
            results = await asyncio.wait_for(
                backend.search(query, num_results),
                timeout=8.0,
            )
            if results:
                return results
            backend.consecutive_failures += 1
            return None
        except Exception:
            backend.consecutive_failures += 1
            return None

    async def search(self, query: str, num_results: int = 10) -> List[Dict]:
        """Search with parallel backends and deduplication."""
        candidates = self._get_healthy_backends(2)

        if not candidates:
            # Reset all backends and try again
            for b in self.backends:
                b.consecutive_failures = 0
            candidates = self.backends[:2]

        # Race two backends in parallel
        tasks = [
            self._search_one(b, query, num_results)
            for b in candidates
        ]
        results_list = await asyncio.gather(*tasks)

        # Merge and deduplicate
        all_results = []
        for results in results_list:
            if results:
                all_results.extend(results)

        return self._deduplicate(all_results, num_results)

    def _deduplicate(
        self, results: List[Dict], limit: int
    ) -> List[Dict]:
        """Remove duplicate URLs, keep first occurrence."""
        seen = set()
        unique = []
        for r in results:
            url_hash = hashlib.md5(
                r.get("url", "").encode()
            ).hexdigest()
            if url_hash not in seen:
                seen.add(url_hash)
                unique.append(r)
        return unique[:limit]

Markdown Extraction with Defuddle

Raw search results include titles and snippets, but coding agents often need the full page content for documentation pages and code examples. Defuddle extracts the main content from HTML pages and converts it to clean markdown, stripping navigation, ads, and boilerplate. This cuts token usage by 60-80% compared to feeding raw HTML into the LLM context.

JavaScript
// Defuddle integration for content extraction
import Defuddle from "@anthropic/defuddle";

async function extractContent(url) {
  const response = await fetch(url);
  const html = await response.text();

  const result = new Defuddle(html, { url }).parse();

  return {
    title: result.title,
    content: result.content,  // Clean markdown
    wordCount: result.wordCount,
  };
}

// Usage in the search pipeline
async function searchAndExtract(query) {
  const searchResults = await multiSearch.search(query, 5);

  // Extract content from top 3 results
  const enriched = await Promise.all(
    searchResults.slice(0, 3).map(async (result) => {
      try {
        const content = await extractContent(result.url);
        return { ...result, fullContent: content.content };
      } catch {
        return { ...result, fullContent: result.description };
      }
    })
  );

  return enriched;
}

Integrating with Coding Agents

The multi-backend search exposes a single tool interface to the coding agent. The agent does not know or care which backends are being used. It calls "search" with a query and gets back structured results. The routing, failover, and deduplication happen transparently.

Python
# MCP tool definition for the multi-backend search
TOOL_SCHEMA = {
    "name": "web_search",
    "description": (
        "Search the web for current information. Use for: "
        "documentation lookups, API references, error messages, "
        "package versions, and code examples. Returns structured "
        "results with title, URL, and content snippet."
    ),
    "inputSchema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Search query, be specific and technical",
            },
            "num_results": {
                "type": "integer",
                "default": 5,
                "description": "Number of results to return",
            },
        },
        "required": ["query"],
    },
}

Performance Numbers

From the Pi Coding Agent user's production logs over 30 days:

  • Total queries: 4,200
  • Primary backend success: 96.8%
  • Failover triggered: 3.2% of queries
  • Failover success: 94% of failover attempts
  • Total query failure rate: 0.19% (8 out of 4,200)
  • Median latency: 680ms (parallel reduces perceived latency)
  • P95 latency: 2.1s (failover cases)

Compare this to single-backend performance where the failure rate was 3.2% on the primary alone. The multi-backend approach reduced failures by 94%, from 134 failed queries to 8.

Cost Optimization

Running 2 backends in parallel means paying for 2 queries per search. At $0.005/query, that doubles the search cost to $0.01/search. For 4,200 queries/month, that is $42 versus $21.

A smarter approach: run the primary backend first with a tight timeout (2 seconds), and only start the secondary if the primary fails or is slow. This uses the secondary for only the 3-5% of queries that need failover, reducing total cost to $22-23/month while maintaining the same reliability improvement.

When This Pattern Is Overkill

If you make under 100 queries per day and a single paid API meets your reliability needs, this architecture adds complexity without proportional benefit. The pattern pays off at scale (1,000+ queries/day) or in environments where search reliability directly impacts user experience, like production coding assistants serving a development team.