local-llmcodingsearch

Multi-Agent Local Coding: The Web Search Gap

Fully local Pi + llama-swap + Qwen3 stack is great for privacy. Missing piece: web search. Add a $0.005/query search tool for live documentation context.

8 min

The Pi Coding Agent guide demonstrates a fully local multi-agent setup running on an RTX 3090 with llama-swap and Qwen3 models. It handles code generation, debugging, and refactoring locally. The missing piece is web search: local LLMs have zero internet access, so they hallucinate library versions, invent deprecated API signatures, and guess at documentation. A lightweight search API call costing $0.005/query bridges this gap completely.

The local setup

Pi Coding Agent runs multiple specialized models via llama-swap on a single GPU. Qwen3 32B handles complex reasoning, smaller models handle routine code completion. Everything stays local: no API costs for inference, no data leaving your machine, sub-second latency. The setup handles 90% of coding tasks. The remaining 10% requires current information that no local model has.

When local models need the web

  • Checking if a package version exists: "Does pandas 2.3 support this method?"
  • Finding correct API signatures: "What are the current OpenAI SDK parameters?"
  • Debugging error messages: "What does this specific Rust borrow checker error mean?"
  • Verifying deprecations: "Is this React lifecycle method still valid?"
  • Checking library compatibility: "Does this work with Python 3.13?"

Adding search as a tool

Python
# tools/web_search.py - search tool for Pi Coding Agent
import requests, os, json

def web_search(query: str) -> str:
    """Search the web for current technical information.
    Use when you need: package versions, API docs, error explanations,
    deprecation notices, or any info newer than your training data.
    Cost: $0.005 per query.
    """
    resp = requests.post("https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
        json={"platform": "google", "query": query},
        timeout=10)

    if resp.status_code != 200:
        return f"Search failed: {resp.status_code}"

    results = resp.json().get("organic", [])[:5]
    formatted = []
    for r in results:
        formatted.append(f"Title: {r.get('title', '')}")
        formatted.append(f"Snippet: {r.get('snippet', '')}")
        formatted.append(f"URL: {r.get('link', '')}")
        formatted.append("---")
    return "\n".join(formatted)

Integration with llama-swap

Python
# agent_config.py - adding web search to tool list
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for current technical docs, package info, or error solutions. Use before guessing about APIs or library versions.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Technical search query"
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read a file from the local filesystem",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"}
                },
                "required": ["path"]
            }
        }
    }
]

Cost analysis: local inference + remote search

Local inference: $0/query (hardware already paid for). Search: $0.005 per query. A heavy coding session triggers maybe 10-15 searches (checking docs, verifying versions, debugging errors). That is $0.05-$0.075 per session. Monthly cost for a developer using this daily: roughly $1.50/month for search. Compare to running everything through a cloud LLM API where a single complex coding session can cost $2-5 in tokens.

When to search vs when to trust the model

Python
# Simple heuristic for the agent to decide when to search
SEARCH_TRIGGERS = [
    "latest version",
    "current API",
    "deprecated",
    "error:",
    "does * support",
    "how to install",
    "breaking change",
    "migration guide",
    "release notes",
    "compatibility"
]

def should_search(user_query: str) -> bool:
    query_lower = user_query.lower()
    return any(trigger in query_lower for trigger in SEARCH_TRIGGERS)

The hybrid architecture

Local models handle: code generation, refactoring, test writing, architecture discussions, and anything that relies on pattern matching rather than current facts. The search API handles: version checks, documentation lookups, error debugging, and compatibility verification. This split gives you the privacy and speed of local inference with the accuracy of live web data. The 250 free credits on Scavio signup cover roughly 50 coding sessions before you need a paid plan.