local-llmmcpsearch

Uncensored AI Search: Local LLMs with Web Access

Setting up uncensored web search for local LLMs using Scavio as the search backend via MCP or function calling.

8 min read

Local LLMs like Llama, Mistral, and Qwen run on your hardware with no content filters. But they are stuck with their training data -- they cannot search the web. Connecting a local model to real-time search transforms it from a static knowledge base into something genuinely useful for research, analysis, and daily work.

This post shows two approaches: using Scavio as an MCP server for MCP-compatible local clients, and using Scavio as a tool in a custom tool-calling loop for any local model that supports function calling.

Why Local LLMs Need Web Search

A local LLM can reason and generate text, but its knowledge stops at its training cutoff. Ask it about today's news, current prices, or recent events and it either hallucinates or admits it does not know. Web search fills this gap by letting the model retrieve current information before generating a response.

Unlike cloud-hosted AI assistants, a local model with search gives you full control. No usage policies deciding what you can and cannot research. No conversation logging. No content filters between you and the search results. The model gets raw data and you decide what to do with it.

Option 1: MCP Integration

If your local AI client supports MCP (such as LM Studio, Open WebUI with MCP plugin, or a custom client), connect Scavio directly:

JSON
{
  "mcpServers": {
    "scavio": {
      "type": "http",
      "url": "https://mcp.scavio.dev/mcp",
      "headers": {
        "x-api-key": "YOUR_SCAVIO_API_KEY"
      }
    }
  }
}

The model now has access to search_google, search_amazon, search_youtube, search_walmart, and other tools. It can decide when to search based on the conversation context.

Option 2: Tool Calling Loop

For models served via Ollama, llama.cpp, or vLLM that support OpenAI-compatible function calling, build a simple tool-calling loop:

Python
import requests
import json

def search_web(query: str) -> dict:
    res = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={
            "Content-Type": "application/json",
            "x-api-key": SCAVIO_API_KEY
        },
        json={"platform": "google", "query": query}
    )
    return res.json()

tools = [{
    "type": "function",
    "function": {
        "name": "search_web",
        "description": "Search Google for current information",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string"}
            },
            "required": ["query"]
        }
    }
}]

The Agent Loop

Wire the tool into a conversation loop that handles function calls:

Python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

def chat_with_search(user_message: str):
    messages = [
        {"role": "system", "content": "You have web search access. "
         "Use it when you need current information."},
        {"role": "user", "content": user_message}
    ]
    while True:
        resp = client.chat.completions.create(
            model="qwen2.5:32b",
            messages=messages,
            tools=tools
        )
        msg = resp.choices[0].message
        if msg.tool_calls:
            messages.append(msg)
            for call in msg.tool_calls:
                args = json.loads(call.function.arguments)
                result = search_web(args["query"])
                messages.append({
                    "role": "tool",
                    "tool_call_id": call.id,
                    "content": json.dumps(result["organic"][:5])
                })
        else:
            return msg.content

Which Models Work Best

Not all local models handle tool calling reliably. Models that work well with search tool calls as of early 2026:

  • Qwen 2.5 (32B and 72B) -- strong tool calling support
  • Llama 3.3 (70B) -- reliable function calling with good reasoning
  • Mistral Large -- solid tool use and instruction following
  • DeepSeek-V3 -- excellent reasoning with search context

Smaller models (7B-13B) can call tools but struggle to synthesize search results. For search-augmented generation, 32B parameters is the practical minimum for consistent quality.

Practical Considerations

  • Rate limit your searches -- local models can be eager tool callers
  • Truncate results to top 5 organic to save context window
  • Cache frequent queries locally to reduce API costs
  • Use Scavio's light mode for simple searches, full mode for knowledge graph

The result is a private AI assistant with real-time web access running entirely under your control -- no cloud dependency for inference and no conversation history leaving your machine.