An r/LocalLLaMA post reported Qwen 3.6-27B + agentic search achieving 95.7% SimpleQA on a single 3090. The key ingredient: search grounding via an external API. This tutorial walks the setup.
Prerequisites
- NVIDIA RTX 3090 (24GB VRAM)
- Ollama installed
- Scavio API key
Walkthrough
Step 1: Pull Qwen 3.6-27B via Ollama
Download the Q4_K_M quantized model.
ollama pull qwen3.6:27b
# Uses Q4_K_M quantization by default
# Fits in 24GB VRAM on RTX 3090Step 2: Set up Scavio MCP for search grounding
Configure the MCP server so the model can search.
# If using opencode or Claude Code as the agent runtime:
claude mcp add scavio https://mcp.scavio.dev/mcp --header 'x-api-key: YOUR_SCAVIO_KEY'
# Or configure in mcp.json for direct Ollama tool callingStep 3: Configure agentic search routing
System prompt tells the model when to search vs answer from knowledge.
SYSTEM_PROMPT = '''You are a research assistant with web search access.
Rules:
- For factual questions (dates, prices, current events): ALWAYS search first
- For reasoning/math/code: answer from knowledge
- Cite search results when used
- If search returns no useful results, say so'''Step 4: Build the agent loop
Simple agent loop: model decides to search or answer.
import ollama, requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
def agent_loop(question):
messages = [{'role': 'system', 'content': SYSTEM_PROMPT}, {'role': 'user', 'content': question}]
response = ollama.chat(model='qwen3.6:27b', messages=messages)
# If model requests search tool call:
if 'search' in response.get('tool_calls', [{}])[0].get('function', {}).get('name', ''):
query = response['tool_calls'][0]['function']['arguments']['query']
results = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
json={'platform': 'google', 'query': query}).json()
messages.append({'role': 'tool', 'content': str(results)})
return ollama.chat(model='qwen3.6:27b', messages=messages)
return responseStep 5: Benchmark with SimpleQA
Run the SimpleQA benchmark to verify accuracy.
# SimpleQA: factual QA benchmark
# Expected result with search grounding: ~95% accuracy
# Without search: ~60-70% for 27B model
# The delta is the value of search groundingPython Example
# The 95.7% SimpleQA result comes from search grounding.
# Without search: Qwen 27B scores ~65% (hallucinations on factual queries).
# With search: factual queries get live data, accuracy jumps to 95%+.
# The search API IS the accuracy improvement.JavaScript Example
// Ollama + Scavio in Node.js:
const { Ollama } = require('ollama');
const ollama = new Ollama();
// Same agent loop pattern in JSExpected Output
Qwen 3.6-27B running locally on RTX 3090 with Scavio MCP for search grounding. 95%+ accuracy on factual questions via agentic search.