Tutorial

How to Run Qwen 3.6-27B Agentic Search on a Single 3090

Set up Qwen 3.6-27B with Scavio MCP for agentic search on a single RTX 3090. Achieves 95.7% SimpleQA with search grounding.

An r/LocalLLaMA post reported Qwen 3.6-27B + agentic search achieving 95.7% SimpleQA on a single 3090. The key ingredient: search grounding via an external API. This tutorial walks the setup.

Prerequisites

  • NVIDIA RTX 3090 (24GB VRAM)
  • Ollama installed
  • Scavio API key

Walkthrough

Step 1: Pull Qwen 3.6-27B via Ollama

Download the Q4_K_M quantized model.

Bash
ollama pull qwen3.6:27b
# Uses Q4_K_M quantization by default
# Fits in 24GB VRAM on RTX 3090

Step 2: Set up Scavio MCP for search grounding

Configure the MCP server so the model can search.

Bash
# If using opencode or Claude Code as the agent runtime:
claude mcp add scavio https://mcp.scavio.dev/mcp --header 'x-api-key: YOUR_SCAVIO_KEY'

# Or configure in mcp.json for direct Ollama tool calling

Step 3: Configure agentic search routing

System prompt tells the model when to search vs answer from knowledge.

Python
SYSTEM_PROMPT = '''You are a research assistant with web search access.
Rules:
- For factual questions (dates, prices, current events): ALWAYS search first
- For reasoning/math/code: answer from knowledge
- Cite search results when used
- If search returns no useful results, say so'''

Step 4: Build the agent loop

Simple agent loop: model decides to search or answer.

Python
import ollama, requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def agent_loop(question):
    messages = [{'role': 'system', 'content': SYSTEM_PROMPT}, {'role': 'user', 'content': question}]
    response = ollama.chat(model='qwen3.6:27b', messages=messages)
    # If model requests search tool call:
    if 'search' in response.get('tool_calls', [{}])[0].get('function', {}).get('name', ''):
        query = response['tool_calls'][0]['function']['arguments']['query']
        results = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
            json={'platform': 'google', 'query': query}).json()
        messages.append({'role': 'tool', 'content': str(results)})
        return ollama.chat(model='qwen3.6:27b', messages=messages)
    return response

Step 5: Benchmark with SimpleQA

Run the SimpleQA benchmark to verify accuracy.

Text
# SimpleQA: factual QA benchmark
# Expected result with search grounding: ~95% accuracy
# Without search: ~60-70% for 27B model
# The delta is the value of search grounding

Python Example

Python
# The 95.7% SimpleQA result comes from search grounding.
# Without search: Qwen 27B scores ~65% (hallucinations on factual queries).
# With search: factual queries get live data, accuracy jumps to 95%+.
# The search API IS the accuracy improvement.

JavaScript Example

JavaScript
// Ollama + Scavio in Node.js:
const { Ollama } = require('ollama');
const ollama = new Ollama();
// Same agent loop pattern in JS

Expected Output

JSON
Qwen 3.6-27B running locally on RTX 3090 with Scavio MCP for search grounding. 95%+ accuracy on factual questions via agentic search.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

NVIDIA RTX 3090 (24GB VRAM). Ollama installed. Scavio API key. A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Set up Qwen 3.6-27B with Scavio MCP for agentic search on a single RTX 3090. Achieves 95.7% SimpleQA with search grounding.