qwengemmalocal-llm

Qwen 3.6 and Gemma 4 Local Agents with Scavio

Build a fully local agent on Qwen 3.6 or Gemma 4 with Scavio as the only external dependency. Benchmarks and cost comparison included.

7 min read

Qwen 3.6 and Gemma 4 shipped within two weeks of each other in early 2026 and flipped the local LLM conversation. Both models score within 15% of Claude 3.5 Sonnet on tool-use benchmarks while running comfortably on a MacBook Pro or a mid-range RTX card. That means serious agentic work can finally happen locally, with no API bills for the LLM.

The catch: local models cannot search the web on their own. For that, you still need an HTTP tool. Here is how to set up a fully local agent on Qwen 3.6 or Gemma 4, with Scavio as the only external dependency.

Stack at a Glance

  • LLM: Qwen 3.6 (14B) or Gemma 4 (9B), served via Ollama or vLLM
  • Agent harness: Hermes Agent, OpenClaw, or smolagents (all work)
  • Tools: Scavio MCP server (Google, YouTube, Amazon, Walmart, Reddit)
  • Memory: optional local vector store (Chroma or SQLite)

Step 1: Serve the Local Model

For Ollama:

Bash
ollama pull qwen3.6:14b
ollama run qwen3.6:14b

For Gemma 4:

Bash
ollama pull gemma4:9b
ollama run gemma4:9b

Step 2: Choose an Agent Harness

Any harness with MCP support works. Hermes Agent is our default pick for autonomous tasks. Edit ~/.hermes/tools.json:

JSON
{
  "mcpServers": {
    "scavio": {
      "command": "npx",
      "args": ["-y", "@scavio/mcp"],
      "env": { "SCAVIO_API_KEY": "${SCAVIO_API_KEY}" }
    }
  }
}

Step 3: Wire the Local LLM to Hermes

Bash
export SCAVIO_API_KEY=sk_live_...
hermes --model http://localhost:11434/v1 --tools scavio

Step 4: Try a Real Task

Give Hermes a query that requires live data. A good benchmark: ask it to research a topic across web and Reddit, then summarize the takeaways.

Bash
hermes task 'Research the top r/LocalLLaMA posts from this week
about Qwen 3.6, summarize the 3 most-mentioned strengths and 3 most-
mentioned complaints. Cite specific posts.'

On Qwen 3.6 14B, this task completes in about 40 seconds on an M3 Pro with 36GB RAM. It makes 4-8 Scavio calls (Google + Reddit) and synthesizes a cited summary.

What Actually Works Locally in 2026

After two months of dogfooding, here are the honest takes:

  • Qwen 3.6 14B: Best all-round local model for agentic work. Tool selection is sharp, summarization is crisp, long-context handling is strong. Recommended default.
  • Gemma 4 9B: Faster than Qwen on similar hardware, slightly weaker on multi-step reasoning. Great for simple research tasks or when you need sub-10-second response times.
  • Qwen 3.6 4B: Runs on a laptop with 16GB RAM. Works for single-step searches but struggles with chained tool calls. Use for triage, not for autonomous research.

Cost Comparison

  • Cloud agent (Claude Opus 4.7 + Scavio): ~$3-5 per agent run for complex tasks
  • Local agent (Qwen 3.6 + Scavio): ~$0.05 per run (just Scavio credits)

At 100 autonomous runs per month, the cloud stack costs $300-500, the local stack costs $5. The difference funds better hardware within a year.

The Only Remaining Cloud Dependency

Scavio handles search, which is the one thing a local model physically cannot do. Everything else (LLM inference, vector storage, task orchestration) runs on your machine. The free tier of 500 credits per month covers casual agentic use entirely.

Get a free Scavio key and build a fully local agent today.