hermeslocal-llmtool-calling

Hermes v0.13 Search Still Broken: Fix Guide

Hermes v0.13 search tool calls use ChatML format that most handlers miss. A 10-line regex parser fixes the silent failure.

7 min

Hermes v0.13 (Nous Research) shipped with broken web search tool integration -- the search tool definition is in the system prompt but the function calling format does not match what most search APIs expect, causing silent failures where the model attempts searches but gets no results. The fix: wrap your search API in a format that matches Hermes tool calling conventions exactly.

What breaks in Hermes v0.13 search

Hermes uses the ChatML tool calling format with specific XML-style tags. When it decides to search, it emits a tool call in its own format. If your search function expects a different call format (like OpenAI function calling), the call either fails silently or gets malformed. The model then proceeds without search results, hallucinating the answer.

The broken flow

Python
# What Hermes v0.13 emits for a search:
# <tool_call>
# {"name": "search", "arguments": {"query": "latest python version"}}
# </tool_call>

# Common mistake: handler expects OpenAI format
def handle_tool_call(response):
    # This looks for response.tool_calls[0].function.arguments
    # Hermes doesn't use this format -> returns None
    tool_calls = response.get("tool_calls", [])  # empty!
    return None  # search silently skipped

The fix: parse Hermes tool call format

Python
import re, json, os, requests

SCAVIO_KEY = os.environ["SCAVIO_API_KEY"]

def parse_hermes_tool_calls(text: str) -> list:
    """Extract tool calls from Hermes ChatML format."""
    pattern = r"<tool_call>s*({.*?})s*</tool_call>"
    matches = re.findall(pattern, text, re.DOTALL)
    calls = []
    for match in matches:
        try:
            call = json.loads(match)
            calls.append(call)
        except json.JSONDecodeError:
            continue
    return calls

def execute_search(query: str) -> str:
    """Execute search and return formatted results."""
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": SCAVIO_KEY},
        json={"query": query, "num_results": 5},
    )
    results = resp.json().get("organic_results", [])
    return "\n".join(
        f"[{r['title']}]({r['link']}): {r['snippet']}"
        for r in results
    )

def hermes_search_handler(model_output: str) -> str:
    """Handle Hermes v0.13 search tool calls properly."""
    tool_calls = parse_hermes_tool_calls(model_output)
    results = []
    for call in tool_calls:
        if call.get("name") == "search":
            query = call.get("arguments", {}).get("query", "")
            if query:
                search_results = execute_search(query)
                results.append(search_results)
    return "\n\n".join(results) if results else ""

# Test
sample_output = """I need to search for this.
<tool_call>
{"name": "search", "arguments": {"query": "latest python version 2026"}}
</tool_call>"""

results = hermes_search_handler(sample_output)
print(results)

Full integration with llama-cpp-python

Python
from llama_cpp import Llama

# Load Hermes v0.13
llm = Llama(model_path="./hermes-3-v0.13.gguf", n_ctx=4096)

SEARCH_TOOL_PROMPT = """You have access to the following tool:
- search: Search the web for current information. Input: {"query": "your search query"}

When you need current information, use the tool like this:
<tool_call>
{"name": "search", "arguments": {"query": "your query"}}
</tool_call>

Wait for the result before continuing your response."""

def chat_with_search(user_message: str) -> str:
    # First pass: let model decide if it needs search
    prompt = f"{SEARCH_TOOL_PROMPT}\n\nUser: {user_message}\nAssistant:"
    response = llm(prompt, max_tokens=512)
    output = response["choices"][0]["text"]

    # Check for tool calls
    search_results = hermes_search_handler(output)

    if search_results:
        # Second pass: inject search results
        augmented = (f"{prompt}{output}\n\n"
                    f"<tool_result>{search_results}</tool_result>\n\n"
                    f"Now answer based on the search results:")
        final = llm(augmented, max_tokens=1024)
        return final["choices"][0]["text"]

    return output

answer = chat_with_search("What does SerpAPI cost in 2026?")
print(answer)

Common pitfalls with the fix

  • Hermes sometimes emits malformed JSON in tool calls -- always wrap in try/except
  • The model may emit multiple tool calls -- handle all of them
  • Context window fills fast with search results -- limit to top 3-5 results
  • Some quantizations (Q4_K_S and below) degrade tool calling quality

Alternative: use MCP instead

If you run Hermes through an MCP-compatible client (like Open Interpreter or LM Studio with MCP support), the client handles tool call format translation. You register a search MCP server and the client parses Hermes output into MCP tool invocations automatically. This is cleaner than custom parsing but requires an MCP-capable runtime.

Key takeaway

Hermes v0.13 search is not broken at the model level -- it generates valid tool calls. The break is at the integration layer where most codebases expect OpenAI-format tool calls. A 10-line regex parser for Hermes ChatML format fixes it. Do not downgrade the model; fix the handler.