Tutorial

How to Reduce LLM Costs with Search Grounding

Use search grounding to cut LLM token waste from hallucination retries. One search call saves multiple LLM retries.

An r/ClaudeCode user ran $42K of Claude API through a $500 plan — 84x leverage. One overlooked cost reducer: search grounding prevents hallucination retries. One $0.005 search call can save a $0.10+ LLM retry cycle.

Prerequisites

  • Scavio API key
  • LLM API access
  • Python 3.8+

Walkthrough

Step 1: Identify retry-prone queries

Factual questions cause the most retries due to hallucination.

Python
# High-retry categories:
# - Current pricing/versions (changes frequently)
# - Company/product facts (LLM training data is stale)
# - Recent events (not in training data)
# These benefit most from search grounding

Step 2: Add search grounding before LLM call

Fetch current facts, inject into prompt.

Python
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def grounded_query(question):
    context = requests.post('https://api.scavio.dev/api/v1/search',
        headers=H, json={'platform': 'google', 'query': question}).json()
    # Inject search results into LLM prompt
    prompt = f'Answer based on these current search results:\n{context}\n\nQuestion: {question}'
    return prompt

Step 3: Measure the savings

Compare token usage with and without grounding.

Text
# Without grounding:
# Query → LLM hallucinates → user catches → retry → correct answer
# Cost: 2-3x the tokens (original + retry + correction)
#
# With grounding:
# Query → search ($0.005) → LLM answers correctly first time
# Cost: 1x tokens + $0.005 search
# Net savings: 50-66% on factual queries

Step 4: Route selectively

Only ground factual queries, not reasoning tasks.

Python
def should_ground(question):
    factual_signals = ['current', 'price', 'latest', 'how much', 'when did', 'who is']
    return any(s in question.lower() for s in factual_signals)

def smart_query(question):
    if should_ground(question):
        return grounded_query(question)
    return direct_llm_query(question)

Python Example

Python
# ROI math: 100 factual queries/day
# Without grounding: 100 × 2.5 retries × $0.03/call = $7.50/day
# With grounding: 100 × $0.005 search + 100 × $0.03 = $3.50/day
# Savings: $4/day = $120/mo

JavaScript Example

JavaScript
// Same routing pattern in JS/TS.

Expected Output

JSON
Selective search grounding that reduces LLM hallucination retries. 50-66% token savings on factual queries.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Scavio API key. LLM API access. Python 3.8+. A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Use search grounding to cut LLM token waste from hallucination retries. One search call saves multiple LLM retries.