ragconsultingaccuracy

RAG on Large Consulting Datasets: Accuracy Guide

RAG accuracy drops from 85% to 60% at 10K+ documents. Metadata filtering plus web augmentation pushes it back above 85%.

8 min

RAG accuracy on large consulting datasets (10K+ documents) drops from 85% to 60% as corpus size grows because retriever recall degrades -- relevant chunks get buried under semantically similar but wrong documents. The fix: combine vector retrieval with structured search re-ranking and web augmentation for questions that exceed corpus coverage.

Why large RAG pipelines lose accuracy

At 500 documents, top-5 retrieval finds the right chunk 80%+ of the time. At 10,000 documents, the same query returns chunks from similar-topic documents that are not the right answer. A consulting firm with 15 years of client reports has thousands of documents about "market sizing" -- the retriever cannot distinguish which market sizing report is relevant without additional signals.

Hybrid retrieval: vector + metadata + search

Python
import os, requests
from typing import Optional

SCAVIO_KEY = os.environ["SCAVIO_API_KEY"]
HEADERS = {"x-api-key": SCAVIO_KEY}

class ConsultingRAG:
    def __init__(self, vectorstore, confidence_threshold: float = 0.72):
        self.vectorstore = vectorstore
        self.threshold = confidence_threshold

    def retrieve(self, query: str, metadata_filter: Optional[dict] = None,
                 k: int = 10) -> dict:
        """Three-stage retrieval: vector -> filter -> augment."""
        # Stage 1: Vector retrieval with metadata filter
        search_kwargs = {"k": k}
        if metadata_filter:
            search_kwargs["filter"] = metadata_filter

        results = self.vectorstore.similarity_search_with_score(
            query, **search_kwargs
        )

        # Stage 2: Check confidence
        if results:
            avg_score = sum(s for _, s in results) / len(results)
            top_score = results[0][1] if results else 0
        else:
            avg_score = 0
            top_score = 0

        docs = [{"text": doc.page_content, "metadata": doc.metadata,
                 "score": score} for doc, score in results]

        # Stage 3: Web augmentation if confidence is low
        web_context = []
        if top_score < self.threshold or len(results) < 3:
            web_resp = requests.post(
                "https://api.scavio.dev/api/v1/search",
                headers=HEADERS,
                json={"query": query, "num_results": 5,
                      "include_ai_overview": True},
            )
            web_data = web_resp.json()
            web_context = [
                {"text": r["snippet"], "source": r["link"], "score": 0.8}
                for r in web_data.get("organic_results", [])[:3]
            ]
            ai_overview = web_data.get("ai_overview", {})
            if ai_overview and ai_overview.get("text"):
                web_context.insert(0, {
                    "text": ai_overview["text"],
                    "source": "ai_overview",
                    "score": 0.85,
                })

        return {
            "internal_docs": docs[:5],
            "web_context": web_context,
            "augmented": len(web_context) > 0,
            "confidence": top_score,
        }

Metadata filtering for consulting documents

Python
# Consulting documents should have rich metadata
DOCUMENT_METADATA = {
    "client": "Acme Corp",
    "project_type": "market_sizing",
    "industry": "healthcare",
    "year": 2025,
    "author": "senior_analyst",
    "status": "final",  # draft, review, final
}

# Query with metadata filter narrows retrieval dramatically
result = rag.retrieve(
    query="healthcare market size APAC region",
    metadata_filter={
        "project_type": "market_sizing",
        "industry": "healthcare",
        "status": "final",
    },
    k=5,
)
# Instead of searching 10K docs, searches ~200 matching metadata
# Accuracy jumps from 60% to 85%+ with the filter

When web augmentation fires

Web augmentation should trigger for queries about:

  • Current market data (corpus has last year, need this year)
  • Competitor intelligence (changes frequently)
  • Regulatory updates (compliance requirements evolve)
  • Technology landscape (new tools and platforms)
  • Pricing benchmarks (inflation, market shifts)

Accuracy measurement framework

Python
def evaluate_rag_accuracy(test_set: list, rag: ConsultingRAG) -> dict:
    """Measure RAG accuracy with and without web augmentation."""
    results = {"correct": 0, "total": 0, "augmented_correct": 0,
               "augmented_total": 0, "internal_only_correct": 0}

    for test in test_set:
        query = test["query"]
        expected = test["expected_answer"]

        retrieval = rag.retrieve(query)

        # Check if correct answer is in retrieved context
        all_text = " ".join(
            d["text"] for d in retrieval["internal_docs"]
        ) + " ".join(d["text"] for d in retrieval["web_context"])

        is_correct = expected.lower() in all_text.lower()
        results["total"] += 1
        if is_correct:
            results["correct"] += 1
        if retrieval["augmented"]:
            results["augmented_total"] += 1
            if is_correct:
                results["augmented_correct"] += 1
        else:
            if is_correct:
                results["internal_only_correct"] += 1

    results["accuracy"] = results["correct"] / max(results["total"], 1)
    return results

Cost at consulting firm scale

  • Vector DB hosting (Pinecone): $70/mo for 1M vectors
  • Web augmentation: ~20% of queries need it
  • 500 analyst queries/day, 100 trigger web search = $0.50/day
  • Monthly search cost: ~$15
  • Total RAG infrastructure: ~$85/mo

Key takeaway

Large-corpus RAG needs three layers: metadata filtering to narrow the search space, vector retrieval for semantic matching, and web augmentation for coverage gaps. The web layer adds $15/month and pushes accuracy from the 60% range back above 85% for time-sensitive and out-of-corpus queries.