RAG on Large Consulting Datasets: Accuracy Guide
RAG accuracy drops from 85% to 60% at 10K+ documents. Metadata filtering plus web augmentation pushes it back above 85%.
RAG accuracy on large consulting datasets (10K+ documents) drops from 85% to 60% as corpus size grows because retriever recall degrades -- relevant chunks get buried under semantically similar but wrong documents. The fix: combine vector retrieval with structured search re-ranking and web augmentation for questions that exceed corpus coverage.
Why large RAG pipelines lose accuracy
At 500 documents, top-5 retrieval finds the right chunk 80%+ of the time. At 10,000 documents, the same query returns chunks from similar-topic documents that are not the right answer. A consulting firm with 15 years of client reports has thousands of documents about "market sizing" -- the retriever cannot distinguish which market sizing report is relevant without additional signals.
Hybrid retrieval: vector + metadata + search
import os, requests
from typing import Optional
SCAVIO_KEY = os.environ["SCAVIO_API_KEY"]
HEADERS = {"x-api-key": SCAVIO_KEY}
class ConsultingRAG:
def __init__(self, vectorstore, confidence_threshold: float = 0.72):
self.vectorstore = vectorstore
self.threshold = confidence_threshold
def retrieve(self, query: str, metadata_filter: Optional[dict] = None,
k: int = 10) -> dict:
"""Three-stage retrieval: vector -> filter -> augment."""
# Stage 1: Vector retrieval with metadata filter
search_kwargs = {"k": k}
if metadata_filter:
search_kwargs["filter"] = metadata_filter
results = self.vectorstore.similarity_search_with_score(
query, **search_kwargs
)
# Stage 2: Check confidence
if results:
avg_score = sum(s for _, s in results) / len(results)
top_score = results[0][1] if results else 0
else:
avg_score = 0
top_score = 0
docs = [{"text": doc.page_content, "metadata": doc.metadata,
"score": score} for doc, score in results]
# Stage 3: Web augmentation if confidence is low
web_context = []
if top_score < self.threshold or len(results) < 3:
web_resp = requests.post(
"https://api.scavio.dev/api/v1/search",
headers=HEADERS,
json={"query": query, "num_results": 5,
"include_ai_overview": True},
)
web_data = web_resp.json()
web_context = [
{"text": r["snippet"], "source": r["link"], "score": 0.8}
for r in web_data.get("organic_results", [])[:3]
]
ai_overview = web_data.get("ai_overview", {})
if ai_overview and ai_overview.get("text"):
web_context.insert(0, {
"text": ai_overview["text"],
"source": "ai_overview",
"score": 0.85,
})
return {
"internal_docs": docs[:5],
"web_context": web_context,
"augmented": len(web_context) > 0,
"confidence": top_score,
}Metadata filtering for consulting documents
# Consulting documents should have rich metadata
DOCUMENT_METADATA = {
"client": "Acme Corp",
"project_type": "market_sizing",
"industry": "healthcare",
"year": 2025,
"author": "senior_analyst",
"status": "final", # draft, review, final
}
# Query with metadata filter narrows retrieval dramatically
result = rag.retrieve(
query="healthcare market size APAC region",
metadata_filter={
"project_type": "market_sizing",
"industry": "healthcare",
"status": "final",
},
k=5,
)
# Instead of searching 10K docs, searches ~200 matching metadata
# Accuracy jumps from 60% to 85%+ with the filterWhen web augmentation fires
Web augmentation should trigger for queries about:
- Current market data (corpus has last year, need this year)
- Competitor intelligence (changes frequently)
- Regulatory updates (compliance requirements evolve)
- Technology landscape (new tools and platforms)
- Pricing benchmarks (inflation, market shifts)
Accuracy measurement framework
def evaluate_rag_accuracy(test_set: list, rag: ConsultingRAG) -> dict:
"""Measure RAG accuracy with and without web augmentation."""
results = {"correct": 0, "total": 0, "augmented_correct": 0,
"augmented_total": 0, "internal_only_correct": 0}
for test in test_set:
query = test["query"]
expected = test["expected_answer"]
retrieval = rag.retrieve(query)
# Check if correct answer is in retrieved context
all_text = " ".join(
d["text"] for d in retrieval["internal_docs"]
) + " ".join(d["text"] for d in retrieval["web_context"])
is_correct = expected.lower() in all_text.lower()
results["total"] += 1
if is_correct:
results["correct"] += 1
if retrieval["augmented"]:
results["augmented_total"] += 1
if is_correct:
results["augmented_correct"] += 1
else:
if is_correct:
results["internal_only_correct"] += 1
results["accuracy"] = results["correct"] / max(results["total"], 1)
return resultsCost at consulting firm scale
- Vector DB hosting (Pinecone): $70/mo for 1M vectors
- Web augmentation: ~20% of queries need it
- 500 analyst queries/day, 100 trigger web search = $0.50/day
- Monthly search cost: ~$15
- Total RAG infrastructure: ~$85/mo
Key takeaway
Large-corpus RAG needs three layers: metadata filtering to narrow the search space, vector retrieval for semantic matching, and web augmentation for coverage gaps. The web layer adds $15/month and pushes accuracy from the 60% range back above 85% for time-sensitive and out-of-corpus queries.