MongoDB Text Search vs Search API for Knowledge Bases
MongoDB searches your data. Search APIs search the web. For KBs you need both: internal docs plus external context filling the gaps.
MongoDB text indexes search your own data. Search APIs search the web. For knowledge bases, you need both: MongoDB for your internal documents and a search API for external context that fills the gaps your KB does not cover.
MongoDB Text Search: What It Does
MongoDB's text index enables full-text search across string fields in your collections. It supports stemming, stop words, text scores, and phrase matching. MongoDB Atlas Search (built on Lucene) adds fuzzy matching, autocomplete, and faceted search. Both search only data you have already ingested into MongoDB.
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017")
db = client["knowledge_base"]
articles = db["articles"]
# Create text index
articles.create_index([("title", "text"), ("content", "text")])
# Search internal KB
results = articles.find(
{"$text": {"$search": "api rate limiting best practices"}},
{"score": {"$meta": "textScore"}}
).sort([("score", {"$meta": "textScore"})]).limit(5)
for doc in results:
print(f"{doc['title']} (score: {doc['score']:.2f})")When MongoDB Text Search Falls Short
Your KB answers questions about what you have documented. It cannot answer questions about what you have not documented: competitor updates, new regulations, recent community discussions, or current pricing that changes weekly. For these, you need external search.
Hybrid KB Architecture
Search MongoDB first. If the results are insufficient (low text score, no results, or the query mentions something external), fall back to a web search API. Merge both result sets for the final answer.
import requests, os
H = {"x-api-key": os.environ["SCAVIO_API_KEY"]}
def hybrid_kb_search(query, min_score=1.0):
"""Search internal KB first, fall back to web."""
# Step 1: Internal KB
internal = list(articles.find(
{"$text": {"$search": query}},
{"score": {"$meta": "textScore"}, "title": 1, "content": 1}
).sort([("score", {"$meta": "textScore"})]).limit(5))
if internal and internal[0].get("score", 0) >= min_score:
return {
"source": "internal",
"results": [
{"title": d["title"], "snippet": d["content"][:200]}
for d in internal
]
}
# Step 2: Web search fallback
web_r = requests.post("https://api.scavio.dev/api/v1/search",
headers=H,
json={"platform": "google", "query": query},
timeout=10
).json()
return {
"source": "web",
"results": [
{"title": r["title"], "snippet": r.get("snippet", "")}
for r in web_r.get("organic", [])[:5]
]
}
result = hybrid_kb_search("scavio api rate limits")
print(f"Source: {result['source']}, Results: {len(result['results'])}")Keeping Your KB Fresh
The gap between internal and external search is a freshness gap. Schedule periodic web searches for your KB's core topics and ingest the results into MongoDB. This keeps your KB current without manual curation.
def refresh_kb_topic(topic):
"""Search web for topic, ingest into MongoDB."""
r = requests.post("https://api.scavio.dev/api/v1/search",
headers=H,
json={"platform": "google", "query": f"{topic} 2026"},
timeout=10
).json()
for item in r.get("organic", [])[:5]:
articles.update_one(
{"source_url": item["link"]},
{"$set": {
"title": item["title"],
"content": item.get("snippet", ""),
"source_url": item["link"],
"source": "web_refresh",
"refreshed_at": "2026-05-08",
}},
upsert=True
)
print(f"Refreshed {topic}: {len(r.get('organic', []))} results ingested")
refresh_kb_topic("search api pricing")Atlas Search vs Text Index
MongoDB Atlas Search (Lucene-based) supports fuzzy matching, synonyms, and autocomplete. Standard text indexes are simpler but less flexible. For KBs with user-facing search, Atlas Search is worth the upgrade. For backend pipelines that just need keyword matching, the standard text index is sufficient.
Cost Comparison
MongoDB Atlas Search: included in Atlas pricing (M10+ clusters). Standard text index: free on any MongoDB deployment. Scavio web search: $0.005/credit, 250 free/month. The hybrid approach adds minimal cost -- you only hit the web API when your internal KB cannot answer the query.