Regulated-industry chatbots (banking, insurance, healthcare) need RAG with PII masking, strict source control, and audit logging. This tutorial builds that architecture using Scavio for public-web grounding and a local PII scrubber.
Prerequisites
- Python 3.10+
- A Scavio API key
- Presidio or equivalent PII detector
- Postgres with pgvector for citation logs
Walkthrough
Step 1: Scrub inputs for PII
Never send PII to downstream APIs.
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
def scrub(text):
results = analyzer.analyze(text=text, language='en')
return anonymizer.anonymize(text=text, analyzer_results=results).textStep 2: Ground the answer with Scavio
Fetch authoritative public content to complement internal docs.
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
def ground(question):
r = requests.post('https://api.scavio.dev/api/v1/search',
headers={'x-api-key': API_KEY},
json={'query': question, 'platform': 'google'})
return r.json().get('organic_results', [])[:5]Step 3: Log every citation
Regulators want provenance for every chatbot statement.
import psycopg2, json
def log_citation(session_id, question, sources):
conn = psycopg2.connect(os.environ['DATABASE_URL'])
with conn.cursor() as c:
c.execute('INSERT INTO citations(session_id, question, sources_json) VALUES (%s, %s, %s)',
(session_id, question, json.dumps(sources)))
conn.commit()Step 4: Compose the answer with guarded prompt
Instruct the LLM to cite every claim and refuse when unsure.
SYSTEM = '''You are a regulated-industry assistant.
Rules:
1. Cite every factual claim with [source N].
2. If unsure, say "I cannot answer with confidence."
3. Never repeat customer PII back in the answer.'''Step 5: Enforce a human-review path
Low-confidence answers route to a human. Every response has a risk score.
def risk_score(answer, sources):
if not sources: return 1.0
if 'cannot answer' in answer.lower(): return 0.2
return 0.5Python Example
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
def rag_answer(question):
clean = scrub(question)
sources = ground(clean)
log_citation('sess-1', clean, sources)
return {'question': clean, 'sources': sources}
print(rag_answer('What are the KYC rules for my account?'))JavaScript Example
const API_KEY = process.env.SCAVIO_API_KEY;
export async function ragAnswer(question) {
const r = await fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST',
headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
body: JSON.stringify({ query: question, platform: 'google' })
});
const sources = ((await r.json()).organic_results || []).slice(0, 5);
return { question, sources };
}Expected Output
PII-scrubbed questions, typed public-web citations, durable audit log of every answer. Compliance team gets a per-session export on demand.