Tutorial

How to Build a RAG Chatbot for Regulated Industries

Build a compliant RAG chatbot for banking or healthcare with PII masking, audit logs, and Scavio for public-web grounding.

Regulated-industry chatbots (banking, insurance, healthcare) need RAG with PII masking, strict source control, and audit logging. This tutorial builds that architecture using Scavio for public-web grounding and a local PII scrubber.

Prerequisites

  • Python 3.10+
  • A Scavio API key
  • Presidio or equivalent PII detector
  • Postgres with pgvector for citation logs

Walkthrough

Step 1: Scrub inputs for PII

Never send PII to downstream APIs.

Python
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

def scrub(text):
    results = analyzer.analyze(text=text, language='en')
    return anonymizer.anonymize(text=text, analyzer_results=results).text

Step 2: Ground the answer with Scavio

Fetch authoritative public content to complement internal docs.

Python
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']

def ground(question):
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': question, 'platform': 'google'})
    return r.json().get('organic_results', [])[:5]

Step 3: Log every citation

Regulators want provenance for every chatbot statement.

Python
import psycopg2, json
def log_citation(session_id, question, sources):
    conn = psycopg2.connect(os.environ['DATABASE_URL'])
    with conn.cursor() as c:
        c.execute('INSERT INTO citations(session_id, question, sources_json) VALUES (%s, %s, %s)',
            (session_id, question, json.dumps(sources)))
    conn.commit()

Step 4: Compose the answer with guarded prompt

Instruct the LLM to cite every claim and refuse when unsure.

Text
SYSTEM = '''You are a regulated-industry assistant.
Rules:
1. Cite every factual claim with [source N].
2. If unsure, say "I cannot answer with confidence."
3. Never repeat customer PII back in the answer.'''

Step 5: Enforce a human-review path

Low-confidence answers route to a human. Every response has a risk score.

Python
def risk_score(answer, sources):
    if not sources: return 1.0
    if 'cannot answer' in answer.lower(): return 0.2
    return 0.5

Python Example

Python
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']

def rag_answer(question):
    clean = scrub(question)
    sources = ground(clean)
    log_citation('sess-1', clean, sources)
    return {'question': clean, 'sources': sources}

print(rag_answer('What are the KYC rules for my account?'))

JavaScript Example

JavaScript
const API_KEY = process.env.SCAVIO_API_KEY;
export async function ragAnswer(question) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ query: question, platform: 'google' })
  });
  const sources = ((await r.json()).organic_results || []).slice(0, 5);
  return { question, sources };
}

Expected Output

JSON
PII-scrubbed questions, typed public-web citations, durable audit log of every answer. Compliance team gets a per-session export on demand.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.10+. A Scavio API key. Presidio or equivalent PII detector. Postgres with pgvector for citation logs. A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Build a compliant RAG chatbot for banking or healthcare with PII masking, audit logs, and Scavio for public-web grounding.