受监管行业聊天机器人(银行、保险、医疗保健)需要具有 PII 屏蔽、严格源控制和审计日志记录的 RAG。本教程使用 Scavio 进行公共网络基础和本地 PII 清理器来构建该架构。
前置条件
- Python 3.10+
- Scavio API 密钥
- Presidio 或同等 PII 检测器
- Postgres 与 pgvector 用于引用日志
操作指南
步骤 1: 清理 PII 输入
切勿将 PII 发送到下游 API。
Python
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
def scrub(text):
results = analyzer.analyze(text=text, language='en')
return anonymizer.anonymize(text=text, analyzer_results=results).text步骤 2: 通过 Scavio 找到答案
获取权威的公共内容来补充内部文档。
Python
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
def ground(question):
r = requests.post('https://api.scavio.dev/api/v1/search',
headers={'x-api-key': API_KEY},
json={'query': question, 'platform': 'google'})
return r.json().get('organic_results', [])[:5]步骤 3: 记录每次引用
监管机构希望每条聊天机器人声明都有出处。
Python
import psycopg2, json
def log_citation(session_id, question, sources):
conn = psycopg2.connect(os.environ['DATABASE_URL'])
with conn.cursor() as c:
c.execute('INSERT INTO citations(session_id, question, sources_json) VALUES (%s, %s, %s)',
(session_id, question, json.dumps(sources)))
conn.commit()步骤 4: 使用受保护的提示撰写答案
指示法学硕士引用每项主张并在不确定时拒绝。
Text
SYSTEM = '''You are a regulated-industry assistant.
Rules:
1. Cite every factual claim with [source N].
2. If unsure, say "I cannot answer with confidence."
3. Never repeat customer PII back in the answer.'''步骤 5: 强制执行人工审核路径
低置信度的答案路线给人类。每个响应都有一个风险评分。
Python
def risk_score(answer, sources):
if not sources: return 1.0
if 'cannot answer' in answer.lower(): return 0.2
return 0.5Python 示例
Python
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
def rag_answer(question):
clean = scrub(question)
sources = ground(clean)
log_citation('sess-1', clean, sources)
return {'question': clean, 'sources': sources}
print(rag_answer('What are the KYC rules for my account?'))JavaScript 示例
JavaScript
const API_KEY = process.env.SCAVIO_API_KEY;
export async function ragAnswer(question) {
const r = await fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST',
headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
body: JSON.stringify({ query: question, platform: 'google' })
});
const sources = ((await r.json()).organic_results || []).slice(0, 5);
return { question, sources };
}预期输出
JSON
PII-scrubbed questions, typed public-web citations, durable audit log of every answer. Compliance team gets a per-session export on demand.