在 llama.cpp、Ollama 或 vLLM 上运行的本地 LLM 功能强大,但及时冻结。他们会产生当前事件、最新发布和实时数据的幻觉,因为他们的训练数据有一个截止点。添加搜索 API 为他们提供了实时基础:在回答之前,法学硕士会搜索网络并使用最新结果作为上下文。本教程适用于任何与 OpenAI 兼容的本地 LLM 端点。成本:每个接地答案 0.005 美元。
前置条件
- 正在运行的本地 LLM(Ollama、llama.cpp 服务器或 vLLM)
- 已安装 Python 3.9+
- 请求已安装库
- 来自 scavio.dev 的 Scavio API 密钥
操作指南
步骤 1: 连接到您当地的 LLM
设置与本地 LLM 的连接。可与任何 OpenAI 兼容端点(Ollama、llama.cpp 服务器、vLLM)配合使用。
import requests
# Common local LLM endpoints:
# Ollama: http://localhost:11434/v1/chat/completions
# llama.cpp: http://localhost:8080/v1/chat/completions
# vLLM: http://localhost:8000/v1/chat/completions
LLM_URL = 'http://localhost:11434/v1/chat/completions' # Ollama default
LLM_MODEL = 'llama3' # or 'mistral', 'codellama', etc.
def ask_llm(messages: list, max_tokens: int = 512) -> str:
resp = requests.post(LLM_URL, json={
'model': LLM_MODEL,
'messages': messages,
'max_tokens': max_tokens,
'temperature': 0.3
}, timeout=120)
return resp.json()['choices'][0]['message']['content']
# Test connection
try:
answer = ask_llm([{'role': 'user', 'content': 'Say hello in one word.'}], max_tokens=10)
print(f'LLM connected: {answer}')
except Exception as e:
print(f'LLM connection error: {e}')
print('Make sure Ollama/llama.cpp is running.')步骤 2: 添加搜索接地功能
构建一个搜索网络并将结果格式化为法学硕士上下文的函数。法学硕士只能看到搜索片段,不能看到完整的页面。
import os
SCAVIO_KEY = os.environ['SCAVIO_API_KEY']
def search_context(query: str, count: int = 5) -> str:
"""Search the web and return formatted context for the LLM."""
resp = requests.post('https://api.scavio.dev/api/v1/search',
headers={'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json'},
json={'query': query, 'country_code': 'us', 'num_results': count})
results = resp.json().get('organic_results', [])
if not results:
return 'No search results found.'
context = 'Search results (use these to answer accurately):\n\n'
for i, r in enumerate(results, 1):
context += f'[{i}] {r["title"]}\n'
context += f' {r.get("snippet", "")}\n'
context += f' Source: {r["link"]}\n\n'
return context
# Test
ctx = search_context('Python 3.14 release date')
print(ctx[:300])步骤 3: 建立接地气的答案管道
将搜索和 LLM 合并为一个功能。法学硕士接收搜索上下文,并且必须在其答案中引用来源。
def grounded_answer(question: str) -> dict:
"""Answer a question using search-grounded local LLM."""
# Step 1: Search for context
context = search_context(question, count=5)
# Step 2: Ask LLM with context
messages = [
{'role': 'system', 'content': (
'You are a helpful assistant. Answer ONLY based on the search results provided. '
'Cite sources as [1], [2], etc. If the search results do not contain the answer, '
'say "I could not find this information in the search results."'
)},
{'role': 'user', 'content': f'{context}\nQuestion: {question}'}
]
answer = ask_llm(messages, max_tokens=512)
return {
'question': question,
'answer': answer,
'grounded': True,
'search_cost': 0.005
}
# Test with a question that requires current data
result = grounded_answer('What is the latest version of Python?')
print(f'Q: {result["question"]}')
print(f'A: {result["answer"]}')
print(f'Grounded: {result["grounded"]}, Cost: ${result["search_cost"]}')步骤 4: 添加智能接地(仅在需要时搜索)
并不是每个问题都需要搜索。添加一项检查,决定是否通过搜索或直接回答进行接地,从而节省成本。
def needs_grounding(question: str) -> bool:
"""Heuristic: does this question need real-time data?"""
grounding_triggers = [
'latest', 'current', 'today', '2026', '2025', 'now',
'price', 'cost', 'version', 'release', 'new', 'update',
'best', 'top', 'compare', 'vs', 'alternative',
'how much', 'where to', 'who is',
]
q_lower = question.lower()
return any(trigger in q_lower for trigger in grounding_triggers)
def smart_answer(question: str) -> dict:
"""Answer with search grounding only when needed."""
if needs_grounding(question):
return grounded_answer(question)
# Direct LLM answer (no search cost)
messages = [{'role': 'user', 'content': question}]
answer = ask_llm(messages, max_tokens=512)
return {
'question': question,
'answer': answer,
'grounded': False,
'search_cost': 0
}
# Test both paths
for q in ['What is a Python list comprehension?',
'What is the latest Python version in 2026?']:
result = smart_answer(q)
print(f'[{"GROUNDED" if result["grounded"] else "DIRECT"}] '
f'${result["search_cost"]} - {q}')
print(f' {result["answer"][:100]}...')
print()Python 示例
import requests, os
LLM_URL = 'http://localhost:11434/v1/chat/completions'
SCAVIO_KEY = os.environ['SCAVIO_API_KEY']
def search(query, count=5):
resp = requests.post('https://api.scavio.dev/api/v1/search',
headers={'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json'},
json={'query': query, 'country_code': 'us', 'num_results': count})
return resp.json().get('organic_results', [])
def grounded_ask(question):
results = search(question)
ctx = '\n'.join(f'[{i+1}] {r["title"]}: {r.get("snippet","")}' for i, r in enumerate(results))
resp = requests.post(LLM_URL, json={'model': 'llama3', 'messages': [
{'role': 'system', 'content': 'Answer from search results. Cite [1],[2].'},
{'role': 'user', 'content': f'{ctx}\n\nQ: {question}'}], 'max_tokens': 512})
return resp.json()['choices'][0]['message']['content']
print(grounded_ask('latest Python version 2026'))JavaScript 示例
const LLM_URL = 'http://localhost:11434/v1/chat/completions';
const SCAVIO_KEY = process.env.SCAVIO_API_KEY;
async function groundedAsk(question) {
const searchResp = await fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST',
headers: { 'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json' },
body: JSON.stringify({ query: question, country_code: 'us', num_results: 5 })
});
const results = (await searchResp.json()).organic_results || [];
const ctx = results.map((r, i) => `[${i+1}] ${r.title}: ${r.snippet || ''}`).join('\n');
const llmResp = await fetch(LLM_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model: 'llama3', messages: [
{ role: 'system', content: 'Answer from search results. Cite [1],[2].' },
{ role: 'user', content: `${ctx}\n\nQ: ${question}` }], max_tokens: 512 })
});
return (await llmResp.json()).choices[0].message.content;
}
groundedAsk('latest Python version 2026').then(console.log);预期输出
LLM connected: Hello
Search results (use these to answer accurately):
[1] Python Release Python 3.14.0
Python 3.14.0 was released on October 7, 2025...
Source: https://www.python.org/downloads/release/python-3140/
Q: What is the latest version of Python?
A: According to the search results, the latest version of Python is 3.14.0,
released on October 7, 2025 [1].
[DIRECT] $0 - What is a Python list comprehension?
A list comprehension is a concise way to create lists...
[GROUNDED] $0.005 - What is the latest Python version in 2026?
The latest Python version is 3.14.0, released October 2025 [1]...