完成如何构建多提供商搜索以提高 rag 可靠性教程需要多长时间？

大多数开发者在15到30分钟内完成本教程。您需要一个Scavio API密钥（免费套餐即可）和可用的Python或JavaScript环境。

开始前需要准备什么？

Python 3.8+. 请求库. 来自 scavio.dev 的 Scavio API 密钥. 可选：用于后备的 Exa 和 Brave API 密钥. Scavio API密钥注册即送50个免费积分。

我可以用免费套餐运行本教程吗？

可以。免费套餐注册即送50个积分，完全足够完成本教程并构建一个可运行的原型解决方案。

这支持哪些框架？

Scavio提供原生LangChain包（langchain-scavio）、MCP服务器以及适用于任何HTTP客户端的REST API。本教程使用 the raw REST API, 但您可以根据需要适配您选择的框架。

RAG 的多提供商搜索 (2026)

当 RAG 的单一搜索提供商出现故障时，管道就会中断。本教程构建了一个多提供商搜索层，该搜索层将 Scavio 作为主要查询，并在失败时回退到 Exa 或 Brave。结果是您的 RAG 管道的搜索可用性接近 100%，无论哪个提供商提供查询服务，结果格式都保持一致。

前置条件

Python 3.8+
请求库
来自 scavio.dev 的 Scavio API 密钥
可选：用于后备的 Exa 和 Brave API 密钥

操作指南

步骤 1: 构建统一的搜索界面

创建一个搜索客户端，规范来自多个提供商的结果。

Python

import os, requests, time, json

class UnifiedSearch:
    def __init__(self):
        self.scavio_key = os.environ.get('SCAVIO_API_KEY', '')
        self.exa_key = os.environ.get('EXA_API_KEY', '')
        self.brave_key = os.environ.get('BRAVE_API_KEY', '')
        self.stats = {'scavio': 0, 'exa': 0, 'brave': 0, 'failures': 0}
    
    def _scavio(self, query, n=5):
        resp = requests.post('https://api.scavio.dev/api/v1/search',
            headers={'x-api-key': self.scavio_key, 'Content-Type': 'application/json'},
            json={'query': query, 'country_code': 'us'}, timeout=10)
        resp.raise_for_status()
        return [{'title': r.get('title', ''), 'url': r.get('link', ''),
                 'text': r.get('snippet', ''), 'source': 'scavio'}
                for r in resp.json().get('organic_results', [])[:n]]
    
    def _exa(self, query, n=5):
        if not self.exa_key: raise Exception('No Exa key')
        resp = requests.post('https://api.exa.ai/search',
            headers={'x-api-key': self.exa_key, 'Content-Type': 'application/json'},
            json={'query': query, 'numResults': n}, timeout=10)
        resp.raise_for_status()
        return [{'title': r.get('title', ''), 'url': r.get('url', ''),
                 'text': r.get('text', '')[:200], 'source': 'exa'}
                for r in resp.json().get('results', [])[:n]]
    
    def _brave(self, query, n=5):
        if not self.brave_key: raise Exception('No Brave key')
        resp = requests.get('https://api.search.brave.com/res/v1/web/search',
            headers={'X-Subscription-Token': self.brave_key},
            params={'q': query}, timeout=10)
        resp.raise_for_status()
        return [{'title': r.get('title', ''), 'url': r.get('url', ''),
                 'text': r.get('description', ''), 'source': 'brave'}
                for r in resp.json().get('web', {}).get('results', [])[:n]]
    
    def search(self, query, n=5):
        for name, fn in [('scavio', self._scavio), ('exa', self._exa), ('brave', self._brave)]:
            try:
                results = fn(query, n)
                self.stats[name] += 1
                return {'provider': name, 'results': results, 'count': len(results)}
            except Exception as e:
                self.stats['failures'] += 1
        return {'provider': 'none', 'results': [], 'count': 0}

usearch = UnifiedSearch()
result = usearch.search('rag pipeline best practices 2026')
print(f'Provider: {result["provider"]} | Results: {result["count"]}')
for r in result['results'][:3]:
    print(f'  [{r["source"]:7}] {r["title"][:50]}')

步骤 2: 添加 RAG 优化的结果格式

专门针对 RAG 上下文注入格式化搜索结果。

Python

def format_for_rag(search_result, max_context_chars=3000):
    """Format search results as RAG context with source tracking."""
    if not search_result.get('results'):
        return {'context': '', 'sources': [], 'char_count': 0}
    context_parts = []
    sources = []
    char_count = 0
    for i, r in enumerate(search_result['results']):
        source_ref = f'[{i+1}]'
        text = r.get('text', '').strip()
        if not text:
            continue
        entry = f'{source_ref} {text}'
        if char_count + len(entry) > max_context_chars:
            break
        context_parts.append(entry)
        sources.append({'ref': source_ref, 'title': r['title'][:60], 'url': r['url']})
        char_count += len(entry)
    context = '\n\n'.join(context_parts)
    return {
        'context': context,
        'sources': sources,
        'char_count': char_count,
        'provider': search_result.get('provider', 'unknown'),
    }

# Build RAG context
result = usearch.search('how to implement vector search python 2026')
rag = format_for_rag(result, max_context_chars=2000)
print(f'\n=== RAG Context ===')
print(f'  Provider: {rag["provider"]}')
print(f'  Context length: {rag["char_count"]} chars')
print(f'  Sources: {len(rag["sources"])}')
for s in rag['sources']:
    print(f'    {s["ref"]} {s["title"][:50]}')
    print(f'       {s["url"][:60]}')
print(f'\n  Context preview: {rag["context"][:150]}...')

步骤 3: 与 RAG 管道集成

将多提供商搜索插入 RAG 管道的检索步骤中。

Python

def rag_retrieve(question, max_sources=5):
    """RAG retrieval step using multi-provider search."""
    # Primary search
    result = usearch.search(question, n=max_sources)
    rag_context = format_for_rag(result)
    # If primary gives weak results, try refined query
    if len(rag_context['sources']) < 2:
        refined = usearch.search(f'{question} tutorial guide', n=max_sources)
        refined_context = format_for_rag(refined)
        if len(refined_context['sources']) > len(rag_context['sources']):
            rag_context = refined_context
    return rag_context

def rag_pipeline(question):
    """Full RAG pipeline: retrieve, format, generate."""
    print(f'\n  Question: {question}')
    # Step 1: Retrieve
    context = rag_retrieve(question)
    print(f'  Retrieved: {len(context["sources"])} sources via {context["provider"]}')
    print(f'  Context: {context["char_count"]} chars')
    # Step 2: Would pass to LLM here
    print(f'  Sources for citation:')
    for s in context['sources']:
        print(f'    {s["ref"]} {s["title"][:45]}')
    return context

print('=== Multi-Provider RAG Pipeline ===')
for q in ['best vector database 2026', 'how to optimize rag pipeline']:
    rag_pipeline(q)

print(f'\n  Provider stats: {json.dumps(usearch.stats)}')
print(f'  Primary: Scavio $0.005/query')
print(f'  Fallback: Exa $0.007/query, Brave ~$0.005/query')
print(f'  Uptime target: 99.9% with multi-provider')

Python 示例

Python

import os, requests
SH = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

def rag_search(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'country_code': 'us'}, timeout=10).json()
    context = '\n'.join(r.get('snippet', '') for r in data.get('organic_results', [])[:3])
    sources = [r.get('link', '') for r in data.get('organic_results', [])[:3]]
    return {'context': context, 'sources': sources}

result = rag_search('vector database comparison')
print(f'Context: {len(result["context"])} chars, Sources: {len(result["sources"])}')

JavaScript 示例

JavaScript

const SH = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
const data = await fetch('https://api.scavio.dev/api/v1/search', {
  method: 'POST', headers: SH,
  body: JSON.stringify({ query: 'vector database comparison', country_code: 'us' })
}).then(r => r.json());
const context = (data.organic_results || []).slice(0, 3).map(r => r.snippet).join('\n');
console.log(`Context: ${context.length} chars`);

预期输出

JSON

Provider: scavio | Results: 5
  [scavio ] Best Vector Databases for AI in 2026 - Compari
  [scavio ] Pinecone vs Weaviate vs Qdrant - Complete Guide

=== RAG Context ===
  Provider: scavio
  Context length: 1,450 chars
  Sources: 5
    [1] Best Vector Databases for AI in 2026
       https://...

=== Multi-Provider RAG Pipeline ===

  Question: best vector database 2026
  Retrieved: 5 sources via scavio
  Context: 1,450 chars

  Provider stats: {"scavio": 4, "exa": 0, "brave": 0, "failures": 0}
  Primary: Scavio $0.005/query
  Uptime target: 99.9% with multi-provider

前置条件

Python 3.8+
请求库
来自 scavio.dev 的 Scavio API 密钥
可选：用于后备的 Exa 和 Brave API 密钥

操作指南

步骤 1: 构建统一的搜索界面

创建一个搜索客户端，规范来自多个提供商的结果。

Python

import os, requests, time, json

class UnifiedSearch:
    def __init__(self):
        self.scavio_key = os.environ.get('SCAVIO_API_KEY', '')
        self.exa_key = os.environ.get('EXA_API_KEY', '')
        self.brave_key = os.environ.get('BRAVE_API_KEY', '')
        self.stats = {'scavio': 0, 'exa': 0, 'brave': 0, 'failures': 0}
    
    def _scavio(self, query, n=5):
        resp = requests.post('https://api.scavio.dev/api/v1/search',
            headers={'x-api-key': self.scavio_key, 'Content-Type': 'application/json'},
            json={'query': query, 'country_code': 'us'}, timeout=10)
        resp.raise_for_status()
        return [{'title': r.get('title', ''), 'url': r.get('link', ''),
                 'text': r.get('snippet', ''), 'source': 'scavio'}
                for r in resp.json().get('organic_results', [])[:n]]
    
    def _exa(self, query, n=5):
        if not self.exa_key: raise Exception('No Exa key')
        resp = requests.post('https://api.exa.ai/search',
            headers={'x-api-key': self.exa_key, 'Content-Type': 'application/json'},
            json={'query': query, 'numResults': n}, timeout=10)
        resp.raise_for_status()
        return [{'title': r.get('title', ''), 'url': r.get('url', ''),
                 'text': r.get('text', '')[:200], 'source': 'exa'}
                for r in resp.json().get('results', [])[:n]]
    
    def _brave(self, query, n=5):
        if not self.brave_key: raise Exception('No Brave key')
        resp = requests.get('https://api.search.brave.com/res/v1/web/search',
            headers={'X-Subscription-Token': self.brave_key},
            params={'q': query}, timeout=10)
        resp.raise_for_status()
        return [{'title': r.get('title', ''), 'url': r.get('url', ''),
                 'text': r.get('description', ''), 'source': 'brave'}
                for r in resp.json().get('web', {}).get('results', [])[:n]]
    
    def search(self, query, n=5):
        for name, fn in [('scavio', self._scavio), ('exa', self._exa), ('brave', self._brave)]:
            try:
                results = fn(query, n)
                self.stats[name] += 1
                return {'provider': name, 'results': results, 'count': len(results)}
            except Exception as e:
                self.stats['failures'] += 1
        return {'provider': 'none', 'results': [], 'count': 0}

usearch = UnifiedSearch()
result = usearch.search('rag pipeline best practices 2026')
print(f'Provider: {result["provider"]} | Results: {result["count"]}')
for r in result['results'][:3]:
    print(f'  [{r["source"]:7}] {r["title"][:50]}')

步骤 2: 添加 RAG 优化的结果格式

专门针对 RAG 上下文注入格式化搜索结果。

Python

def format_for_rag(search_result, max_context_chars=3000):
    """Format search results as RAG context with source tracking."""
    if not search_result.get('results'):
        return {'context': '', 'sources': [], 'char_count': 0}
    context_parts = []
    sources = []
    char_count = 0
    for i, r in enumerate(search_result['results']):
        source_ref = f'[{i+1}]'
        text = r.get('text', '').strip()
        if not text:
            continue
        entry = f'{source_ref} {text}'
        if char_count + len(entry) > max_context_chars:
            break
        context_parts.append(entry)
        sources.append({'ref': source_ref, 'title': r['title'][:60], 'url': r['url']})
        char_count += len(entry)
    context = '\n\n'.join(context_parts)
    return {
        'context': context,
        'sources': sources,
        'char_count': char_count,
        'provider': search_result.get('provider', 'unknown'),
    }

# Build RAG context
result = usearch.search('how to implement vector search python 2026')
rag = format_for_rag(result, max_context_chars=2000)
print(f'\n=== RAG Context ===')
print(f'  Provider: {rag["provider"]}')
print(f'  Context length: {rag["char_count"]} chars')
print(f'  Sources: {len(rag["sources"])}')
for s in rag['sources']:
    print(f'    {s["ref"]} {s["title"][:50]}')
    print(f'       {s["url"][:60]}')
print(f'\n  Context preview: {rag["context"][:150]}...')

步骤 3: 与 RAG 管道集成

将多提供商搜索插入 RAG 管道的检索步骤中。

Python

def rag_retrieve(question, max_sources=5):
    """RAG retrieval step using multi-provider search."""
    # Primary search
    result = usearch.search(question, n=max_sources)
    rag_context = format_for_rag(result)
    # If primary gives weak results, try refined query
    if len(rag_context['sources']) < 2:
        refined = usearch.search(f'{question} tutorial guide', n=max_sources)
        refined_context = format_for_rag(refined)
        if len(refined_context['sources']) > len(rag_context['sources']):
            rag_context = refined_context
    return rag_context

def rag_pipeline(question):
    """Full RAG pipeline: retrieve, format, generate."""
    print(f'\n  Question: {question}')
    # Step 1: Retrieve
    context = rag_retrieve(question)
    print(f'  Retrieved: {len(context["sources"])} sources via {context["provider"]}')
    print(f'  Context: {context["char_count"]} chars')
    # Step 2: Would pass to LLM here
    print(f'  Sources for citation:')
    for s in context['sources']:
        print(f'    {s["ref"]} {s["title"][:45]}')
    return context

print('=== Multi-Provider RAG Pipeline ===')
for q in ['best vector database 2026', 'how to optimize rag pipeline']:
    rag_pipeline(q)

print(f'\n  Provider stats: {json.dumps(usearch.stats)}')
print(f'  Primary: Scavio $0.005/query')
print(f'  Fallback: Exa $0.007/query, Brave ~$0.005/query')
print(f'  Uptime target: 99.9% with multi-provider')

Python 示例

Python

import os, requests
SH = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

def rag_search(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'country_code': 'us'}, timeout=10).json()
    context = '\n'.join(r.get('snippet', '') for r in data.get('organic_results', [])[:3])
    sources = [r.get('link', '') for r in data.get('organic_results', [])[:3]]
    return {'context': context, 'sources': sources}

result = rag_search('vector database comparison')
print(f'Context: {len(result["context"])} chars, Sources: {len(result["sources"])}')

JavaScript 示例

JavaScript

const SH = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
const data = await fetch('https://api.scavio.dev/api/v1/search', {
  method: 'POST', headers: SH,
  body: JSON.stringify({ query: 'vector database comparison', country_code: 'us' })
}).then(r => r.json());
const context = (data.organic_results || []).slice(0, 3).map(r => r.snippet).join('\n');
console.log(`Context: ${context.length} chars`);

预期输出

JSON

Provider: scavio | Results: 5
  [scavio ] Best Vector Databases for AI in 2026 - Compari
  [scavio ] Pinecone vs Weaviate vs Qdrant - Complete Guide

=== RAG Context ===
  Provider: scavio
  Context length: 1,450 chars
  Sources: 5
    [1] Best Vector Databases for AI in 2026
       https://...

=== Multi-Provider RAG Pipeline ===

  Question: best vector database 2026
  Retrieved: 5 sources via scavio
  Context: 1,450 chars

  Provider stats: {"scavio": 4, "exa": 0, "brave": 0, "failures": 0}
  Primary: Scavio $0.005/query
  Uptime target: 99.9% with multi-provider

如何构建多提供商搜索以提高 RAG 可靠性

前置条件

操作指南

步骤 1: 构建统一的搜索界面

步骤 2: 添加 RAG 优化的结果格式

步骤 3: 与 RAG 管道集成

Python 示例

JavaScript 示例

预期输出

相关教程

常见问题

完成如何构建多提供商搜索以提高 rag 可靠性教程需要多长时间？

开始前需要准备什么？

我可以用免费套餐运行本教程吗？

这支持哪些框架？

相关资源

2026年5月LangChain RAG流水线最佳搜索API

2026 年 5 月测试 RAG 搜索质量的最佳工具

大型RAG语料库构建方案（1000万Token）

用搜索支撑提升RAG回答质量

RAG Corpus 构建工作流程（10M 代币）

爬取 vs 搜索构建 RAG

开始构建

如何构建多提供商搜索以提高 RAG 可靠性

前置条件

操作指南

步骤 1: 构建统一的搜索界面

步骤 2: 添加 RAG 优化的结果格式

步骤 3: 与 RAG 管道集成

Python 示例

JavaScript 示例

预期输出

相关教程

常见问题

完成如何构建多提供商搜索以提高 rag 可靠性教程需要多长时间？

开始前需要准备什么？

我可以用免费套餐运行本教程吗？

这支持哪些框架？

相关资源

2026年5月LangChain RAG流水线最佳搜索API

2026 年 5 月测试 RAG 搜索质量的最佳工具

大型RAG语料库构建方案（1000万Token）

用搜索支撑提升RAG回答质量

RAG Corpus 构建工作流程（10M 代币）

爬取 vs 搜索构建 RAG

开始构建