llm-wikiingestionsearch-api

LLM Wiki: One API vs Five Tools (2026)

r/AI_Agents asked for 5 different tools for Karpathy LLM Wiki. A multi-platform search API covers web + YouTube + Reddit in one call. When one API wins vs specialized tools.

May 4, 2026

4 min read

The original post in r/AI_Agents asked for an LLM wiki architecture inspired by Karpathy's vision: a system that can search the web, scrape pages, pull YouTube transcripts, parse PDFs, and extract tables. Five different capabilities. The natural instinct is five different tools. But sometimes one API covers most of the surface area.

The five-tool approach

The explicit stack from the thread: a web search API for general queries, a scraping tool for full-page content, a YouTube transcript API, a PDF parsing library, and a table extraction service. Each tool has its own auth, its own response format, its own rate limits, and its own failure modes. Five integrations means five things that can break independently.

What one multi-platform search API can cover

A search API that supports multiple platforms can handle web search, YouTube search, and Reddit search in a single integration. That covers three of the five needs immediately. You get structured results with titles, URLs, descriptions, and metadata across all three platforms with one API key and one response format.

Python

import requests, os

API = 'https://api.scavio.dev/api/v1/search'
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def multi_search(query, platforms=['web', 'youtube', 'reddit']):
    """Search across multiple platforms with one API."""
    results = {}
    for platform in platforms:
        r = requests.post(API, headers=H, json={
            'query': query,
            'search_type': platform,
            'num_results': 5
        })
        results[platform] = r.json().get('results', [])
    return results

# One query, three platforms, one API key
data = multi_search('transformer architecture explained')
for platform, hits in data.items():
    print(f"\n{platform}: {len(hits)} results")
    for h in hits[:2]:
        print(f"  - {h.get('title', 'N/A')}")

When one API is not enough

PDF parsing is genuinely different. You need to upload a binary file, handle OCR for scanned documents, and extract structured tables from unstructured layouts. No search API does this because it is a fundamentally different operation: processing a local file, not querying an external index.

Deep page scraping is also a separate concern. If you need the full HTML content of a specific URL, including JavaScript-rendered content, you need a scraping tool. Search APIs return snippets and metadata, not full page content.

The practical split

Three-tool stack instead of five: one multi-platform search API for web, YouTube, and Reddit discovery. One scraping tool for full-page content extraction when you need more than snippets. One PDF parser for document processing. Three integrations instead of five. Three sets of auth, three response formats, three failure modes.

Python

# The three-tool wiki architecture
def build_wiki_entry(topic):
    # 1. Multi-platform search (one API, three platforms)
    search_data = multi_search(topic)

    # 2. Deep scrape top web results for full content
    top_urls = [r['url'] for r in search_data.get('web', [])[:3]]
    # full_pages = [scrape(url) for url in top_urls]  # separate tool

    # 3. PDF parsing handled separately for uploaded documents
    # pdf_content = parse_pdf(file_path)  # separate tool

    # Combine and send to LLM
    context = {
        'web_results': search_data.get('web', []),
        'youtube_results': search_data.get('youtube', []),
        'reddit_discussions': search_data.get('reddit', []),
        # 'full_pages': full_pages,
    }
    return context

entry = build_wiki_entry('retrieval augmented generation')
print(f"Sources: {sum(len(v) for v in entry.values())} total")

When specialized tools win

Specialized tools win when you need depth over breadth. If your wiki is exclusively about YouTube content, a dedicated YouTube Data API integration gives you transcripts, comments, channel metadata, and playlist data. A search API gives you search results. The specialized tool goes deeper into one platform.

The decision framework

If you need discovery across platforms: one multi-platform search API. If you need depth within one platform: specialized tool. If you need to process local files: specialized parser. Most LLM wiki projects need discovery first and depth second. Start with one API, add specialized tools only when you hit its limits.