Workflow

YaCy Search with LLM Grounding Pipeline

Workflow that combines YaCy P2P search with Scavio API for reliable LLM grounding. Fallback from YaCy to Scavio when P2P results are thin.

Overview

YaCy provides free P2P search via yacy_expert with llama.cpp, but results are inconsistent at volume and miss recent content. This workflow uses YaCy for broad discovery, then validates and enriches results through Scavio's structured API. The LLM gets grounded with verified, fresh data regardless of which provider sourced it.

Trigger

Every LLM grounding request that needs web context.

Schedule

On-demand

Workflow Steps

1

Query YaCy P2P Index

Send the query to the local YaCy instance. Collect results with URLs, titles, and snippets.

2

Score YaCy Results

Check result count and freshness. If YaCy returns fewer than 3 results or results are older than 30 days, flag for enrichment.

3

Enrich via Scavio

For flagged queries, call Scavio search API to get fresh, structured results with AI Overview and Knowledge Graph.

4

Merge and Deduplicate

Combine YaCy and Scavio results, deduplicate by URL, rank by freshness and relevance.

5

Format for LLM Context

Format the merged results as a grounding context block for the LLM prompt.

Python Implementation

Python
import requests, os, json

API_KEY = os.environ["SCAVIO_API_KEY"]
H = {"x-api-key": API_KEY, "Content-Type": "application/json"}
YACY_URL = os.environ.get("YACY_URL", "http://localhost:8090")

def yacy_search(query: str) -> list:
    """Search local YaCy P2P index."""
    try:
        resp = requests.get(
            f"{YACY_URL}/yacysearch.json",
            params={"query": query, "maximumRecords": 10},
            timeout=5,
        )
        channels = resp.json().get("channels", [{}])
        return [{"title": r.get("title", ""), "url": r.get("link", ""), "snippet": r.get("description", "")}
                for r in channels[0].get("items", [])]
    except Exception:
        return []

def scavio_search(query: str) -> list:
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers=H,
        json={"query": query, "country_code": "us"},
        timeout=10,
    )
    data = resp.json()
    return [{"title": r.get("title", ""), "url": r.get("link", ""), "snippet": r.get("snippet", "")}
            for r in data.get("organic_results", [])]

def grounding_pipeline(query: str) -> str:
    yacy_results = yacy_search(query)
    if len(yacy_results) < 3:
        scavio_results = scavio_search(query)
        all_results = yacy_results + scavio_results
    else:
        all_results = yacy_results
    # Deduplicate by URL
    seen = set()
    unique = [r for r in all_results if r["url"] not in seen and not seen.add(r["url"])]
    # Format as LLM context
    context = "\n\n".join(f"[{r['title']}]({r['url']}): {r['snippet']}" for r in unique[:8])
    return context

context = grounding_pipeline("transformer architecture attention mechanism")
print(f"Grounding context ({len(context)} chars):\n{context[:500]}")

JavaScript Implementation

JavaScript
const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
const YACY_URL = process.env.YACY_URL || 'http://localhost:8090';

async function yacySearch(query) {
  try {
    const r = await fetch(YACY_URL+'/yacysearch.json?query='+encodeURIComponent(query)+'&maximumRecords=10');
    const channels = (await r.json()).channels || [{}];
    return (channels[0].items||[]).map(r=>({title:r.title||'', url:r.link||'', snippet:r.description||''}));
  } catch { return []; }
}

async function scavioSearch(query) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {method:'POST', headers:H, body:JSON.stringify({query, country_code:'us'})});
  return ((await r.json()).organic_results||[]).map(r=>({title:r.title||'', url:r.link||'', snippet:r.snippet||''}));
}

async function groundingPipeline(query) {
  let results = await yacySearch(query);
  if (results.length < 3) results = results.concat(await scavioSearch(query));
  const seen = new Set();
  const unique = results.filter(r=>{ if (seen.has(r.url)) return false; seen.add(r.url); return true; });
  return unique.slice(0,8).map(r=>'['+r.title+']('+r.url+'): '+r.snippet).join('\n\n');
}

const ctx = await groundingPipeline('transformer architecture attention mechanism');
console.log('Grounding context ('+ctx.length+' chars):\n'+ctx.slice(0,500));

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Frequently Asked Questions

YaCy provides free P2P search via yacy_expert with llama.cpp, but results are inconsistent at volume and miss recent content. This workflow uses YaCy for broad discovery, then validates and enriches results through Scavio's structured API. The LLM gets grounded with verified, fresh data regardless of which provider sourced it.

This workflow uses a every llm grounding request that needs web context.. On-demand.

This workflow uses the following Scavio platforms: google. Each platform is called via the same unified API endpoint.

Yes. Scavio's free tier includes 250 credits per month with no credit card required. That is enough to test and validate this workflow before scaling it.

YaCy Search with LLM Grounding Pipeline

Workflow that combines YaCy P2P search with Scavio API for reliable LLM grounding. Fallback from YaCy to Scavio when P2P results are thin.