Tutorial

How to Build a Content Pipeline with Live Data

Feed real SERP data, Reddit opinions, and product prices into your AI content pipeline. Stop generating AI slop.

AI content without live data is slop. Articles about 'best APIs' with fabricated pricing and product comparisons with imaginary features fail because the LLM invented details. This tutorial builds a content pipeline that fetches real data before generating text: current prices from search results, user opinions from Reddit, and product details from Amazon. The output passes fact-checking because facts came from live sources.

Prerequisites

  • Python 3.10+
  • requests library installed
  • A Scavio API key from scavio.dev
  • An OpenAI API key for content generation

Walkthrough

Step 1: Build the multi-source data fetcher

Pull data from Google, Reddit, and Amazon via Scavio.

Python
import os, requests, json

SK = os.environ['SCAVIO_API_KEY']
OK = os.environ['OPENAI_API_KEY']
SH = {'x-api-key': SK, 'Content-Type': 'application/json'}

def fetch_google(q):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': q, 'country_code': 'us'}).json()
    return [{'title': r['title'], 'snippet': r.get('snippet', ''), 'url': r['link']}
            for r in data.get('organic_results', [])[:5]]

def fetch_reddit(q):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': q, 'platform': 'reddit', 'country_code': 'us'}).json()
    return [{'title': r['title'], 'snippet': r.get('snippet', '')}
            for r in data.get('organic_results', [])[:5]]

def fetch_products(q):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': q, 'platform': 'amazon', 'marketplace': 'US'}).json()
    return [{'title': p.get('title', ''), 'price': p.get('price', 'N/A'), 'rating': p.get('rating', '')}
            for p in data.get('products', [])[:5]]

Step 2: Assemble a research brief

Compile data from all sources into a structured brief for the LLM.

Python
def research(topic, product_query=None):
    brief = f'Topic: {topic}\n\n=== Google ===\n'
    g = fetch_google(topic)
    brief += '\n'.join(f"- {r['title']}: {r['snippet']}" for r in g)
    brief += '\n\n=== Reddit ===\n'
    r = fetch_reddit(topic)
    brief += '\n'.join(f"- {d['title']}: {d['snippet']}" for d in r)
    credits = 2
    if product_query:
        p = fetch_products(product_query)
        brief += '\n\n=== Amazon Products ===\n'
        brief += '\n'.join(f"- {x['title']}: {x['price']} ({x['rating']})" for x in p)
        credits += 1
    print(f'Research cost: ${credits * 0.005:.3f}')
    return brief

Step 3: Generate grounded content

Pass the brief to the LLM with strict instructions to only use provided data.

Python
def generate(topic, brief):
    resp = requests.post('https://api.openai.com/v1/chat/completions',
        headers={'Authorization': f'Bearer {OK}', 'Content-Type': 'application/json'},
        json={'model': 'gpt-4o', 'temperature': 0.3, 'messages': [
            {'role': 'system', 'content': 'Write based ONLY on the research brief. No fabricated stats. '
                'If data is missing, say so. Cite Reddit as "users report". Start with a direct answer.'},
            {'role': 'user', 'content': f'Write 600 words about: {topic}\n\n{brief}'}]})
    return resp.json()['choices'][0]['message']['content']

brief = research('best noise canceling headphones 2026', 'noise canceling headphones')
article = generate('best noise canceling headphones 2026', brief)
print(article[:300])

Step 4: Validate prices in generated content

Check that dollar amounts in the article appear in source data.

Python
import re

def validate(content, brief):
    source = brief.lower()
    prices = re.findall(r'\$[\d,.]+', content)
    issues = [p for p in prices if p.lower() not in source]
    if issues:
        print(f'WARNING: {len(issues)} unverified prices: {issues}')
    else:
        print('All prices verified against source data.')
    return issues

validate(article, brief)

Python Example

Python
import os, requests

SK = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': SK, 'Content-Type': 'application/json'}

def research(topic):
    g = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': topic, 'country_code': 'us'}).json().get('organic_results', [])[:3]
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': topic, 'platform': 'reddit', 'country_code': 'us'}).json().get('organic_results', [])[:3]
    print(f'{len(g)} Google + {len(r)} Reddit results. Cost: $0.010')
    return g, r

research('best serp api 2026')

JavaScript Example

JavaScript
const SK = process.env.SCAVIO_API_KEY;
const SH = { 'x-api-key': SK, 'Content-Type': 'application/json' };

async function research(topic) {
  const [g, r] = await Promise.all([
    fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: SH,
      body: JSON.stringify({ query: topic, country_code: 'us' })
    }).then(r => r.json()),
    fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: SH,
      body: JSON.stringify({ query: topic, platform: 'reddit', country_code: 'us' })
    }).then(r => r.json()),
  ]);
  console.log(`${(g.organic_results||[]).length}G + ${(r.organic_results||[]).length}R. Cost: $0.010`);
}
research('best serp api 2026').catch(console.error);

Expected Output

JSON
Research cost: $0.015

The Sony WH-1000XM5 remains the top noise canceling headphone
in 2026, priced at $298 on Amazon with a 4.6 rating. Users on
Reddit report the XM5 noise cancellation outperforms Bose QC
Ultra in airplane environments...

All prices verified against source data.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.10+. requests library installed. A Scavio API key from scavio.dev. An OpenAI API key for content generation. A Scavio API key gives you 250 free credits per month.

Yes. The free tier includes 250 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Feed real SERP data, Reddit opinions, and product prices into your AI content pipeline. Stop generating AI slop.