How long does this build a content pipeline with live data tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Python 3.10+. requests library installed. A Scavio API key from scavio.dev. An OpenAI API key for content generation. A Scavio API key gives you 50 free credits on signup.

Can I run this tutorial with the free tier?

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Content Pipeline with Live Search Data (2026)

AI content without live data is slop. Articles about 'best APIs' with fabricated pricing and product comparisons with imaginary features fail because the LLM invented details. This tutorial builds a content pipeline that fetches real data before generating text: current prices from search results, user opinions from Reddit, and product details from Amazon. The output passes fact-checking because facts came from live sources.

Prerequisites

Python 3.10+
requests library installed
A Scavio API key from scavio.dev
An OpenAI API key for content generation

Walkthrough

Step 1: Build the multi-source data fetcher

Pull data from Google, Reddit, and Amazon via Scavio.

Python

import os, requests, json

SK = os.environ['SCAVIO_API_KEY']
OK = os.environ['OPENAI_API_KEY']
SH = {'x-api-key': SK, 'Content-Type': 'application/json'}

def fetch_google(q):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': q, 'country_code': 'us'}).json()
    return [{'title': r['title'], 'snippet': r.get('snippet', ''), 'url': r['link']}
            for r in data.get('organic_results', [])[:5]]

def fetch_reddit(q):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': q, 'platform': 'reddit', 'country_code': 'us'}).json()
    return [{'title': r['title'], 'snippet': r.get('snippet', '')}
            for r in data.get('organic_results', [])[:5]]

def fetch_products(q):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': q, 'platform': 'amazon', 'marketplace': 'US'}).json()
    return [{'title': p.get('title', ''), 'price': p.get('price', 'N/A'), 'rating': p.get('rating', '')}
            for p in data.get('products', [])[:5]]

Step 2: Assemble a research brief

Compile data from all sources into a structured brief for the LLM.

Python

def research(topic, product_query=None):
    brief = f'Topic: {topic}\n\n=== Google ===\n'
    g = fetch_google(topic)
    brief += '\n'.join(f"- {r['title']}: {r['snippet']}" for r in g)
    brief += '\n\n=== Reddit ===\n'
    r = fetch_reddit(topic)
    brief += '\n'.join(f"- {d['title']}: {d['snippet']}" for d in r)
    credits = 2
    if product_query:
        p = fetch_products(product_query)
        brief += '\n\n=== Amazon Products ===\n'
        brief += '\n'.join(f"- {x['title']}: {x['price']} ({x['rating']})" for x in p)
        credits += 1
    print(f'Research cost: ${credits * 0.005:.3f}')
    return brief

Step 3: Generate grounded content

Pass the brief to the LLM with strict instructions to only use provided data.

Python

def generate(topic, brief):
    resp = requests.post('https://api.openai.com/v1/chat/completions',
        headers={'Authorization': f'Bearer {OK}', 'Content-Type': 'application/json'},
        json={'model': 'gpt-4o', 'temperature': 0.3, 'messages': [
            {'role': 'system', 'content': 'Write based ONLY on the research brief. No fabricated stats. '
                'If data is missing, say so. Cite Reddit as "users report". Start with a direct answer.'},
            {'role': 'user', 'content': f'Write 600 words about: {topic}\n\n{brief}'}]})
    return resp.json()['choices'][0]['message']['content']

brief = research('best noise canceling headphones 2026', 'noise canceling headphones')
article = generate('best noise canceling headphones 2026', brief)
print(article[:300])

Step 4: Validate prices in generated content

Check that dollar amounts in the article appear in source data.

Python

import re

def validate(content, brief):
    source = brief.lower()
    prices = re.findall(r'\$[\d,.]+', content)
    issues = [p for p in prices if p.lower() not in source]
    if issues:
        print(f'WARNING: {len(issues)} unverified prices: {issues}')
    else:
        print('All prices verified against source data.')
    return issues

validate(article, brief)

Python Example

Python

import os, requests

SK = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': SK, 'Content-Type': 'application/json'}

def research(topic):
    g = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': topic, 'country_code': 'us'}).json().get('organic_results', [])[:3]
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': topic, 'platform': 'reddit', 'country_code': 'us'}).json().get('organic_results', [])[:3]
    print(f'{len(g)} Google + {len(r)} Reddit results. Cost: $0.010')
    return g, r

research('best serp api 2026')

JavaScript Example

JavaScript

const SK = process.env.SCAVIO_API_KEY;
const SH = { 'x-api-key': SK, 'Content-Type': 'application/json' };

async function research(topic) {
  const [g, r] = await Promise.all([
    fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: SH,
      body: JSON.stringify({ query: topic, country_code: 'us' })
    }).then(r => r.json()),
    fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: SH,
      body: JSON.stringify({ query: topic, platform: 'reddit', country_code: 'us' })
    }).then(r => r.json()),
  ]);
  console.log(`${(g.organic_results||[]).length}G + ${(r.organic_results||[]).length}R. Cost: $0.010`);
}
research('best serp api 2026').catch(console.error);

Expected Output

JSON

Research cost: $0.015

The Sony WH-1000XM5 remains the top noise canceling headphone
in 2026, priced at $298 on Amazon with a 4.6 rating. Users on
Reddit report the XM5 noise cancellation outperforms Bose QC
Ultra in airplane environments...

All prices verified against source data.

Prerequisites

Python 3.10+
requests library installed
A Scavio API key from scavio.dev
An OpenAI API key for content generation

Walkthrough

Step 1: Build the multi-source data fetcher

Pull data from Google, Reddit, and Amazon via Scavio.

Python

import os, requests, json

SK = os.environ['SCAVIO_API_KEY']
OK = os.environ['OPENAI_API_KEY']
SH = {'x-api-key': SK, 'Content-Type': 'application/json'}

def fetch_google(q):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': q, 'country_code': 'us'}).json()
    return [{'title': r['title'], 'snippet': r.get('snippet', ''), 'url': r['link']}
            for r in data.get('organic_results', [])[:5]]

def fetch_reddit(q):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': q, 'platform': 'reddit', 'country_code': 'us'}).json()
    return [{'title': r['title'], 'snippet': r.get('snippet', '')}
            for r in data.get('organic_results', [])[:5]]

def fetch_products(q):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': q, 'platform': 'amazon', 'marketplace': 'US'}).json()
    return [{'title': p.get('title', ''), 'price': p.get('price', 'N/A'), 'rating': p.get('rating', '')}
            for p in data.get('products', [])[:5]]

Step 2: Assemble a research brief

Compile data from all sources into a structured brief for the LLM.

Python

def research(topic, product_query=None):
    brief = f'Topic: {topic}\n\n=== Google ===\n'
    g = fetch_google(topic)
    brief += '\n'.join(f"- {r['title']}: {r['snippet']}" for r in g)
    brief += '\n\n=== Reddit ===\n'
    r = fetch_reddit(topic)
    brief += '\n'.join(f"- {d['title']}: {d['snippet']}" for d in r)
    credits = 2
    if product_query:
        p = fetch_products(product_query)
        brief += '\n\n=== Amazon Products ===\n'
        brief += '\n'.join(f"- {x['title']}: {x['price']} ({x['rating']})" for x in p)
        credits += 1
    print(f'Research cost: ${credits * 0.005:.3f}')
    return brief

Step 3: Generate grounded content

Pass the brief to the LLM with strict instructions to only use provided data.

Python

def generate(topic, brief):
    resp = requests.post('https://api.openai.com/v1/chat/completions',
        headers={'Authorization': f'Bearer {OK}', 'Content-Type': 'application/json'},
        json={'model': 'gpt-4o', 'temperature': 0.3, 'messages': [
            {'role': 'system', 'content': 'Write based ONLY on the research brief. No fabricated stats. '
                'If data is missing, say so. Cite Reddit as "users report". Start with a direct answer.'},
            {'role': 'user', 'content': f'Write 600 words about: {topic}\n\n{brief}'}]})
    return resp.json()['choices'][0]['message']['content']

brief = research('best noise canceling headphones 2026', 'noise canceling headphones')
article = generate('best noise canceling headphones 2026', brief)
print(article[:300])

Step 4: Validate prices in generated content

Check that dollar amounts in the article appear in source data.

Python

import re

def validate(content, brief):
    source = brief.lower()
    prices = re.findall(r'\$[\d,.]+', content)
    issues = [p for p in prices if p.lower() not in source]
    if issues:
        print(f'WARNING: {len(issues)} unverified prices: {issues}')
    else:
        print('All prices verified against source data.')
    return issues

validate(article, brief)

Python Example

Python

import os, requests

SK = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': SK, 'Content-Type': 'application/json'}

def research(topic):
    g = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': topic, 'country_code': 'us'}).json().get('organic_results', [])[:3]
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': topic, 'platform': 'reddit', 'country_code': 'us'}).json().get('organic_results', [])[:3]
    print(f'{len(g)} Google + {len(r)} Reddit results. Cost: $0.010')
    return g, r

research('best serp api 2026')

JavaScript Example

JavaScript

const SK = process.env.SCAVIO_API_KEY;
const SH = { 'x-api-key': SK, 'Content-Type': 'application/json' };

async function research(topic) {
  const [g, r] = await Promise.all([
    fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: SH,
      body: JSON.stringify({ query: topic, country_code: 'us' })
    }).then(r => r.json()),
    fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: SH,
      body: JSON.stringify({ query: topic, platform: 'reddit', country_code: 'us' })
    }).then(r => r.json()),
  ]);
  console.log(`${(g.organic_results||[]).length}G + ${(r.organic_results||[]).length}R. Cost: $0.010`);
}
research('best serp api 2026').catch(console.error);

Expected Output

JSON

Research cost: $0.015

The Sony WH-1000XM5 remains the top noise canceling headphone
in 2026, priced at $298 on Amazon with a 4.6 rating. Users on
Reddit report the XM5 noise cancellation outperforms Bose QC
Ultra in airplane environments...

All prices verified against source data.

How to Build a Content Pipeline with Live Data

Prerequisites

Walkthrough

Step 1: Build the multi-source data fetcher

Step 2: Assemble a research brief

Step 3: Generate grounded content

Step 4: Validate prices in generated content

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this build a content pipeline with live data tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Best Real Time Search API in 2026

Best Search APIs for Pipeline Integration in 2026

Search API Provider Landscape (2026)

Vibe-Coded Data-Grounded App

Agentic SEO Content Operations

Free Search API Tier Comparison

Start Building

How to Build a Content Pipeline with Live Data

Prerequisites

Walkthrough

Step 1: Build the multi-source data fetcher

Step 2: Assemble a research brief

Step 3: Generate grounded content

Step 4: Validate prices in generated content

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this build a content pipeline with live data tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Best Real Time Search API in 2026

Best Search APIs for Pipeline Integration in 2026

Search API Provider Landscape (2026)

Vibe-Coded Data-Grounded App

Agentic SEO Content Operations

Free Search API Tier Comparison

Start Building