How long does this build a multi-source news aggregator with apis tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Python 3.10 or higher. requests library installed. A Scavio API key. Topics or keywords to aggregate news for. A Scavio API key gives you 250 free credits per month.

Can I run this tutorial with the free tier?

Yes. The free tier includes 250 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Build a Multi-Source News Aggregator API (2026)

A news aggregator that combines Google News articles, Reddit discussions, and YouTube videos for any topic gives a more complete picture than any single source. Google News provides editorial coverage, Reddit surfaces community reactions, and YouTube captures video commentary. This tutorial builds a multi-source aggregator using the Scavio API that queries all three platforms, normalizes the results into a common format, deduplicates by URL, and ranks by a combined relevance score.

Prerequisites

Python 3.10 or higher
requests library installed
A Scavio API key
Topics or keywords to aggregate news for

Walkthrough

Step 1: Query all three sources

Fetch Google News results, Reddit posts, and YouTube videos for the same topic using the Scavio API.

Python

from concurrent.futures import ThreadPoolExecutor

def fetch_google_news(topic: str) -> list[dict]:
    r = requests.post(ENDPOINT, headers={"x-api-key": API_KEY},
                      json={"query": f"{topic} news", "country_code": "us"})
    r.raise_for_status()
    return r.json().get("news_results", r.json().get("organic_results", []))

def fetch_reddit(topic: str) -> list[dict]:
    r = requests.post(ENDPOINT, headers={"x-api-key": API_KEY},
                      json={"platform": "reddit", "query": topic})
    r.raise_for_status()
    return r.json().get("data", {}).get("posts", [])

def fetch_youtube(topic: str) -> list[dict]:
    r = requests.post(ENDPOINT, headers={"x-api-key": API_KEY},
                      json={"platform": "youtube", "query": topic})
    r.raise_for_status()
    return r.json().get("videos", [])

Step 2: Normalize results into a common format

Transform results from each platform into a uniform structure with title, url, source, and snippet.

Python

def normalize_google(item: dict) -> dict:
    return {"title": item.get("title"), "url": item.get("link"), "source": "google", "snippet": item.get("snippet", ""), "date": item.get("date")}

def normalize_reddit(post: dict) -> dict:
    return {"title": post.get("title"), "url": post.get("url"), "source": "reddit", "snippet": f"r/{post.get('subreddit', '')}", "date": post.get("timestamp")}

def normalize_youtube(video: dict) -> dict:
    return {"title": video.get("title"), "url": video.get("url"), "source": "youtube", "snippet": video.get("description", "")[:100], "date": video.get("published_at")}

Step 3: Deduplicate by URL

Remove duplicate entries that appear across sources using URL as the deduplication key.

Python

def deduplicate(items: list[dict]) -> list[dict]:
    seen = {}
    for item in items:
        url = item.get("url", "")
        if url and url not in seen:
            seen[url] = item
    return list(seen.values())

Step 4: Output the aggregated feed

Print the combined, deduplicated feed grouped by source for easy consumption.

Python

def aggregate(topic: str) -> list[dict]:
    with ThreadPoolExecutor(max_workers=3) as ex:
        g = ex.submit(fetch_google_news, topic)
        r = ex.submit(fetch_reddit, topic)
        y = ex.submit(fetch_youtube, topic)
    items = [normalize_google(i) for i in g.result()[:5]]
    items += [normalize_reddit(i) for i in r.result()[:5]]
    items += [normalize_youtube(i) for i in y.result()[:5]]
    return deduplicate(items)

Python Example

Python

import os
import requests
from concurrent.futures import ThreadPoolExecutor

API_KEY = os.environ.get("SCAVIO_API_KEY", "your_scavio_api_key")
ENDPOINT = "https://api.scavio.dev/api/v1/search"

def fetch(body: dict) -> dict:
    r = requests.post(ENDPOINT, headers={"x-api-key": API_KEY}, json=body)
    r.raise_for_status()
    return r.json()

def aggregate(topic: str) -> list[dict]:
    with ThreadPoolExecutor(max_workers=3) as ex:
        g = ex.submit(fetch, {"query": f"{topic} news", "country_code": "us"})
        r = ex.submit(fetch, {"platform": "reddit", "query": topic})
        y = ex.submit(fetch, {"platform": "youtube", "query": topic})
    items = []
    for i in (g.result().get("news_results") or g.result().get("organic_results", []))[:5]:
        items.append({"src": "google", "title": i.get("title"), "url": i.get("link")})
    for p in r.result().get("data", {}).get("posts", [])[:5]:
        items.append({"src": "reddit", "title": p.get("title"), "url": p.get("url")})
    for v in y.result().get("videos", [])[:5]:
        items.append({"src": "youtube", "title": v.get("title"), "url": v.get("url")})
    return items

if __name__ == "__main__":
    for item in aggregate("AI agents 2026"):
        print(f"[{item['src']:>7}] {item['title'][:60]}")

JavaScript Example

JavaScript

const API_KEY = process.env.SCAVIO_API_KEY || "your_scavio_api_key";
const ENDPOINT = "https://api.scavio.dev/api/v1/search";

async function call(body) {
  const res = await fetch(ENDPOINT, {
    method: "POST",
    headers: { "x-api-key": API_KEY, "Content-Type": "application/json" },
    body: JSON.stringify(body)
  });
  return res.json();
}

async function aggregate(topic) {
  const [g, r, y] = await Promise.all([
    call({ query: `${topic} news`, country_code: "us" }),
    call({ platform: "reddit", query: topic }),
    call({ platform: "youtube", query: topic })
  ]);
  const items = [];
  (g.news_results || g.organic_results || []).slice(0, 5).forEach(i => items.push({ src: "google", title: i.title }));
  (r.data?.posts || []).slice(0, 5).forEach(p => items.push({ src: "reddit", title: p.title }));
  (y.videos || []).slice(0, 5).forEach(v => items.push({ src: "youtube", title: v.title }));
  return items;
}

aggregate("AI agents 2026").then(items => items.forEach(i => console.log(`[${i.src}] ${i.title}`))).catch(console.error);

Expected Output

JSON

[ google] OpenAI Launches Agent Building Platform for Enterprise
[ google] Anthropic Expands Claude Agent Capabilities
[ reddit] Has anyone deployed AI agents in production yet?
[ reddit] Best frameworks for building AI agents in 2026
[youtube] I Built an AI Agent That Runs My Business
[youtube] AI Agents Explained - Complete 2026 Guide

Prerequisites

Python 3.10 or higher
requests library installed
A Scavio API key
Topics or keywords to aggregate news for

Walkthrough

Step 1: Query all three sources

Fetch Google News results, Reddit posts, and YouTube videos for the same topic using the Scavio API.

Python

from concurrent.futures import ThreadPoolExecutor

def fetch_google_news(topic: str) -> list[dict]:
    r = requests.post(ENDPOINT, headers={"x-api-key": API_KEY},
                      json={"query": f"{topic} news", "country_code": "us"})
    r.raise_for_status()
    return r.json().get("news_results", r.json().get("organic_results", []))

def fetch_reddit(topic: str) -> list[dict]:
    r = requests.post(ENDPOINT, headers={"x-api-key": API_KEY},
                      json={"platform": "reddit", "query": topic})
    r.raise_for_status()
    return r.json().get("data", {}).get("posts", [])

def fetch_youtube(topic: str) -> list[dict]:
    r = requests.post(ENDPOINT, headers={"x-api-key": API_KEY},
                      json={"platform": "youtube", "query": topic})
    r.raise_for_status()
    return r.json().get("videos", [])

Step 2: Normalize results into a common format

Transform results from each platform into a uniform structure with title, url, source, and snippet.

Python

def normalize_google(item: dict) -> dict:
    return {"title": item.get("title"), "url": item.get("link"), "source": "google", "snippet": item.get("snippet", ""), "date": item.get("date")}

def normalize_reddit(post: dict) -> dict:
    return {"title": post.get("title"), "url": post.get("url"), "source": "reddit", "snippet": f"r/{post.get('subreddit', '')}", "date": post.get("timestamp")}

def normalize_youtube(video: dict) -> dict:
    return {"title": video.get("title"), "url": video.get("url"), "source": "youtube", "snippet": video.get("description", "")[:100], "date": video.get("published_at")}

Step 3: Deduplicate by URL

Remove duplicate entries that appear across sources using URL as the deduplication key.

Python

def deduplicate(items: list[dict]) -> list[dict]:
    seen = {}
    for item in items:
        url = item.get("url", "")
        if url and url not in seen:
            seen[url] = item
    return list(seen.values())

Step 4: Output the aggregated feed

Print the combined, deduplicated feed grouped by source for easy consumption.

Python

def aggregate(topic: str) -> list[dict]:
    with ThreadPoolExecutor(max_workers=3) as ex:
        g = ex.submit(fetch_google_news, topic)
        r = ex.submit(fetch_reddit, topic)
        y = ex.submit(fetch_youtube, topic)
    items = [normalize_google(i) for i in g.result()[:5]]
    items += [normalize_reddit(i) for i in r.result()[:5]]
    items += [normalize_youtube(i) for i in y.result()[:5]]
    return deduplicate(items)

Python Example

Python

import os
import requests
from concurrent.futures import ThreadPoolExecutor

API_KEY = os.environ.get("SCAVIO_API_KEY", "your_scavio_api_key")
ENDPOINT = "https://api.scavio.dev/api/v1/search"

def fetch(body: dict) -> dict:
    r = requests.post(ENDPOINT, headers={"x-api-key": API_KEY}, json=body)
    r.raise_for_status()
    return r.json()

def aggregate(topic: str) -> list[dict]:
    with ThreadPoolExecutor(max_workers=3) as ex:
        g = ex.submit(fetch, {"query": f"{topic} news", "country_code": "us"})
        r = ex.submit(fetch, {"platform": "reddit", "query": topic})
        y = ex.submit(fetch, {"platform": "youtube", "query": topic})
    items = []
    for i in (g.result().get("news_results") or g.result().get("organic_results", []))[:5]:
        items.append({"src": "google", "title": i.get("title"), "url": i.get("link")})
    for p in r.result().get("data", {}).get("posts", [])[:5]:
        items.append({"src": "reddit", "title": p.get("title"), "url": p.get("url")})
    for v in y.result().get("videos", [])[:5]:
        items.append({"src": "youtube", "title": v.get("title"), "url": v.get("url")})
    return items

if __name__ == "__main__":
    for item in aggregate("AI agents 2026"):
        print(f"[{item['src']:>7}] {item['title'][:60]}")

JavaScript Example

JavaScript

const API_KEY = process.env.SCAVIO_API_KEY || "your_scavio_api_key";
const ENDPOINT = "https://api.scavio.dev/api/v1/search";

async function call(body) {
  const res = await fetch(ENDPOINT, {
    method: "POST",
    headers: { "x-api-key": API_KEY, "Content-Type": "application/json" },
    body: JSON.stringify(body)
  });
  return res.json();
}

async function aggregate(topic) {
  const [g, r, y] = await Promise.all([
    call({ query: `${topic} news`, country_code: "us" }),
    call({ platform: "reddit", query: topic }),
    call({ platform: "youtube", query: topic })
  ]);
  const items = [];
  (g.news_results || g.organic_results || []).slice(0, 5).forEach(i => items.push({ src: "google", title: i.title }));
  (r.data?.posts || []).slice(0, 5).forEach(p => items.push({ src: "reddit", title: p.title }));
  (y.videos || []).slice(0, 5).forEach(v => items.push({ src: "youtube", title: v.title }));
  return items;
}

aggregate("AI agents 2026").then(items => items.forEach(i => console.log(`[${i.src}] ${i.title}`))).catch(console.error);

Expected Output

JSON

[ google] OpenAI Launches Agent Building Platform for Enterprise
[ google] Anthropic Expands Claude Agent Capabilities
[ reddit] Has anyone deployed AI agents in production yet?
[ reddit] Best frameworks for building AI agents in 2026
[youtube] I Built an AI Agent That Runs My Business
[youtube] AI Agents Explained - Complete 2026 Guide

How to Build a Multi-Source News Aggregator with APIs

Prerequisites

Walkthrough

Step 1: Query all three sources

Step 2: Normalize results into a common format

Step 3: Deduplicate by URL

Step 4: Output the aggregated feed

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this build a multi-source news aggregator with apis tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

AI Trading Multi-Source Data Aggregation

Best Google Maps Business Data APIs (May 2026)

Multi-Source Data Aggregation via Single API

Best Reddit APIs for Stock Sentiment Data in 2026

Find YouTube Influencers via API Instead of Scraping

Google Maps Places API Cost

Start Building

How to Build a Multi-Source News Aggregator with APIs

Prerequisites

Walkthrough

Step 1: Query all three sources

Step 2: Normalize results into a common format

Step 3: Deduplicate by URL

Step 4: Output the aggregated feed

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this build a multi-source news aggregator with apis tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

AI Trading Multi-Source Data Aggregation

Best Google Maps Business Data APIs (May 2026)

Multi-Source Data Aggregation via Single API

Best Reddit APIs for Stock Sentiment Data in 2026

Find YouTube Influencers via API Instead of Scraping

Google Maps Places API Cost

Start Building