Tutorial

How to Build a Multi-Source News Aggregation Agent

Pipeline that runs 6 query bursts/day across 9 sources, dedupes, and emits AI-edited articles. Pattern from r/IA_Italia's cybersecurity build.

An r/IA_Italia post documented a multi-source AI-native cybersecurity news pipeline: 6 cron bursts/day, 9 sources, similarity-filter dedup, Gemini editor. This tutorial reconstructs the pattern using Scavio.

Prerequisites

  • Python 3.10+
  • Scavio API key
  • Gemini or any LLM API key

Walkthrough

Step 1: Cron triggers (6 bursts/day)

Spread across the day to capture fresh news.

Bash
# crontab -e
# 0 6,10,12,15,18,21 * * * /usr/bin/python pipeline.py

Step 2: LLM generates 6 query phrases

Avoid topics already covered today.

Python
import anthropic
client = anthropic.Anthropic()

def queries(covered):
    msg = client.messages.create(model='claude-sonnet-4-6', max_tokens=300,
        messages=[{'role':'user','content':f'Generate 6 news queries about cybersecurity. Avoid: {covered}'}])
    return msg.content[0].text.split('\n')

Step 3: Parallel SERP across surfaces

Scavio search returns Google News + organic.

Python
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def news(q):
    return requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': q, 'search_type': 'news'}).json()

Step 4: Similarity-filter dedup

Embed titles, drop near-duplicates against today's published set.

Python
# Pseudocode — use sentence-transformers or a hosted embedding API.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
def dedup(new_items, published_today):
    new_emb = model.encode([i['title'] for i in new_items])
    pub_emb = model.encode([p['title'] for p in published_today])
    # cosine similarity threshold 0.85 -> drop

Step 5: LLM edits article with editorial angle

Internal links + SEO metadata.

Python
def article(item, related):
    msg = client.messages.create(model='claude-sonnet-4-6', max_tokens=600,
        messages=[{'role':'user','content':f'Write a 300-word news article about {item["title"]} with an editorial angle. Source: {item["link"]}. Related: {related}.'}])
    return msg.content[0].text

Step 6: Throttle publication cadence

Avoid flooding the site.

Python
import time
for article_text in articles:
    publish(article_text)
    time.sleep(60 * 5)  # 5 min between posts

Python Example

Python
# See steps above. Daily run cost: 6 bursts × 6 queries = 36 credits = $0.15.

JavaScript Example

JavaScript
// Same pattern in TS. Use the Vercel AI SDK or any LLM client for the edit step.

Expected Output

JSON
About 20-30 published articles per day, deduplicated, with editorial angle. Daily 11:30 PM recap from the last 24 hours.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.10+. Scavio API key. Gemini or any LLM API key. A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Pipeline that runs 6 query bursts/day across 9 sources, dedupes, and emits AI-edited articles. Pattern from r/IA_Italia's cybersecurity build.