Tutorial

How to Build a Cybersecurity News Pipeline with AI

Multi-source pipeline: 9 cybersecurity sources, dedup, AI editor. Pattern from r/IA_Italia's AI-native news publication.

An r/IA_Italia post described an AI-native cybersecurity headline system with 9 sources, similarity-filter dedup, and a daily 11:30 PM recap. This tutorial walks the cybersecurity-specific pipeline.

Prerequisites

  • Python 3.10+
  • Scavio API key
  • Gemini or Claude API key

Walkthrough

Step 1: Source list

Mix of SERP-driven and direct site queries.

Python
SOURCES = [
  ('google_news', 'site:thehackernews.com 2026'),
  ('google_news', 'site:bleepingcomputer.com 2026'),
  ('google_news', 'site:krebsonsecurity.com'),
  ('google_news', 'site:therecord.media 2026'),
  ('google_news', 'site:wired.com cybersecurity'),
  ('reddit', 'cybersecurity'),
  ('reddit', 'netsec'),
]

Step 2: Per-source query via Scavio

SERP for news sites; Reddit endpoint for subs.

Python
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']
H = {'x-api-key': API_KEY}

def pull(kind, q):
    if kind == 'google_news':
        return requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q, 'search_type': 'news'}).json()
    elif kind == 'reddit':
        return requests.post('https://api.scavio.dev/api/v1/reddit/search', headers=H, json={'query': q}).json()

Step 3: Similarity dedup

Embed titles, drop near-duplicates.

Python
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')

def dedup(items):
    out = []
    for i in items:
        if not any(util.cos_sim(model.encode(i['title']), model.encode(o['title'])).item() > 0.85 for o in out):
            out.append(i)
    return out

Step 4: LLM editor with editorial angle

Gemini or Claude composes per item.

Python
def article(item):
    # Gemini call here, returning 300-word article with editorial angle
    pass

Step 5: Daily 11:30 PM recap

Aggregate the day's published items.

Python
def recap(today_items):
    summary = '\n'.join(f'- {i["title"]}' for i in today_items)
    # LLM composes daily wrap-up

Python Example

Python
# 6 cron bursts × 7 sources = 42 calls/day ≈ $0.18.

JavaScript Example

JavaScript
// Same in TS.

Expected Output

JSON
20-30 published articles per day across cybersecurity sources, deduped, edited. Daily recap at 11:30 PM.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.10+. Scavio API key. Gemini or Claude API key. A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Multi-source pipeline: 9 cybersecurity sources, dedup, AI editor. Pattern from r/IA_Italia's AI-native news publication.