An r/IA_Italia post described an AI-native cybersecurity headline system with 9 sources, similarity-filter dedup, and a daily 11:30 PM recap. This tutorial walks the cybersecurity-specific pipeline.
Prerequisites
- Python 3.10+
- Scavio API key
- Gemini or Claude API key
Walkthrough
Step 1: Source list
Mix of SERP-driven and direct site queries.
SOURCES = [
('google_news', 'site:thehackernews.com 2026'),
('google_news', 'site:bleepingcomputer.com 2026'),
('google_news', 'site:krebsonsecurity.com'),
('google_news', 'site:therecord.media 2026'),
('google_news', 'site:wired.com cybersecurity'),
('reddit', 'cybersecurity'),
('reddit', 'netsec'),
]Step 2: Per-source query via Scavio
SERP for news sites; Reddit endpoint for subs.
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']
H = {'x-api-key': API_KEY}
def pull(kind, q):
if kind == 'google_news':
return requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q, 'search_type': 'news'}).json()
elif kind == 'reddit':
return requests.post('https://api.scavio.dev/api/v1/reddit/search', headers=H, json={'query': q}).json()Step 3: Similarity dedup
Embed titles, drop near-duplicates.
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')
def dedup(items):
out = []
for i in items:
if not any(util.cos_sim(model.encode(i['title']), model.encode(o['title'])).item() > 0.85 for o in out):
out.append(i)
return outStep 4: LLM editor with editorial angle
Gemini or Claude composes per item.
def article(item):
# Gemini call here, returning 300-word article with editorial angle
passStep 5: Daily 11:30 PM recap
Aggregate the day's published items.
def recap(today_items):
summary = '\n'.join(f'- {i["title"]}' for i in today_items)
# LLM composes daily wrap-upPython Example
# 6 cron bursts × 7 sources = 42 calls/day ≈ $0.18.JavaScript Example
// Same in TS.Expected Output
20-30 published articles per day across cybersecurity sources, deduped, edited. Daily recap at 11:30 PM.