An r/IA_Italia post documented a multi-source AI-native cybersecurity news pipeline: 6 cron bursts/day, 9 sources, similarity-filter dedup, Gemini editor. This tutorial reconstructs the pattern using Scavio.
Prerequisites
- Python 3.10+
- Scavio API key
- Gemini or any LLM API key
Walkthrough
Step 1: Cron triggers (6 bursts/day)
Spread across the day to capture fresh news.
# crontab -e
# 0 6,10,12,15,18,21 * * * /usr/bin/python pipeline.pyStep 2: LLM generates 6 query phrases
Avoid topics already covered today.
import anthropic
client = anthropic.Anthropic()
def queries(covered):
msg = client.messages.create(model='claude-sonnet-4-6', max_tokens=300,
messages=[{'role':'user','content':f'Generate 6 news queries about cybersecurity. Avoid: {covered}'}])
return msg.content[0].text.split('\n')Step 3: Parallel SERP across surfaces
Scavio search returns Google News + organic.
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']
def news(q):
return requests.post('https://api.scavio.dev/api/v1/search',
headers={'x-api-key': API_KEY},
json={'query': q, 'search_type': 'news'}).json()Step 4: Similarity-filter dedup
Embed titles, drop near-duplicates against today's published set.
# Pseudocode — use sentence-transformers or a hosted embedding API.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
def dedup(new_items, published_today):
new_emb = model.encode([i['title'] for i in new_items])
pub_emb = model.encode([p['title'] for p in published_today])
# cosine similarity threshold 0.85 -> dropStep 5: LLM edits article with editorial angle
Internal links + SEO metadata.
def article(item, related):
msg = client.messages.create(model='claude-sonnet-4-6', max_tokens=600,
messages=[{'role':'user','content':f'Write a 300-word news article about {item["title"]} with an editorial angle. Source: {item["link"]}. Related: {related}.'}])
return msg.content[0].textStep 6: Throttle publication cadence
Avoid flooding the site.
import time
for article_text in articles:
publish(article_text)
time.sleep(60 * 5) # 5 min between postsPython Example
# See steps above. Daily run cost: 6 bursts × 6 queries = 36 credits = $0.15.JavaScript Example
// Same pattern in TS. Use the Vercel AI SDK or any LLM client for the edit step.Expected Output
About 20-30 published articles per day, deduplicated, with editorial angle. Daily 11:30 PM recap from the last 24 hours.