Building an AI-Native News Publication in 2026
6 cron bursts/day, 9 sources, similarity-filter dedup, Gemini editor, daily 11:30 PM recap. Pattern from r/IA_Italia.
An r/IA_Italia post documented an AI-native cybersecurity news publication: 6 cron-triggered query bursts/day, 9 sources, similarity-filter dedup, Gemini editor, throttled publication, and a daily 11:30 PM recap. Six minutes from query to published article, no human touch. The pattern is portable.
Why six bursts per day
Cybersecurity news breaks throughout the day. A single daily run misses the morning American breaks, the afternoon European announcements, and the late-evening incident disclosures. Six bursts at 6 AM, 10 AM, 12 PM, 3 PM, 6 PM, and 9 PM cover the zones reasonably without overwhelming the publication cadence.
Source design
Mix SERP-scoped queries (site:thehackernews.com 2026, site:bleepingcomputer.com 2026, site:krebsonsecurity.com, site:therecord.media 2026) with Reddit endpoints (r/cybersecurity, r/netsec). The SERP layer covers established publishers; Reddit catches stories that break in community threads first.
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
H = {'x-api-key': API_KEY}
SOURCES = [
'site:thehackernews.com 2026',
'site:bleepingcomputer.com 2026',
'site:krebsonsecurity.com',
'site:therecord.media 2026',
'site:wired.com cybersecurity 2026'
]
def burst():
items = []
for q in SOURCES:
r = requests.post('https://api.scavio.dev/api/v1/search',
headers=H, json={'query': q, 'search_type': 'news'}).json()
items += r.get('news_results', [])
return itemsThe query-generation step
Each burst starts with an LLM that generates 6 fresh queries based on what hasn't been covered in the previous bursts. The avoid-list passed to the LLM keeps coverage diverse rather than repeating the same five stories all day.
The dedup step
Embed each candidate title with a small embedding model (Sentence Transformers all-MiniLM-L6-v2 is sufficient). Drop any title with cosine similarity above 0.85 against today's already-published items. The threshold catches near-duplicates without being so strict that legitimate follow-ups get killed.
The editor step
Each surviving item goes to Gemini or Claude. Prompt: "Write a 300-word news article about this story with a specific editorial angle, not just a summary. Add internal links to related articles already on the site." The internal-links part requires a small lookup against your published index.
The publication-cadence throttle
Don't publish 30 articles in 5 minutes. Spread them out at 5-minute intervals so the site feels like a newsroom instead of a flooding bot. Readers and crawlers respond better to steady cadence. RSS feed subscribers also get a more useful experience.
The 11:30 PM daily recap
Aggregate the day's published items into a single recap article. The recap is its own SEO surface and gives readers who missed the day a single entry point. The same pipeline pulls the recap from the database and asks Gemini to compose.
Cost math
6 bursts × 6 sources × 1 Scavio call = 36 Scavio credits/day = $0.15 at the Project tier. Plus LLM token spend for query generation, editorial composition, and the daily recap — typically $5-15/mo at this scale on Gemini or Claude. Total API and LLM spend: under $25/mo.
What this isn't
Not a replacement for original investigative reporting. Not a replacement for analysis pieces that require subject-matter expertise. The pattern works for news aggregation with editorial angle — the kind of coverage that fills 70% of a vertical news site.
Generalizing the pattern
Replace cybersecurity sources with sources for any vertical: AI research, climate tech, gaming, biotech, regulatory updates. The pipeline shape stays identical. Only the source list and the editorial-angle prompt change.