newsai-publicationautomation

Building an AI-Native News Publication in 2026

6 cron bursts/day, 9 sources, similarity-filter dedup, Gemini editor, daily 11:30 PM recap. Pattern from r/IA_Italia.

April 28, 2026

5 min read

An r/IA_Italia post documented an AI-native cybersecurity news publication: 6 cron-triggered query bursts/day, 9 sources, similarity-filter dedup, Gemini editor, throttled publication, and a daily 11:30 PM recap. Six minutes from query to published article, no human touch. The pattern is portable.

Why six bursts per day

Cybersecurity news breaks throughout the day. A single daily run misses the morning American breaks, the afternoon European announcements, and the late-evening incident disclosures. Six bursts at 6 AM, 10 AM, 12 PM, 3 PM, 6 PM, and 9 PM cover the zones reasonably without overwhelming the publication cadence.

Source design

Mix SERP-scoped queries (site:thehackernews.com 2026, site:bleepingcomputer.com 2026, site:krebsonsecurity.com, site:therecord.media 2026) with Reddit endpoints (r/cybersecurity, r/netsec). The SERP layer covers established publishers; Reddit catches stories that break in community threads first.

Python

import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
H = {'x-api-key': API_KEY}

SOURCES = [
    'site:thehackernews.com 2026',
    'site:bleepingcomputer.com 2026',
    'site:krebsonsecurity.com',
    'site:therecord.media 2026',
    'site:wired.com cybersecurity 2026'
]

def burst():
    items = []
    for q in SOURCES:
        r = requests.post('https://api.scavio.dev/api/v1/search',
            headers=H, json={'query': q, 'search_type': 'news'}).json()
        items += r.get('news_results', [])
    return items

The query-generation step

Each burst starts with an LLM that generates 6 fresh queries based on what hasn't been covered in the previous bursts. The avoid-list passed to the LLM keeps coverage diverse rather than repeating the same five stories all day.

The dedup step

Embed each candidate title with a small embedding model (Sentence Transformers all-MiniLM-L6-v2 is sufficient). Drop any title with cosine similarity above 0.85 against today's already-published items. The threshold catches near-duplicates without being so strict that legitimate follow-ups get killed.

The editor step

Each surviving item goes to Gemini or Claude. Prompt: "Write a 300-word news article about this story with a specific editorial angle, not just a summary. Add internal links to related articles already on the site." The internal-links part requires a small lookup against your published index.

The publication-cadence throttle

Don't publish 30 articles in 5 minutes. Spread them out at 5-minute intervals so the site feels like a newsroom instead of a flooding bot. Readers and crawlers respond better to steady cadence. RSS feed subscribers also get a more useful experience.

The 11:30 PM daily recap

Aggregate the day's published items into a single recap article. The recap is its own SEO surface and gives readers who missed the day a single entry point. The same pipeline pulls the recap from the database and asks Gemini to compose.

Cost math

6 bursts × 6 sources × 1 Scavio call = 36 Scavio credits/day = $0.15 at the Project tier. Plus LLM token spend for query generation, editorial composition, and the daily recap — typically $5-15/mo at this scale on Gemini or Claude. Total API and LLM spend: under $25/mo.

What this isn't

Not a replacement for original investigative reporting. Not a replacement for analysis pieces that require subject-matter expertise. The pattern works for news aggregation with editorial angle — the kind of coverage that fills 70% of a vertical news site.

Generalizing the pattern

Replace cybersecurity sources with sources for any vertical: AI research, climate tech, gaming, biotech, regulatory updates. The pipeline shape stays identical. Only the source list and the editorial-angle prompt change.