Can I try this with the free tier?

Yes. Scavio's free tier includes 250 credits per month with no credit card required. That is enough to validate this solution in your workflow.

News Sentiment Corpus Builder

The Problem

Building ML training sets for sentiment analysis requires large volumes of labeled news text. Manually collecting and labeling news articles is expensive and slow. Existing news APIs charge per article and lack the structured metadata (source, date, topic) needed for clean training sets. Teams end up with small, biased corpora that do not generalize well across topics and sources.

The Scavio Solution

Build an automated corpus builder using Scavio's Google News search. Query news-related keywords, extract titles and snippets from organic results, and use the structured metadata (source domain, publication date, position) as features. The pipeline produces clean training sets with thousands of labeled examples at search API prices instead of news API prices.

Before

Before: A data science team used a dedicated news API at $0.05/article to build a sentiment corpus. Building a 10K article training set cost $500 and took 2 weeks of curation. The corpus was biased toward English-language sources from the news API's limited index.

After

After: The same team uses Scavio Google News queries to build corpora at $0.005/query. Each query returns 10+ results, so 1K queries ($5) produce a 10K+ snippet corpus. Monthly corpus refresh costs $5 instead of $500. Build time dropped from 2 weeks to 2 hours.

Who It Is For

Data scientists and ML engineers building sentiment analysis models who need affordable, structured news corpora. NLP researchers collecting training data at scale.

Key Benefits

Build 10K+ snippet corpora for $5 instead of $500 via news APIs
Structured metadata (source, date, position) included with every result
Refresh corpora monthly at negligible cost for model retraining
Google News index covers broader sources than dedicated news APIs
Reduce corpus build time from weeks to hours with automated pipelines

Python Example

Python

import requests
import json
from datetime import date

API_KEY = "your_scavio_api_key"

def build_corpus(topics: list, results_per_topic: int = 10) -> list:
    corpus = []
    for topic in topics:
        r = requests.post(
            "https://api.scavio.dev/api/v1/search",
            headers={"x-api-key": API_KEY},
            json={"platform": "google", "query": f"{topic} news 2026"},
            timeout=10,
        )
        data = r.json()
        for item in data.get("organic", [])[:results_per_topic]:
            corpus.append({
                "text": f"{item["title"]}. {item.get("snippet", "")}",
                "source": item.get("link", "").split("/")[2] if "/" in item.get("link", "") else "",
                "topic": topic,
                "date_collected": str(date.today()),
                "position": item.get("position"),
            })
    return corpus

topics = ["artificial intelligence regulation", "climate tech funding", "semiconductor shortage"]
corpus = build_corpus(topics)
print(f"Corpus size: {len(corpus)} entries")
with open("news_corpus.json", "w") as f:
    json.dump(corpus, f, indent=2)

JavaScript Example

JavaScript

const API_KEY = "your_scavio_api_key";

async function buildCorpus(topics, resultsPerTopic = 10) {
  const corpus = [];
  for (const topic of topics) {
    const res = await fetch("https://api.scavio.dev/api/v1/search", {
      method: "POST",
      headers: { "x-api-key": API_KEY, "content-type": "application/json" },
      body: JSON.stringify({ platform: "google", query: `${topic} news 2026` }),
    });
    const data = await res.json();
    for (const item of (data.organic || []).slice(0, resultsPerTopic)) {
      corpus.push({
        text: `${item.title}. ${item.snippet || ""}`,
        source: item.link ? new URL(item.link).hostname : "",
        topic,
        dateCollected: new Date().toISOString().split("T")[0],
        position: item.position,
      });
    }
  }
  return corpus;
}

const topics = ["artificial intelligence regulation", "climate tech funding", "semiconductor shortage"];
const corpus = await buildCorpus(topics);
console.log(`Corpus size: ${corpus.length} entries`);

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

News Sentiment Corpus Builder

The Problem

The Scavio Solution

Before

After

Who It Is For

Key Benefits

Python Example

JavaScript Example

Platforms Used

Google

Frequently Asked Questions

What problem does Scavio solve here?

How does Scavio solve it?

Who is this for?

Can I try this with the free tier?

News Sentiment Corpus Builder