financetradingscraping

Reverse-Engineering Google Finance for Traders

Skip the proxy bills. SERP plus extract plus Reddit plus News produces a daily 50-ticker brief operation under $80/mo.

6 min read

A r/webscraping post on reverse-engineering Google Finance got 103 upvotes and 20 comments. Most replies were variants of "don't scrape, use the official APIs." The honest middle path: SERP plus extract endpoints get the same data shape with no proxy bills and no broken selectors.

What Google Finance Actually Returns

Per-ticker pages with current price, recent news, fundamentals card, and similar-companies section. The HTML is JS-rendered with rotating class names, which makes direct scraping a moving target. Most of the same data is also retrievable via SERP queries scoped tosite:google.com/finance.

Why Direct Scraping Fails

Google rotates class names on a cadence shorter than most scraper maintenance cycles. Cloudflare and Google's own bot mitigation flag aggressive scrapers within hours. The maintenance overhead outweighs the data value for most use cases.

The Indirect Pattern

Use SERP queries scoped to Google Finance pages. The SERP API returns the page metadata (title, snippet, link), and the snippet often contains the current price. For deeper data, follow the link with an extract call that returns the rendered markdown.

Python
import os, requests

API_KEY = os.environ["SCAVIO_API_KEY"]
H = {"x-api-key": API_KEY}

def ticker_brief(symbol: str) -> dict:
    serp = requests.post("https://api.scavio.dev/api/v1/google",
        headers=H,
        json={"query": f"site:google.com/finance {symbol}"}).json()
    pages = serp.get("organic_results", [])[:3]
    extracts = []
    for p in pages:
        ext = requests.post("https://api.scavio.dev/api/v1/extract",
            headers=H, json={"url": p["link"], "format": "markdown"}).json()
        extracts.append(ext.get("markdown", ""))
    return {"pages": pages, "extracts": extracts}

print(ticker_brief("AAPL"))

The News Layer

Google Finance pages link to news. The faster path: search Google News for the ticker and pull the news_results directly. Skip the finance page entirely for news; Finance just curates the same news Google News indexes.

Python
def ticker_news(symbol: str) -> list:
    r = requests.post("https://api.scavio.dev/api/v1/google",
        headers=H,
        json={"query": f"{symbol} stock news",
              "search_type": "news",
              "time_range": "d1"}).json()
    return r.get("news_results", [])[:10]

The Filings Cross-Check

SEC.gov is the authoritative source for 10-Q, 10-K, and 8-K filings. SERP scoped to site:sec.gov returns the filings index. Extract the filing pages for full text. This sidesteps the SEC EDGAR API entirely while still getting the primary-source documents.

The Reddit Sentiment Layer

Ticker-specific subreddits carry sentiment signal that drives short-term price action for retail-traded names. Scavio's Reddit endpoint returns thread structure including upvotes and comment counts, which are crude sentiment proxies but useful in aggregate.

Python
def reddit_sentiment(symbol: str) -> list:
    r = requests.post("https://api.scavio.dev/api/v1/reddit/search",
        headers=H, json={"query": symbol}).json()
    threads = r.get("posts", [])
    return sorted(threads, key=lambda t: t.get("score", 0), reverse=True)[:10]

The Daily Brief

Combine all four layers: SERP-derived finance page, news, filings, Reddit sentiment. Pass to Claude to compose a 200-word daily brief per ticker. The brief is structured: price, recent news, recent filings, Reddit narrative. Drop into a markdown email at 7 AM.

The Cost Profile

Per-ticker brief uses about 8 to 12 Scavio credits. A 50-ticker watchlist costs about 500 credits per day, or 15K per month. Fits the $30/mo plan with headroom. Plus $20 to $40 in LLM spend for the composition. Total under $80/mo for a 50-ticker daily brief operation.

What Direct Google Finance Scraping Wins

The current price as a single tick. SERP returns the price as part of the snippet, but with a small lag. For trading systems that need sub-second pricing, use a real broker API or a market-data vendor. For research briefs and end-of-day analysis, the SERP indirect pattern is fine.

What This Pattern Does Not Replace

Order routing. Real-time level-2 book. Tick-by-tick fundamentals. These need market data feeds, not SERP. The pattern is for research, sentiment, and narrative — the surfaces where SERP plus Reddit plus YouTube outperforms scraping for cost and reliability.