tradingredditpipeline

AI Trading Data: Reddit + SERP Pipeline

Combine Reddit sentiment from WSB/AusFinance with SERP news data for trading signals. Backtesting pipeline with position sizing based on signal strength.

9 min

A multi-source AI trading data pipeline combines Reddit sentiment from subreddits like r/wallstreetbets with SERP-sourced company news to generate trading signals. Layer 1 pulls Reddit discussion volume and tone via SERP API reddit search. Layer 2 pulls recent news articles for the same tickers. Neither layer alone is reliable -- sentiment without price context is noise, and news without community reaction is stale. The value is in the intersection.

Why Sentiment Alone Is Terrible

Every backtest on Reddit-only sentiment shows the same thing: it lags price action. By the time a ticker is trending on r/wallstreetbets, the move already happened. Pure sentiment strategies have a negative expected value after transaction costs. The useful signal is divergence: sentiment spiking while news is quiet (potential rumor worth investigating), or news breaking while sentiment is flat (market has not priced it in yet).

Layer 1: Reddit Sentiment via SERP

Python
import requests, os
from datetime import datetime

API_KEY = os.environ["SCAVIO_API_KEY"]
BASE = "https://api.scavio.dev/api/v1/search"

def get_reddit_sentiment(ticker: str, subreddit: str = "wallstreetbets") -> dict:
    """Pull recent Reddit mentions for a ticker via SERP."""
    resp = requests.post(BASE,
        headers={"x-api-key": API_KEY},
        json={
            "query": f"{ticker} site:reddit.com/r/{subreddit}",
            "num_results": 20,
        }, timeout=15)
    results = resp.json().get("results", [])
    bullish = sum(1 for r in results if any(
        w in r.get("title", "").lower()
        for w in ["calls", "moon", "bullish", "buy", "long"]))
    bearish = sum(1 for r in results if any(
        w in r.get("title", "").lower()
        for w in ["puts", "crash", "bearish", "sell", "short"]))
    return {
        "ticker": ticker,
        "mentions": len(results),
        "bullish": bullish,
        "bearish": bearish,
        "ratio": round(bullish / max(bearish, 1), 2),
        "timestamp": datetime.now().isoformat(),
    }

Layer 2: Company News via SERP

Python
def get_company_news(ticker: str, company_name: str) -> list:
    """Pull recent news articles for a company."""
    resp = requests.post(BASE,
        headers={"x-api-key": API_KEY},
        json={
            "query": f"{company_name} {ticker} stock news",
            "num_results": 10,
        }, timeout=15)
    results = resp.json().get("results", [])
    return [{
        "title": r.get("title", ""),
        "url": r.get("url", ""),
        "snippet": r.get("snippet", ""),
        "source": r.get("displayed_url", ""),
    } for r in results]

The Pipeline: Combine and Score

Python
import json

WATCHLIST = [
    {"ticker": "NVDA", "name": "NVIDIA"},
    {"ticker": "TSLA", "name": "Tesla"},
    {"ticker": "AAPL", "name": "Apple"},
    {"ticker": "PLTR", "name": "Palantir"},
]

def run_pipeline(watchlist: list) -> list:
    budget_used = 0
    budget_limit = 100  # credits per run
    signals = []
    for stock in watchlist:
        if budget_used >= budget_limit:
            print(f"Budget exhausted at {budget_used} credits")
            break
        sentiment = get_reddit_sentiment(stock["ticker"])
        budget_used += 1
        news = get_company_news(stock["ticker"], stock["name"])
        budget_used += 1
        signal = {
            **sentiment,
            "news_count": len(news),
            "news_headlines": [n["title"] for n in news[:3]],
            "divergence": sentiment["mentions"] > 10 and len(news) < 3,
        }
        signals.append(signal)
    print(f"Pipeline complete: {len(signals)} tickers, {budget_used} credits used")
    return signals

signals = run_pipeline(WATCHLIST)
for s in signals:
    flag = " ** DIVERGENCE" if s["divergence"] else ""
    print(f"{s['ticker']}: {s['mentions']} mentions, "
          f"ratio {s['ratio']}, {s['news_count']} news{flag}")

Budget Tracking

Each SERP call costs 1 credit ($0.005). The pipeline above uses 2 credits per ticker (1 reddit + 1 news). A 20-ticker watchlist scanned 4x/day = 160 credits/day = 4,800 credits/month. That fits inside the $30/mo plan (7,000 credits). If you add intraday scans during market hours, budget accordingly: 20 tickers x 12 scans/day = 240 credits/day = 7,200/mo, which slightly exceeds the plan.

What This Is Not

  • Not a trading bot. This generates signals for human review.
  • Not real-time. SERP results have indexing delay. Reddit posts may take minutes to hours to appear.
  • Not financial advice. Backtesting Reddit sentiment shows poor standalone returns. The edge is in combining sources and applying your own domain knowledge.
  • Not a replacement for a broker API. You still need price data from Alpaca, Polygon, or similar for actual trade execution.

Making It Useful

The divergence flag is the most actionable output. When Reddit is buzzing about a ticker but there is no corresponding news, it is worth investigating whether insiders know something or the crowd is just memeing. When news breaks but Reddit is quiet, the market may not have reacted yet. Both cases warrant manual research, not automatic trades.

Run the pipeline on a cron schedule (GitHub Actions free tier works fine), dump results to a Slack channel, and review before market open. That is the workflow that actually helps -- not the one where you let the bot trade for you.