AI Trading Data: Reddit + SERP Pipeline
Combine Reddit sentiment from WSB/AusFinance with SERP news data for trading signals. Backtesting pipeline with position sizing based on signal strength.
A multi-source AI trading data pipeline combines Reddit sentiment from subreddits like r/wallstreetbets with SERP-sourced company news to generate trading signals. Layer 1 pulls Reddit discussion volume and tone via SERP API reddit search. Layer 2 pulls recent news articles for the same tickers. Neither layer alone is reliable -- sentiment without price context is noise, and news without community reaction is stale. The value is in the intersection.
Why Sentiment Alone Is Terrible
Every backtest on Reddit-only sentiment shows the same thing: it lags price action. By the time a ticker is trending on r/wallstreetbets, the move already happened. Pure sentiment strategies have a negative expected value after transaction costs. The useful signal is divergence: sentiment spiking while news is quiet (potential rumor worth investigating), or news breaking while sentiment is flat (market has not priced it in yet).
Layer 1: Reddit Sentiment via SERP
import requests, os
from datetime import datetime
API_KEY = os.environ["SCAVIO_API_KEY"]
BASE = "https://api.scavio.dev/api/v1/search"
def get_reddit_sentiment(ticker: str, subreddit: str = "wallstreetbets") -> dict:
"""Pull recent Reddit mentions for a ticker via SERP."""
resp = requests.post(BASE,
headers={"x-api-key": API_KEY},
json={
"query": f"{ticker} site:reddit.com/r/{subreddit}",
"num_results": 20,
}, timeout=15)
results = resp.json().get("results", [])
bullish = sum(1 for r in results if any(
w in r.get("title", "").lower()
for w in ["calls", "moon", "bullish", "buy", "long"]))
bearish = sum(1 for r in results if any(
w in r.get("title", "").lower()
for w in ["puts", "crash", "bearish", "sell", "short"]))
return {
"ticker": ticker,
"mentions": len(results),
"bullish": bullish,
"bearish": bearish,
"ratio": round(bullish / max(bearish, 1), 2),
"timestamp": datetime.now().isoformat(),
}Layer 2: Company News via SERP
def get_company_news(ticker: str, company_name: str) -> list:
"""Pull recent news articles for a company."""
resp = requests.post(BASE,
headers={"x-api-key": API_KEY},
json={
"query": f"{company_name} {ticker} stock news",
"num_results": 10,
}, timeout=15)
results = resp.json().get("results", [])
return [{
"title": r.get("title", ""),
"url": r.get("url", ""),
"snippet": r.get("snippet", ""),
"source": r.get("displayed_url", ""),
} for r in results]The Pipeline: Combine and Score
import json
WATCHLIST = [
{"ticker": "NVDA", "name": "NVIDIA"},
{"ticker": "TSLA", "name": "Tesla"},
{"ticker": "AAPL", "name": "Apple"},
{"ticker": "PLTR", "name": "Palantir"},
]
def run_pipeline(watchlist: list) -> list:
budget_used = 0
budget_limit = 100 # credits per run
signals = []
for stock in watchlist:
if budget_used >= budget_limit:
print(f"Budget exhausted at {budget_used} credits")
break
sentiment = get_reddit_sentiment(stock["ticker"])
budget_used += 1
news = get_company_news(stock["ticker"], stock["name"])
budget_used += 1
signal = {
**sentiment,
"news_count": len(news),
"news_headlines": [n["title"] for n in news[:3]],
"divergence": sentiment["mentions"] > 10 and len(news) < 3,
}
signals.append(signal)
print(f"Pipeline complete: {len(signals)} tickers, {budget_used} credits used")
return signals
signals = run_pipeline(WATCHLIST)
for s in signals:
flag = " ** DIVERGENCE" if s["divergence"] else ""
print(f"{s['ticker']}: {s['mentions']} mentions, "
f"ratio {s['ratio']}, {s['news_count']} news{flag}")Budget Tracking
Each SERP call costs 1 credit ($0.005). The pipeline above uses 2 credits per ticker (1 reddit + 1 news). A 20-ticker watchlist scanned 4x/day = 160 credits/day = 4,800 credits/month. That fits inside the $30/mo plan (7,000 credits). If you add intraday scans during market hours, budget accordingly: 20 tickers x 12 scans/day = 240 credits/day = 7,200/mo, which slightly exceeds the plan.
What This Is Not
- Not a trading bot. This generates signals for human review.
- Not real-time. SERP results have indexing delay. Reddit posts may take minutes to hours to appear.
- Not financial advice. Backtesting Reddit sentiment shows poor standalone returns. The edge is in combining sources and applying your own domain knowledge.
- Not a replacement for a broker API. You still need price data from Alpaca, Polygon, or similar for actual trade execution.
Making It Useful
The divergence flag is the most actionable output. When Reddit is buzzing about a ticker but there is no corresponding news, it is worth investigating whether insiders know something or the crowd is just memeing. When news breaks but Reddit is quiet, the market may not have reacted yet. Both cases warrant manual research, not automatic trades.
Run the pipeline on a cron schedule (GitHub Actions free tier works fine), dump results to a Slack channel, and review before market open. That is the workflow that actually helps -- not the one where you let the bot trade for you.