redditstocksai
Reddit Stock Sentiment AI Pipeline
Build AI pipeline scanning Reddit for stock sentiment. Scavio Reddit API for thread data, LLM for sentiment classification. Cost: ~$0.25/day for 50 tickers.
9 min
Reddit's WallStreetBets, r/stocks, and r/investing contain real-time retail sentiment that moves markets. Building an AI pipeline to scan these subreddits for sentiment costs approximately $0.25/day using a structured Reddit API for thread data and an LLM for sentiment classification. No scraping, no proxies, no Reddit API rate limits.
Pipeline architecture
- Fetch Reddit threads mentioning target tickers via search API
- Extract thread titles, body text, upvote counts, and comment counts
- Run sentiment classification on each thread (bullish / bearish / neutral)
- Aggregate daily sentiment scores per ticker
- Alert on sentiment spikes (sudden shift from neutral to strongly bullish/bearish)
Step 1: collect Reddit data
Python
import requests, os
H = {"x-api-key": os.environ["SCAVIO_API_KEY"]}
def get_reddit_mentions(ticker: str):
"""Fetch Reddit threads mentioning a stock ticker."""
queries = [
f"{ticker} stock reddit",
f"{ticker} DD wallstreetbets",
f"{ticker} analysis r/stocks",
]
threads = []
for q in queries:
resp = requests.post("https://api.scavio.dev/api/v1/search",
headers=H, json={"query": q, "platform": "reddit"})
for r in resp.json().get("organic_results", []):
threads.append({
"title": r.get("title", ""),
"snippet": r.get("snippet", ""),
"url": r.get("link", ""),
"date": r.get("date", ""),
})
return threads
threads = get_reddit_mentions("NVDA")
print(f"Found {len(threads)} threads mentioning NVDA")Step 2: sentiment classification
Python
from openai import OpenAI
client = OpenAI()
def classify_sentiment(threads: list) -> list:
"""Classify each thread as bullish, bearish, or neutral."""
results = []
for t in threads:
text = f"{t['title']}. {t['snippet']}"
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f"Classify this Reddit post about a stock as bullish, bearish, or neutral. Reply with one word only.\n\n{text}"
}],
max_tokens=5)
sentiment = resp.choices[0].message.content.strip().lower()
results.append({**t, "sentiment": sentiment})
return resultsStep 3: aggregate and alert
Python
from collections import Counter
def daily_sentiment(ticker: str):
threads = get_reddit_mentions(ticker)
classified = classify_sentiment(threads)
counts = Counter(t["sentiment"] for t in classified)
total = len(classified)
score = {
"ticker": ticker,
"total_threads": total,
"bullish": counts.get("bullish", 0),
"bearish": counts.get("bearish", 0),
"neutral": counts.get("neutral", 0),
"bullish_pct": round(counts.get("bullish", 0) / max(total, 1) * 100, 1),
}
return score
# Cost: 50 tickers x 3 queries each x $0.005 = $0.75/day
# LLM cost: ~150 classifications x $0.001 each = $0.15/day
# Total: under $1/day for 50 tickersLimitations
- Reddit sentiment is noisy: memes, sarcasm, and pump-and-dump campaigns are common
- Search results lag: new posts take time to be indexed by search engines
- Not a trading signal on its own: use as one input among many, not as a sole decision driver
- Small sample sizes: a ticker with 5 threads is not statistically meaningful