redditstocksai

Reddit Stock Sentiment AI Pipeline

Build AI pipeline scanning Reddit for stock sentiment. Scavio Reddit API for thread data, LLM for sentiment classification. Cost: ~$0.25/day for 50 tickers.

9 min

Reddit's WallStreetBets, r/stocks, and r/investing contain real-time retail sentiment that moves markets. Building an AI pipeline to scan these subreddits for sentiment costs approximately $0.25/day using a structured Reddit API for thread data and an LLM for sentiment classification. No scraping, no proxies, no Reddit API rate limits.

Pipeline architecture

  1. Fetch Reddit threads mentioning target tickers via search API
  2. Extract thread titles, body text, upvote counts, and comment counts
  3. Run sentiment classification on each thread (bullish / bearish / neutral)
  4. Aggregate daily sentiment scores per ticker
  5. Alert on sentiment spikes (sudden shift from neutral to strongly bullish/bearish)

Step 1: collect Reddit data

Python
import requests, os

H = {"x-api-key": os.environ["SCAVIO_API_KEY"]}

def get_reddit_mentions(ticker: str):
    """Fetch Reddit threads mentioning a stock ticker."""
    queries = [
        f"{ticker} stock reddit",
        f"{ticker} DD wallstreetbets",
        f"{ticker} analysis r/stocks",
    ]
    threads = []
    for q in queries:
        resp = requests.post("https://api.scavio.dev/api/v1/search",
            headers=H, json={"query": q, "platform": "reddit"})
        for r in resp.json().get("organic_results", []):
            threads.append({
                "title": r.get("title", ""),
                "snippet": r.get("snippet", ""),
                "url": r.get("link", ""),
                "date": r.get("date", ""),
            })
    return threads

threads = get_reddit_mentions("NVDA")
print(f"Found {len(threads)} threads mentioning NVDA")

Step 2: sentiment classification

Python
from openai import OpenAI

client = OpenAI()

def classify_sentiment(threads: list) -> list:
    """Classify each thread as bullish, bearish, or neutral."""
    results = []
    for t in threads:
        text = f"{t['title']}. {t['snippet']}"
        resp = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"Classify this Reddit post about a stock as bullish, bearish, or neutral. Reply with one word only.\n\n{text}"
            }],
            max_tokens=5)
        sentiment = resp.choices[0].message.content.strip().lower()
        results.append({**t, "sentiment": sentiment})
    return results

Step 3: aggregate and alert

Python
from collections import Counter

def daily_sentiment(ticker: str):
    threads = get_reddit_mentions(ticker)
    classified = classify_sentiment(threads)
    counts = Counter(t["sentiment"] for t in classified)
    total = len(classified)
    score = {
        "ticker": ticker,
        "total_threads": total,
        "bullish": counts.get("bullish", 0),
        "bearish": counts.get("bearish", 0),
        "neutral": counts.get("neutral", 0),
        "bullish_pct": round(counts.get("bullish", 0) / max(total, 1) * 100, 1),
    }
    return score

# Cost: 50 tickers x 3 queries each x $0.005 = $0.75/day
# LLM cost: ~150 classifications x $0.001 each = $0.15/day
# Total: under $1/day for 50 tickers

Limitations

  • Reddit sentiment is noisy: memes, sarcasm, and pump-and-dump campaigns are common
  • Search results lag: new posts take time to be indexed by search engines
  • Not a trading signal on its own: use as one input among many, not as a sole decision driver
  • Small sample sizes: a ticker with 5 threads is not statistically meaningful