Local Event Aggregator as a Side Project (2026)
An r/AnnArbor post hit 110 upvotes aggregating events from many sources. Curated venue list + Scavio site-search + Reddit + daily cron. Under $40/mo.
An r/AnnArbor post in May 2026 hit 110 upvotes by aggregating events from many scattered sources into one site. The pattern works because most cities don't have a canonical events feed, and Eventbrite/Meetup don't cover venue-direct calendars or community spaces. This is the recipe for shipping the same as a side project.
Why this works
The OP's observation: "way more going on than I expected, all scattered across random sites." That gap is real for almost every city. Venue websites publish their own calendars. Local subreddits announce community events. University event pages host meetups. Eventbrite has coverage but rarely catches the venue-direct stuff.
The architecture
Three layers. Curated source list per city. Data pull via Scavio site- search + Reddit. Render as a clean public list.
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
# Curated source list per city
DOMAINS = {
'AnnArbor': ['theark.org', 'thelivery.com', 'umich.edu/events'],
# add more cities...
}
def pull_events(city):
out = []
# Site-search per venue
for d in DOMAINS[city]:
r = requests.post('https://api.scavio.dev/api/v1/search',
headers=H,
json={'query': f'site:{d} events 2026'}).json()
out += r.get('organic_results', [])[:10]
# Reddit signal for community events
r = requests.post('https://api.scavio.dev/api/v1/search',
headers=H,
json={'query': f'reddit r/{city} this weekend events 2026'}).json()
out += r.get('organic_results', [])[:10]
return outCuration is the moat
The 5-15 source domains per city is the operator's value-add. Scavio handles the typed-JSON part; you handle the discipline of picking which venues actually drive a city's scene. That's genuinely defensible — competitors who blindly scrape everything get noise; you get signal.
Normalization
Each result needs to land as typed JSON: title, datetime, venue, url, category. For ambiguous cases (date inferable from snippet but not explicit, category unclear), an LLM with a constrained-output prompt cleans up the 5-15% messy fraction.
Deduplication
Same event shows up on multiple feeds. Use rapidfuzz or a small LLM judge for fuzzy title overlap. Don't spend a week perfecting this — 80% accuracy is fine for a side project; users understand duplicates.
The cron
Daily at 6 AM local. Pulls fresh data, normalizes, writes a static JSON per city. Front-end (Next.js or Svelte or just static HTML) reads the JSON and renders.
Per-month cost
For one city: Scavio Project at $30/mo + $5 hosting = under $40. At Scavio Project tier (7K credits), 15 domains × daily site-search + weekly Reddit = about 500 credits/mo. Comfortable headroom for one or two cities. Add more cities, scale up tier when needed.
Why no login
Public events feed. The OP's a2eventsearch.com gets traffic precisely because there's no friction. Login walls kill side-project events aggregators. Stay free, stay public, stay clean.
Monetization (if you want)
Most events aggregator side projects don't need to monetize. If yours gets traction: sponsored event slots (clear labeling), Eventbrite affiliate links, or a paid "promote" tier for venues. Don't over-think this in year one.
Scaling to multiple cities
Curation is the work. Each new city needs its own source list research (find the 5-15 venues that drive the scene). Don't copy-paste a generic template; the value is the curation. Plan on a day per city for proper source curation.
Honest tradeoffs
Eventbrite/Meetup APIs are free but miss venue-direct events. Generic web scrapers fight Cloudflare and break. Scavio site-search is the cleanest single-source approach but you still need source curation. Reddit signal helps surface community events but adds noise.
Verified-online May 2026 against the source post and the Scavio API.