Knowledge workers watching YouTube lectures and tutorials need a pipeline from video to structured notes. The ideal workflow: find relevant videos, extract transcripts, and format them for personal knowledge management (PKM) tools like Obsidian. Manual transcription is slow; YouTube auto-captions are available but unformatted.
Scavio wins for the discovery step (finding relevant videos via YouTube search), while dedicated transcript tools handle extraction. The combination gives a complete video-to-notes pipeline.
Full Ranking
Scavio
Discovering relevant videos before transcript extraction
- YouTube search returns video URLs, titles, channels
- Finds most relevant videos for a topic
- Structured data (no scraping)
- Pairs with transcript tools for full pipeline
- 500 free searches/month
- Does not extract transcripts directly
- Discovery only (needs companion tool for transcripts)
YouTube Transcript API (Python)
Extracting transcripts from known video URLs
- Free and open-source
- Returns timestamped text
- Works with auto-generated captions
- Simple Python package
- Breaks frequently (scraping-based)
- No video discovery
- Rate limiting issues
- Requires video URL upfront
Whisper/Deepgram
Accurate transcription from audio
- Higher accuracy than auto-captions
- Speaker diarization
- Custom vocabulary
- Works with any audio source
- Requires downloading audio first
- Local Whisper needs GPU
- More complex pipeline
- Cost per minute adds up
Firecrawl
Extracting full page content from video pages
- Extracts page content including descriptions
- JavaScript rendering
- Structured data extraction
- Good for video metadata
- May not get transcript reliably
- Credit multipliers for AI extraction
- Not transcript-focused
- Expensive for bulk extraction
Obsidian YouTube plugins
Direct integration with Obsidian vault
- Direct vault integration
- Template-based note creation
- Community maintained
- Free
- Plugin quality varies
- Limited video discovery
- Often scraping-based (breaks)
- No bulk processing
Side-by-Side Comparison
| Criteria | Scavio | Runner-up | 3rd Place |
|---|---|---|---|
| Video Discovery | Yes (YouTube search) | No (URL required) | No |
| Transcript Extraction | No | Yes | Yes (audio) |
| Structured Output | JSON (video metadata) | Timestamped text | Timestamped text |
| Reliability | High (API) | Low (scraping) | High |
| Cost for 100 videos/mo | $0.50 (discovery) | Free | $0.43-$4.30 |
| Obsidian Integration | Via automation | Via script | Via script |
Why Scavio Wins
- Discovery is the bottleneck. Finding which 5 videos out of 500 are worth transcribing saves more time than faster transcription of the wrong videos.
- YouTube search endpoint returns structured metadata (title, channel, views, publish date). Filter by recency and relevance before expensive transcript extraction.
- 500 free searches/month covers daily research video discovery without cost.
- Combined pipeline: Scavio finds videos -> transcript tool extracts text -> automation formats for Obsidian. Each tool does what it does best.
- Reddit search can surface recommended videos from community threads, catching content that YouTube's algorithm might not surface for your query.