Solution

Build a Searchable YouTube Transcript Knowledge Base

Valuable knowledge is locked inside YouTube videos with no structured way to search across transcripts. Developers want to build internal knowledge bases from tutorials, conference

The Problem

Valuable knowledge is locked inside YouTube videos with no structured way to search across transcripts. Developers want to build internal knowledge bases from tutorials, conference talks, and educational content but YouTube has no transcript search API.

The Scavio Solution

Three-step pipeline: (1) Use Scavio YouTube search to discover relevant videos by topic, (2) Extract transcripts using youtube-transcript-api, (3) Index transcripts in MongoDB with weighted text indexes. MongoDB text search with 10x weight on title and 1x on transcript ensures short title matches rank above incidental transcript keyword hits.

Before

Before the knowledge base, a developer education team manually searched YouTube for tutorials, watched parts of each video, and kept links in a spreadsheet. Finding 'which video covered React Server Components error handling' required checking 20 bookmarks.

After

After building the knowledge base, the team searches 'React Server Components error handling' and gets 8 matching transcripts ranked by relevance, with direct links. New videos are discovered daily via automated topic searches. The knowledge base contains 2,400 indexed transcripts after 3 months of daily ingestion.

Who It Is For

Developer education teams, knowledge management programs, and content researchers who want searchable access to YouTube video content.

Key Benefits

  • YouTube search discovers relevant videos automatically
  • MongoDB text indexes enable full-text search across transcripts
  • Weighted indexes prioritize title matches over transcript noise
  • Daily pipeline adds new videos without manual curation
  • 50 daily discovery queries cost $0.25

Python Example

Python
import requests, os, json
from pymongo import MongoClient

H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
client = MongoClient(os.environ.get('MONGO_URI', 'mongodb://localhost:27017'))
db = client['youtube_kb']
collection = db['transcripts']

def discover_videos(topic: str) -> list:
    """Find relevant YouTube videos via search."""
    r = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'youtube', 'query': topic}, timeout=10).json()
    return [{'title': o.get('title'), 'url': o.get('link'),
             'channel': o.get('channel')}
            for o in r.get('organic_results', [])[:10]]

def index_transcript(video: dict, transcript_text: str):
    collection.update_one(
        {'url': video['url']},
        {'$set': {'title': video['title'], 'transcript': transcript_text,
                  'channel': video.get('channel')}},
        upsert=True)

# Create weighted text index
collection.create_index([('title', 'text'), ('transcript', 'text')],
    weights={'title': 10, 'transcript': 1})

videos = discover_videos('react server components tutorial')
for v in videos[:3]:
    print(f"{v['title']}: {v['url']}")

JavaScript Example

JavaScript
const H = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };

async function discoverVideos(topic) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: H,
    body: JSON.stringify({ platform: 'youtube', query: topic })
  }).then(r => r.json());
  return (r.organic_results || []).slice(0, 10).map(o => ({
    title: o.title, url: o.link, channel: o.channel
  }));
}

const videos = await discoverVideos('react server components tutorial');
videos.slice(0, 3).forEach(v => console.log(`${v.title}: ${v.url}`));

Platforms Used

YouTube

Video search with transcripts and metadata

Frequently Asked Questions

Valuable knowledge is locked inside YouTube videos with no structured way to search across transcripts. Developers want to build internal knowledge bases from tutorials, conference talks, and educational content but YouTube has no transcript search API.

Three-step pipeline: (1) Use Scavio YouTube search to discover relevant videos by topic, (2) Extract transcripts using youtube-transcript-api, (3) Index transcripts in MongoDB with weighted text indexes. MongoDB text search with 10x weight on title and 1x on transcript ensures short title matches rank above incidental transcript keyword hits.

Developer education teams, knowledge management programs, and content researchers who want searchable access to YouTube video content.

Yes. Scavio's free tier includes 250 credits per month with no credit card required. That is enough to validate this solution in your workflow.

Build a Searchable YouTube Transcript Knowledge Base

Three-step pipeline: (1) Use Scavio YouTube search to discover relevant videos by topic, (2) Extract transcripts using youtube-transcript-api, (3) Index transcripts in MongoDB with