How long does this build a youtube knowledge base with mongodb tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Python 3.8+ installed. pymongo library installed. MongoDB running locally or Atlas connection string. A Scavio API key from scavio.dev. A Scavio API key gives you 50 free credits on signup.

Can I run this tutorial with the free tier?

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Build YouTube KB with MongoDB and API (2026)

Build a YouTube knowledge base in MongoDB by searching for topic-relevant videos through the Scavio API, extracting metadata and transcript data, storing them as text-indexed documents, and querying the knowledge base with full-text search. YouTube contains expert knowledge across every domain, but it is locked inside video format. A searchable knowledge base converts that video content into queryable text, making it accessible for research agents, chatbots, and internal tools.

Prerequisites

Python 3.8+ installed
pymongo library installed
MongoDB running locally or Atlas connection string
A Scavio API key from scavio.dev

Walkthrough

Step 1: Search YouTube topics

Query YouTube through Scavio to find relevant videos on your target topics.

Python

import os, requests
from pymongo import MongoClient
from datetime import datetime

API_KEY = os.environ['SCAVIO_API_KEY']
MONGO_URI = os.environ.get('MONGO_URI', 'mongodb://localhost:27017')

client = MongoClient(MONGO_URI)
db = client['youtube_kb']
collection = db['videos']

def search_youtube(topic: str) -> list:
    resp = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'platform': 'youtube', 'query': topic}, timeout=15)
    resp.raise_for_status()
    return resp.json().get('organic_results', [])

results = search_youtube('python async programming tutorial')
print(f'Found {len(results)} videos')

Step 2: Extract video metadata

Parse the search results to extract title, channel, description, views, and duration for each video.

Python

def extract_metadata(results: list, topic: str) -> list:
    docs = []
    for r in results:
        doc = {
            'title': r.get('title', ''),
            'url': r.get('link', ''),
            'channel': r.get('channel', r.get('author', '')),
            'description': r.get('snippet', r.get('description', '')),
            'views': r.get('views', ''),
            'duration': r.get('duration', ''),
            'topic': topic,
            'indexed_at': datetime.utcnow(),
            'searchable_text': f"{r.get('title', '')} {r.get('snippet', '')} {r.get('channel', '')}",
        }
        docs.append(doc)
    return docs

docs = extract_metadata(results, 'python async')
print(f'Extracted {len(docs)} video documents')
if docs:
    print(f'Sample: {docs[0]["title"][:60]}')

Step 3: Create MongoDB text index

Store the documents in MongoDB and create a text index on the searchable fields for full-text queries.

Python

def index_videos(docs: list) -> int:
    if not docs:
        return 0
    # Upsert to avoid duplicates
    inserted = 0
    for doc in docs:
        result = collection.update_one(
            {'url': doc['url']},
            {'$set': doc},
            upsert=True
        )
        if result.upserted_id:
            inserted += 1
    return inserted

# Create text index
collection.create_index([
    ('searchable_text', 'text'),
    ('title', 'text'),
    ('description', 'text'),
])

count = index_videos(docs)
print(f'Indexed {count} new videos (total: {collection.count_documents({})})')

Step 4: Search the knowledge base

Query your MongoDB knowledge base using full-text search to find relevant videos by content.

Python

def search_kb(query: str, limit: int = 5) -> list:
    cursor = collection.find(
        {'$text': {'$search': query}},
        {'score': {'$meta': 'textScore'}}
    ).sort([('score', {'$meta': 'textScore'})]).limit(limit)
    results = []
    for doc in cursor:
        results.append({
            'title': doc['title'],
            'url': doc['url'],
            'channel': doc.get('channel', ''),
            'score': doc.get('score', 0),
        })
    return results

results = search_kb('async await python')
for r in results:
    print(f"{r['score']:.1f} | {r['title'][:50]} | {r['channel']}")

Step 5: Automate daily indexing

Build a daily job that searches new topics and adds fresh videos to the knowledge base.

Python

DAILY_TOPICS = [
    'python best practices 2026',
    'machine learning tutorial',
    'system design interview',
    'cloud architecture patterns',
]

def daily_index() -> dict:
    total_new = 0
    for topic in DAILY_TOPICS:
        results = search_youtube(topic)
        docs = extract_metadata(results, topic)
        new = index_videos(docs)
        total_new += new
        print(f'{topic}: {new} new videos indexed')
    total = collection.count_documents({})
    print(f'Daily index complete: {total_new} new, {total} total')
    return {'new': total_new, 'total': total}

# Run daily via cron: python -c 'from kb import daily_index; daily_index()'
daily_index()

Python Example

Python

import requests, os
from pymongo import MongoClient
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
client = MongoClient('mongodb://localhost:27017')
db = client['youtube_kb']

def index_topic(topic):
    data = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'youtube', 'query': topic}).json()
    for r in data.get('organic_results', []):
        db.videos.update_one({'url': r.get('link')}, {'$set': {
            'title': r.get('title', ''), 'url': r.get('link', ''),
            'text': f"{r.get('title', '')} {r.get('snippet', '')}"
        }}, upsert=True)

index_topic('python async tutorial')
print(f'{db.videos.count_documents({})} videos in KB')

JavaScript Example

JavaScript

const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function searchYouTube(topic) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: H, body: JSON.stringify({platform: 'youtube', query: topic})
  });
  return (await r.json()).organic_results || [];
}
searchYouTube('python async tutorial').then(r => console.log(r.length + ' videos found'));

Expected Output

JSON

A MongoDB-backed knowledge base populated with YouTube video metadata from search results, queryable via full-text search and updated daily.

Prerequisites

Python 3.8+ installed
pymongo library installed
MongoDB running locally or Atlas connection string
A Scavio API key from scavio.dev

Walkthrough

Step 1: Search YouTube topics

Query YouTube through Scavio to find relevant videos on your target topics.

Python

import os, requests
from pymongo import MongoClient
from datetime import datetime

API_KEY = os.environ['SCAVIO_API_KEY']
MONGO_URI = os.environ.get('MONGO_URI', 'mongodb://localhost:27017')

client = MongoClient(MONGO_URI)
db = client['youtube_kb']
collection = db['videos']

def search_youtube(topic: str) -> list:
    resp = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'platform': 'youtube', 'query': topic}, timeout=15)
    resp.raise_for_status()
    return resp.json().get('organic_results', [])

results = search_youtube('python async programming tutorial')
print(f'Found {len(results)} videos')

Step 2: Extract video metadata

Parse the search results to extract title, channel, description, views, and duration for each video.

Python

def extract_metadata(results: list, topic: str) -> list:
    docs = []
    for r in results:
        doc = {
            'title': r.get('title', ''),
            'url': r.get('link', ''),
            'channel': r.get('channel', r.get('author', '')),
            'description': r.get('snippet', r.get('description', '')),
            'views': r.get('views', ''),
            'duration': r.get('duration', ''),
            'topic': topic,
            'indexed_at': datetime.utcnow(),
            'searchable_text': f"{r.get('title', '')} {r.get('snippet', '')} {r.get('channel', '')}",
        }
        docs.append(doc)
    return docs

docs = extract_metadata(results, 'python async')
print(f'Extracted {len(docs)} video documents')
if docs:
    print(f'Sample: {docs[0]["title"][:60]}')

Step 3: Create MongoDB text index

Store the documents in MongoDB and create a text index on the searchable fields for full-text queries.

Python

def index_videos(docs: list) -> int:
    if not docs:
        return 0
    # Upsert to avoid duplicates
    inserted = 0
    for doc in docs:
        result = collection.update_one(
            {'url': doc['url']},
            {'$set': doc},
            upsert=True
        )
        if result.upserted_id:
            inserted += 1
    return inserted

# Create text index
collection.create_index([
    ('searchable_text', 'text'),
    ('title', 'text'),
    ('description', 'text'),
])

count = index_videos(docs)
print(f'Indexed {count} new videos (total: {collection.count_documents({})})')

Step 4: Search the knowledge base

Query your MongoDB knowledge base using full-text search to find relevant videos by content.

Python

def search_kb(query: str, limit: int = 5) -> list:
    cursor = collection.find(
        {'$text': {'$search': query}},
        {'score': {'$meta': 'textScore'}}
    ).sort([('score', {'$meta': 'textScore'})]).limit(limit)
    results = []
    for doc in cursor:
        results.append({
            'title': doc['title'],
            'url': doc['url'],
            'channel': doc.get('channel', ''),
            'score': doc.get('score', 0),
        })
    return results

results = search_kb('async await python')
for r in results:
    print(f"{r['score']:.1f} | {r['title'][:50]} | {r['channel']}")

Step 5: Automate daily indexing

Build a daily job that searches new topics and adds fresh videos to the knowledge base.

Python

DAILY_TOPICS = [
    'python best practices 2026',
    'machine learning tutorial',
    'system design interview',
    'cloud architecture patterns',
]

def daily_index() -> dict:
    total_new = 0
    for topic in DAILY_TOPICS:
        results = search_youtube(topic)
        docs = extract_metadata(results, topic)
        new = index_videos(docs)
        total_new += new
        print(f'{topic}: {new} new videos indexed')
    total = collection.count_documents({})
    print(f'Daily index complete: {total_new} new, {total} total')
    return {'new': total_new, 'total': total}

# Run daily via cron: python -c 'from kb import daily_index; daily_index()'
daily_index()

Python Example

Python

import requests, os
from pymongo import MongoClient
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
client = MongoClient('mongodb://localhost:27017')
db = client['youtube_kb']

def index_topic(topic):
    data = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'youtube', 'query': topic}).json()
    for r in data.get('organic_results', []):
        db.videos.update_one({'url': r.get('link')}, {'$set': {
            'title': r.get('title', ''), 'url': r.get('link', ''),
            'text': f"{r.get('title', '')} {r.get('snippet', '')}"
        }}, upsert=True)

index_topic('python async tutorial')
print(f'{db.videos.count_documents({})} videos in KB')

JavaScript Example

JavaScript

const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function searchYouTube(topic) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: H, body: JSON.stringify({platform: 'youtube', query: topic})
  });
  return (await r.json()).organic_results || [];
}
searchYouTube('python async tutorial').then(r => console.log(r.length + ' videos found'));

Expected Output

JSON

A MongoDB-backed knowledge base populated with YouTube video metadata from search results, queryable via full-text search and updated daily.

How to Build a YouTube Knowledge Base with MongoDB

Prerequisites

Walkthrough

Step 1: Search YouTube topics

Step 2: Extract video metadata

Step 3: Create MongoDB text index

Step 4: Search the knowledge base

Step 5: Automate daily indexing

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this build a youtube knowledge base with mongodb tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Daily Personal Knowledge Base Update with Search

Search API Provider Landscape (2026)

Best YouTube Impression Tracking APIs in 2026

Best Budget Search APIs for AI Agents Under $10/mo (2026)

YouTube Transcript Knowledge Base

Video Library Transcript Search

Start Building

How to Build a YouTube Knowledge Base with MongoDB

Prerequisites

Walkthrough

Step 1: Search YouTube topics

Step 2: Extract video metadata

Step 3: Create MongoDB text index

Step 4: Search the knowledge base

Step 5: Automate daily indexing

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this build a youtube knowledge base with mongodb tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Daily Personal Knowledge Base Update with Search

Search API Provider Landscape (2026)

Best YouTube Impression Tracking APIs in 2026

Best Budget Search APIs for AI Agents Under $10/mo (2026)

YouTube Transcript Knowledge Base

Video Library Transcript Search

Start Building