Overview
Teams building knowledge bases from video content need a repeatable pipeline to discover new YouTube videos on their topics, extract transcripts, and make them searchable. This workflow runs at 6 AM daily, searches YouTube via Scavio for your configured topics, extracts transcript data from the top results, indexes the content into MongoDB with full-text search enabled, and runs a verification query to confirm the new content is retrievable. No manual video hunting or copy-pasting transcripts.
Trigger
Cron schedule (daily at 6 AM UTC)
Schedule
Daily at 6 AM UTC
Workflow Steps
Search YouTube for topics
Query Scavio YouTube platform for each configured topic. Collect video IDs, titles, and descriptions from top results.
Extract transcripts
For each discovered video, fetch the transcript text. Skip videos already in the database to avoid duplicates.
Index into MongoDB
Insert transcript documents with metadata (video ID, title, channel, publish date, topic) into MongoDB with text index.
Run search verification
Execute a test query against the MongoDB text index to confirm new transcripts are retrievable and ranked correctly.
Python Implementation
import requests, os, json
from datetime import datetime
H = {"x-api-key": os.environ["SCAVIO_API_KEY"]}
TOPICS = ["search api integration 2026", "ai agent grounding tools", "serp api tutorial"]
def search_youtube_topics(topic):
"""Search YouTube for a topic and return video metadata."""
r = requests.post("https://api.scavio.dev/api/v1/search", headers=H,
json={"platform": "youtube", "query": topic}, timeout=10).json()
videos = []
for item in r.get("organic", [])[:5]:
videos.append({
"video_id": item.get("link", "").split("v=")[-1] if "v=" in item.get("link", "") else "",
"title": item.get("title", ""),
"description": item.get("snippet", ""),
"link": item.get("link", ""),
"topic": topic,
"indexed_at": datetime.utcnow().isoformat()
})
return videos
# Collect all videos
all_videos = []
for topic in TOPICS:
videos = search_youtube_topics(topic)
all_videos.extend(videos)
print(f"[YOUTUBE] {topic}: {len(videos)} videos found")
# MongoDB insert (pseudo-code - replace with your pymongo connection)
# from pymongo import MongoClient
# db = MongoClient(os.environ["MONGO_URI"]).transcripts
# db.videos.create_index([("title", "text"), ("description", "text")])
# for v in all_videos:
# db.videos.update_one({"video_id": v["video_id"]}, {"$set": v}, upsert=True)
print(f"\nTotal videos to index: {len(all_videos)}")
for v in all_videos[:3]:
print(f" {v['title'][:80]} | {v['link']}")JavaScript Implementation
const H = {"x-api-key": process.env.SCAVIO_API_KEY, "Content-Type": "application/json"};
const TOPICS = ["search api integration 2026", "ai agent grounding tools", "serp api tutorial"];
async function searchYoutubeTopics(topic) {
const r = await fetch("https://api.scavio.dev/api/v1/search", {
method: "POST", headers: H,
body: JSON.stringify({platform: "youtube", query: topic})
}).then(r => r.json());
return (r.organic || []).slice(0, 5).map(item => ({
videoId: (item.link || "").includes("v=") ? item.link.split("v=").pop() : "",
title: item.title || "",
description: item.snippet || "",
link: item.link || "",
topic,
indexedAt: new Date().toISOString()
}));
}
(async () => {
const allVideos = [];
for (const topic of TOPICS) {
const videos = await searchYoutubeTopics(topic);
allVideos.push(...videos);
console.log(`[YOUTUBE] ${topic}: ${videos.length} videos found`);
}
// MongoDB insert (pseudo-code - replace with your mongodb connection)
// const { MongoClient } = require("mongodb");
// const db = (await MongoClient.connect(process.env.MONGO_URI)).db("transcripts");
// await db.collection("videos").createIndex({title: "text", description: "text"});
// for (const v of allVideos) {
// await db.collection("videos").updateOne({videoId: v.videoId}, {$set: v}, {upsert: true});
// }
console.log(`\nTotal videos to index: ${allVideos.length}`);
allVideos.slice(0, 3).forEach(v => console.log(` ${v.title.slice(0, 80)} | ${v.link}`));
})();Platforms Used
YouTube
Video search with transcripts and metadata