Tutorial

How to Build a YouTube Data Agent Pipeline

Build an AI agent that researches YouTube channels, analyzes video trends, and generates reports using Scavio YouTube search at $0.005/query.

YouTube search data reveals what content performs in any niche, which creators dominate, and what gaps exist. This pipeline builds an agent that researches a topic on YouTube, ranks channels by consistency, and outputs a competitive landscape report. Each YouTube search costs $0.005 through Scavio, making a full niche analysis under $0.10.

Prerequisites

  • Python 3.8+
  • requests library
  • A Scavio API key from scavio.dev
  • Target niches or topics to research

Walkthrough

Step 1: Search YouTube for topic coverage

Query YouTube for videos on a topic and extract channel data.

Python
import os, requests, json
from collections import Counter, defaultdict

API_KEY = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': API_KEY, 'Content-Type': 'application/json'}

def youtube_search(query):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'platform': 'youtube', 'country_code': 'us'}).json()
    results = data.get('organic_results', data.get('video_results', []))[:10]
    return [{'title': r.get('title', ''), 'channel': r.get('channel', {}).get('name', r.get('channel_name', '')),
             'views': r.get('views', 0), 'date': r.get('published_date', ''),
             'link': r.get('link', '')} for r in results]

videos = youtube_search('python web scraping tutorial 2026')
print(f'Found {len(videos)} videos')
for v in videos[:5]: print(f'  {v["channel"]:25} | {v["title"][:50]}')

Step 2: Map the channel landscape

Search multiple angles to build a map of who covers the topic.

Python
def map_landscape(topic, angles=None):
    if not angles:
        angles = [f'{topic} tutorial', f'{topic} guide 2026', f'{topic} for beginners',
                  f'best {topic}', f'{topic} tips']
    channels = defaultdict(lambda: {'videos': 0, 'total_views': 0, 'titles': []})
    for angle in angles:
        videos = youtube_search(angle)
        for v in videos:
            ch = v['channel'] or 'Unknown'
            channels[ch]['videos'] += 1
            channels[ch]['total_views'] += v.get('views', 0) if isinstance(v.get('views'), int) else 0
            channels[ch]['titles'].append(v['title'][:60])
    cost = len(angles) * 0.005
    print(f'\nLandscape for "{topic}" ({len(angles)} queries, ${cost:.3f}):')
    sorted_ch = sorted(channels.items(), key=lambda x: x[1]['videos'], reverse=True)
    for name, data in sorted_ch[:10]:
        print(f'  {name:25} | {data["videos"]} videos | views: {data["total_views"]:,}')
    return dict(channels), cost

channels, cost = map_landscape('web scraping')

Step 3: Identify content gaps

Find subtopics with low competition or outdated coverage.

Python
def find_gaps(topic, subtopics):
    gaps = []
    for sub in subtopics:
        query = f'{topic} {sub}'
        videos = youtube_search(query)
        recent = [v for v in videos if '2026' in v.get('date', '') or '2025' in v.get('date', '')]
        total = len(videos)
        print(f'  {sub:25} | {total} results | {len(recent)} recent')
        if total < 5 or len(recent) < 2:
            gaps.append({'subtopic': sub, 'total': total, 'recent': len(recent),
                'opportunity': 'low competition' if total < 5 else 'outdated content'})
    print(f'\nGaps found: {len(gaps)}')
    for g in gaps:
        print(f'  {g["subtopic"]}: {g["opportunity"]} ({g["total"]} total, {g["recent"]} recent)')
    return gaps

subtopics = ['playwright', 'scrapy', 'beautifulsoup', 'selenium', 'httpx', 'curl_cffi']
gaps = find_gaps('web scraping', subtopics)

Step 4: Generate the landscape report

Combine channel mapping and gap analysis into a report.

Python
def generate_report(topic, subtopics=None):
    if not subtopics:
        subtopics = ['beginner', 'advanced', 'tools', 'automation', 'api']
    print(f'=== YouTube Landscape Report: {topic} ===')
    channels, map_cost = map_landscape(topic)
    print(f'\n--- Content Gaps ---')
    gaps = find_gaps(topic, subtopics)
    gap_cost = len(subtopics) * 0.005
    total_cost = map_cost + gap_cost
    top_channels = sorted(channels.items(), key=lambda x: x[1]['videos'], reverse=True)[:5]
    report = {
        'topic': topic,
        'total_channels': len(channels),
        'top_5': [{'name': n, 'videos': d['videos']} for n, d in top_channels],
        'gaps': gaps,
        'cost': total_cost
    }
    with open(f'yt_report_{topic.replace(" ", "_")}.json', 'w') as f:
        json.dump(report, f, indent=2)
    print(f'\nTotal cost: ${total_cost:.3f}. Saved report.')
    return report

generate_report('web scraping')

Python Example

Python
import os, requests
from collections import Counter
SH = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

def yt_landscape(topic):
    channels = Counter()
    for suffix in ['tutorial', 'guide 2026', 'for beginners']:
        data = requests.post('https://api.scavio.dev/api/v1/search',
            headers=SH, json={'query': f'{topic} {suffix}', 'platform': 'youtube', 'country_code': 'us'}).json()
        for r in data.get('organic_results', [])[:10]:
            channels[r.get('channel', {}).get('name', 'Unknown')] += 1
    print(f'{topic} YouTube landscape ({len(channels)} channels):')
    for ch, count in channels.most_common(5):
        print(f'  {ch}: {count} videos')

yt_landscape('web scraping')

JavaScript Example

JavaScript
const SH = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
async function ytLandscape(topic) {
  const channels = {};
  for (const suffix of ['tutorial', 'guide 2026', 'for beginners']) {
    const data = await fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: SH,
      body: JSON.stringify({ query: `${topic} ${suffix}`, platform: 'youtube', country_code: 'us' })
    }).then(r => r.json());
    for (const r of (data.organic_results || []).slice(0, 10)) {
      const ch = r.channel?.name || 'Unknown';
      channels[ch] = (channels[ch] || 0) + 1;
    }
  }
  const sorted = Object.entries(channels).sort((a,b) => b[1]-a[1]);
  console.log(`${topic}: ${sorted.length} channels`);
  sorted.slice(0, 5).forEach(([ch, n]) => console.log(`  ${ch}: ${n} videos`));
}
ytLandscape('web scraping').catch(console.error);

Expected Output

JSON
Landscape for "web scraping" (5 queries, $0.025):
  Tech With Tim              | 4 videos | views: 2,340,000
  Corey Schafer              | 3 videos | views: 1,890,000
  John Watson Rooney         | 3 videos | views: 456,000
  NetworkChuck               | 2 videos | views: 3,200,000
  Fireship                   | 2 videos | views: 5,100,000

--- Content Gaps ---
  playwright                 | 8 results | 3 recent
  scrapy                     | 9 results | 2 recent
  beautifulsoup              | 10 results | 1 recent
  httpx                      | 3 results | 1 recent
  curl_cffi                  | 2 results | 0 recent

Gaps found: 2
  httpx: low competition (3 total, 1 recent)
  curl_cffi: low competition (2 total, 0 recent)

Total cost: $0.055. Saved report.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.8+. requests library. A Scavio API key from scavio.dev. Target niches or topics to research. A Scavio API key gives you 250 free credits per month.

Yes. The free tier includes 250 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Build an AI agent that researches YouTube channels, analyzes video trends, and generates reports using Scavio YouTube search at $0.005/query.