tiktokanalyticsapi

TikTok Retention Analysis with API Data

TikTok Studio shows basic retention. API data gives video-level engagement across your catalog. Build your own top-vs-low performer analysis.

May 16, 2026

8 min

TikTok Studio shows basic retention curves for individual videos, but it does not let you compare retention patterns across your entire catalog or identify what separates your top-performing content from everything else. API data gives you video-level engagement metrics (play_count, digg_count, comment_count, share_count) across all your videos, enabling systematic analysis that the built-in dashboard cannot provide.

What TikTok Studio gives you

TikTok Studio provides per-video retention curves showing the percentage of viewers at each second of the video. It also shows average watch time, total views, and basic demographic breakdowns. The limitation: you can only view one video at a time, there is no export function for bulk analysis, and you cannot correlate retention patterns with other engagement metrics at scale.

For creators with 10-20 videos, this is manageable. For creators with 100+ videos, manually checking retention on each video is impractical. You need programmatic access to build a picture of what actually drives engagement across your content library.

What API data adds

The TikTok API exposes video-level metrics that you can pull for every video on an account: play_count, digg_count (likes), comment_count, share_count, and video metadata (duration, hashtags, music, create_time). While the API does not expose the granular second-by-second retention curve, these engagement metrics serve as strong proxies for retention quality.

Python

import httpx
import pandas as pd

async def fetch_creator_videos(username: str, api_key: str) -> list:
    """Fetch all videos for a TikTok creator via API."""
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            f"https://api.scavio.dev/api/v1/tiktok/user/videos",
            headers={"Authorization": f"Bearer {api_key}"},
            json={"username": username, "limit": 50},
        )
        return resp.json().get("videos", [])

async def analyze_retention_proxies(username: str, api_key: str):
    """Build retention analysis from engagement metrics."""
    videos = await fetch_creator_videos(username, api_key)

    df = pd.DataFrame([
        {
            "id": v.get("id"),
            "desc": v.get("desc", "")[:80],
            "duration": v.get("duration", 0),
            "plays": v.get("play_count", 0),
            "likes": v.get("digg_count", 0),
            "comments": v.get("comment_count", 0),
            "shares": v.get("share_count", 0),
            "create_time": v.get("create_time"),
            "hashtags": ", ".join(v.get("hashtags", [])),
        }
        for v in videos
    ])

    # Engagement rate as retention proxy
    df["engagement_rate"] = (
        (df["likes"] + df["comments"] + df["shares"]) / df["plays"]
    ).round(4)

    # Share rate: high share rate correlates with watch-through
    df["share_rate"] = (df["shares"] / df["plays"]).round(4)

    # Comment rate: high comment rate signals strong hooks
    df["comment_rate"] = (df["comments"] / df["plays"]).round(4)

    return df

Identifying top vs low performers

Split your videos into quartiles by engagement rate. The top quartile represents content with the strongest retention signals. Compare the top quartile against the bottom quartile across every dimension: duration, hashtags, posting time, and content type.

Python

def compare_performance_tiers(df: pd.DataFrame) -> dict:
    """Compare top 25% vs bottom 25% of videos."""
    q75 = df["engagement_rate"].quantile(0.75)
    q25 = df["engagement_rate"].quantile(0.25)

    top = df[df["engagement_rate"] >= q75]
    bottom = df[df["engagement_rate"] <= q25]

    comparison = {
        "top_25_pct": {
            "count": len(top),
            "avg_duration": round(top["duration"].mean(), 1),
            "avg_engagement": round(top["engagement_rate"].mean(), 4),
            "avg_share_rate": round(top["share_rate"].mean(), 4),
            "avg_comment_rate": round(top["comment_rate"].mean(), 4),
            "avg_plays": int(top["plays"].mean()),
        },
        "bottom_25_pct": {
            "count": len(bottom),
            "avg_duration": round(bottom["duration"].mean(), 1),
            "avg_engagement": round(bottom["engagement_rate"].mean(), 4),
            "avg_share_rate": round(bottom["share_rate"].mean(), 4),
            "avg_comment_rate": round(bottom["comment_rate"].mean(), 4),
            "avg_plays": int(bottom["plays"].mean()),
        },
    }

    # Duration sweet spot analysis
    duration_buckets = pd.cut(
        df["duration"],
        bins=[0, 15, 30, 60, 120, 300],
        labels=["0-15s", "15-30s", "30-60s", "1-2min", "2-5min"],
    )
    comparison["engagement_by_duration"] = (
        df.groupby(duration_buckets, observed=True)["engagement_rate"]
        .mean()
        .round(4)
        .to_dict()
    )

    return comparison

What the analysis reveals

Common patterns creators discover when running this analysis across 100+ videos:

Duration sweet spots are real but niche-specific. Cooking content performs best at 45-90 seconds. Tech reviews peak at 2-3 minutes. The aggregate "keep it under 60 seconds" advice is wrong for many niches.

Share rate is a better retention proxy than like rate. Videos with high share rates almost always have strong retention because people share content they watched through. Like rate is more impulsive and less correlated with watch time.

Comment rate spikes indicate strong hooks. Videos with above-average comment rates typically have strong opening hooks that provoke a reaction, which correlates with early retention.

Building a retention dashboard

Pull data weekly, append to a running dataset, and track trends over time. This is more valuable than point-in-time analysis because you can see how changes to your content strategy affect engagement metrics across your catalog.

Python

import json
from datetime import datetime

async def weekly_snapshot(username: str, api_key: str, output_dir: str):
    """Take weekly snapshot for longitudinal analysis."""
    df = await analyze_retention_proxies(username, api_key)
    comparison = compare_performance_tiers(df)

    snapshot = {
        "date": datetime.now().isoformat(),
        "total_videos": len(df),
        "comparison": comparison,
        "top_performers": df.nlargest(5, "engagement_rate")[
            ["id", "desc", "duration", "plays", "engagement_rate"]
        ].to_dict("records"),
    }

    filename = f"{output_dir}/tiktok_snapshot_{datetime.now().strftime('%Y%m%d')}.json"
    with open(filename, "w") as f:
        json.dump(snapshot, f, indent=2, default=str)

    return snapshot

Cost

Pulling 50 videos of data for a creator costs one API call at $0.005. Running the analysis weekly for a single account costs $0.02/month. Even analyzing 20 competitor accounts weekly costs $0.40/month. The bottleneck is not cost but building the analysis pipeline, and the code above handles the core of it.

Limitations

API data does not replace TikTok Studio's second-by-second retention curves. If you need to know exactly where viewers drop off within a specific video, you still need Studio. What API data gives you is the macro view: across your entire catalog, what patterns correlate with high engagement, and how are those patterns changing over time. The two data sources complement each other rather than compete.

TikTok Retention Analysis with API Data

What TikTok Studio gives you

What API data adds

Identifying top vs low performers

What the analysis reveals

Building a retention dashboard

Cost

Limitations

Continue reading

Connect Scavio to Any AI Assistant with MCP

Build a Cross-Platform Product Research Agent with LangGraph