TikTok Retention Analysis with API Data
TikTok Studio shows basic retention. API data gives video-level engagement across your catalog. Build your own top-vs-low performer analysis.
TikTok Studio shows basic retention curves for individual videos, but it does not let you compare retention patterns across your entire catalog or identify what separates your top-performing content from everything else. API data gives you video-level engagement metrics (play_count, digg_count, comment_count, share_count) across all your videos, enabling systematic analysis that the built-in dashboard cannot provide.
What TikTok Studio gives you
TikTok Studio provides per-video retention curves showing the percentage of viewers at each second of the video. It also shows average watch time, total views, and basic demographic breakdowns. The limitation: you can only view one video at a time, there is no export function for bulk analysis, and you cannot correlate retention patterns with other engagement metrics at scale.
For creators with 10-20 videos, this is manageable. For creators with 100+ videos, manually checking retention on each video is impractical. You need programmatic access to build a picture of what actually drives engagement across your content library.
What API data adds
The TikTok API exposes video-level metrics that you can pull for every video on an account: play_count, digg_count (likes), comment_count, share_count, and video metadata (duration, hashtags, music, create_time). While the API does not expose the granular second-by-second retention curve, these engagement metrics serve as strong proxies for retention quality.
import httpx
import pandas as pd
async def fetch_creator_videos(username: str, api_key: str) -> list:
"""Fetch all videos for a TikTok creator via API."""
async with httpx.AsyncClient() as client:
resp = await client.post(
f"https://api.scavio.dev/api/v1/tiktok/user/videos",
headers={"Authorization": f"Bearer {api_key}"},
json={"username": username, "limit": 50},
)
return resp.json().get("videos", [])
async def analyze_retention_proxies(username: str, api_key: str):
"""Build retention analysis from engagement metrics."""
videos = await fetch_creator_videos(username, api_key)
df = pd.DataFrame([
{
"id": v.get("id"),
"desc": v.get("desc", "")[:80],
"duration": v.get("duration", 0),
"plays": v.get("play_count", 0),
"likes": v.get("digg_count", 0),
"comments": v.get("comment_count", 0),
"shares": v.get("share_count", 0),
"create_time": v.get("create_time"),
"hashtags": ", ".join(v.get("hashtags", [])),
}
for v in videos
])
# Engagement rate as retention proxy
df["engagement_rate"] = (
(df["likes"] + df["comments"] + df["shares"]) / df["plays"]
).round(4)
# Share rate: high share rate correlates with watch-through
df["share_rate"] = (df["shares"] / df["plays"]).round(4)
# Comment rate: high comment rate signals strong hooks
df["comment_rate"] = (df["comments"] / df["plays"]).round(4)
return dfIdentifying top vs low performers
Split your videos into quartiles by engagement rate. The top quartile represents content with the strongest retention signals. Compare the top quartile against the bottom quartile across every dimension: duration, hashtags, posting time, and content type.
def compare_performance_tiers(df: pd.DataFrame) -> dict:
"""Compare top 25% vs bottom 25% of videos."""
q75 = df["engagement_rate"].quantile(0.75)
q25 = df["engagement_rate"].quantile(0.25)
top = df[df["engagement_rate"] >= q75]
bottom = df[df["engagement_rate"] <= q25]
comparison = {
"top_25_pct": {
"count": len(top),
"avg_duration": round(top["duration"].mean(), 1),
"avg_engagement": round(top["engagement_rate"].mean(), 4),
"avg_share_rate": round(top["share_rate"].mean(), 4),
"avg_comment_rate": round(top["comment_rate"].mean(), 4),
"avg_plays": int(top["plays"].mean()),
},
"bottom_25_pct": {
"count": len(bottom),
"avg_duration": round(bottom["duration"].mean(), 1),
"avg_engagement": round(bottom["engagement_rate"].mean(), 4),
"avg_share_rate": round(bottom["share_rate"].mean(), 4),
"avg_comment_rate": round(bottom["comment_rate"].mean(), 4),
"avg_plays": int(bottom["plays"].mean()),
},
}
# Duration sweet spot analysis
duration_buckets = pd.cut(
df["duration"],
bins=[0, 15, 30, 60, 120, 300],
labels=["0-15s", "15-30s", "30-60s", "1-2min", "2-5min"],
)
comparison["engagement_by_duration"] = (
df.groupby(duration_buckets, observed=True)["engagement_rate"]
.mean()
.round(4)
.to_dict()
)
return comparisonWhat the analysis reveals
Common patterns creators discover when running this analysis across 100+ videos:
Duration sweet spots are real but niche-specific. Cooking content performs best at 45-90 seconds. Tech reviews peak at 2-3 minutes. The aggregate "keep it under 60 seconds" advice is wrong for many niches.
Share rate is a better retention proxy than like rate. Videos with high share rates almost always have strong retention because people share content they watched through. Like rate is more impulsive and less correlated with watch time.
Comment rate spikes indicate strong hooks. Videos with above-average comment rates typically have strong opening hooks that provoke a reaction, which correlates with early retention.
Building a retention dashboard
Pull data weekly, append to a running dataset, and track trends over time. This is more valuable than point-in-time analysis because you can see how changes to your content strategy affect engagement metrics across your catalog.
import json
from datetime import datetime
async def weekly_snapshot(username: str, api_key: str, output_dir: str):
"""Take weekly snapshot for longitudinal analysis."""
df = await analyze_retention_proxies(username, api_key)
comparison = compare_performance_tiers(df)
snapshot = {
"date": datetime.now().isoformat(),
"total_videos": len(df),
"comparison": comparison,
"top_performers": df.nlargest(5, "engagement_rate")[
["id", "desc", "duration", "plays", "engagement_rate"]
].to_dict("records"),
}
filename = f"{output_dir}/tiktok_snapshot_{datetime.now().strftime('%Y%m%d')}.json"
with open(filename, "w") as f:
json.dump(snapshot, f, indent=2, default=str)
return snapshotCost
Pulling 50 videos of data for a creator costs one API call at $0.005. Running the analysis weekly for a single account costs $0.02/month. Even analyzing 20 competitor accounts weekly costs $0.40/month. The bottleneck is not cost but building the analysis pipeline, and the code above handles the core of it.
Limitations
API data does not replace TikTok Studio's second-by-second retention curves. If you need to know exactly where viewers drop off within a specific video, you still need Studio. What API data gives you is the macro view: across your entire catalog, what patterns correlate with high engagement, and how are those patterns changing over time. The two data sources complement each other rather than compete.