analyticsmcpgsc

GSC + GA4 MCP Analytics Pipeline

Connect Google Search Console and GA4 to AI agents via MCP. Add SERP context for keywords GSC shows. Architecture and implementation guide.

9 min

Build an analytics pipeline where n8n pulls Google Search Console and GA4 data via their APIs, saves pre-aggregated reports to Google Drive, and Claude reads them via MCP to write analysis into Obsidian. The key insight: pre-aggregate data before sending to Claude. Raw GSC exports with 50,000 rows overwhelm context windows. Summarize first, analyze second.

Architecture

Python
# Pipeline stages:
#
# [n8n Cron: daily 6am]
#     |
#     v
# [GSC API] --> aggregate by page, query, date
#     |
# [GA4 API] --> aggregate by page, source, events
#     |
#     v
# [n8n Function: pre-aggregate]
#     |
#     v
# [Google Drive: /analytics/daily-reports/]
#     |
#     v
# [Claude via MCP: reads Drive files]
#     |
#     v
# [Obsidian vault: /analysis/weekly-seo-report.md]

Step 1: pull GSC data with n8n

Python
# n8n HTTP Request node: GSC Search Analytics
# Method: POST
# URL: https://www.googleapis.com/webmasters/v3/sites/
#      https%3A%2F%2Fyoursite.com/searchAnalytics/query
# Auth: OAuth2 (Google credentials)
#
# Body:
# {
#   "startDate": "{{ $now.minus(7, 'days').format('yyyy-MM-dd') }}",
#   "endDate": "{{ $now.minus(1, 'day').format('yyyy-MM-dd') }}",
#   "dimensions": ["query", "page", "date"],
#   "rowLimit": 5000,
#   "dimensionFilterGroups": [{
#     "filters": [{
#       "dimension": "country",
#       "expression": "usa"
#     }]
#   }]
# }

# This returns up to 5,000 rows with clicks, impressions, CTR, position
# for each query+page+date combination

Step 2: pre-aggregate before Claude sees it

Python
# n8n Function node: aggregate GSC data
# Place after GSC HTTP Request node

const rows = $input.first().json.rows || [];

// Aggregate by page (sum clicks/impressions, avg position)
const byPage = {};
for (const row of rows) {
  const page = row.keys[1]; // page dimension
  if (!byPage[page]) {
    byPage[page] = { clicks: 0, impressions: 0, positions: [], queries: new Set() };
  }
  byPage[page].clicks += row.clicks;
  byPage[page].impressions += row.impressions;
  byPage[page].positions.push(row.position);
  byPage[page].queries.add(row.keys[0]);
}

// Create summary (fits in Claude's context)
const summary = Object.entries(byPage)
  .map(([page, d]) => ({
    page,
    clicks: d.clicks,
    impressions: d.impressions,
    avg_position: (d.positions.reduce((a, b) => a + b, 0) / d.positions.length).toFixed(1),
    unique_queries: d.queries.size,
  }))
  .sort((a, b) => b.clicks - a.clicks)
  .slice(0, 50); // Top 50 pages only

return [{ json: { summary, total_pages: Object.keys(byPage).length, period: "last_7_days" } }];

Step 3: pull GA4 data

Python
# n8n HTTP Request node: GA4 Data API
# Method: POST
# URL: https://analyticsdata.googleapis.com/v1beta/
#      properties/YOUR_PROPERTY_ID:runReport
# Auth: OAuth2 (Google credentials)
#
# Body:
# {
#   "dateRanges": [{"startDate": "7daysAgo", "endDate": "yesterday"}],
#   "dimensions": [
#     {"name": "pagePath"},
#     {"name": "sessionSource"}
#   ],
#   "metrics": [
#     {"name": "sessions"},
#     {"name": "engagedSessions"},
#     {"name": "conversions"}
#   ],
#   "limit": 1000,
#   "orderBys": [{"metric": {"metricName": "sessions"}, "desc": true}]
# }

# Aggregate GA4 in a Function node similar to GSC:
# Group by pagePath, sum sessions/conversions, list top sources

Step 4: save to Google Drive

Python
# n8n Google Drive node: Create File
# Operation: Upload
# File Name: analytics_{{ $now.format('yyyy-MM-dd') }}.json
# Folder: /analytics/daily-reports/
# Content: {{ JSON.stringify($json) }}

# The file contains pre-aggregated data:
# {
#   "gsc": { "summary": [...top 50 pages...], "total_pages": 230, "period": "last_7_days" },
#   "ga4": { "summary": [...top 50 pages...], "total_sessions": 4500, "period": "last_7_days" }
# }
#
# This is ~5-10KB instead of the raw 500KB+ export

Step 5: Claude reads via MCP and writes analysis

Python
# MCP configuration for Claude (in .mcp.json or settings):
# {
#   "mcpServers": {
#     "google-drive": {
#       "command": "npx",
#       "args": ["-y", "@anthropic/google-drive-mcp"],
#       "env": { "GOOGLE_CLIENT_ID": "...", "GOOGLE_CLIENT_SECRET": "..." }
#     }
#   }
# }

# Claude prompt for weekly analysis:
# "Read the latest analytics file from Google Drive at
#  /analytics/daily-reports/. Compare this week's top pages
#  to last week. Identify:
#  1. Pages with biggest position improvements
#  2. Pages losing clicks despite stable position (CTR drop)
#  3. New queries driving traffic this week
#  4. Pages with high impressions but low CTR (optimization targets)
#  Write the analysis to my Obsidian vault."

Why pre-aggregation matters

  • Raw GSC export: 50,000 rows = 5MB+ = exceeds Claude's practical context window
  • Pre-aggregated: 50 top pages with metrics = 10KB = fits easily, faster response
  • Claude analyzes patterns in summaries better than scanning raw rows
  • Cheaper: fewer tokens = lower API cost if using Claude API
  • Reproducible: saved aggregates serve as historical records

Enriching with SERP data

Python
import requests, os

def enrich_gsc_with_serp(gsc_summary: list) -> list:
    """Add SERP feature data to top GSC pages."""
    for page in gsc_summary[:10]:  # Top 10 pages only
        # Find the top query for this page
        top_query = page.get("top_query", page["page"].split("/")[-1].replace("-", " "))

        resp = requests.post(
            "https://api.scavio.dev/api/v1/search",
            headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
            json={"query": top_query, "platform": "google"},
            timeout=10,
        )
        serp = resp.json()
        page["serp_features"] = {
            "ai_overview": bool(serp.get("ai_overview")),
            "featured_snippet": bool(serp.get("featured_snippet")),
            "paa_count": len(serp.get("people_also_ask", [])),
        }

    return gsc_summary

# 10 credits = $0.05 to enrich top pages with SERP context
# Helps Claude identify WHY CTR changed (AI Overview appeared, etc.)

This pipeline costs under $5/month: n8n self-hosted is free, GSC/GA4 APIs are free, Google Drive is free, and optional SERP enrichment via Scavio is $1-2/month. The value is not the data collection -- it is the pre-aggregation that makes Claude's analysis actually useful instead of drowning in raw rows.