Feature: content

Content Extraction with JS Rendering

Render any URL in a headless browser and extract clean article content, title, byline, and structured metadata.

What is Content Extraction with JS Rendering?

Scavio's content extraction endpoint takes any URL and returns the clean, readable article body rendered in a full headless browser, so single-page apps and JavaScript-heavy sites work the same as static pages. We strip nav, footers, sidebars, and ad slots using a readability-style extractor tuned on 2026 web layouts, then return the article title, author, publish and updated dates, hero image, cleaned HTML body, plain-text body, word count, language, and discovered OpenGraph and JSON-LD metadata. Optional parameters let you wait for a CSS selector, run a custom JS snippet, or take a screenshot. This is the workhorse endpoint for RAG pipelines, content monitoring, and any agent that needs the actual article behind a link.

Example Response

JSON
{
  "url": "https://example.com/ai-agents-are-eating-saas",
  "final_url": "https://example.com/ai-agents-are-eating-saas",
  "status": 200,
  "title": "AI Agents Are Eating SaaS",
  "author": "Jamie Chen",
  "published": "2026-04-02T08:00:00Z",
  "updated": "2026-04-05T14:30:00Z",
  "language": "en",
  "word_count": 1842,
  "hero_image": "https://example.com/og/ai-agents-saas.png",
  "text": "The first wave of AI agents shipped in 2024 as chat demos. By 2026 they replace entire SaaS categories...",
  "html": "<article><h1>AI Agents Are Eating SaaS</h1><p>The first wave...</p></article>",
  "metadata": {
    "og:title": "AI Agents Are Eating SaaS",
    "og:type": "article",
    "article:section": "AI",
    "article:tag": ["agents", "saas", "langgraph"]
  },
  "render_time_ms": 1842
}

Use Cases

  • RAG ingestion pipelines for enterprise knowledge bases
  • News and competitor content monitoring
  • Clean article extraction for summarization agents
  • JavaScript-site scraping without managing browsers
  • Content deduplication via canonical URL resolution

Why Content Extraction with JS Rendering Matters

Running a headless browser fleet in-house is expensive and fragile, and most read-mode libraries fail on modern JS-heavy sites. Scavio delivers readability-quality extraction with full JS rendering as a single API call, eliminating Playwright clusters and homegrown Readability forks. RAG teams see immediate quality lifts because noise from nav and ads is no longer polluting their embeddings, and engineering teams cut browser infrastructure spend significantly.

LangChain Example

Drop content extraction with js rendering data into your LangChain agent in a few lines:

Python
from langchain_scavio import ScavioContentTool
from langchain_community.vectorstores import Chroma
from langchain_anthropic import ChatAnthropic

tool = ScavioContentTool(api_key="your_scavio_api_key")

page = tool.invoke({
    "url": "https://example.com/ai-agents-are-eating-saas",
    "render_js": True,
})

docs = [{"text": page["text"], "metadata": {"url": page["url"]}}]
store = Chroma.from_texts([d["text"] for d in docs], metadatas=[d["metadata"] for d in docs])
print(f"Indexed {page['word_count']} words from {page['title']}")

Frequently Asked Questions

Send a search request with the appropriate platform (google) and Scavio returns content extraction with js rendering data in the response. See the example above for the exact field path.

Yes. Scavio fetches content extraction with js rendering data in real time on each request. There is no caching layer and no stale data.

Scavio's content extraction endpoint takes any URL and returns the clean, readable article body rendered in a full headless browser, so single-page apps and JavaScript-heavy sites

Content Extraction with JS Rendering data is returned as part of the standard search response. Each request costs 1 credit. Free tier includes 500 credits/month.

Start Using Content Extraction with JS Rendering

Render any URL in a headless browser and extract clean article content, title, byline, and structured metadata.