What is Content Extraction with JS Rendering?
Scavio's content extraction endpoint takes any URL and returns the clean, readable article body rendered in a full headless browser, so single-page apps and JavaScript-heavy sites work the same as static pages. We strip nav, footers, sidebars, and ad slots using a readability-style extractor tuned on 2026 web layouts, then return the article title, author, publish and updated dates, hero image, cleaned HTML body, plain-text body, word count, language, and discovered OpenGraph and JSON-LD metadata. Optional parameters let you wait for a CSS selector, run a custom JS snippet, or take a screenshot. This is the workhorse endpoint for RAG pipelines, content monitoring, and any agent that needs the actual article behind a link.
Example Response
{
"url": "https://example.com/ai-agents-are-eating-saas",
"final_url": "https://example.com/ai-agents-are-eating-saas",
"status": 200,
"title": "AI Agents Are Eating SaaS",
"author": "Jamie Chen",
"published": "2026-04-02T08:00:00Z",
"updated": "2026-04-05T14:30:00Z",
"language": "en",
"word_count": 1842,
"hero_image": "https://example.com/og/ai-agents-saas.png",
"text": "The first wave of AI agents shipped in 2024 as chat demos. By 2026 they replace entire SaaS categories...",
"html": "<article><h1>AI Agents Are Eating SaaS</h1><p>The first wave...</p></article>",
"metadata": {
"og:title": "AI Agents Are Eating SaaS",
"og:type": "article",
"article:section": "AI",
"article:tag": ["agents", "saas", "langgraph"]
},
"render_time_ms": 1842
}Use Cases
- RAG ingestion pipelines for enterprise knowledge bases
- News and competitor content monitoring
- Clean article extraction for summarization agents
- JavaScript-site scraping without managing browsers
- Content deduplication via canonical URL resolution
Why Content Extraction with JS Rendering Matters
Running a headless browser fleet in-house is expensive and fragile, and most read-mode libraries fail on modern JS-heavy sites. Scavio delivers readability-quality extraction with full JS rendering as a single API call, eliminating Playwright clusters and homegrown Readability forks. RAG teams see immediate quality lifts because noise from nav and ads is no longer polluting their embeddings, and engineering teams cut browser infrastructure spend significantly.
LangChain Example
Drop content extraction with js rendering data into your LangChain agent in a few lines:
from langchain_scavio import ScavioContentTool
from langchain_community.vectorstores import Chroma
from langchain_anthropic import ChatAnthropic
tool = ScavioContentTool(api_key="your_scavio_api_key")
page = tool.invoke({
"url": "https://example.com/ai-agents-are-eating-saas",
"render_js": True,
})
docs = [{"text": page["text"], "metadata": {"url": page["url"]}}]
store = Chroma.from_texts([d["text"] for d in docs], metadatas=[d["metadata"] for d in docs])
print(f"Indexed {page['word_count']} words from {page['title']}")