Definition
Google Dorks are advanced Google search operators — `site:`, `filetype:`, `intitle:`, `inurl:`, `intext:`, `before:`, `after:` — used together to surface specific documents, pages, or PDFs that plain keyword queries miss.
In Depth
Dorks are how researchers, security analysts, and AI agents find structured documents that aren't indexed under obvious keywords: government bid PDFs, court filings, leaked configuration files, deep-web pages on specific domains. An r/LangChain post in 2026 documented an SDR agent that uses Google Dorks (`site:gov.br filetype:pdf 2026 contratos`) plus Llama-3 to extract typed JSON from the discovered PDFs. Dorks work as well through search APIs as through the Google web UI; the API just returns results as structured JSON instead of HTML, which makes them easier to chain into agent loops.
Example Usage
The CrewAI agent ran `site:europa.eu filetype:pdf AI act` as its discovery dork, then passed the resulting PDF URLs through an extract endpoint and an LLM to produce typed JSON of regulatory updates.
Platforms
Google Dorks is relevant across the following platforms, all accessible through Scavio's unified API:
Related Terms
Data as a Service (DaaS)
Data as a Service (DaaS) is a delivery model where structured data is exposed via API or query layer rather than as a on...
Agent Architecture
Agent architecture is the set of design choices that turn an LLM prompt into a production system: routing and classifica...
Multi-Platform Search API
A multi-platform search API is a single REST endpoint that returns structured JSON from several public surfaces — Google...