CrawlGuard — Enterprise AI Bot Detection

53%

Web traffic is now bots

22+

AI crawlers identified

Compensation by default

100%

Per-bot visibility

Workflow

Three minutes from script tag to control.

No DNS changes, no proxy install, no platform lock-in. One script. Works on WordPress, Ghost, Substack, custom CMS, anything that renders HTML.

Step 01

See who's reading you

Real-time dashboard breaks down every AI crawler hitting your site — GPTBot, ClaudeBot, PerplexityBot, Bytespider, and the long tail of sub-million-page-a-month scrapers.

Step 02

Choose your policy

Block specific crawlers. Allow only those that respect robots.txt. Throttle aggressive ones. Or charge per crawl with the pay-per-crawl module.

Step 03

Enforce + measure

Per-crawler verdict logs. Track blocks, allows, and revenue if monetizing. Audit-friendly trail for compliance and legal teams.

Threats

Who's reading you without permission.

The visible AI crawlers are just the start. Most scraping for AI training happens via headless Chromium fleets paid for by data brokers.

AI training scrapers

OpenAI's GPTBot, Anthropic's ClaudeBot, Google's GoogleOther, ByteDance's Bytespider — all crawl for model training corpora. Identified by user-agent, verified by rDNS, blocked at the edge.

Search-and-summarize

PerplexityBot, You.com, Phind, Bing Copilot grab your content to generate answers — keeping users on their interface, away from your ads. Detected and blockable per crawler.

Aggregator scrapers

Headless Chromium fleets harvest your articles for downstream resale, summary feeds, and SEO clone sites. Caught via fingerprint mismatch and behavioral signals, even with rotation.

Anti-detect harvesters

Browserbase, ScrapingBee Premium, ZenRows, Hyperbrowser — paid services optimized to evade traditional detection. Caught via canvas double-render and timing-distribution ML.

Worked example

Blocking Bytespider before it touches an article.

ByteDance's crawler arrives. We identify it by user-agent. We verify with reverse DNS. We block per your site policy. We log it for your records.

GET /article/the-future-of-ai · 95.142.121.12 · Bytespider

Identityuser-agent matches 'Bytespider'0.95
VerificationrDNS confirms .bytespider.com originverified
Policysite rule: block all known AI crawlersmatch
Loggingrequest logged for compliance auditlogged
VerdictCapped at 1.0 — blockedBLOCK

What's in the box

Built for publishers, by people who get the stakes.

Crawler dashboard

Per-crawler view: requests, bytes, top paths, peak hours, and revenue (if monetizing). Filter by AI vendor, last 30 days.

Block + allow policies

One toggle per known crawler. Granular path scoping — block ChatGPT from /premium/* but allow from /free/*.

Content opt-out flag

Honors AI-opt-out headers (noai, noimageai, X-Robots-Tag) and emits them on every response. Respectful bots back off automatically.

Pay-per-crawl

Monetize crawler traffic. Set a per-request price in your currency. Crawlers either pay or get blocked. Revenue reports in the dashboard.

Content watermarks

Invisible zero-width markers embedded in your text. If your content surfaces in an AI dataset later, the marker proves where it came from.

Forensic logs

Per-request audit trail — full headers, score, signals, geo. Export for legal action or compliance reporting.

Stop training someone else's AI for free.

Content Protection

AI Crawler Blocking

robots.txt Generator

Content Watermarking

Compliance Monitor

Traffic Transparency

TDM Headers

Cost Calculator

Three minutes from script tag to control.

See who's reading you

Choose your policy

Enforce + measure

Who's reading you without permission.

AI training scrapers

Search-and-summarize

Aggregator scrapers

Anti-detect harvesters

Blocking Bytespider before it touches an article.

Built for publishers, by people who get the stakes.

Crawler dashboard

Block + allow policies

Content opt-out flag

Pay-per-crawl

Content watermarks

Forensic logs

53% of internet traffic is automated.
How much of yours?

Stop training someone else's AI for free.

Content Protection

AI Crawler Blocking

robots.txt Generator

Content Watermarking

Compliance Monitor

Traffic Transparency

TDM Headers

Cost Calculator

Three minutes from script tag to control.

See who's reading you

Choose your policy

Enforce + measure

Who's reading you without permission.

AI training scrapers

Search-and-summarize

Aggregator scrapers

Anti-detect harvesters

Blocking Bytespider before it touches an article.

Built for publishers, by people who get the stakes.

Crawler dashboard

Block + allow policies

Content opt-out flag

Pay-per-crawl

Content watermarks

Forensic logs

53% of internet traffic is automated.How much of yours?

53% of internet traffic is automated.
How much of yours?