TheSkinnyAI Crawler · Bot Documentation
Public documentation of our crawler's identity, behavior, and operations for Cloudflare verified bot review and for site owners.
Bot Purpose & Use Case
- Purpose: Discover and index public website content for customers who embed TheSkinnyAI assistant on their own domains. The indexed content is used to answer end‑user questions about the customer’s products and services.
- Scope: Public pages only; no login‑gated or paywalled content. We honor
robots.txt
and customer‑provided exclusions (e.g., “Do Not Crawl URLs”).
- Intent: Benign and helpful. We do not perform competitive scraping, price scraping, or any activity intended to harm site performance or business interests.
- Targets: Customer websites that have explicitly onboarded to TheSkinnyAI. Discovery begins from a customer‑provided starting URL and/or
/sitemap.xml
.
User‑Agent String
Our crawler identifies itself with the following User‑Agent string (or newer minor versions):
TheSkinnyCrawler/1.0 (+https://theskinnyai.com/bot-docs/)
Site owners and Cloudflare rules can reliably match on TheSkinnyCrawler
.
IP Addresses & ASN
For verified‑bot review requiring static egress, we can operate through a fixed IP set dedicated to TheSkinnyCrawler.
- Status: Static egress IPs will be published here when enabled. If you require IP allow‑listing sooner, contact https://theskinnyai.com/contact or email support@theskinnyai.com to obtain a dedicated IP (or small CIDR) reserved for crawling.
- ASN: We do not operate our own ASN. If fixed egress is used, the ASN will correspond to our hosting/provider and will be documented alongside the published IPs.
Expected Behavior
- Robots compliance: We fetch and honor
/robots.txt
(allow/disallow; crawl‑delay if provided, minimum 1‑second delay when specified).
- Crawl rate: 0.5 requests/second per domain by default. Rates can be lowered to 0.1 req/sec upon site‑owner request.
- Discovery: From a customer’s starting URL and, if permitted, from
/sitemap.xml
. Explicit “Do Not Crawl” rules are enforced.
- Data handling & retention: We store only public page text, title, and structural metadata to support question‑answering for that customer. Data is encrypted at rest, retained for as long as the customer account is active, and removed within 30 days of contract termination.
- No evasion: We do not rotate identities to bypass bot protection. If blocked, we request allow‑listing or fall back to customer‑provided content.
Contact & Verification
Compliance & Policy
- We adhere to Cloudflare’s Verified Bots policy and site owners’ robots and rate‑limit preferences. Breaches lead to immediate remediation and potential removal of a domain from crawling.
- We keep this page updated to reflect any changes to identity (User‑Agent), egress IPs, or behavior.
Cloudflare Allow‑List Guidance
To allow our bot in Cloudflare (Security > WAF > Firewall Rules or Security > Bots), use:
Expression: (http.user_agent contains "TheSkinnyCrawler")
Action: Allow
Features: Bypass Bot Management, Skip Managed Challenges
Optionally, allow‑list /sitemap.xml
and specific content paths (e.g., /team
, /advisors
) to ensure discovery and ingestion succeed.
Change Log
- 2025‑10‑05: Initial publication of bot documentation.
This page is the official Bot Documentation URL for TheSkinnyAI crawler and may be referenced in Cloudflare’s verified bot submission.