TheSkinnyAI Crawler · Bot Documentation

Public documentation of our crawler's identity, behavior, and operations for Cloudflare verified bot review and for site owners.

Bot Purpose & Use Case

Purpose: Discover and index public website content for customers who embed TheSkinnyAI assistant on their own domains. The indexed content is used to answer end‑user questions about the customer’s products and services.
Scope: Public pages only; no login‑gated or paywalled content. We honor robots.txt and customer‑provided exclusions (e.g., “Do Not Crawl URLs”).
Intent: Benign and helpful. We do not perform competitive scraping, price scraping, or any activity intended to harm site performance or business interests.
Targets: Customer websites that have explicitly onboarded to TheSkinnyAI. Discovery begins from a customer‑provided starting URL and/or /sitemap.xml.

Our crawler identifies itself with the following User‑Agent string (or newer minor versions):

TheSkinnyCrawler/1.0 (+https://theskinnyai.com/bot-docs/)

Site owners and Cloudflare rules can reliably match on TheSkinnyCrawler.

For verified‑bot review requiring static egress, we can operate through a fixed IP set dedicated to TheSkinnyCrawler.

Status: Static egress IPs will be published here when enabled. If you require IP allow‑listing sooner, contact https://theskinnyai.com/contact or email support@theskinnyai.com to obtain a dedicated IP (or small CIDR) reserved for crawling.
ASN: We do not operate our own ASN. If fixed egress is used, the ASN will correspond to our hosting/provider and will be documented alongside the published IPs.

Robots compliance: We fetch and honor /robots.txt (allow/disallow; crawl‑delay if provided, minimum 1‑second delay when specified).
Crawl rate: 0.5 requests/second per domain by default. Rates can be lowered to 0.1 req/sec upon site‑owner request.
Discovery: From a customer’s starting URL and, if permitted, from /sitemap.xml. Explicit “Do Not Crawl” rules are enforced.
Data handling & retention: We store only public page text, title, and structural metadata to support question‑answering for that customer. Data is encrypted at rest, retained for as long as the customer account is active, and removed within 30 days of contract termination.
No evasion: We do not rotate identities to bypass bot protection. If blocked, we request allow‑listing or fall back to customer‑provided content.

Owner: TheSkinnyAI, LLC
Contact: https://theskinnyai.com/contact or support@theskinnyai.com (rate changes, allow‑listing, or block requests)
Abuse reports: Email abuse@theskinnyai.com and include IP, timestamps, and requested URLs for investigation.
Web Bot Auth: Support for signed requests (JSON Web Key Sets) is planned for Q1 2026. Public keys will be hosted at https://theskinnyai.com/.well-known/jwks.json upon activation.

We adhere to Cloudflare’s Verified Bots policy and site owners’ robots and rate‑limit preferences. Breaches lead to immediate remediation and potential removal of a domain from crawling.
We keep this page updated to reflect any changes to identity (User‑Agent), egress IPs, or behavior.

To allow our bot in Cloudflare (Security > WAF > Firewall Rules or Security > Bots), use:

Expression: (http.user_agent contains "TheSkinnyCrawler")
Action: Allow
Features: Bypass Bot Management, Skip Managed Challenges

Optionally, allow‑list /sitemap.xml and specific content paths (e.g., /team, /advisors) to ensure discovery and ingestion succeed.

This page is the official Bot Documentation URL for TheSkinnyAI crawler and may be referenced in Cloudflare’s verified bot submission.