About Assaybot

Information for web publishers on Index Exchange’s site crawler bot.
Assaybot is Index Exchange’s automated content analysis crawler designed to ensure brand safety across our advertising exchange. It uses a multi-stage AI classification pipeline to analyze web page content, detect potential brand safety concerns, and help maintain high-quality inventory standards that protect both advertiser and publisher interests.
Purpose
Assaybot operates as part of Index Exchange’s quality assurance infrastructure. The system:
- Analyzes publisher page content for brand safety compliance using industry-leading AI models
- Identifies potential concerns including adult content, hate speech, violence, CSAM and other material that may affect advertiser confidence
- Helps publishers maintain and grow advertiser demand by ensuring inventory meets brand safety standards
- Operates entirely outside of the ad serving path — it has zero impact on ad delivery latency or page load performance
How It Benefits Publishers
Brand safety is a shared priority. When advertisers trust the quality of your inventory, it drives stronger demand and better monetization outcomes. Assaybot helps by:
- Proactively identifying content issues before they affect your revenue
- Providing transparent, consistent quality assessments across the Index Exchange
- Reducing the need for manual review processes that can delay issue resolution
- Ensuring your inventory remains eligible for premium advertiser demand
Assaybot does not affect your site’s search engine rankings or visibility. It does not index content for public search, and it does not redistribute your content in any form. It is exclusively used for content quality assessment within Index Exchange’s advertising ecosystem.
This documentation is maintained by Index Exchange and reflects the current state of the Assaybot system. Publishers will be notified of significant changes to crawl behavior or capabilities.
User Agent and Network
Assaybot identifies itself using the following HTTP user-agent request header:
Mozilla/5.0 (compatible; Assaybot/0.1; +http://www.indexexchange.dev/bot.html)
Assaybot always sends this user-agent string with every request. It does not attempt to disguise itself as a browser or any other client.
Important Security Note: The HTTP user-agent request header can be spoofed by other crawlers. For verification purposes, publishers should validate requests using IP address verification rather than relying solely on the user-agent string.
Allowing Assaybot in Your robots.txt File
To ensure our crawler doesn’t land on your global Disallow: condition, please add a single line:
User-agent: Assaybot
to the allowed-crawlers group in your robots.txt file. Our crawler identifies itself with the product token Assaybot and follows RFC 9309 / Google robots.txt semantics and will pick-up the authorization.
Authorized IP Address CIDR
All requests for Assaybot outside this address space can be considered user-agent spoofed requests.
192.139.80.0/24
Verification Recommendations
For CDN operators and network administrators who want to verify Assaybot traffic:
- IP Verification: Confirm the source IP falls within the authorized CIDR range shown above
- User-Agent Check: Verify the user-agent string matches the format shown above
- Behavioral Pattern: Assaybot makes only standard HTTP GET requests, respects robots.txt directives, and does not attempt to bypass authentication or access controls
If you need additional assistance with verification or allowlisting, contact your Index Exchange account representative.
Crawl Behavior
Access Frequency
Assaybot is designed to minimize impact on publisher infrastructure:
- Deduplication: Assaybot maintains a multi-layer deduplication system to prevent redundant requests. Each URL is analyzed at most once within a 30-day rolling window. A short-term cache prevents duplicate requests within the same day, while a long-term filter ensures URLs are not re-crawled for up to 30 days
- Per-Domain Concurrency: Assaybot limits the number of simultaneous requests to any single domain, ensuring no individual site experiences excessive load
- Timeout Period: Each page request has a 30-second timeout — if your server does not respond within that window, Assaybot moves on
- Retry Logic: Failed requests (server errors or rate limit responses) are retried up to 3 times with exponential backoff, increasing the delay between each attempt to avoid adding pressure to an already-strained server
- Rate Limit Compliance: If your server returns a
429 Too Many Requestsresponse, Assaybot will back off automatically and retry with increasing delays
What Assaybot Accesses
Assaybot analyzes URLs that appear in ad request traffic flowing through Index Exchange. The system:
- Processes page URLs and referrer URLs observed in ad request data
- Makes a single HTTP GET request per URL to retrieve the page content
- Extracts visible text content for brand safety analysis
- Stores analysis results internally for quality assurance reporting
- Does not index content for public search or external redistribution
- Does not execute JavaScript, submit forms, or interact with page elements
- Does not follow links on the page to discover new URLs — it only visits URLs already observed in ad traffic
Content Analysis Method
Assaybot uses a straightforward content retrieval approach:
- Makes standard HTTP GET requests using the documented user-agent string
- Extracts visible text content from the HTML response
- Strips scripts, styles, navigation elements, and other non-visible content
- Follows redirects automatically (up to 5 hops)
- Timeout: 30 seconds per request
- Cookies are disabled — Assaybot does not send or store cookies
The extracted text is then passed through a multi-stage AI classification pipeline to assess brand safety. No images, videos, or other media are downloaded or analyzed as of the last update to this guide.
Domain Safe List and Block List
Assaybot maintains curated domain lists to optimize system resources and focus analysis where it is most needed:
- Safe List: Well-known, trusted publisher domains (such as major news outlets) are automatically classified as safe and are not crawled, saving resources for both Assaybot and the publisher
- Block List: Domains that are already known to be non-compliant are excluded from crawling
These lists are maintained by Index Exchange and updated on a regular basis. If you believe your domain has been incorrectly categorized, please contact your account representative.
Data Collection & Privacy
Information Collected
For each analyzed page, Assaybot records:
- URL and Domain: The full URL and root domain of the analyzed page
- Publisher ID: Internal Index Exchange identifier linking to your account
- Extracted Text: Visible text content extracted from the HTML page (scripts, styles, and non-visible elements are excluded)
- HTTP Status: The response code returned by your server
- Brand Safety Verdict: The classification result (safe or unsafe) along with the confidence score
- Processing Metadata: Timestamps, response latency, and which classification stage produced the verdict
Assaybot does not collect:
- Personally identifiable information (PII) from page visitors
- Cookies or session data
- Form data or user-submitted content
- Images, videos, or other media files
- Information from password-protected or authenticated pages
Data Storage and Retention
- Analysis results are stored in compressed columnar format partitioned by date
- Automated lifecycle policies manage data retention and archival
- Data is accessible only to authorized Index Exchange personnel and relevant publisher account teams
Data Usage
Analysis results are used exclusively for:
- Brand safety quality assurance across Index Exchange’s supply network
- Publisher account management and content quality reporting
- Advertiser protection and inventory curation
- System performance monitoring and optimization
- Regulatory compliance reporting
Regulations
Assaybot’s content analysis is designed to comply with:
- GDPR: No personal data is intentionally collected; analysis focuses exclusively on publicly available published content
- CCPA: Text content analysis falls under business operations exemptions
- Industry Standards: Aligned with IAB brand safety guidelines and frameworks
Publishers with specific privacy concerns should contact their Index Exchange account representative.
Technical Specifications
Request Characteristics
- Protocol: HTTPS only
- HTTP Method: GET (read-only; Assaybot never POSTs data to publisher sites)
- Connection: Keep-alive
- Accept-Encoding: gzip
- Accept: text/html, application/xhtml+xml
- Accept-Language: en-US,en;
- DNT: 1 (Do Not Track enabled)
- Cookies: Disabled — Assaybot does not send or store cookies
Content Processing
- HTML Processing: Assaybot processes the full HTML response, extracting only visible text content
- Text Extraction: Scripts, styles, navigation elements, and non-visible markup are stripped before analysis
- Media: Images, videos, and other media files are not downloaded
HTTP Status Handling
- 2xx Success: Content is extracted and analyzed normally
- 3xx Redirects: Followed automatically (up to 5 redirects per request)
- 4xx Client Errors: Logged and not retried — Assaybot respects access restrictions
- 429 Too Many Requests: Retried with exponential backoff (automatically backs off to reduce load)
- 5xx Server Errors: Retried up to 3 times with exponential backoff, then logged as failed
robots.txt Compliance
Assaybot fully respects the Robots Exclusion Standard. Before crawling any page, Assaybot checks the site’s robots.txt file and honors all applicable directives, including:
User-agent: Assaybotspecific rules (checked first)User-agent: *wildcard rules (used as fallback)DisallowandAllowdirectivesCrawl-delayspecifications
robots.txt responses are cached so that your server is not repeatedly queried for the same file.
ℹ️ Publisher Recommendation: Publishers may add robots.txt rules for Assaybot at any time. Note: restricting access may impact eligibility to transact on Index Exchange for certain inventory. Exceptions to the robots.txt policy will be handled on a case-by-case basis through your account representative.
Managing Assaybot Access
Allowing Access (Recommended)
To ensure optimal brand safety monitoring and maintain good standing in Index Exchange’s supply network, we recommend allowing Assaybot full access to your publicly available content.
Benefits of allowing access:
- Proactive identification of potential content issues before they affect your revenue
- Faster resolution of brand safety concerns with automated, consistent analysis
- Maintained eligibility for premium advertiser demand across Index Exchange
- Transparent, data-driven content quality assessments available through your account team
Assaybot is designed to be a good citizen on your infrastructure. It respects robots.txt, limits concurrent requests per domain, backs off automatically when rate-limited, and will never crawl the same URL more than once within a 30-day window.
Restricting or Blocking Access
Assaybot fully supports robots.txt directives, giving you granular control over what it can access. Publishers who choose to restrict or block Assaybot should be aware:
- Quality Assurance Impact: Content that cannot be analyzed may require manual review processes, potentially causing delays in brand safety assessments
- Demand Eligibility: Blocking may impact eligibility to transact on Index Exchange for certain inventory, as automated brand safety verification cannot be completed
- Account Coordination: Significant restrictions may require additional coordination with your account team
To block Assaybot entirely, add the following to your robots.txt file:
User-agent: Assaybot
Disallow: /
To allow access to most of your site while restricting specific sections:
User-agent: Assaybot
Disallow: /private/
Disallow: /admin/
Allow: /
To set a crawl delay (seconds between requests):
User-agent: Assaybot
Crawl-delay: 10
If you have questions about how access restrictions may affect your account, please contact your Index Exchange account representative.
Troubleshooting & Common Issues
High Request Volume
If you notice unexpectedly high request volume from Assaybot:
- Verify Authenticity: First, confirm the requests are genuinely from Assaybot by checking the user-agent string and verifying the source IP against Index Exchange’s authorized IP range (
192.139.80.0/24). Requests from outside this range using the Assaybot user-agent are spoofed - Check Deduplication: Assaybot should not request the same URL more than once within a 30-day period. If you are seeing repeated requests to the same URL, the traffic may not be from Assaybot
- Use robots.txt: You can set a
Crawl-delaydirective in your robots.txt file to control how frequently Assaybot makes requests to your site - Contact Support: If the issue persists after verification, reach out to your account representative with sample request logs (timestamps, URLs, source IPs) and Index Exchange will investigate
WAF and CDN Configuration
If Assaybot is being blocked by your Web Application Firewall (WAF) or CDN:
- Allowlist by IP: Add Index Exchange’s authorized IP range (
192.139.80.0/24) to your WAF/CDN allowlist - Allowlist by User-Agent: Add
Assaybotto your bot allowlist. Note that IP verification is more secure than user-agent matching alone - Rate Limiting: If your CDN applies rate limits, ensure they are not so restrictive that legitimate crawl traffic is blocked. Assaybot limits its own per-domain concurrency and respects
429responses with automatic backoff - Bot Management: If you use a bot management solution (e.g., Cloudflare Bot Management, Akamai Bot Manager), you may need to add Assaybot to your verified bot list or create an exception rule
Access Errors
If Assaybot encounters repeated access errors on your site (403, 401, etc.):
- Authentication Walls: Assaybot can only access publicly available pages. Ensure the pages that appear in ad traffic are accessible without authentication
- Geo-Restrictions: If your site restricts access by geography, ensure that Index Exchange’s IP range is permitted
- IP Allowlisting: Use the authorized CIDR range documented in the User Agent and Network section
Content Analysis Issues
If you believe Assaybot is incorrectly flagging content:
- Review Flagged Content: Your account representative can provide specific examples of content that was flagged and the reason for the classification
- Understand Criteria: Brand safety assessment covers categories including explicit sexual content, hate speech, violence, illegal activity, and other material that may affect advertiser confidence. Classifications follow IAB brand safety guidelines
- Request Review: Contact your account representative to request a manual review of specific flagged URLs
- Appeal Process: Work with the Exchange Quality team for remediation guidance. Incorrectly flagged content can be reviewed and reclassified