IATO - Website Crawler & Content Governance Platform
Welcome to IATO! This tool allows you to crawl websites, analyze their structure, find SEO issues, and generate comprehensive reports.
The dashboard is organized into the following main sections:
Click on any workspace to open its detail view:
Click your name in the top-right corner to access:
Check the health status indicator in the top-right corner to ensure all services are running:
| Setting | Description | Default |
|---|---|---|
| Website URL | Starting URL for the crawl | Required |
| Max Pages | Maximum internal pages to crawl (external links don't count) | 500 |
| Max External Links | Maximum external links to check (only when "Check external links" is ON). 0 = unlimited | 0 |
| Max Depth | How many links deep to follow from start URL | 3 |
| Max Redirects | Maximum redirect chain length | 5 |
| Timeout | Request timeout in seconds | 30 |
These settings control what gets crawled and stored. Default settings are optimized for minimal storage - enable only what you need.
| Setting | Description | Default |
|---|---|---|
| Crawl subdomains | Treat subdomains (blog.example.com) as internal | OFF |
| Store external links | Record links to other domains (URLs only, not checked) | ON |
| Check external links (detect broken) | Make HEAD requests to external links to check status codes. Enables broken external link detection. May slow crawls. | OFF |
| Crawl outside start folder | If starting at /blog/, also crawl /products/, /about/, etc. | ON |
| Respect robots.txt | Honor robots.txt directives. Disable only for sites you own. | ON |
| Store page content (HTML) | Save full HTML for content analysis. Required for word counts and duplicate detection. | OFF |
Default settings create the smallest database footprint (~5-10 MB per 5,000 pages). Enable additional options only as needed - a full crawl with all options can use 850 MB - 4 GB per project.
Control which resource types to discover and verify. Each has two options:
| Resource | Track | Check Size | Default |
|---|---|---|---|
| Images | Record image URLs found on pages | HEAD request to get status & file size | OFF |
| CSS | Record CSS file URLs | HEAD request to verify & get size | OFF |
| JavaScript | Record JS file URLs | HEAD request to verify & get size | OFF |
| Fonts | Record font URLs | HEAD request to verify & get size | OFF |
| Media | Record video/audio URLs | HEAD request to verify & get size | OFF |
| Other | Record PDFs, docs, etc. | HEAD request to verify & get size | OFF |
Enable these to generate insights. When disabled, the corresponding metrics will show 0 with a hint to enable.
| Setting | What It Enables | Default |
|---|---|---|
| SEO Analysis | Issues tab, missing titles, meta descriptions, heading structure | OFF |
| Performance Metrics | Response times, TTFB, response time distribution chart | OFF |
| Detect Duplicates | Content similarity analysis, duplicate page detection | OFF |
| Track Redirects | Redirect chain analysis, redirect loops detection | OFF |
| Extract Hreflang | International SEO, language/region targeting analysis | OFF |
| Extract Structured Data | Schema.org markup, JSON-LD, Microdata extraction | OFF |
Use regex patterns to include/exclude URLs:
# Include only blog and product pages /blog/.* /products/.* # Exclude admin and login pages /admin/.* /login.* \\?.*session.*
Enable JavaScript rendering to crawl single-page applications (SPAs) and sites with dynamic content. Uses headless browsers.
| Preset | Resolution | Type |
|---|---|---|
| Desktop 1080p | 1920×1080 | Desktop |
| Desktop 1440p | 2560×1440 | Desktop |
| iPhone 14 | 390×844 | Mobile |
| iPhone 14 Pro Max | 430×932 | Mobile |
| iPad Pro | 1024×1366 | Tablet |
| Pixel 7 | 412×915 | Mobile |
| Samsung Galaxy S23 | 360×780 | Mobile |
| Googlebot Mobile | 412×823 | Bot |
| Googlebot Desktop | 1920×1080 | Bot |
| Custom | User-defined | Custom |
| Option | Description |
|---|---|
| Network Idle | Wait until no network requests for 500ms (recommended for SPAs) |
| Page Load | Wait for the load event |
| DOM Content Loaded | Wait for DOMContentLoaded event |
| First Response | Continue after first server response |
| Wait for Selector | Wait for a specific CSS selector to appear (e.g., #main-content) |
| Extra Wait | Additional delay in milliseconds after page loads |
Block resources during rendering to speed up crawls:
Capture screenshots of each page during the crawl (requires JavaScript rendering).
| Option | Description |
|---|---|
| Full Page | Capture entire scrollable page (vs viewport only) |
| Format | PNG (lossless), JPEG (smaller), WebP (modern) |
| Quality | Compression quality for JPEG/WebP (10-100%) |
Click on any project to open the full analysis view. The left sidebar contains collapsible sections:
See dedicated sections below for details on these sidebar areas.
From the workspace list:
From the project detail header:
Workspaces help you organize projects and collaborate with team members.
| Role | Permissions |
|---|---|
| Owner | Full access, can delete workspace, manage members |
| Admin | Manage projects and members, cannot delete workspace |
| Member | Create and manage own projects, view all projects |
| Viewer | View projects only, no editing |
Set up recurring crawls that run automatically. Create a schedule from the workspace detail view by clicking "New Schedule".
Generate reports from the Reports view. The left panel has a report generation form (select job, type, and format), and the right panel shows recent reports.
Note: Report generation creates database records. Full file generation and download is planned for Phase 9.
Compare two crawls of the same website to find changes over time. Access via the Compare button in the project detail header, or from the Versions section in the sidebar. Select a baseline (older) job and a compare (newer) job, then click Compare.
The Settings page has 6 tabs for configuring your crawling environment:
Configure HTTP Basic or Digest authentication for password-protected websites. Add domain patterns with username/password, then select them when starting a crawl.
Create custom rules to extract specific data from pages using CSS selectors, XPath, or Regular Expressions.
Configure login form credentials so the crawler can authenticate before crawling. Specify the login URL, form field names, and credentials.
Configure your personal AI provider override (Bring Your Own Key). Select a provider (Anthropic or OpenAI), enter your API key, and choose a model. This overrides the system AI configuration for your account.
Platform administration (admin users only). Configure system-wide AI settings, email delivery (SendGrid/SMTP), and access the danger zone for system-level operations.
Connect external services to enrich your crawl data:
| Role | Permissions |
|---|---|
| Owner | Full workspace access, team management, delete workspaces |
| Admin | Manage users, settings, all projects in workspace |
| Member | Create projects, manage own crawls, view all data |
| Viewer | Read-only access to projects and reports |
Team plans include a set number of seats. The Team Members view shows a seat management panel where you can add or remove seats. Additional seats cost $20/month each.
The Content Inventory provides a complete catalog of all crawled pages with metadata and classification.
Access from the Inventory section in the project sidebar:
Build and manage taxonomies to organize your content. Access from the Taxonomy section in the project sidebar.
Launch from Taxonomy Overview to access the guided workflow:
Automatic quality checks for your taxonomy:
| Issue Type | Description | Action |
|---|---|---|
| Missing Definition | Terms without descriptions | Edit Term |
| Duplicate Labels | Same label used for multiple terms | Review Terms |
| Orphan Terms | Terms not connected in hierarchy | Fix Hierarchy |
| Draft Terms | Terms not yet approved | Edit Term |
The classification phase requires human approval for quality control. AI suggestions are starting points - always review before approving to ensure accuracy.
The Visual Sitemap Editor lets you plan and reorganize your website's information architecture with a drag-and-drop canvas interface.
| Control | Action |
|---|---|
| Click + Drag | Pan the canvas |
| Scroll / Pinch | Zoom in/out |
| Click Node | Select and open details panel |
| Drag Node | Reposition on canvas |
| Shift + Click | Multi-select nodes |
| Delete / Backspace | Delete selected node(s) |
| Ctrl/Cmd + C/V | Copy/paste nodes |
Click any node to open the right panel with editable properties:
Access via the gear icon in the toolbar:
The editor uses automatic hierarchical layout:
Click the AI button (✨) in the toolbar to open the AI Sitemap Assistant. Ask it to reorganize pages, create new sections, or generate content for you.
The AI Sitemap Assistant is a conversational interface for restructuring your sitemap using natural language. Ask it to create pages, reorganize sections, or generate content.
| Request Type | Example Prompts |
|---|---|
| Create pages | "Create a new FAQ section with 3 pages: General, Pricing, Support" |
| Reorganize | "Move all blog posts under a new Blog section" |
| Update metadata | "Add meta descriptions to all service pages" |
| Analyze | "What pages are missing meta descriptions?" |
| Suggest structure | "How should I organize the Services section?" |
| Generate content | "Write content for the About page" |
When AI proposes changes, you review and approve before execution:
Drag the left edge of the drawer to resize it (320px - 700px). Your preferred width is saved automatically.
The AI has access to:
The AI Assistant requires an AI provider (OpenAI or Anthropic) to be configured in Admin → AI Usage & Costs. Without this, the assistant will show an error message.
Download crawl data as comma-separated values. Scopes available:
Generate a sitemap.xml file from crawl results, compatible with Google Search Console.
Full structured data export for programmatic processing.
Export a complete redirect map CSV mapping old URLs to new destinations. Hand the file directly to your development team for server configuration.
Redirect maps aggregate data from multiple sources automatically. Here's how to build one:
There are two ways to set up redirects:
Navigate to your project's Export tab and click the Redirect Map card:
The redirect map CSV aggregates redirects from all sources:
If the same source URL appears in multiple sources, the most explicit user action takes priority.
Source URL, Destination URL, Status Code, Redirect Type, Chain Length, Notes
Check if robots.txt is blocking. Try disabling "Respect robots.txt" for testing.
Increase concurrent requests (up to 10-15) and reduce delay (0.25s).
Credentials allow IATO to access password-protected areas of websites using HTTP Basic or Digest authentication.
When starting a new project, expand Advanced Options and select your credential from the dropdown. The crawler will use these credentials for all requests.
Security Note: Credentials are stored encrypted. Never share your IATO account if you have sensitive credentials stored.
Extraction rules let you pull specific structured data from crawled pages using CSS selectors, XPath, or Regular Expressions. Rules are created once and can be attached to any crawl or scheduled crawl.
| Method | Example | Use Case |
|---|---|---|
| CSS | h1.title | Page titles, specific elements |
| XPath | //meta[@name='description']/@content | Attributes, complex paths |
| Regex | price:\\s*\\$(\\d+\\.\\d{2}) | Pattern matching in text |
| Target | Description |
|---|---|
| text | Extract the text content of matched elements (default) |
| html | Extract the full HTML of matched elements |
| attribute | Extract a specific HTML attribute (set target_attribute, e.g. href, src, content) |
| count | Return the number of matched elements |
When starting a new project or scheduled project, select extraction rules from the Extraction Rules section in the crawl configuration. Selected rules will be applied to every page during the crawl. View results in the Extracted Data tab after the crawl completes.
Tip: Always use the Test button to verify your selector works on a sample page before starting a large crawl.
Form authentication allows IATO to log into websites that use login forms before crawling.
Tip: Use your browser's developer tools to find the exact form field names. Look for <input name="..."> attributes.
All settings in IATO are global - they apply across all your workspaces and projects.
| Setting Type | Location | Notes |
|---|---|---|
| Credentials | Settings tab | Available to all projects |
| Extraction Rules | Settings tab | Applied based on URL patterns |
| Form Auth | Settings tab | Available to all projects |
| AI Configuration | Settings tab | Personal AI provider override (BYOK) |
| Integrations | Settings tab | Google Analytics, Search Console, WordPress |
| API Key | My Account | For REST API and SDK access |
JavaScript rendering enables IATO to crawl modern single-page applications (SPAs), React/Vue/Angular sites, and any page with dynamically loaded content.
Note: JavaScript rendering is slower than standard crawling (typically 2-5 seconds per page vs 0.5 seconds). Use it only when needed.
IATO uses a headless browser engine to launch a real browser (Chromium, Firefox, or WebKit) that:
| Option | Description |
|---|---|
| Browser Engine | Chromium (default), Firefox, or WebKit (Safari) |
| Device Preset | Pre-configured viewport sizes for desktop, mobile, and tablet |
| Wait Until | When to consider the page loaded (Network Idle recommended for SPAs) |
| Wait for Selector | Wait for a specific element to appear (e.g., #main-content) |
| Extra Wait | Additional delay after page load for slow animations |
| Resource Blocking | Skip images, CSS, fonts, or media to speed up rendering |
Choose from 9 pre-configured device profiles or define a custom viewport:
Desktop
Mobile
Tablet
Bots
When enabled, IATO captures a screenshot of each page:
Tip: Use resource blocking (images, fonts) to speed up rendering when you don't need visual fidelity for SEO analysis.
Customize the look and feel of IATO to suit your preferences. All theme settings are saved to your account and sync across devices.
Choose between three display modes:
| Mode | Description |
|---|---|
| Light | Default white/gray appearance |
| Dark | Dark backgrounds, easier on the eyes in low light |
| System | Automatically matches your device's light/dark setting |
Select from 12 accent colors that apply to buttons, links, and highlights throughout the app:
Blue (default), Purple, Green, Red, Orange, Teal, Indigo, Pink, Cyan, Amber, Lime, Slate
Adjust the text size from 80% to 120% of the default:
Note: Theme settings are saved automatically and will persist across browser sessions and devices when logged in.
IATO provides a comprehensive REST API for automation and integration with other tools. The API is designed to be AI-orchestrator friendly.
The official SDK is the easiest way to integrate with IATO programmatically:
npm install iato-sdk
import { IATO } from 'iato-sdk';
const iato = new IATO({ apiKey: 'iato_your_key_here' });
const job = await iato.crawls.start({
url: 'https://example.com',
workspace_id: 'ws_abc123',
});
const completed = await iato.crawls.waitForCompletion(job.id);
const issues = await iato.crawls.seoIssues(completed.id);
The SDK covers every API endpoint with full TypeScript types, automatic retries, and built-in error handling. See the API Documentation tab for the complete SDK reference and all available resources.
/api/manifest - Machine-readable capabilities and endpoints/api - API index with all endpoint categories/api/health/detailed - System status with latency infoCreate scoped API keys for automated access:
read, write, or adminInclude your API key in requests:
Authorization: Bearer iato_your_key_here
The API allows 60 requests per minute. Response headers tell you your status:
X-RateLimit-Limit: Maximum requests (60)X-RateLimit-Remaining: Requests left in windowX-RateLimit-Reset: When the window resetsFor safe retries on POST/PUT/DELETE requests, include an idempotency key:
X-Idempotency-Key: unique-request-id-123
If you retry with the same key within 24 hours, you'll get the cached response.
POST /api/crawl/jobs/batch-delete - Delete up to 100 jobsPOST /api/crawl/jobs/batch-export - Export up to 20 jobsTip: See the full API Documentation in the User Menu for detailed endpoint documentation.
Receive HTTP notifications when crawl events occur. Perfect for integrating IATO with your CI/CD pipeline, Slack, or other automation tools.
| Event | Description |
|---|---|
crawl.started | Crawl job has begun |
crawl.progress | Progress update (every 10%) |
crawl.completed | Crawl finished successfully |
crawl.failed | Crawl encountered an error |
crawl.cancelled | Crawl was cancelled by user |
{
"event": "crawl.completed",
"timestamp": "2026-01-14T00:00:00Z",
"data": {
"job_id": "abc123",
"url": "https://example.com",
"pages_crawled": 150,
"status": "completed"
}
}
If you configured a secret, verify the webhook signature:
# Header: X-Webhook-Signature: sha256=abc123... # Compute: HMAC-SHA256(secret, payload)
For real-time progress, connect via WebSocket:
const ws = new WebSocket('ws://your-server/api/crawl/jobs/{job_id}/stream');
ws.onmessage = (e) => {
const data = JSON.parse(e.data);
console.log(`Progress: ${data.data.percent_complete}%`);
};
Testing: Use the "Test Webhook" button to send a test event to your endpoint before relying on it in production.
The Developer Portal at /developers is a self-service platform for managing API access, usage, and billing. Access it from the user dropdown menu.
Live overview of your current usage and costs:
Create and manage scoped API keys with the iato_ prefix:
read, write, admin — choose which permissions each key hasImportant: API keys are shown only once at creation. Copy and save your key immediately — it cannot be retrieved later.
Get from zero to your first API call in minutes:
npm install iato-sdk
import { IATO } from 'iato-sdk';
const client = new IATO({ apiKey: 'iato_YOUR_KEY_HERE' });
const job = await client.crawl.start({
url: 'https://example.com',
maxPages: 100
});
console.log('Job started:', job.id);
The quickstart page also covers cURL examples, error handling for 402 (spending cap reached) and 429 (rate limited), and cost information.
The pricing page includes tier comparison, per-unit pricing with volume discounts, and an interactive cost estimator:
| Tier | Price | Highlights |
|---|---|---|
| Free | $0 | 500 pages/crawl, 30 req/min, 100 MB storage |
| Pro | $49/mo or $468/yr | 10k pages, 10k calls, 50 AI ops, 5 GB included |
| API | Usage-based | 120 req/min, 5 concurrent, spending caps, volume discounts |
| Enterprise | Custom | 600 req/min, 20 concurrent, dedicated support |
Use the Cost Estimator on the pricing page to calculate projected costs with preset scenarios (Solo Dev, Small Agency, AI Agent).
| Purchase | Credits | Bonus |
|---|---|---|
| $50 | $50 | — |
| $100 | $120 | 20% |
| $250 | $312.50 | 25% |
| $500 | $650 | 30% |