IATO REST API
Base URL: https://iato.ai/api
Two authentication methods are supported:
# JWT Token (from login) Authorization: Bearer eyJhbGciOiJIUzI1NiIs... # API Key (recommended for automation) Authorization: Bearer iato_your_api_key_here
For safe retries on POST/PUT/DELETE, include:
X-Idempotency-Key: unique-request-id-123
Cached responses returned for 24 hours. Check X-Idempotency-Replayed: true header.
The fastest way to integrate with IATO. Zero dependencies, fully typed, with built-in retries and error handling.
npm install iato-sdk
import { IATO } from 'iato-sdk';
const iato = new IATO({
apiKey: 'iato_your_key_here',
baseUrl: 'https://iato.ai'
});
// Start a crawl
const job = await iato.crawls.start({
url: 'https://example.com',
workspace_id: 'ws_abc123',
max_pages: 500,
});
// Wait for completion (polls automatically)
const completed = await iato.crawls.waitForCompletion(job.id);
// Get results
const stats = await iato.crawls.stats(completed.id);
const issues = await iato.crawls.seoIssues(completed.id);
const broken = await iato.crawls.brokenLinks(completed.id);
| Resource | Description |
|---|---|
iato.workspaces | Create, list, update, delete workspaces |
iato.crawls | Start crawls, get pages, stats, SEO issues, exports |
iato.sitemaps | Visual sitemaps, nodes, AI assistant, plans |
iato.inventory | Content inventory and statistics |
iato.navigation | Navigation detection and validation |
iato.taxonomy | AI taxonomy builder (5-phase wizard) |
iato.schedules | Recurring crawl schedules |
iato.webhooks | Webhook management and signature verification |
iato.reports | Report generation and download |
iato.auth | API key management |
iato.users | Quota and preferences |
iato.org | Organization members and invitations |
iato.health | Health checks and API manifest |
iato.aeo | Healthchecks, domain verification, and Overview endpoints |
import { IATO, NotFoundError, RateLimitError, AuthenticationError } from 'iato-sdk';
try {
await iato.crawls.get('nonexistent');
} catch (error) {
if (error instanceof NotFoundError) {
console.log('Job not found');
} else if (error instanceof RateLimitError) {
console.log(`Retry after ${error.retryAfter}s`);
} else if (error instanceof AuthenticationError) {
console.log('Invalid API key');
}
}
Features: Zero dependencies (native fetch), automatic retries with exponential backoff on 5xx/429, full TypeScript types for every endpoint, request cancellation via AbortSignal, and webhook signature verification. Requires Node.js 18+.
Source: npmjs.com/package/iato-sdk
/api/manifest
Machine-readable API capabilities. AI orchestrators use this to discover features.
/api
API index listing all endpoint categories.
/api/user/quota
Check your usage quota before starting crawls.
{
"pages_remaining": 9500,
"crawls_remaining": 47,
"can_start_crawl": true,
"plan": "pro"
}
/api/health
Basic health check of all services.
{
"dashboard": "ok",
"database": "ok",
"crawler": "ok",
"redis": "ok"
}
/api/health/detailed
Detailed health with latency and capacity info. Useful for orchestrator decisions.
{
"status": "healthy",
"version": "10.1.0",
"dependencies": {
"database": {"status": "healthy", "latency_ms": 5},
"redis": {"status": "healthy", "latency_ms": 2},
"crawler": {"status": "healthy", "latency_ms": 15}
},
"capacity": {
"active_crawls": 3,
"max_concurrent_crawls": 10,
"available_slots": 7
}
}
/api/auth/login
Login to get JWT token.
{
"email": "[email protected]",
"password": "your-password"
}
/api/auth/api-keys
List your API keys.
/api/auth/api-keys
Create a new scoped API key.
{
"name": "CI/CD Pipeline",
"scopes": ["read", "write"],
"expires_in_days": 90
}
read - View jobs, pages, reportswrite - Start crawls, create webhooksadmin - Manage users, delete resources⚠️ Important: The API key is only shown once at creation. Save it immediately!
/api/auth/api-keys/{key_id}
Revoke an API key.
/api/crawl/jobs/batch-delete
Delete multiple jobs at once (max 100).
{
"job_ids": ["abc123", "def456", "ghi789"]
}
/api/crawl/jobs/batch-export
Export multiple jobs at once (max 20).
{
"job_ids": ["abc123", "def456"],
"format": "json",
"scope": "summary" // or "pages"
}
/api/webhooks
List your webhooks.
/api/webhooks
Create a webhook to receive event notifications.
{
"name": "Slack Notification",
"url": "https://hooks.slack.com/...",
"events": ["crawl.completed", "crawl.failed"],
"secret": "optional-hmac-secret"
}
crawl.started
crawl.progress
crawl.completed
crawl.failed
crawl.cancelled
/api/webhooks/{id}/test
Send a test event to verify your webhook endpoint.
/api/webhooks/{id}
Delete a webhook.
/api/webhooks/{webhook_id}
Get webhook details
/api/webhooks/{webhook_id}
Update webhook configuration
/api/webhooks/{webhook_id}/history
Get delivery history
/api/crawl/jobs/{job_id}/stream
WebSocket endpoint for real-time progress streaming.
// Connect
const ws = new WebSocket('ws://host/api/crawl/jobs/abc123/stream');
// Receive updates
ws.onmessage = (e) => {
const msg = JSON.parse(e.data);
// msg.type: "progress" | "complete" | "error"
// msg.data: { job_id, status, percent_complete, ... }
};
/api/crawl/start
Start a new crawl job. All config options default to minimal footprint settings.
{
"url": "https://example.com",
"workspace_id": 1,
// Basic Settings
"max_pages": 500, // Max internal pages (external links don't count)
"max_external_links": 0, // Max external links to check (0 = unlimited)
"max_depth": 3,
"delay_ms": 100,
"timeout_seconds": 30,
"concurrent_requests": 5,
"max_redirects": 5,
// Scope Settings
"crawl_subdomains": false,
"include_external": true, // Track external link URLs
"crawl_external": false, // Check external links (HEAD request)
"crawl_outside_start_folder": true,
"respect_robots_txt": true,
"store_content": false, // Store HTML (required for word counts)
// Analysis Settings (all default to false for minimal footprint)
"seo_analysis": false,
"performance_metrics": false,
"detect_duplicates": false,
"track_redirects": false,
"extract_hreflang": false,
"extract_structured_data": false,
// Resource Tracking (all default to false)
// store_* = Track URLs only, crawl_* = HEAD request to check size
"store_images": false, "crawl_images": false,
"store_css": false, "crawl_css": false,
"store_js": false, "crawl_js": false,
"store_fonts": false, "crawl_fonts": false,
"store_media": false, "crawl_media": false,
"store_other": false, "crawl_other": false,
// URL Filtering
"include_patterns": [],
"exclude_patterns": [],
"remove_parameters": ["utm_source", "utm_medium", "fbclid"],
// Authentication (optional)
"credential_id": null,
"formauth_id": null,
// JavaScript Rendering (optional)
"js_rendering": false,
"js_browser_type": "chromium", // chromium, firefox, webkit
"js_device_preset": "desktop_1080p",
"js_wait_until": "networkidle",
// Screenshots (optional, requires js_rendering)
"capture_screenshots": false,
"screenshot_full_page": false,
"screenshot_format": "png"
}
Note: Default settings create minimal database footprint (~5-10 MB per 5,000 pages). Enable options as needed for your use case.
Available values for js_device_preset:
desktop_1080p
desktop_1440p
iphone_14
iphone_14_pro_max
ipad_pro
pixel_7
samsung_galaxy_s23
googlebot_mobile
googlebot_desktop
custom
{
"job_id": "abc12345",
"status": "started"
}
/api/crawl/jobs
List all crawl jobs with pagination and filtering.
| workspace_id | Filter by workspace ID |
| limit | Results per page (default: 20, max: 100) |
| offset | Pagination offset (default: 0) |
| status | Filter by status: pending, running, completed, failed |
/api/crawl/jobs/{job_id}
Get detailed information about a specific job.
/api/crawl/jobs/{job_id}
Delete a crawl job and all associated data.
/api/crawl/jobs/{job_id}
Update crawl job metadata
/api/crawl/jobs/{job_id}/recrawl
Start recrawl of existing job
/api/crawl/jobs/{job_id}/cancel
Cancel a running crawl
/api/crawl/jobs/{job_id}/versions
Get crawl version history
/api/crawl/jobs/{job_id}/progress
Get real-time crawl progress
/api/crawl/jobs/{job_id}/overview
Get crawl overview dashboard data
/api/crawl/jobs/{job_id}/logs
Get crawl execution logs
/api/crawl/jobs/deletion-status
Check status of job deletions
/api/crawl/jobs/{job_id}/pages
Get all pages from a crawl job.
| limit | Results per page (default: 100) |
| offset | Pagination offset |
| status_code | Filter by HTTP status (e.g., 200, 404) |
| is_internal | Filter: true/false |
| search | Search in URL or title |
/api/crawl/jobs/{job_id}/pages/{page_id}
Get detailed page data including content, headers, and links.
/api/crawl/jobs/{job_id}/stats
Get crawl statistics and metrics.
/api/crawl/jobs/{job_id}/pages/{page_id}/html
Get full HTML content of page
/api/crawl/jobs/{job_id}/pages/{page_id}/screenshot
Get page screenshot
/api/crawl/jobs/{job_id}/pages/{page_id}/thumbnail
Get page thumbnail
/api/crawl/jobs/{job_id}/screenshots
List all screenshots for crawl
/api/crawl/jobs/{job_id}/pages/{page_id}/screenshot
Delete page screenshot
/api/crawl/jobs/{job_id}/screenshots
Delete all screenshots for crawl
/api/crawl/jobs/{job_id}/broken-links
Get all broken links (4xx, 5xx status codes).
/api/crawl/jobs/{job_id}/redirects
Get redirect chains and final destinations.
/api/crawl/jobs/{job_id}/duplicates
Find duplicate titles, descriptions, and content.
/api/crawl/jobs/{job_id}/seo-issues
Get SEO issues: missing titles, short descriptions, etc.
/api/crawl/jobs/{job_id}/performance
Get page performance metrics (load times, sizes).
/api/crawl/jobs/{job_id}/resources
Get discovered resources (images, CSS, JS).
/api/crawl/jobs/{job_id}/structured-data
Get Schema.org structured data found on pages.
/api/compare
Compare two crawl jobs.
Query params: baseline_job_id, compare_job_id
/api/crawl/jobs/{job_id}/hreflang
Get hreflang tag analysis
/api/crawl/jobs/{job_id}/sitemaps
Get XML sitemaps found during crawl
/api/crawl/jobs/{job_id}/sitemap-coverage
Get sitemap URL coverage analysis
/api/crawl/jobs/{job_id}/pagination
Get pagination pattern analysis
/api/crawl/jobs/{job_id}/overview/stats
Get comprehensive statistics
/api/crawl/jobs/{job_id}/content/thin
Get thin content pages
/api/crawl/jobs/{job_id}/content/headings
Get heading structure analysis
/api/crawl/jobs/{job_id}/links/internal
Get internal link analysis
/api/crawl/jobs/{job_id}/links/external
Get external link analysis
/api/crawl/jobs/{job_id}/links/orphan
Get orphan pages (no inlinks)
/api/crawl/jobs/{job_id}/technical/indexability
Get indexability report
/api/crawl/jobs/{job_id}/technical/robots
Get robots.txt analysis
/api/crawl/jobs/{job_id}/technical/canonicals
Get canonical tag analysis
/api/crawl/jobs/{job_id}/technical/performance
Get performance metrics
/api/crawl/jobs/{job_id}/onpage/titles
Get title tag analysis
/api/crawl/jobs/{job_id}/onpage/descriptions
Get meta description analysis
/api/crawl/jobs/{job_id}/onpage/images
Get image alt text analysis
/api/crawl/jobs/{job_id}/structured-data/summary
Get structured data overview
/api/crawl/jobs/{job_id}/structured-data/errors
Get structured data validation errors
/api/crawl/jobs/{job_id}/international/hreflang
Get international hreflang report
/api/crawl/jobs/{job_id}/issues/all
Get all SEO issues with filtering
/api/crawl/jobs/{job_id}/export/csv
Export as CSV.
Query param: scope (all, internal, external, errors)
/api/crawl/jobs/{job_id}/export/json
Export as JSON.
Query param: scope (all, internal, external, errors)
/api/crawl/jobs/{job_id}/export/sitemap
Export as XML Sitemap.
Content audit workflow with page-level decisions and task assignments.
/api/crawl/jobs/{job_id}/audit/stats
Get audit statistics
/api/crawl/jobs/{job_id}/audit/pages
Get auditable pages
/api/crawl/jobs/{job_id}/audit/page/{page_id}
Get audit details for page
/api/crawl/jobs/{job_id}/audit/decision
Submit audit decision
/api/crawl/jobs/{job_id}/audit/decisions
Get all audit decisions
/api/crawl/jobs/{job_id}/audit/task/{page_id}
Get audit task for page
/api/crawl/jobs/{job_id}/audit/task/{page_id}
Update audit task
/api/crawl/jobs/{job_id}/audit/export
Export audit results
/api/node-task/{node_id}
Get task for sitemap node
/api/node-task/{node_id}
Update sitemap node task
/api/node-task/bulk
Bulk update node tasks
/api/crawl/jobs/{job_id}/suggestions
Get improvement suggestions
/api/crawl/jobs/{job_id}/suggestions/generate
Generate new suggestions
/api/crawl/jobs/{job_id}/suggestions/{suggestion_id}/apply
Apply suggestion
/api/crawl/jobs/{job_id}/suggestions/{suggestion_id}/dismiss
Dismiss suggestion
/api/shared
Create shared crawl link
/api/shared/{share_token}
Access shared crawl data
Manage visual sitemaps for planning and organizing website structure.
/api/sitemaps
List all visual sitemaps in a workspace.
/api/sitemaps
Create a new visual sitemap. Body: { name, workspace_id }
/api/sitemaps/{sitemap_id}
Get sitemap details including all nodes and hierarchy.
/api/sitemaps/{sitemap_id}
Update sitemap properties.
/api/sitemaps/{sitemap_id}/nodes
Create a new node. Body: { title, url, parent_node_id, node_type, status }
/api/sitemaps/{sitemap_id}/nodes/{node_id}
Update node properties. Body fields: title, url, path, status, action (keep, update, review, remove, redirect, merge), redirect_to (destination URL when action is redirect), page_type, assigned_to, category_id, tags, menu_ids, meta_description, target_keywords, notes
/api/sitemaps/{sitemap_id}/nodes/{node_id}
Delete a node (children become orphans).
/api/sitemaps/{sitemap_id}/nodes/bulk-position
Update positions of multiple nodes. Body: { positions: [{ id, x, y }] }
/api/sitemaps/{sitemap_id}/export
Export sitemap as JSON, CSV, standalone HTML, or redirect map.
Query params: format (json, csv, html, redirects, redirects_json), theme (light, dark — HTML only), color (blue, purple, green, red, orange, teal, indigo, pink, cyan, amber, lime, slate — HTML only), redirect_code (301 or 302 — redirects formats only, default: 301)
Redirect Map CSV (format=redirects): Downloads a CSV with columns: Source URL, Destination URL, Status Code, Redirect Type, Chain Length, Notes. Aggregates crawl-detected 3xx chains, content audit redirect decisions, sitemap URL changes, and sitemap nodes marked as redirect.
Redirect Map JSON (format=redirects_json): Returns { redirects: [...], summary: { total, crawl_detected, user_decision, url_changed, node_redirect } }
/api/sitemaps/{sitemap_id}
Delete sitemap
/api/sitemaps/{sitemap_id}/nodes/{node_id}/duplicate
Duplicate sitemap node
/api/sitemaps/import
Import sitemap from URL
/api/sitemaps/{sitemap_id}/import
Import nodes into sitemap
/api/sitemaps/{sitemap_id}/changes
List detected changes
/api/sitemaps/{sitemap_id}/detect-changes
Detect changes since last crawl
/api/sitemaps/{sitemap_id}/changes/{change_id}/acknowledge
Acknowledge a change
/api/sitemaps/{sitemap_id}/changes/bulk-acknowledge
Bulk acknowledge changes
/api/sitemaps/{sitemap_id}/users
Get available users for assignment
/api/sitemaps/{sitemap_id}/nodes/{node_id}/found-on
Get pages where node URL was found
/api/sitemaps/{sitemap_id}/nodes/{node_id}/content
Get node content
/api/sitemaps/{sitemap_id}/nodes/{node_id}/seo-data
Get node SEO data
/api/sitemaps/{sitemap_id}/nodes/{node_id}/comments
Get node comments
/api/sitemaps/{sitemap_id}/nodes/{node_id}/comments
Add comment to node
/api/sitemaps/{sitemap_id}/nodes/{node_id}/comments/{comment_id}
Update comment
/api/sitemaps/{sitemap_id}/nodes/{node_id}/comments/{comment_id}
Delete comment
/api/sitemaps/{sitemap_id}/activity
Get sitemap activity log
/api/sitemaps/{sitemap_id}/shares
List sitemap shares
/api/sitemaps/{sitemap_id}/shares
Create sitemap share link
/api/sitemaps/{sitemap_id}/shares/{share_id}
Update share settings
/api/sitemaps/{sitemap_id}/shares/{share_id}
Delete share
/api/sitemaps/shared/{share_token}
Access shared sitemap
/api/sitemaps/{sitemap_id}/nodes/{node_id}/capture-screenshot
Capture screenshot for node
/api/sitemaps/{sitemap_id}/taxonomy
Get sitemap taxonomy
/api/sitemaps/{sitemap_id}/menus
Get sitemap navigation menus
/api/sitemaps/{sitemap_id}/taxonomy/categories
Create category
/api/sitemaps/{sitemap_id}/taxonomy/categories/{term_id}
Update category
/api/sitemaps/{sitemap_id}/taxonomy/categories/{term_id}
Delete category
/api/sitemaps/{sitemap_id}/taxonomy/tags
Create tag
/api/sitemaps/{sitemap_id}/taxonomy/tags/{tag_id}
Update tag
/api/sitemaps/{sitemap_id}/taxonomy/tags/{tag_id}
Delete tag
In-browser content editing with version history and AI assistance.
/api/sitemaps/{sitemap_id}/content-editor/pages
Get pages for content editing
/api/sitemaps/{sitemap_id}/content-editor/{node_id}
Get content for editing
/api/sitemaps/{sitemap_id}/content-editor/{node_id}/save
Save content version
/api/sitemaps/{sitemap_id}/content-editor/{node_id}/versions
Get content version history
/api/sitemaps/{sitemap_id}/content-editor/{node_id}/restore/{version_id}
Restore content version
/api/sitemaps/{sitemap_id}/content-editor/bulk-update
Bulk update content
/api/sitemaps/{sitemap_id}/content-brief
Get content brief
/api/sitemaps/{sitemap_id}/content-brief
Update content brief
/api/sitemaps/{sitemap_id}/content-brief/page/{node_id}
Get page brief override
/api/sitemaps/{sitemap_id}/content-brief/page/{node_id}
Update page brief
/api/sitemaps/{sitemap_id}/content-brief/page/{node_id}
Delete page brief
Conversational AI for sitemap organization with human-in-the-loop approval.
Note: Requires AI provider configured in Admin → AI Usage & Costs. The main chat endpoint uses Server-Sent Events (SSE) for streaming responses.
/api/sitemaps/{sitemap_id}/ai-assistant
Main chat endpoint (SSE stream). Body: { message, conversation_id? }
Events: thinking, text, plan, done, error
/api/sitemaps/{sitemap_id}/ai-assistant/context
Get the context data sent to AI (useful for debugging).
/api/sitemaps/{sitemap_id}/ai-assistant/conversations
List all conversations for this sitemap.
/api/sitemaps/{sitemap_id}/ai-assistant/conversations/{id}
Get a specific conversation with full message history.
/api/sitemaps/{sitemap_id}/ai-assistant/conversations/{id}
Delete a conversation.
/api/sitemaps/{sitemap_id}/ai-assistant/plans
List all plans for this sitemap.
/api/sitemaps/{sitemap_id}/ai-assistant/plans/{plan_id}
Get plan details with operations list.
/api/sitemaps/{sitemap_id}/ai-assistant/plans/{plan_id}/execute
Execute a pending plan. Body: { selected_operations?: [0, 1, 2] }
Omit selected_operations to execute all, or pass indices to cherry-pick.
/api/sitemaps/{sitemap_id}/ai-assistant/plans/{plan_id}/reject
Reject a pending plan.
/api/sitemaps/{sitemap_id}/ai-assistant/plans/{plan_id}/undo
Undo an executed plan (if operations are reversible).
/api/sitemaps/{sitemap_id}/ai-assistant/plans/{plan_id}
Modify a pending plan (add/remove/update operations).
/api/sitemaps/{sitemap_id}/ai-assistant/quick-actions/analyze
Get AI analysis of sitemap structure and suggestions.
/api/sitemaps/{sitemap_id}/ai-assistant/quick-actions/suggest-structure
Get AI suggestions for improving site structure.
/api/sitemaps/{sitemap_id}/ai-assistant/plans/scheduled
Get scheduled plans
/api/sitemaps/{sitemap_id}/ai-assistant/plans/{plan_id}/schedule
Schedule plan execution
/api/sitemaps/{sitemap_id}/ai-assistant/plans/{plan_id}/schedule
Cancel scheduled plan
/api/sitemaps/{sitemap_id}/ai-assistant/notifications
Get AI notifications
/api/sitemaps/{sitemap_id}/ai-assistant/notifications/{notification_id}/read
Mark notification as read
/api/sitemaps/{sitemap_id}/ai-assistant/notifications/mark-all-read
Mark all notifications as read
/api/sitemaps/{sitemap_id}/ai-assistant/activity
Get AI activity log
/api/sitemaps/{sitemap_id}/ai-assistant/seo-audit
Run AI-powered SEO audit
/api/sitemaps/{sitemap_id}/ai-assistant/seo-audit/history
Get SEO audit history
/api/sitemaps/{sitemap_id}/ai-assistant/templates
List content templates
/api/sitemaps/{sitemap_id}/ai-assistant/templates/{template_id}
Get template details
/api/sitemaps/{sitemap_id}/ai-assistant/templates
Create content template
/api/sitemaps/{sitemap_id}/ai-assistant/templates/{template_id}/apply
Apply template to nodes
/api/sitemaps/{sitemap_id}/ai-assistant/analytics/config
Get analytics integration config
/api/sitemaps/{sitemap_id}/ai-assistant/analytics/low-performing
Get low-performing pages
/api/sitemaps/{sitemap_id}/ai-assistant/analytics/pages
Get page analytics data
/api/sitemaps/{sitemap_id}/ai-assistant/suggestions
Get AI improvement suggestions
/api/sitemaps/{sitemap_id}/ai-assistant/suggestions/{suggestion_id}/dismiss
Dismiss suggestion
/api/sitemaps/{sitemap_id}/ai-assistant/suggestions/{suggestion_id}/accept
Accept and apply suggestion
/api/crawl/jobs/{job_id}/inventory
Get content inventory for a crawl job.
Query params: page, per_page, content_type, status_code, search
/api/crawl/jobs/{job_id}/inventory/stats
Get inventory statistics (counts by type, status).
/api/crawl/jobs/{job_id}/inventory/sync
Sync inventory from crawl data (rebuilds inventory table).
Build and manage content taxonomies with AI assistance.
/api/crawl/jobs/{job_id}/taxonomy/terms
Get all taxonomy terms. Query: status, search, page, limit.
/api/crawl/jobs/{job_id}/taxonomy/terms
Create a new term. Body: preferred_label, definition, status.
/api/crawl/jobs/{job_id}/taxonomy/validate
Validate taxonomy and return issues (missing definitions, duplicates, etc.).
/api/crawl/jobs/{job_id}/taxonomy/ai/extract-terms
Extract terms from crawled content (titles, headings, navigation, URLs).
/api/crawl/jobs/{job_id}/taxonomy/ai/import-extracted
Import selected extracted terms into taxonomy.
/api/crawl/jobs/{job_id}/taxonomy/ai/hierarchy-tree
Get taxonomy as a nested tree structure.
/api/crawl/jobs/{job_id}/taxonomy/ai/suggest-hierarchy
Get AI-suggested parent-child relationships based on term similarity.
/api/crawl/jobs/{job_id}/taxonomy/ai/set-parent
Set or remove parent-child relationship. Body: child_term_id, parent_term_id.
/api/crawl/jobs/{job_id}/taxonomy/ai/auto-classify
Auto-classify pages by matching content against terms. Body: max_pages, min_confidence.
/api/crawl/jobs/{job_id}/taxonomy/ai/pending-classifications
Get classifications pending human review.
/api/crawl/jobs/{job_id}/taxonomy/ai/review-classification
Approve, reject, or delete a classification. Body: id, action (approve/reject/delete).
/api/crawl/jobs/{job_id}/taxonomy/ai/bulk-review
Bulk approve or reject classifications. Body: ids[], action (approve_all/reject_all).
/api/crawl/jobs/{job_id}/taxonomy/ai/export/json
Export taxonomy as JSON. Query: include_drafts (true/false).
/api/crawl/jobs/{job_id}/taxonomy/ai/export/csv
Export taxonomy as CSV (spreadsheet format).
/api/crawl/jobs/{job_id}/taxonomy/ai/export/skos
Export taxonomy as SKOS RDF/XML (W3C standard).
/api/crawl/jobs/{job_id}/taxonomy/stats
Get taxonomy statistics
/api/crawl/jobs/{job_id}/taxonomy/tree
Get full taxonomy tree
/api/crawl/jobs/{job_id}/taxonomy/term/{term_id}
Update taxonomy term
/api/crawl/jobs/{job_id}/taxonomy/term/{term_id}
Delete taxonomy term
/api/crawl/jobs/{job_id}/taxonomy/move
Move term to new parent
/api/crawl/jobs/{job_id}/taxonomy/export
Export taxonomy (JSON/CSV)
/api/crawl/jobs/{job_id}/taxonomy/import
Import taxonomy from file
/api/crawl/jobs/{job_id}/taxonomy/import-skos
Import SKOS taxonomy
/api/crawl/jobs/{job_id}/taxonomy/changelog
Get taxonomy changelog
/api/crawl/jobs/{job_id}/taxonomy/versions
List taxonomy versions
/api/crawl/jobs/{job_id}/taxonomy/versions
Create taxonomy version snapshot
/api/crawl/jobs/{job_id}/taxonomy/versions/{version_id}
Delete taxonomy version
/api/crawl/jobs/{job_id}/taxonomy/term/{term_id}/history
Get term change history
/api/crawl/jobs/{job_id}/taxonomy/changelog/export
Export changelog
/api/crawl/jobs/{job_id}/taxonomy/governance/stats
Get governance statistics
/api/crawl/jobs/{job_id}/taxonomy/governance/bulk-status
Bulk change term status
/api/crawl/jobs/{job_id}/taxonomy/term/{term_id}/status
Change individual term status
/api/crawl/jobs/{job_id}/taxonomy/tags
List taxonomy tags
/api/crawl/jobs/{job_id}/taxonomy/tag
Create taxonomy tag
/api/crawl/jobs/{job_id}/taxonomy/tag/{tag_id}
Update taxonomy tag
/api/crawl/jobs/{job_id}/taxonomy/tag/{tag_id}
Delete taxonomy tag
/api/crawl/jobs/{job_id}/taxonomy/term/{term_id}/tags
Assign tags to term
/api/crawl/jobs/{job_id}/taxonomy/term/{term_id}/tags
Get tags for term
/api/crawl/jobs/{job_id}/taxonomy/term/{term_id}/relationships
Get term relationships
/api/crawl/jobs/{job_id}/taxonomy/term/{term_id}/synonym
Add term synonym
/api/crawl/jobs/{job_id}/taxonomy/term/{term_id}/related
Add related term
/api/crawl/jobs/{job_id}/taxonomy/term/{term_id}/use-for
Add use-for relationship
/api/crawl/jobs/{job_id}/taxonomy/relationship/{relationship_id}
Delete relationship
/api/crawl/jobs/{job_id}/taxonomy/thesaurus
Get taxonomy thesaurus view
/api/crawl/jobs/{job_id}/taxonomy/ai/detect-industry
AI: Detect site industry
/api/crawl/jobs/{job_id}/taxonomy/ai/bartoc/search
AI: Search BARTOC vocabularies
/api/crawl/jobs/{job_id}/taxonomy/ai/bartoc/preview
AI: Preview BARTOC vocabulary
/api/crawl/jobs/{job_id}/taxonomy/ai/bartoc/import
AI: Import BARTOC vocabulary
/api/crawl/jobs/{job_id}/taxonomy/ai/starter-taxonomies
AI: List starter taxonomies
/api/crawl/jobs/{job_id}/taxonomy/ai/import-starter
AI: Import starter taxonomy
/api/crawl/jobs/{job_id}/taxonomy/ai/standard-vocabularies
AI: List standard vocabularies
/api/crawl/jobs/{job_id}/taxonomy/ai/vocab-types
AI: Get vocabulary types
/api/crawl/jobs/{job_id}/taxonomy/ai/import-standard
AI: Import standard vocabulary
/api/crawl/jobs/{job_id}/taxonomy/ai/import-skos
AI: Import SKOS file
/api/crawl/jobs/{job_id}/taxonomy/ai/classification-stats
AI: Get classification statistics
/api/crawl/jobs/{job_id}/taxonomy/ai/page-classifications
AI: Get page classifications
/api/crawl/jobs/{job_id}/taxonomy/ai/export/stats
AI: Get export statistics
/api/workspaces
List all workspaces for the current user.
/api/workspaces
Create a new workspace.
{
"name": "My Workspace",
"description": "Optional description",
"visibility": "private"
}
/api/workspaces/{workspace_id}
Get workspace details.
/api/workspaces/{workspace_id}
Update workspace name, description, or visibility.
/api/workspaces/{workspace_id}
Delete a workspace (owner only).
/api/workspaces/{workspace_id}/members
List workspace members and their roles.
/api/workspaces/{workspace_id}/invite
Invite a user to the workspace.
{
"email": "[email protected]",
"role": "member"
}
/api/schedules
List all scheduled jobs.
/api/schedules
Create a new schedule.
{
"name": "Daily Crawl",
"url": "https://example.com",
"cron_expression": "0 2 * * *",
"timezone": "UTC",
"config": {"max_pages": 100}
}
/api/schedules/{schedule_id}
Update schedule (pause/resume with is_active param).
/api/schedules/{schedule_id}/run
Trigger schedule to run immediately.
/api/schedules/{schedule_id}
Delete a schedule.
/api/schedules/{schedule_id}
Get schedule details
/api/schedules/{schedule_id}/toggle
Toggle schedule active/inactive
/api/schedules/{schedule_id}/history
Get schedule run history
/api/schedules/cleanup-orphans
Clean up orphaned schedule jobs
/api/reports
List generated reports.
/api/reports/generate
Generate a new report.
{
"job_id": 1,
"report_type": "summary", // summary, detailed, audit
"output_format": "html" // html, pdf, csv, json, xlsx
}
/api/reports/{report_id}/download
Download a generated report as HTML or PDF file.
Returns: Streaming file download with Content-Disposition header
/api/auth/register
Register a new user and organization.
{
"email": "[email protected]",
"password": "securepassword",
"org_name": "My Company",
"name": "John Doe"
}
/api/auth/login
Login and receive JWT token.
{
"email": "[email protected]",
"password": "securepassword"
}
{
"access_token": "eyJhbG...",
"user": {"id": 1, "email": "...", "name": "..."}
}
/api/auth/me
Get current user info. Requires Authorization header with JWT token.
Header: Authorization: Bearer <token>
/api/user/preferences
Get current user's theme preferences.
{
"theme_color": "blue",
"text_size": 100,
"theme_mode": "light"
}
/api/user/preferences
Save user's theme preferences.
{
"theme_color": "purple", // blue, purple, green, red, orange, teal, indigo, pink, cyan, amber, lime, slate
"text_size": 110, // 80-120
"theme_mode": "dark" // light, dark, system
}
/api/credentials
List saved credentials.
/api/credentials
{
"name": "Staging Server",
"domain_pattern": "staging.example.com",
"username": "admin",
"password": "secret"
}
/api/credentials/{cred_id}
Get a specific credential by ID.
/api/credentials/{cred_id}
Delete a saved credential.
/api/form-auth
List form authentication configs.
/api/form-auth
{
"name": "Admin Login",
"login_url": "https://example.com/login",
"username_field": "email",
"password_field": "password",
"username": "[email protected]",
"password": "secret"
}
/api/form-auth/{config_id}
Delete a form authentication configuration.
Create CSS, XPath, or regex rules to extract structured data from crawled pages. Attach rules to crawls via extraction_rule_ids in the crawl config.
/api/extraction-rules
List all extraction rules. Returns { rules: [...], count }.
/api/extraction-rules/{rule_id}
Get a single extraction rule by ID.
/api/extraction-rules
Create an extraction rule.
{
"name": "Product Prices",
"rule_type": "css", // css, xpath, regex
"selector": ".product-price",
"target": "text", // text, html, attribute, count
"target_attribute": null, // required when target = "attribute"
"description": "Extract prices from product pages",
"required": false, // fail page if extraction fails?
"default_value": null, // fallback if extraction fails
"match_all": false, // all matches or first only
"regex_group": 0 // capture group for regex rules
}
/api/extraction-rules/{rule_id}
Update an extraction rule. Send only the fields you want to change.
/api/extraction-rules/{rule_id}
Delete an extraction rule.
/api/extraction-rules/test
Test an extraction rule against a live URL without saving it.
{
"url": "https://example.com/product/123",
"rule": {
"rule_type": "css",
"selector": ".product-price",
"target": "text"
}
}
/api/crawl/jobs/{job_id}/extracted-data
Retrieve data extracted by custom rules during a crawl. Returns results grouped by field name with success/failure counts.
Query Parameters:
field_name — Filter by specific field (optional)limit — Max results, default 100, max 500offset — Pagination offset, default 0// Response
{
"data": [
{
"page_url": "https://example.com/product/1",
"rule_name": "Product Prices",
"field_name": "price",
"extracted_value": "$29.99",
"extraction_success": true
}
],
"count": 42,
"by_field": {
"price": { "count": 42, "success": 40 },
"title": { "count": 42, "success": 42 }
}
}
/api/org/users
List users in organization.
/api/org/users/invite
{
"email": "[email protected]",
"role": "member" // owner, admin, member, viewer
}
/api/org/invitations
List pending invitations.
/api/org/users/{user_id}
Remove user from organization.
/api/org/users/{user_id}/roles
Update user roles.
/api/org/invitations/{invite_id}
Revoke invitation.
/api/org/invitations/{invite_id}/resend
Resend invitation email.
/api/org/leave
Leave organization.
/api/org/seats
Get seat usage and limits.
/api/org/seats/add
Add paid seats.
/api/org/seats/remove
Remove paid seats.
/api/invitations/{token}/info
Get invitation details.
/api/invitations/pending
Get pending invitations for user.
/api/invitations/{token}/accept
Accept invitation.
/api/invitations/{token}/decline
Decline invitation.
All responses use a consistent format:
{
"success": true | false,
"data": { ... } | null,
"error": null | {
"code": "ERROR_CODE",
"message": "Human-readable message",
"details": { ... },
"suggestions": ["Try this", "Or this"],
"retryable": true | false,
"retry_strategy": {
"retry_after_seconds": 30,
"max_retries": 3,
"backoff_multiplier": 2.0,
"retry_schedule": [30, 60, 120]
}
},
"meta": {
"request_id": "uuid",
"timestamp": "2026-01-14T00:00:00Z",
"api_version": "10.1.0"
},
"_actions": {
"next_action": {"method": "POST", "href": "/api/..."}
}
}
| Code | Error Code | Description |
|---|---|---|
| 200 | - | Success |
| 201 | - | Created |
| 400 | BAD_REQUEST | Invalid parameters |
| 401 | UNAUTHORIZED | Missing or invalid token |
| 403 | FORBIDDEN | Insufficient permissions |
| 404 | NOT_FOUND | Resource doesn't exist |
| 422 | VALIDATION_ERROR | Request validation failed |
| 429 | RATE_LIMITED | Too many requests (retryable) |
| 500 | INTERNAL_ERROR | Unexpected server error |
| Header | Description |
|---|---|
| X-RateLimit-Limit | Maximum requests per minute (60) |
| X-RateLimit-Remaining | Requests remaining in window |
| X-RateLimit-Reset | Unix timestamp when window resets |
| Retry-After | Seconds to wait (on 429 response) |
When error.retryable is true, use the provided retry strategy:
"retry_strategy": {
"retry_after_seconds": 30, // Initial wait
"max_retries": 3, // Don't retry more than this
"backoff_multiplier": 2.0, // Double wait each retry
"retry_schedule": [30, 60, 120] // Pre-calculated waits
}
Advanced authentication including MFA, OAuth, and account management.
/api/auth/register
Register new user account.
/api/auth/logout
Logout and invalidate session.
/api/auth/verify-email
Verify email address via token.
/api/auth/forgot-password
Request password reset email.
/api/auth/reset-password
Reset password with token.
/api/auth/oauth/{provider}
Redirect to OAuth provider (google, github).
/api/auth/oauth/google/callback
Google OAuth callback.
/api/auth/oauth/github/callback
GitHub OAuth callback.
/api/auth/profile
Update user profile.
/api/auth/password
Change current password.
/api/auth/api-key
Create new API key.
/api/auth/notifications
Get notification preferences.
/api/auth/notifications
Update notification preferences.
/api/auth/mfa/status
Get MFA enrollment status.
/api/auth/mfa/setup
Start MFA setup (generates QR code).
/api/auth/mfa/verify-setup
Complete MFA setup with TOTP code.
/api/auth/mfa
Disable MFA.
/api/auth/mfa/verify
Verify MFA code during login.
/api/auth/mfa/email-code
Send MFA verification code via email.
/api/auth/mfa/backup-codes/regenerate
Regenerate backup codes.
/api/auth/mfa/org-enforcement
Set organization MFA enforcement policy.
/api/auth/mfa/trusted-devices
List trusted devices.
/api/auth/mfa/trusted-devices/{device_id}
Remove trusted device.
/api/auth/account
Delete user account.
Single Sign-On configuration for team and enterprise plans.
/api/sso/providers
List available SSO providers.
/api/sso/connections
List SSO connections for organization.
/api/sso/connections
Create new SSO connection.
/api/sso/connections/{connection_id}
Get SSO connection details.
/api/sso/connections/{connection_id}
Update SSO connection.
/api/sso/connections/{connection_id}
Delete SSO connection.
/api/sso/connections/{connection_id}/test
Test SSO configuration.
/api/sso/connections/{connection_id}/metadata
Get SAML service provider metadata.
Automated content governance policies, change queue management, and activity logging.
/api/autopilot/{workspace_id}/policy
Get governance policy for workspace.
/api/autopilot/{workspace_id}/policy
Create or update governance policy.
/api/autopilot/{workspace_id}/queue
List queued governance changes.
/api/autopilot/queue/{change_id}/approve
Approve a queued change.
/api/autopilot/queue/{change_id}/reject
Reject a queued change.
/api/autopilot/queue/{change_id}/mark-fixed
Mark change as manually fixed.
/api/autopilot/queue/{change_id}/rollback
Rollback an applied change.
/api/autopilot/batch/{batch_id}/approve
Bulk approve entire batch.
/api/autopilot/batch/{batch_id}/reject
Bulk reject entire batch.
/api/autopilot/batch/{batch_id}/mark-fixed
Bulk mark batch as fixed.
/api/autopilot/batch/{batch_id}/rollback
Rollback entire batch.
/api/autopilot/{workspace_id}/activity
Query governance activity log.
/api/autopilot/activity/batch/{batch_id}
Get activity for specific batch.
Endpoints for making a site discoverable to AI search engines — both Google's AI Overviews / AI Mode (which lean on traditional SEO signals: crawl access, content quality, page experience) and non-Google AI engines (ChatGPT, Claude, Perplexity, et al., which crawl and rank independently). Two engine faces appear throughout the AEO surface: Google AI Search readiness and Other AI Engines readiness. Each face has its own 0–100 score, its own measurable layers, and its own remediation surface.
Below: shared-infrastructure endpoints (common healthcheck and DNS-TXT domain verification — the precondition for future write-back actions like auto-PR llms.txt and autopilot AEO fixes); per-tab healthchecks gated on per-workspace feature flags (six tabs: overview, access, files, content, grounding, visibility); three Tab 1 Overview endpoints (/sub-scores, /priorities, /trend) shipped in M2; and the four Tab 2 Access endpoints (the matrix /{crawl_id}/matrix shipped in M3.WU 3.3; the robots-analysis /{crawl_id}/robots-analysis shipped in M3.WU 3.4; the view-as detail /page/{page_id}/view-as/{agent_name} shipped in M3.WU 3.6 paired with 3.4; the JS rendering probe POST /{crawl_id}/probe-page shipped in M3.WU 3.5). Tabs 3–6 remain healthcheck-only stubs; their analytical endpoints ship in M3+/M4+. Through M2 the IssueType producers are stubbed — the Tab 1 Overview surface renders correctly but with sentinel values (sub-scores at 100, recommendations list empty, trend flat at 100); real variation lands as M3+ producers populate seo_issues with aeo_* rows.
/api/aeo/common/healthz
Ungated healthcheck for shared AEO infrastructure. Does not depend on any tab feature flag.
{
"ok": true,
"feature_flag": null
}
/api/aeo/common/domains/{domain}/verify
Mint a verification token for a domain on the authenticated user's workspace. Idempotent — repeated calls for the same (workspace_id, domain) return the same token.
workspace_id (integer, required) — workspace the caller is a member of.{
"success": true,
"data": {
"domain": "example.com",
"token": "<url-safe-base64, 64 chars>",
"instructions": "Add a TXT record at _iato-verify.example.com with the token as the value, then call GET .../verify/status."
}
}
/api/aeo/common/domains/{domain}/verify/status
Check whether the DNS TXT record at _iato-verify.<domain> matches the workspace's minted token. Updates last_checked_at; sets verified_at on first successful match.
workspace_id (integer, required) — workspace the caller is a member of.{
"success": true,
"data": {
"domain": "example.com",
"verified": false,
"verified_at": null,
"last_checked_at": "2026-05-22T10:42:18Z"
}
}
Each AEO tab exposes a single healthcheck gated on a per-workspace feature flag. Disabled tabs return 403 with {"success": false, "error": {"code": "feature_disabled", ...}}; enabled tabs return 200 with {"ok": true, "feature_flag": true}. Tab flags default false in production.
/api/aeo/{tab}/healthz
tab is one of: overview, access, content, grounding, visibility.
workspace_id (integer, required) — the workspace whose feature-flag state is being probed.{
"ok": true,
"feature_flag": true
}
Three endpoints power the Tab 1 Overview surface (gauges, sub-score grid, top-5 recommendations panel, score trend chart). All three are gated on the aeo.overview tab flag and workspace ownership of the crawl. {crawl_id} in each URL is the UUID crawl_jobs.job_id (matching the dashboard's pervasive /api/crawl/jobs/{job_id}/... convention); each response echoes the resolved numeric crawl_jobs.id as data.crawl_id for diagnostic readability — UUID in, numeric out. Through M2 the issue-producers are stubbed, so sub-scores return total: 100 across the board, the recommendations list is empty, and the trend chart is a flat line at 100; real variation lands as M3+ producers populate seo_issues with aeo_* rows.
/api/aeo/overview/{crawl_id}/sub-scores
Per-face per-layer AEO sub-scores for a single crawl. Returns both engine faces (Google AI Search readiness + Other AI Engines readiness) with a clamped 0–100 total plus a six-key by_layer issue-count breakdown (Access / Discovery / Capability / Content / Tokens / Visibility). Reads live seo_issues rows for the crawl and aggregates per-face per-layer. workspace_id is derived server-side from crawl_jobs.workspace_id — the endpoint accepts no workspace_id query parameter.
Each face also carries measurable_layers — the subset of layers that have at least one IssueType mapped to this face server-side. Layers absent from this list are "not measured for this face" (rendered as gray on the sub-score grid); layers present in the list with a zero by_layer count are "awaiting analysis" until producers run (rendered as amber). Through M2 the Google face's measurable_layers is ["Content", "Tokens"] and the Other face's is all six layers, reflecting the locked engine-face partition.
{
"success": true,
"data": {
"crawl_id": 11237,
"workspace_id": 17,
"google": {
"total": 100,
"by_layer": {
"Access": 0, "Discovery": 0, "Capability": 0,
"Content": 0, "Tokens": 0, "Visibility": 0
},
"measurable_layers": ["Content", "Tokens"]
},
"other": {
"total": 100,
"by_layer": {
"Access": 0, "Discovery": 0, "Capability": 0,
"Content": 0, "Tokens": 0, "Visibility": 0
},
"measurable_layers": [
"Access", "Discovery", "Capability",
"Content", "Tokens", "Visibility"
]
}
}
}
401 unauthorized — API key missing or invalid (auth middleware).404 crawl_not_found — crawl_id (UUID) does not match any crawl_jobs row.403 not_workspace_member — caller is not a member of the crawl's owning workspace.403 feature_disabled — aeo.overview flag is off for the workspace./api/aeo/overview/{crawl_id}/priorities
Up to five prioritized AEO recommendations for the crawl, ordered by priority (high → medium → low) then effort (low → medium → high). Reads ai_suggestions filtered to suggestion_type LIKE 'aeo_%' and status = 'pending'. Through M2 with producers stubbed, the array is empty — the Top-5 panel renders the locked empty-state copy ("No recommendations yet — AEO issues will surface here as they're detected."). Real recommendations land as M3+ producers populate seo_issues with aeo_* IssueType rows and the suggestion generator emits matching ai_suggestions rows.
Each recommendation carries applies_to engine-face attribution (["google"], ["other"], or ["google", "other"]) and layer (one of the six AEO layers, or null for unknown IssueTypes). Both fields are derived server-side at read time from the static _LAYER_MAPPING partition — not persisted in the database. auto_fixable defaults false for all aeo_* rows in M2; per-type tuning is M4+ work.
{
"success": true,
"data": {
"crawl_id": 11237,
"workspace_id": 17,
"recommendations": [
{
"id": 4821,
"suggestion_type": "aeo_perplexity_blocked",
"issue_type": "aeo_perplexity_blocked",
"title": "Perplexity bot is blocked",
"description": "Improves visibility to non-Google AI engines that surface citations from Perplexity's index.",
"priority": "high",
"effort": "medium",
"applies_to": ["other"],
"layer": "Access",
"auto_fixable": false,
"affected_count": 1,
"created_at": "2026-05-26T09:47:25Z"
}
]
}
}
401 unauthorized — API key missing or invalid (auth middleware).404 crawl_not_found — crawl_id (UUID) does not match any crawl_jobs row.403 not_workspace_member — caller is not a member of the crawl's owning workspace.403 feature_disabled — aeo.overview flag is off for the workspace./api/aeo/overview/{crawl_id}/trend
Per-face time-series of completed crawls' total AEO scores within a backward time window in the same workspace as the URL crawl. Powers the score trend chart on the Tab 1 Overview surface. Both face arrays (data.google and data.other) are aligned by index (one entry per scored crawl in the window, repeated across both faces) and ordered ASC by x (the score's computed_at timestamp). Through M2 every scored crawl returns 100 for both faces, so the chart renders a flat line at 100; M3+ producers populate real variation naturally.
days (integer, optional, default 30, range 1–365) — backward time window for the trend in days. Values outside the 1–365 range return 400 invalid_window.crawl_id semanticsThe response carries two crawl_id fields at different tree levels with different meanings — both intentional. data.crawl_id (top level) echoes the resolved numeric crawl_jobs.id from the URL UUID and identifies the crawl the operator is currently viewing. Each point's crawl_id inside data.google[] / data.other[] is the crawl_jobs.id of the crawl that produced THAT specific score — typically a different value, since the trend includes prior crawls of the same workspace.
{
"success": true,
"data": {
"crawl_id": 11237,
"workspace_id": 17,
"window_days": 30,
"google": [
{ "x": "2026-05-08T12:34:56Z", "y": 100, "crawl_id": 11189 },
{ "x": "2026-05-16T09:15:22Z", "y": 100, "crawl_id": 11214 },
{ "x": "2026-05-26T10:02:11Z", "y": 100, "crawl_id": 11237 }
],
"other": [
{ "x": "2026-05-08T12:34:56Z", "y": 100, "crawl_id": 11189 },
{ "x": "2026-05-16T09:15:22Z", "y": 100, "crawl_id": 11214 },
{ "x": "2026-05-26T10:02:11Z", "y": 100, "crawl_id": 11237 }
]
}
}
400 invalid_window — days is less than 1 or greater than 365. Fires before workspace-ownership lookup.401 unauthorized — API key missing or invalid (auth middleware).404 crawl_not_found — crawl_id (UUID) does not match any crawl_jobs row.403 not_workspace_member — caller is not a member of the crawl's owning workspace.403 feature_disabled — aeo.overview flag is off for the workspace.Four endpoints power the Tab 2 Access surface. The matrix (M3.WU 3.3) is the per-page × per-agent grid read from page_agent_access. The robots-analysis (M3.WU 3.4) is the per-crawl per-agent robots.txt decision surface, with regression diff against the immediately-prior crawl in the same workspace and domain. The view-as endpoint (M3.WU 3.6) is the per-(page, agent) detail read by the side-sliding panel that opens when you click a matrix cell. The probe-page endpoint (M3.WU 3.5) is the opt-in JS rendering test — runs an HTTP-only fetch and a Playwright render side-by-side and returns a byte-diff delta with top-N CSR-only sentences, surfacing content AI crawlers can't see without JS execution. All four are gated on the aeo.access tab flag and workspace ownership of the crawl. {crawl_id} in URLs is the UUID crawl_jobs.job_id (same convention as the Tab 1 endpoints); responses echo the resolved numeric crawl_jobs.id as data.crawl_id for diagnostic readability — UUID in, numeric out. {page_id} in the view-as URL is the integer pages.id (pages use numeric IDs throughout). Cells missing from page_agent_access for a given (page, agent) pair are synthesized server-side as {"status": "no_data"} — they are never persisted.
/api/aeo/access/{crawl_id}/matrix
Multi-agent access matrix for a single crawl. Rows are pages crawled, columns are AI agents probed, cells are status badges keyed by the page_agent_access.status ENUM. The matrix surfaces how each of nine canonical AI crawler user-agents accessed your pages during the multi-UA probe pass.
agents (string, optional) — comma-separated UA preset names to include as columns. Default: all nine presets in canonical source order (GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, anthropic-ai, Applebot-Extended, cohere-ai). Unknown names are dropped silently; an all-unknown filter falls back to the full nine.limit (integer, optional, default 50, range 1–200) — page-axis page size. Pagination is offset-based on the page axis; each pages[] entry is one page, with its cells inline (cells are not paginated separately).offset (integer, optional, default 0, range ≥0) — page-axis offset.Cell status mirrors the page_agent_access.status ENUM plus the synthesized no_data:
ok — agent fetched the page successfully (2xx).blocked — access refused at HTTP layer (401 / 451).partial — partial-content response (206).http_403 / http_429 — explicit HTTP-status badges for 403 (often AI-bot-block) and 429 (rate-limited).timeout — no response within budget.error — connection / parse / other failure.no_data — synthesized; this agent did not probe this page (no page_agent_access row exists).Three of the nine presets — Google-Extended, anthropic-ai, Applebot-Extended — are robots.txt-only tokens AI vendors publish so site operators can gate training-data ingestion separately from search/answer crawlers. No real HTTP crawler ships these exact UAs. The matrix probes them with synthetic compatible; <token>/1.0-style UA strings so server logs can branch on the token name, but the HTTP results do not reflect what real AI vendors see when they ingest the page. The response identifies these three in the data.token_only_agents array; the dashboard UI renders them in italics with a "robots.txt only" subtitle to make the caveat visible. Real per-token robots.txt semantics ship in a later release.
{
"success": true,
"data": {
"crawl_id": 11262,
"workspace_id": 17,
"agents": [
"GPTBot", "OAI-SearchBot", "ClaudeBot", "PerplexityBot",
"Google-Extended", "Bytespider", "anthropic-ai",
"Applebot-Extended", "cohere-ai"
],
"token_only_agents": ["Applebot-Extended", "Google-Extended", "anthropic-ai"],
"pages": [
{
"page_id": 220605,
"url": "https://example.com/",
"cells": {
"GPTBot": { "status": "ok",
"http_status": 200,
"response_time_ms": 27,
"blocked_by": null,
"fetched_at": "2026-05-29T00:17:25Z" },
"OAI-SearchBot": { "status": "no_data" },
"ClaudeBot": { "status": "no_data" },
"PerplexityBot": { "status": "no_data" },
"Google-Extended": { "status": "ok",
"http_status": 200,
"response_time_ms": 19,
"blocked_by": null,
"fetched_at": "2026-05-29T00:17:25Z" },
"Bytespider": { "status": "no_data" },
"anthropic-ai": { "status": "no_data" },
"Applebot-Extended": { "status": "no_data" },
"cohere-ai": { "status": "no_data" }
}
}
],
"pagination": { "limit": 50, "offset": 0, "total_pages": 1 }
}
}
401 UNAUTHORIZED — API key missing or invalid (auth middleware).404 crawl_not_found — crawl_id (UUID) does not match any crawl_jobs row.403 not_workspace_member — caller is not a member of the crawl's owning workspace. Ownership check runs before the flag gate, so this surfaces even on workspaces where aeo.access is enabled.403 feature_disabled — aeo.access flag is off for the workspace. Contact your workspace owner to enable./api/aeo/access/{crawl_id}/robots-analysis
Per-crawl per-agent robots.txt decisions for the nine canonical AI agents, with a regression diff against the immediately-prior crawl in the same workspace for the same domain. Surfaces what each AI agent is allowed or disallowed to crawl per the site's published robots.txt, and flags any agent that lost access since the prior crawl. The domain is derived from crawl_jobs.url via urlparse(...).netloc; robots.txt content is read from the crawler's robots_txt_cache.
Each entry in data.per_agent is keyed by the canonical preset name and carries a triple:
current ("allow" | "disallow") — the agent's access decision per the crawl's robots.txt, evaluated at the site root.prior ("allow" | "disallow" | null) — the agent's decision per the prior crawl's robots.txt. null when no prior crawl exists for this workspace + domain.regression (boolean) — true only when prior == "allow" AND current == "disallow". The asymmetry is intentional — sites becoming more permissive are not regressions, and the field exists to surface AI-bot blocking that wasn't there before.For the three robots.txt-only tokens (Google-Extended, anthropic-ai, Applebot-Extended) this endpoint is the source of truth for "is this AI vendor blocked." The matrix endpoint's synthetic HTTP probe rows for these tokens reflect what a probe with that user-agent sees; the robots-analysis output reflects what the AI vendors themselves see when reading robots.txt to decide whether to crawl. Token-only UAs are identified in data.token_only_agents for the frontend to render with the appropriate caveat.
{
"success": true,
"data": {
"crawl_id": 11262,
"workspace_id": 17,
"domain": "example.com",
"agents": [
"GPTBot", "OAI-SearchBot", "ClaudeBot", "PerplexityBot",
"Google-Extended", "Bytespider", "anthropic-ai",
"Applebot-Extended", "cohere-ai"
],
"token_only_agents": ["Applebot-Extended", "Google-Extended", "anthropic-ai"],
"robots_txt_present": false,
"prior_crawl_id": null,
"prior_robots_txt_present": false,
"per_agent": {
"GPTBot": { "current": "allow", "prior": null, "regression": false },
"OAI-SearchBot": { "current": "allow", "prior": null, "regression": false },
"ClaudeBot": { "current": "allow", "prior": null, "regression": false },
"PerplexityBot": { "current": "allow", "prior": null, "regression": false },
"Google-Extended": { "current": "allow", "prior": null, "regression": false },
"Bytespider": { "current": "allow", "prior": null, "regression": false },
"anthropic-ai": { "current": "allow", "prior": null, "regression": false },
"Applebot-Extended": { "current": "allow", "prior": null, "regression": false },
"cohere-ai": { "current": "allow", "prior": null, "regression": false }
}
}
}
When no robots.txt is cached for the crawl's domain, every agent's current defaults to "allow" per RFC 9309 §2.2.1 / urllib.robotparser permissive semantics. robots_txt_present and prior_robots_txt_present let the UI render the appropriate state pane.
401 UNAUTHORIZED — API key missing or invalid (auth middleware).404 crawl_not_found — crawl_id (UUID) does not match any crawl_jobs row.403 not_workspace_member — caller is not a member of the crawl's owning workspace. Ownership check runs before the flag gate, so this surfaces even on workspaces where aeo.access is enabled.403 feature_disabled — aeo.access flag is off for the workspace./api/aeo/access/page/{page_id}/view-as/{agent_name}
Per-(page, agent) detail powering the side-sliding view-as panel in the matrix UI. Surfaces the single agent's probe outcome for the single page (status / HTTP code / response time / response size / fetched-at) plus a 2 KB excerpt of the page's HTML and text content as captured during the crawl. Workspace ownership flows page→crawl→workspace: the page's parent pages.job_id is the integer FK to crawl_jobs.id, the parent crawl's UUID is resolved, and the standard resolve_crawl_access ownership-before-flag-gate sequence runs against the resolved workspace.
page_id (integer, ≥1) — numeric pages.id.agent_name (string) — one of the nine canonical UA presets: GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, anthropic-ai, Applebot-Extended, cohere-ai. Unknown names are rejected with 422 unknown_agent before any DB roundtrip.data.access — the page_agent_access row for this (page, agent) pair, or {"status": "no_data"} when no row exists. Same status ENUM as the matrix endpoint plus the synthesized no_data. blocked_by is included in the envelope for forward-compatibility but is always null in current writes (reserved for a future writer-WU).data.content.html — the page's HTML content as captured during the crawl, truncated to 2048 characters.data.content.text — the page's extracted text content, truncated to 2048 characters.data.content.truncated (boolean) — true when either HTML or text exceeded the 2 KB excerpt limit. Lets the UI render a truncation badge.page_content row exists for the page, data.content is {"html": null, "text": null, "truncated": false}.{
"success": true,
"data": {
"page_id": 220605,
"crawl_id": 11262,
"workspace_id": 17,
"agent": "GPTBot",
"access": {
"status": "ok",
"http_status": 200,
"response_time_ms": 27,
"response_bytes": 4096,
"blocked_by": null,
"fetched_at": "2026-05-29T00:17:25Z"
},
"content": {
"html": "",
"text": "Example Domain. This domain is for use in illustrative examples ...",
"truncated": false
}
}
}
401 UNAUTHORIZED — API key missing or invalid (auth middleware).422 unknown_agent — agent_name is not in the nine-preset allowlist. Validated before any DB read.404 page_not_found — no pages row with the given page_id.403 not_workspace_member — caller is not a member of the workspace owning the page's parent crawl. Ownership check runs before the flag gate.403 feature_disabled — aeo.access flag is off for the workspace./api/aeo/access/{crawl_id}/probe-page
Opt-in JS rendering probe for a single page. Runs an HTTP-only fetch (httpx, no JavaScript execution) and a full Playwright render (Chromium, JS-capable) for the supplied URL concurrently, then returns a byte-diff delta — signed char-count difference plus the top-N sentences present only in the Playwright render. Use it to surface content that AI crawlers without JS execution (the common case for HTTP-only crawlers like GPTBot and most archival bots) cannot see on the page. The probe is page-scoped: it runs once per call regardless of which matrix cell triggered it.
Wall-time is capped near 30s by running the two fetches concurrently (the operation surfaces a 30s timeout on each fetcher independently; total wall is the slower of the two). The endpoint is purely additive on the M1 single-UA Playwright flow — the default crawl behavior is not modified.
crawl_id (string, UUID) — the crawl_jobs.job_id UUID. Same shape as /{crawl_id}/matrix and /{crawl_id}/robots-analysis.{
"page_id": 220605,
"url": "https://example.com/"
}
page_id (integer, ≥1) — numeric pages.id; the page must belong to the crawl identified by crawl_id.url (string, 1–2048 chars) — must equal pages.url for the supplied page_id. This is a defense-in-depth check that prevents the endpoint from acting as a general-purpose probe for arbitrary URLs — the client must demonstrate they already discovered the URL via a crawl owned by the requesting user's workspace.data.probe.http_status / data.probe.playwright_status — HTTP status codes from each fetcher (null if the fetcher errored before receiving a response).data.probe.http_fetch_ms / data.probe.playwright_render_ms — wall-clock time per fetcher in milliseconds.data.probe.http_error / data.probe.playwright_error — non-null when that fetcher raised; the other fetcher's results are still returned (failures are partial, not fatal).data.delta.char_count_delta — signed integer (positive when Playwright sees more content than HTTP-only).data.delta.http_char_count / data.delta.playwright_char_count — visible-text character counts per side, after stripping <script> / <style> / <noscript> / <template>.data.delta.csr_only_sentence_count — total count of sentences present in the Playwright render but absent from the HTTP-only fetch (after a 30-char floor to drop nav/menu noise).data.delta.csr_only_sentences — up to 20 of those sentences, each truncated to 200 characters. List order is the order they appear in the Playwright body. Empty array when the page is fully visible to HTTP-only crawlers.{
"success": true,
"data": {
"crawl_id": 11266,
"workspace_id": 17,
"page_id": 220605,
"url": "https://example.com/",
"probe": {
"http_status": 200,
"http_fetch_ms": 25,
"http_error": null,
"playwright_status": 200,
"playwright_render_ms": 835,
"playwright_error": null
},
"delta": {
"char_count_delta": 0,
"http_char_count": 230,
"playwright_char_count": 230,
"csr_only_sentence_count": 0,
"csr_only_sentences": []
}
}
}
The example above shows a static-HTML target (example.com): the HTTP-only fetch and the Playwright render produce identical visible text, so the delta is zero and no CSR-only sentences are reported — that is the contract working correctly, not an error. CSR-heavy targets (e.g., a React SPA without server-side rendering) produce non-zero char_count_delta and a populated csr_only_sentences list.
401 UNAUTHORIZED — API key missing or invalid (auth middleware).404 crawl_not_found — crawl_id (UUID) does not match any crawl_jobs row.403 not_workspace_member — caller is not a member of the crawl's owning workspace. Ownership check runs before the flag gate.403 feature_disabled — aeo.access flag is off for the workspace.422 page_not_in_crawl — page_id does not exist, or exists but belongs to a different crawl. The endpoint cannot probe pages outside the supplied crawl.422 url_mismatch — url in the body does not match pages.url for the supplied page_id. This is the defense-in-depth check that gates the endpoint against arbitrary-URL probing.500 probe_failed — an unhandled exception inside the probe execution (distinct from per-fetcher errors, which surface as non-null http_error / playwright_error in a 200 response).Connect WordPress sites for bidirectional content sync, SEO fixes, and governance.
/api/wordpress/{workspace_id}/connections
List WordPress connections.
/api/wordpress/{workspace_id}/connect
Register new WordPress site.
/api/wordpress/{workspace_id}/connections/{connection_id}
Update connection settings.
/api/wordpress/{workspace_id}/connections/{connection_id}
Disconnect WordPress site.
/api/wordpress/{workspace_id}/connections/{connection_id}/test
Test connectivity.
/api/wordpress/{workspace_id}/test-connection
Test connection before saving.
/api/wordpress/{workspace_id}/connections/{connection_id}/sync-history
Get sync history.
/api/wordpress/sync/pages
Push pages/posts from WordPress.
/api/wordpress/sync/menus
Push navigation menus from WordPress.
/api/wordpress/sync/taxonomy
Push categories/tags from WordPress.
/api/wordpress/{workspace_id}/connections/{connection_id}/pull
Pull content from WordPress.
/api/wordpress/{workspace_id}/connections/{connection_id}/push-navigation
Push navigation to WordPress.
/api/wordpress/{workspace_id}/connections/{connection_id}/push-taxonomy
Push taxonomy to WordPress.
/api/wordpress/{workspace_id}/connections/{connection_id}/push-pages
Push pages to WordPress.
/api/wordpress/{workspace_id}/connections/{connection_id}/test-mcp-tool
Test MCP tool on WordPress.
Subscription management, usage tracking, spending caps, and prepaid credits.
/api/billing/pricing
Get available pricing plans.
/api/subscription/status
Get current subscription details.
/api/subscription/checkout
Create Stripe Checkout session.
/api/subscription/change
Change subscription plan (upgrade/downgrade).
/api/subscription/cancel
Cancel subscription at end of period.
/api/subscription/reactivate
Undo pending cancellation.
/api/billing/invoices
Get Stripe invoices.
/api/billing/portal-session
Open Stripe Customer Portal.
/api/billing/usage
Get current billing period usage.
/api/billing/usage/history
Get usage history across periods.
/api/billing/spending-cap
Get spending cap status.
/api/billing/spending-cap
Set or update spending cap.
/api/billing/credits/bundles
List available credit bundles.
/api/billing/credits/balance
Get credit balance.
/api/billing/credits/purchase
Purchase credits via Stripe.
/api/billing/credits/history
Get credit purchase history.
/api/billing/cost-estimate
Estimate costs for current period.
/api/user/tier
Get current subscription tier.
/api/user/quota
Get usage quota and limits.
/api/user/downgrade-eligibility
Check downgrade eligibility.
Connect Google Analytics and Google Search Console for enriched crawl data.
/api/integrations/{workspace_id}
List integration status.
/api/integrations/{workspace_id}/oauth/url
Generate Google OAuth consent URL.
/api/integrations/oauth/callback
OAuth callback (token exchange).
/api/integrations/{workspace_id}/connect
Save OAuth tokens.
/api/integrations/{workspace_id}/{provider}
Disconnect integration.
/api/integrations/{workspace_id}/analytics
Fetch GA + GSC data for URLs.
/api/integrations/{workspace_id}/google/properties
List GA4 properties.
/api/integrations/{workspace_id}/google/sites
List Search Console sites.
/api/integrations/{workspace_id}/select-property
Save selected GA/GSC property.