|
| 1 | +--- |
| 2 | +title: robots.txt and sitemaps |
| 3 | +pcx_content_type: reference |
| 4 | +sidebar: |
| 5 | + order: 5 |
| 6 | +--- |
| 7 | + |
| 8 | +This page provides general guidance on configuring `robots.txt` and sitemaps for websites you plan to access with Browser Rendering. |
| 9 | + |
| 10 | +## User-Agent |
| 11 | + |
| 12 | +Browser Rendering uses a standard browser User-Agent by default: |
| 13 | + |
| 14 | +```txt |
| 15 | +Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 |
| 16 | +``` |
| 17 | + |
| 18 | +This means `robots.txt` rules targeting `User-agent: *` will apply to Browser Rendering requests. You can customize the User-Agent using the `userAgent` parameter in your API request. |
| 19 | + |
| 20 | +## Identifying Browser Rendering requests |
| 21 | + |
| 22 | +While Browser Rendering uses a standard browser User-Agent, requests can be identified by the [automatic headers](/browser-rendering/reference/automatic-request-headers/) that Cloudflare attaches: |
| 23 | + |
| 24 | +- `cf-brapi-request-id` — Unique identifier for REST API requests |
| 25 | +- `Signature-agent` — Points to Cloudflare's bot verification keys |
| 26 | + |
| 27 | +For Cloudflare security products, Browser Rendering has a bot detection ID of `128292352`. Use this to create WAF rules that allow or block Browser Rendering traffic. |
| 28 | + |
| 29 | +## Best practices for robots.txt |
| 30 | + |
| 31 | +A well-configured `robots.txt` helps crawlers understand which parts of your site they can access. |
| 32 | + |
| 33 | +### Reference your sitemap |
| 34 | + |
| 35 | +Include a reference to your sitemap in `robots.txt` so crawlers can discover your URLs: |
| 36 | + |
| 37 | +```txt title="robots.txt" |
| 38 | +User-agent: * |
| 39 | +Allow: / |
| 40 | +
|
| 41 | +Sitemap: https://example.com/sitemap.xml |
| 42 | +``` |
| 43 | + |
| 44 | +You can list multiple sitemaps: |
| 45 | + |
| 46 | +```txt title="robots.txt" |
| 47 | +User-agent: * |
| 48 | +Allow: / |
| 49 | +
|
| 50 | +Sitemap: https://example.com/sitemap.xml |
| 51 | +Sitemap: https://example.com/blog-sitemap.xml |
| 52 | +``` |
| 53 | + |
| 54 | +### Set a crawl delay |
| 55 | + |
| 56 | +Use `crawl-delay` to control how frequently crawlers request pages from your server: |
| 57 | + |
| 58 | +```txt title="robots.txt" |
| 59 | +User-agent: * |
| 60 | +Crawl-delay: 2 |
| 61 | +Allow: / |
| 62 | +
|
| 63 | +Sitemap: https://example.com/sitemap.xml |
| 64 | +``` |
| 65 | + |
| 66 | +The value is in seconds. A `crawl-delay` of 2 means the crawler waits 2 seconds between requests. |
| 67 | + |
| 68 | +## Best practices for sitemaps |
| 69 | + |
| 70 | +Structure your sitemap to help crawlers process your site efficiently: |
| 71 | + |
| 72 | +```xml title="sitemap.xml" |
| 73 | +<?xml version="1.0" encoding="UTF-8"?> |
| 74 | +<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> |
| 75 | + <url> |
| 76 | + <loc>https://example.com/important-page</loc> |
| 77 | + <lastmod>2025-01-15</lastmod> |
| 78 | + <priority>1.0</priority> |
| 79 | + </url> |
| 80 | + <url> |
| 81 | + <loc>https://example.com/other-page</loc> |
| 82 | + <lastmod>2025-01-10</lastmod> |
| 83 | + <priority>0.5</priority> |
| 84 | + </url> |
| 85 | +</urlset> |
| 86 | +``` |
| 87 | + |
| 88 | +| Attribute | Purpose | Recommendation | |
| 89 | +| ------------ | ----------------------------- | ------------------------------------------------------------------------------ | |
| 90 | +| `<loc>` | URL of the page | Required. Use full URLs. | |
| 91 | +| `<lastmod>` | Last modification date | Include to help the crawler identify updated content. | |
| 92 | +| `<priority>` | Relative importance (0.0-1.0) | Set higher values for important pages. The crawler processes pages in order. | |
| 93 | + |
| 94 | +### Recommendations |
| 95 | + |
| 96 | +- **Include `<lastmod>`** on all URLs to help identify which pages have changed. |
| 97 | +- **Set `<priority>`** to control processing order. Pages with higher priority are processed first. |
| 98 | +- **Use sitemap index files** for large sites with multiple sitemaps. |
| 99 | +- **Compress large sitemaps** using `.gz` format to reduce bandwidth. |
| 100 | +- **Keep sitemaps under 50MB** and 50,000 URLs per file (standard sitemap limits). |
| 101 | + |
| 102 | +## Related resources |
| 103 | + |
| 104 | +- [/crawl endpoint](/browser-rendering/rest-api/crawl-endpoint/) — Automate crawling multiple pages |
| 105 | +- [FAQ: Will Browser Rendering bypass Cloudflare's Bot Protection?](/browser-rendering/faq/#will-browser-rendering-bypass-cloudflares-bot-protection) — Instructions for creating a WAF skip rule |
0 commit comments