Skip to content

Commit 5419853

Browse files
committed
Add robots.txt and sitemaps reference page
1 parent a7e8459 commit 5419853

File tree

1 file changed

+105
-0
lines changed

1 file changed

+105
-0
lines changed
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
---
2+
title: robots.txt and sitemaps
3+
pcx_content_type: reference
4+
sidebar:
5+
order: 5
6+
---
7+
8+
This page provides general guidance on configuring `robots.txt` and sitemaps for websites you plan to access with Browser Rendering.
9+
10+
## User-Agent
11+
12+
Browser Rendering uses a standard browser User-Agent by default:
13+
14+
```txt
15+
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
16+
```
17+
18+
This means `robots.txt` rules targeting `User-agent: *` will apply to Browser Rendering requests. You can customize the User-Agent using the `userAgent` parameter in your API request.
19+
20+
## Identifying Browser Rendering requests
21+
22+
While Browser Rendering uses a standard browser User-Agent, requests can be identified by the [automatic headers](/browser-rendering/reference/automatic-request-headers/) that Cloudflare attaches:
23+
24+
- `cf-brapi-request-id` — Unique identifier for REST API requests
25+
- `Signature-agent` — Points to Cloudflare's bot verification keys
26+
27+
For Cloudflare security products, Browser Rendering has a bot detection ID of `128292352`. Use this to create WAF rules that allow or block Browser Rendering traffic.
28+
29+
## Best practices for robots.txt
30+
31+
A well-configured `robots.txt` helps crawlers understand which parts of your site they can access.
32+
33+
### Reference your sitemap
34+
35+
Include a reference to your sitemap in `robots.txt` so crawlers can discover your URLs:
36+
37+
```txt title="robots.txt"
38+
User-agent: *
39+
Allow: /
40+
41+
Sitemap: https://example.com/sitemap.xml
42+
```
43+
44+
You can list multiple sitemaps:
45+
46+
```txt title="robots.txt"
47+
User-agent: *
48+
Allow: /
49+
50+
Sitemap: https://example.com/sitemap.xml
51+
Sitemap: https://example.com/blog-sitemap.xml
52+
```
53+
54+
### Set a crawl delay
55+
56+
Use `crawl-delay` to control how frequently crawlers request pages from your server:
57+
58+
```txt title="robots.txt"
59+
User-agent: *
60+
Crawl-delay: 2
61+
Allow: /
62+
63+
Sitemap: https://example.com/sitemap.xml
64+
```
65+
66+
The value is in seconds. A `crawl-delay` of 2 means the crawler waits 2 seconds between requests.
67+
68+
## Best practices for sitemaps
69+
70+
Structure your sitemap to help crawlers process your site efficiently:
71+
72+
```xml title="sitemap.xml"
73+
<?xml version="1.0" encoding="UTF-8"?>
74+
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
75+
<url>
76+
<loc>https://example.com/important-page</loc>
77+
<lastmod>2025-01-15</lastmod>
78+
<priority>1.0</priority>
79+
</url>
80+
<url>
81+
<loc>https://example.com/other-page</loc>
82+
<lastmod>2025-01-10</lastmod>
83+
<priority>0.5</priority>
84+
</url>
85+
</urlset>
86+
```
87+
88+
| Attribute | Purpose | Recommendation |
89+
| ------------ | ----------------------------- | ------------------------------------------------------------------------------ |
90+
| `<loc>` | URL of the page | Required. Use full URLs. |
91+
| `<lastmod>` | Last modification date | Include to help the crawler identify updated content. |
92+
| `<priority>` | Relative importance (0.0-1.0) | Set higher values for important pages. The crawler processes pages in order. |
93+
94+
### Recommendations
95+
96+
- **Include `<lastmod>`** on all URLs to help identify which pages have changed.
97+
- **Set `<priority>`** to control processing order. Pages with higher priority are processed first.
98+
- **Use sitemap index files** for large sites with multiple sitemaps.
99+
- **Compress large sitemaps** using `.gz` format to reduce bandwidth.
100+
- **Keep sitemaps under 50MB** and 50,000 URLs per file (standard sitemap limits).
101+
102+
## Related resources
103+
104+
- [/crawl endpoint](/browser-rendering/rest-api/crawl-endpoint/) — Automate crawling multiple pages
105+
- [FAQ: Will Browser Rendering bypass Cloudflare's Bot Protection?](/browser-rendering/faq/#will-browser-rendering-bypass-cloudflares-bot-protection) — Instructions for creating a WAF skip rule

0 commit comments

Comments
 (0)