Skip to content

Commit 22f198d

Browse files
committed
Add robots.txt and sitemaps reference page
1 parent a7e8459 commit 22f198d

File tree

2 files changed

+220
-0
lines changed

2 files changed

+220
-0
lines changed

agents.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Agent Workflow Rules for cloudflare-docs
2+
3+
## Git Workflow Between Workstreams
4+
5+
### Critical Rules
6+
7+
1. **Always sync with main between workstreams**
8+
- Before starting new work, ensure you're on the latest main branch
9+
- Run `git checkout main && git pull origin main`
10+
11+
2. **Create clean branches for each workstream**
12+
- Each new workstream gets a fresh branch from main
13+
- Branch naming conventions:
14+
- Browser Rendering: `br-<descriptive-name>` (e.g., `br-update-playwright-docs`)
15+
- Zaraz: `zaraz-<descriptive-name>`
16+
- Google Tag Gateway: `gtg-<descriptive-name>`
17+
- General: Use descriptive names for other products
18+
19+
3. **Ensure PRs only contain relevant work**
20+
- Each PR should only include changes from its specific workstream
21+
- No leftover files or changes from previous workstreams
22+
23+
### Standard Workflow
24+
25+
#### Starting a New Workstream
26+
27+
```bash
28+
# 1. Switch to main and update
29+
git checkout main
30+
git pull origin main
31+
32+
# 2. Create new branch for the workstream
33+
git checkout -b <descriptive-branch-name>
34+
```
35+
36+
#### During Work
37+
38+
```bash
39+
# Stage and commit changes as you work
40+
git add <files>
41+
git commit -m "descriptive message"
42+
```
43+
44+
#### Finishing a Workstream
45+
46+
```bash
47+
# 1. Push branch
48+
git push origin <branch-name>
49+
50+
# 2. Create PR (via GitHub UI or CLI)
51+
52+
# 3. After PR is merged, clean up
53+
git checkout main
54+
git pull origin main
55+
git branch -d <branch-name>
56+
```
57+
58+
#### Between Workstreams Checklist
59+
60+
- [ ] Current work is committed and pushed
61+
- [ ] PR is created for current workstream
62+
- [ ] Switched back to main: `git checkout main`
63+
- [ ] Pulled latest changes: `git pull origin main`
64+
- [ ] Ready to create new branch for next workstream
65+
66+
## Cloudflare Docs Specific Rules
67+
68+
### Changelog Locations
69+
70+
1. **Product-specific release notes** (routine updates): `src/content/release-notes/*.yaml`
71+
- Use for: version bumps, bug fixes, minor features
72+
- Example: `src/content/release-notes/browser-rendering.yaml`
73+
74+
2. **Cloudflare-wide changelog** (major announcements): `src/content/changelog/<product>/*.mdx`
75+
- Use for: major features, GA announcements, significant updates
76+
- Example: `src/content/changelog/browser-rendering/`
77+
78+
### Content Guidelines
79+
80+
- Follow all rules in `.windsurf/rules/general-rules.md`
81+
- Use absolute paths for links (e.g., `/1.1.1.1/check/`) not full URLs
82+
- Always include trailing slash for links without anchors
83+
- Import components at top of file below frontmatter
84+
- No contractions, exclamation marks, or non-standard quotes
85+
86+
### Common Components
87+
88+
- `DashButton` - Replace `https://dash.cloudflare.com` in steps
89+
- `APIRequest` - Replace `sh` blocks with API requests
90+
- `FileTree` - Replace `txt` file tree blocks
91+
- `PackageManagers` - Replace `sh` blocks with npm commands
92+
- `TypeScriptExample` - Replace `ts`/`typescript` code blocks (except in tutorials)
93+
94+
## Cross-Reference Reminders
95+
96+
When making changes to one part of the docs, review existing pages to see if there are other locations that should also be updated. Common scenarios:
97+
98+
- **Updating a value** (e.g., version number, ID, limit) — Search the docs for other references
99+
- **Adding new functionality** — Check if related pages (FAQ, tutorials, reference) need updates
100+
- **Changing behavior** — Update any pages that describe the old behavior
101+
102+
Example values that appear in multiple places:
103+
104+
| Value | Source of Truth | Also Referenced In |
105+
| ----- | --------------- | ------------------ |
106+
| Bot detection ID (`128292352`) | `/browser-rendering/reference/automatic-request-headers.mdx` | `/browser-rendering/faq.mdx` |
107+
108+
## Deployment Preference
109+
110+
When building full-stack web apps:
111+
- Use single Cloudflare Worker for frontend + API
112+
- Configure `[assets]` in `wrangler.toml` for static files
113+
- Use Hono for API routes
114+
- Deploy with single `wrangler deploy` command
115+
- Do not use Cloudflare Pages separately
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
---
2+
title: robots.txt and sitemaps
3+
pcx_content_type: reference
4+
sidebar:
5+
order: 5
6+
---
7+
8+
This page provides general guidance on configuring `robots.txt` and sitemaps for websites you plan to access with Browser Rendering.
9+
10+
## User-Agent
11+
12+
Browser Rendering uses a standard browser User-Agent by default:
13+
14+
```txt
15+
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
16+
```
17+
18+
This means `robots.txt` rules targeting `User-agent: *` will apply to Browser Rendering requests. You can customize the User-Agent using the `userAgent` parameter in your API request.
19+
20+
## Identifying Browser Rendering requests
21+
22+
While Browser Rendering uses a standard browser User-Agent, requests can be identified by the [automatic headers](/browser-rendering/reference/automatic-request-headers/) that Cloudflare attaches:
23+
24+
- `cf-brapi-request-id` — Unique identifier for REST API requests
25+
- `Signature-agent` — Points to Cloudflare's bot verification keys
26+
27+
For Cloudflare security products, Browser Rendering has a bot detection ID of `128292352`. Use this to create WAF rules that allow or block Browser Rendering traffic.
28+
29+
## Best practices for robots.txt
30+
31+
A well-configured `robots.txt` helps crawlers understand which parts of your site they can access.
32+
33+
### Reference your sitemap
34+
35+
Include a reference to your sitemap in `robots.txt` so crawlers can discover your URLs:
36+
37+
```txt title="robots.txt"
38+
User-agent: *
39+
Allow: /
40+
41+
Sitemap: https://example.com/sitemap.xml
42+
```
43+
44+
You can list multiple sitemaps:
45+
46+
```txt title="robots.txt"
47+
User-agent: *
48+
Allow: /
49+
50+
Sitemap: https://example.com/sitemap.xml
51+
Sitemap: https://example.com/blog-sitemap.xml
52+
```
53+
54+
### Set a crawl delay
55+
56+
Use `crawl-delay` to control how frequently crawlers request pages from your server:
57+
58+
```txt title="robots.txt"
59+
User-agent: *
60+
Crawl-delay: 2
61+
Allow: /
62+
63+
Sitemap: https://example.com/sitemap.xml
64+
```
65+
66+
The value is in seconds. A `crawl-delay` of 2 means the crawler waits 2 seconds between requests.
67+
68+
## Best practices for sitemaps
69+
70+
Structure your sitemap to help crawlers process your site efficiently:
71+
72+
```xml title="sitemap.xml"
73+
<?xml version="1.0" encoding="UTF-8"?>
74+
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
75+
<url>
76+
<loc>https://example.com/important-page</loc>
77+
<lastmod>2025-01-15</lastmod>
78+
<priority>1.0</priority>
79+
</url>
80+
<url>
81+
<loc>https://example.com/other-page</loc>
82+
<lastmod>2025-01-10</lastmod>
83+
<priority>0.5</priority>
84+
</url>
85+
</urlset>
86+
```
87+
88+
| Attribute | Purpose | Recommendation |
89+
| ------------ | ----------------------------- | ------------------------------------------------------------------------------ |
90+
| `<loc>` | URL of the page | Required. Use full URLs. |
91+
| `<lastmod>` | Last modification date | Include to help the crawler identify updated content. |
92+
| `<priority>` | Relative importance (0.0-1.0) | Set higher values for important pages. The crawler processes pages in order. |
93+
94+
### Recommendations
95+
96+
- **Include `<lastmod>`** on all URLs to help identify which pages have changed.
97+
- **Set `<priority>`** to control processing order. Pages with higher priority are processed first.
98+
- **Use sitemap index files** for large sites with multiple sitemaps.
99+
- **Compress large sitemaps** using `.gz` format to reduce bandwidth.
100+
- **Keep sitemaps under 50MB** and 50,000 URLs per file (standard sitemap limits).
101+
102+
## Related resources
103+
104+
- [/crawl endpoint](/browser-rendering/rest-api/crawl-endpoint/) — Automate crawling multiple pages
105+
- [FAQ: Will Browser Rendering bypass Cloudflare's Bot Protection?](/browser-rendering/faq/#will-browser-rendering-bypass-cloudflares-bot-protection) — Instructions for creating a WAF skip rule

0 commit comments

Comments
 (0)