Skip to content

Commit 6fc2c3c

Browse files
authored
docs: document how to import the default config (#1392)
Signed-off-by: Xe Iaso <me@xeiaso.net>
1 parent 149e864 commit 6fc2c3c

2 files changed

Lines changed: 84 additions & 0 deletions

File tree

data/common/acts-like-browser.yaml

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Assert behaviour that only genuine browsers display. This ensures that modern Chrome
2+
# or Firefox versions will get through without a challenge.
3+
#
4+
# These rules have been known to be bypassed by some of the worst automated scrapers.
5+
# Use at your own risk.
6+
7+
- name: realistic-browser-catchall
8+
expression:
9+
all:
10+
- '"User-Agent" in headers'
11+
- '( userAgent.contains("Firefox") ) || ( userAgent.contains("Chrome") ) || ( userAgent.contains("Safari") )'
12+
- '"Accept" in headers'
13+
- '"Sec-Fetch-Dest" in headers'
14+
- '"Sec-Fetch-Mode" in headers'
15+
- '"Sec-Fetch-Site" in headers'
16+
- '"Accept-Encoding" in headers'
17+
- '( headers["Accept-Encoding"].contains("zstd") || headers["Accept-Encoding"].contains("br") )'
18+
- '"Accept-Language" in headers'
19+
action: WEIGH
20+
weight:
21+
adjust: -10
22+
23+
# The Upgrade-Insecure-Requests header is typically sent by browsers, but not always
24+
- name: upgrade-insecure-requests
25+
expression: '"Upgrade-Insecure-Requests" in headers'
26+
action: WEIGH
27+
weight:
28+
adjust: -2
29+
30+
# Chrome should behave like Chrome
31+
- name: chrome-is-proper
32+
expression:
33+
all:
34+
- userAgent.contains("Chrome")
35+
- '"Sec-Ch-Ua" in headers'
36+
- 'headers["Sec-Ch-Ua"].contains("Chromium")'
37+
- '"Sec-Ch-Ua-Mobile" in headers'
38+
- '"Sec-Ch-Ua-Platform" in headers'
39+
action: WEIGH
40+
weight:
41+
adjust: -5
42+
43+
- name: should-have-accept
44+
expression: '!("Accept" in headers)'
45+
action: WEIGH
46+
weight:
47+
adjust: 5
48+
49+
# Generic catchall rule
50+
- name: generic-browser
51+
user_agent_regex: >-
52+
Mozilla|Opera
53+
action: WEIGH
54+
weight:
55+
adjust: 10

docs/docs/admin/configuration/import.mdx

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ bots:
1313
- # This correlates to data/bots/ai-catchall.yaml in the source tree
1414
import: (data)/bots/ai-catchall.yaml
1515
- import: (data)/bots/cloudflare-workers.yaml
16+
# Import all the rules in the default configuration
17+
- import: (data)/meta/default-config.yaml
1618
```
1719
1820
Of note, a bot rule can either have inline bot configuration or import a bot config snippet. You cannot do both in a single bot rule.
@@ -35,6 +37,33 @@ config.BotOrImport: rule definition is invalid, you must set either bot rules or
3537
3638
Paths can either be prefixed with `(data)` to import from the [the data folder in the Anubis source tree](https://github.com/TecharoHQ/anubis/tree/main/data) or anywhere on the filesystem. If you don't have access to the Anubis source tree, check /usr/share/docs/anubis/data or in the tarball you extracted Anubis from.
3739

40+
## Importing the default configuration
41+
42+
If you want to base your configuration off of the default configuration, import `(data)/meta/default-config.yaml`:
43+
44+
```yaml
45+
bots:
46+
- import: (data)/meta/default-config.yaml
47+
# Write your rules here
48+
```
49+
50+
This will keep your configuration up to date as Anubis adapts to emerging threats.
51+
52+
## How do I exempt most modern browsers from Anubis challenges?
53+
54+
If you want to exempt most modern browsers from Anubis challenges, import `(data)/common/acts-like-browser.yaml`:
55+
56+
```yaml
57+
bots:
58+
- import: (data)/meta/default-config.yaml
59+
- import: (data)/common/acts-like-browser.yaml
60+
# Write your rules here
61+
```
62+
63+
These rules will allow traffic that "looks like" it's from a modern copy of Edge, Safari, Chrome, or Firefox. These rules used to be enabled by default, however user reports have suggested that AI scraper bots have adapted to conform to these rules to scrape without regard for the infrastructure they are attacking.
64+
65+
Use these rules at your own risk.
66+
3867
## Importing from imports
3968

4069
You can also import from an imported file in case you want to import an entire folder of rules at once.

0 commit comments

Comments
 (0)