Skip to content

Commit 87ed8b1

Browse files
CopilotMossaka
andauthored
docs: add blocklist documentation to docs-site (#150)
* Initial plan * docs: add blocklist documentation to docs-site Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>
1 parent 45da48d commit 87ed8b1

4 files changed

Lines changed: 299 additions & 20 deletions

File tree

Lines changed: 263 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,263 @@
1+
---
2+
title: Domain Filtering
3+
description: Control network access with allowlists, blocklists, and wildcard patterns.
4+
---
5+
6+
Control which domains your AI agents can access using allowlists and blocklists. This guide covers all domain filtering options including wildcard patterns and file-based configuration.
7+
8+
## How domain matching works
9+
10+
Domains automatically match all subdomains:
11+
12+
```bash
13+
# Allowing github.com permits:
14+
# ✓ github.com
15+
# ✓ api.github.com
16+
# ✓ raw.githubusercontent.com
17+
# ✗ example.com (not in allowlist)
18+
19+
sudo awf --allow-domains github.com -- curl https://api.github.com
20+
```
21+
22+
:::tip
23+
You don't need to list every subdomain. Adding the base domain covers all subdomains automatically.
24+
:::
25+
26+
## Allowlist options
27+
28+
### Command-line flag
29+
30+
Use `--allow-domains` with a comma-separated list:
31+
32+
```bash
33+
sudo awf --allow-domains github.com,npmjs.org,googleapis.com -- <command>
34+
```
35+
36+
### File-based allowlist
37+
38+
Use `--allow-domains-file` for managing large domain lists:
39+
40+
```bash
41+
# Create a domains file
42+
cat > allowed-domains.txt << 'EOF'
43+
# GitHub domains
44+
github.com
45+
api.github.com
46+
47+
# NPM registry
48+
npmjs.org, registry.npmjs.org
49+
50+
# Wildcard patterns
51+
*.googleapis.com
52+
EOF
53+
54+
# Use the file
55+
sudo awf --allow-domains-file allowed-domains.txt -- <command>
56+
```
57+
58+
**File format:**
59+
- One domain per line or comma-separated
60+
- Comments start with `#` (full line or inline)
61+
- Empty lines are ignored
62+
- Whitespace is trimmed
63+
64+
### Combining methods
65+
66+
You can use both flags together - domains are merged:
67+
68+
```bash
69+
sudo awf \
70+
--allow-domains github.com \
71+
--allow-domains-file my-domains.txt \
72+
-- <command>
73+
```
74+
75+
## Wildcard patterns
76+
77+
Use `*` to match multiple domains:
78+
79+
```bash
80+
# Match any subdomain of github.com
81+
--allow-domains '*.github.com'
82+
83+
# Match api-v1.example.com, api-v2.example.com, etc.
84+
--allow-domains 'api-*.example.com'
85+
86+
# Combine plain domains and wildcards
87+
--allow-domains 'github.com,*.googleapis.com,api-*.example.com'
88+
```
89+
90+
:::caution
91+
Use quotes around patterns to prevent shell expansion of `*`.
92+
:::
93+
94+
**Pattern matching rules:**
95+
96+
| Pattern | Matches | Does Not Match |
97+
|---------|---------|----------------|
98+
| `*.github.com` | `api.github.com`, `raw.github.com` | `github.com` |
99+
| `api-*.example.com` | `api-v1.example.com`, `api-test.example.com` | `api.example.com` |
100+
| `github.com` | `github.com`, `api.github.com` | `notgithub.com` |
101+
102+
**Security restrictions:**
103+
- Overly broad patterns like `*`, `*.*`, or `*.*.*` are rejected
104+
- Patterns are case-insensitive (DNS is case-insensitive)
105+
106+
## Blocklist options
107+
108+
Block specific domains while allowing others. **Blocked domains take precedence over allowed domains.**
109+
110+
### Basic blocklist usage
111+
112+
```bash
113+
# Allow example.com but block internal.example.com
114+
sudo awf \
115+
--allow-domains example.com \
116+
--block-domains internal.example.com \
117+
-- curl https://api.example.com # ✓ allowed
118+
119+
sudo awf \
120+
--allow-domains example.com \
121+
--block-domains internal.example.com \
122+
-- curl https://internal.example.com # ✗ blocked
123+
```
124+
125+
### Blocklist with wildcards
126+
127+
```bash
128+
# Allow all of example.com except internal-* subdomains
129+
sudo awf \
130+
--allow-domains example.com \
131+
--block-domains 'internal-*.example.com' \
132+
-- curl https://api.example.com # ✓ allowed
133+
134+
# Allow broad pattern, block sensitive subdomains
135+
sudo awf \
136+
--allow-domains '*.example.com' \
137+
--block-domains '*.secret.example.com' \
138+
-- curl https://api.example.com # ✓ allowed
139+
```
140+
141+
### File-based blocklist
142+
143+
```bash
144+
# Create a blocklist file
145+
cat > blocked-domains.txt << 'EOF'
146+
# Internal services that should never be accessed
147+
internal.example.com
148+
admin.example.com
149+
150+
# Block all subdomains of sensitive.org
151+
*.sensitive.org
152+
EOF
153+
154+
# Use the blocklist file
155+
sudo awf \
156+
--allow-domains example.com,sensitive.org \
157+
--block-domains-file blocked-domains.txt \
158+
-- <command>
159+
```
160+
161+
### Combining all options
162+
163+
```bash
164+
sudo awf \
165+
--allow-domains github.com \
166+
--allow-domains-file allowed.txt \
167+
--block-domains internal.github.com \
168+
--block-domains-file blocked.txt \
169+
-- <command>
170+
```
171+
172+
## Common use cases
173+
174+
### AI agent with API access
175+
176+
Allow an AI agent to access specific APIs while blocking internal services:
177+
178+
```bash
179+
sudo awf \
180+
--allow-domains 'api.openai.com,*.github.com' \
181+
--block-domains 'internal.github.com,admin.github.com' \
182+
-- npx @github/copilot@latest --prompt "Analyze this code"
183+
```
184+
185+
### CI/CD pipeline restrictions
186+
187+
Restrict network access during builds:
188+
189+
```bash
190+
sudo awf \
191+
--allow-domains npmjs.org,registry.npmjs.org,github.com \
192+
--block-domains-file ci-blocklist.txt \
193+
-- npm install && npm test
194+
```
195+
196+
### MCP server isolation
197+
198+
Test MCP servers with controlled network access:
199+
200+
```bash
201+
sudo awf \
202+
--allow-domains arxiv.org,api.github.com \
203+
-- npx @github/copilot@latest \
204+
--mcp-server ./my-mcp-server.js \
205+
--prompt "Search for papers"
206+
```
207+
208+
## Normalization
209+
210+
Domains are normalized before matching:
211+
212+
- **Case-insensitive**: `GitHub.COM` = `github.com`
213+
- **Whitespace trimmed**: `" github.com "` = `github.com`
214+
- **Trailing dots removed**: `github.com.` = `github.com`
215+
- **Protocols stripped**: `https://github.com` = `github.com`
216+
217+
```bash
218+
# These are all equivalent
219+
--allow-domains github.com
220+
--allow-domains " GitHub.COM. "
221+
--allow-domains "https://github.com"
222+
```
223+
224+
## Debugging domain filtering
225+
226+
### Enable debug logging
227+
228+
See which domains are being allowed or blocked:
229+
230+
```bash
231+
sudo awf \
232+
--allow-domains github.com \
233+
--block-domains internal.github.com \
234+
--log-level debug \
235+
-- <command>
236+
```
237+
238+
### Check Squid logs
239+
240+
View traffic decisions after execution:
241+
242+
```bash
243+
# Find blocked requests
244+
sudo grep "TCP_DENIED" /tmp/squid-logs-*/access.log
245+
246+
# Find allowed requests
247+
sudo grep "TCP_TUNNEL" /tmp/squid-logs-*/access.log
248+
```
249+
250+
### Use the logs command
251+
252+
```bash
253+
# View recent traffic with formatting
254+
awf logs
255+
256+
# Filter to blocked requests only
257+
awf logs --format json | jq 'select(.isAllowed == false)'
258+
```
259+
260+
## See also
261+
262+
- [CLI Reference](/gh-aw-firewall/reference/cli-reference) - Complete option documentation
263+
- [Security Architecture](/gh-aw-firewall/reference/security-architecture) - How filtering works

docs-site/src/content/docs/index.md

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ This project is part of GitHub Next's explorations of [Agentic Workflows](https:
1414
When AI agents like GitHub Copilot CLI run with access to tools and MCP servers, they can make network requests to any domain. This firewall provides **L7 (HTTP/HTTPS) egress control** using domain whitelisting, ensuring agents can only access approved domains while blocking all unauthorized network traffic.
1515

1616
**Key Capabilities:**
17-
- **Domain Whitelisting**: Allow only specific domains (automatically includes subdomains)
17+
- **Domain Allowlist & Blocklist**: Allow specific domains and block exceptions with wildcard pattern support
1818
- **Docker-in-Docker Enforcement**: Spawned containers inherit firewall restrictions
1919
- **Host-Level Protection**: Uses iptables DOCKER-USER chain for defense-in-depth
2020
- **Zero Trust**: Block all traffic by default, allow only what you explicitly permit
@@ -174,36 +174,42 @@ The firewall uses a containerized architecture with three security layers:
174174

175175
<div class="sl-steps">
176176

177-
1. **Understand Security**
177+
1. **Learn Domain Filtering**
178178

179-
Review the [Security Architecture](/gh-aw-firewall/reference/security-architecture/) to learn how the firewall protects against attacks.
179+
Master [allowlists, blocklists, and wildcards](/gh-aw-firewall/guides/domain-filtering/) for fine-grained network control.
180180

181-
2. **Read Full Documentation**
181+
2. **Understand Security**
182182

183-
Check the [README](https://github.com/githubnext/gh-aw-firewall#readme) for detailed usage examples and configuration options.
183+
Review the [Security Architecture](/gh-aw-firewall/reference/security-architecture/) to learn how the firewall protects against attacks.
184184

185-
3. **Debug Issues**
185+
3. **CLI Reference**
186186

187-
See the [troubleshooting guide](https://github.com/githubnext/gh-aw-firewall/blob/main/docs/troubleshooting.md) for common problems and solutions.
187+
See the [CLI Reference](/gh-aw-firewall/reference/cli-reference/) for all available options.
188188

189-
4. **Explore Examples**
189+
4. **Debug Issues**
190190

191-
Browse the [examples directory](https://github.com/githubnext/gh-aw-firewall/tree/main/examples) for real-world use cases.
191+
Check the [troubleshooting guide](https://github.com/githubnext/gh-aw-firewall/blob/main/docs/troubleshooting.md) for common problems and solutions.
192192

193193
</div>
194194

195195
## Key Features
196196

197197
### Domain Whitelisting
198198

199-
Domains automatically match all subdomains:
199+
Domains automatically match all subdomains. Use blocklist for fine-grained control:
200200

201201
```bash
202202
# Whitelisting github.com allows:
203203
# ✓ github.com
204204
# ✓ api.github.com
205205
# ✓ raw.githubusercontent.com
206206
# ✗ example.com (not whitelisted)
207+
208+
# Block specific subdomains while allowing parent domain:
209+
sudo awf \
210+
--allow-domains example.com \
211+
--block-domains internal.example.com \
212+
-- curl https://api.example.com # ✓ allowed
207213
```
208214

209215
### Protocol-Specific Filtering
@@ -260,6 +266,9 @@ sudo awf --allow-domains github.com,arxiv.org,npmjs.org -- <command>
260266

261267
# From file
262268
sudo awf --allow-domains-file domains.txt -- <command>
269+
270+
# With blocklist for fine-grained control
271+
sudo awf --allow-domains '*.example.com' --block-domains 'internal.example.com' -- <command>
263272
```
264273

265274
## Architecture Highlights

docs-site/src/content/docs/reference/cli-reference.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -367,7 +367,5 @@ awf logs summary --format pretty
367367

368368
## See Also
369369

370-
- [Quick Start Guide](/gh-aw-firewall/quickstart) - Getting started with examples
371-
- [Usage Guide](/gh-aw-firewall/usage) - Detailed usage patterns and examples
372-
- [Troubleshooting](/gh-aw-firewall/troubleshooting) - Common issues and solutions
370+
- [Domain Filtering Guide](/gh-aw-firewall/guides/domain-filtering) - Allowlists, blocklists, and wildcards
373371
- [Security Architecture](/gh-aw-firewall/reference/security-architecture) - How the firewall works internally

docs-site/src/content/docs/reference/security-architecture.md

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,13 @@ graph TB
9595

9696
**Container iptables (NAT table)** — Inside the agent container, NAT rules intercept outbound HTTP (port 80) and HTTPS (port 443) traffic, rewriting the destination to Squid at `172.30.0.10:3128`. This handles traffic from the agent process itself and any child processes (including stdio MCP servers).
9797

98-
**Squid ACL** — The primary control point. Squid receives CONNECT requests, extracts the target domain from SNI (for HTTPS) or Host header (for HTTP), and checks against the allowlist. Unlisted domains get `403 Forbidden`. No SSL inspection—we read SNI from the TLS ClientHello without decrypting traffic.
98+
**Squid ACL** — The primary control point. Squid receives CONNECT requests, extracts the target domain from SNI (for HTTPS) or Host header (for HTTP), and checks against the allowlist and blocklist. The evaluation order is:
99+
100+
1. **Blocklist check first**: If domain matches a blocked pattern, deny immediately
101+
2. **Allowlist check second**: If domain matches an allowed pattern, permit
102+
3. **Default deny**: All other domains get `403 Forbidden`
103+
104+
This allows fine-grained control like allowing `*.example.com` while blocking `internal.example.com`. No SSL inspection—we read SNI from the TLS ClientHello without decrypting traffic.
99105

100106
---
101107

@@ -117,9 +123,13 @@ sequenceDiagram
117123
NAT->>Squid: TCP to proxy port
118124
119125
Squid->>Squid: Parse CONNECT api.github.com:443
120-
Squid->>Squid: Check domain against ACL
126+
Squid->>Squid: Check blocklist first
127+
Squid->>Squid: Check allowlist second
121128
122-
alt api.github.com in allowlist
129+
alt Domain in blocklist
130+
Squid-->>Agent: HTTP 403 Forbidden
131+
Note over Agent: Blocked by blocklist
132+
else Domain in allowlist
123133
Squid->>Host: Outbound to api.github.com:443
124134
Note over Host: Source is Squid IP (172.30.0.10)<br/>→ ACCEPT (unrestricted)
125135
Host->>Net: TCP connection
@@ -128,7 +138,7 @@ sequenceDiagram
128138
Note over Agent,Net: End-to-end encrypted tunnel
129139
else Domain not in allowlist
130140
Squid-->>Agent: HTTP 403 Forbidden
131-
Note over Agent: Connection refused
141+
Note over Agent: Not in allowlist
132142
end
133143
```
134144

@@ -305,6 +315,5 @@ Use `sudo -E` to preserve environment variables (like `GITHUB_TOKEN`) through su
305315

306316
## Related Documentation
307317

308-
- [Architecture Overview](/reference/architecture) — Component details and code structure
309-
- [CLI Reference](/reference/cli-options) — Complete command-line options
310-
- [Logging](/guides/logging) — Audit trail configuration and analysis
318+
- [Domain Filtering](/gh-aw-firewall/guides/domain-filtering/) — Allowlists, blocklists, and wildcard patterns
319+
- [CLI Reference](/gh-aw-firewall/reference/cli-reference/) — Complete command-line options

0 commit comments

Comments
 (0)