Skip to content

Commit dfd5209

Browse files
CopilotMossaka
andauthored
feat: add blocklist support for domain filtering (#114)
* Initial plan * feat: add blocklist support for domain filtering Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com> * docs: add blocklist documentation to cli reference and usage guide Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>
1 parent 7930bb8 commit dfd5209

7 files changed

Lines changed: 371 additions & 18 deletions

File tree

docs-site/src/content/docs/reference/cli-reference.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ awf [options] -- <command>
2121
|--------|------|---------|-------------|
2222
| `--allow-domains <domains>` | string || Comma-separated list of allowed domains (required unless `--allow-domains-file` used) |
2323
| `--allow-domains-file <path>` | string || Path to file containing allowed domains |
24+
| `--block-domains <domains>` | string || Comma-separated list of blocked domains (takes precedence over allowed) |
25+
| `--block-domains-file <path>` | string || Path to file containing blocked domains |
2426
| `--log-level <level>` | string | `info` | Logging verbosity: `debug`, `info`, `warn`, `error` |
2527
| `--keep-containers` | flag | `false` | Keep containers running after command exits |
2628
| `--tty` | flag | `false` | Allocate pseudo-TTY for interactive tools |
@@ -40,10 +42,11 @@ awf [options] -- <command>
4042

4143
### `--allow-domains <domains>`
4244

43-
Comma-separated list of allowed domains. Domains automatically match all subdomains.
45+
Comma-separated list of allowed domains. Domains automatically match all subdomains. Supports wildcard patterns.
4446

4547
```bash
4648
--allow-domains github.com,npmjs.org
49+
--allow-domains '*.github.com,api-*.example.com'
4750
```
4851

4952
### `--allow-domains-file <path>`
@@ -54,6 +57,26 @@ Path to file with allowed domains. Supports comments (`#`) and one domain per li
5457
--allow-domains-file ./allowed-domains.txt
5558
```
5659

60+
### `--block-domains <domains>`
61+
62+
Comma-separated list of blocked domains. **Blocked domains take precedence over allowed domains**, enabling fine-grained control. Supports the same wildcard patterns as `--allow-domains`.
63+
64+
```bash
65+
# Block specific subdomain while allowing parent domain
66+
--allow-domains example.com --block-domains internal.example.com
67+
68+
# Block with wildcards
69+
--allow-domains '*.example.com' --block-domains '*.secret.example.com'
70+
```
71+
72+
### `--block-domains-file <path>`
73+
74+
Path to file with blocked domains. Supports the same format as `--allow-domains-file`.
75+
76+
```bash
77+
--block-domains-file ./blocked-domains.txt
78+
```
79+
5780
### `--log-level <level>`
5881

5982
Set logging verbosity.

docs/usage.md

Lines changed: 95 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,34 @@ Domains automatically match all subdomains:
121121
sudo awf --allow-domains github.com "curl https://api.github.com" # ✓ works
122122
```
123123

124+
### Wildcard Patterns
125+
126+
You can use wildcard patterns with `*` to match multiple domains:
127+
128+
```bash
129+
# Match any subdomain of github.com
130+
--allow-domains '*.github.com'
131+
132+
# Match api-v1.example.com, api-v2.example.com, etc.
133+
--allow-domains 'api-*.example.com'
134+
135+
# Combine plain domains and wildcards
136+
--allow-domains 'github.com,*.googleapis.com,api-*.example.com'
137+
```
138+
139+
**Pattern rules:**
140+
- `*` matches any characters (converted to regex `.*`)
141+
- Patterns are case-insensitive (DNS is case-insensitive)
142+
- Overly broad patterns like `*`, `*.*`, or `*.*.*` are rejected for security
143+
- Use quotes around patterns to prevent shell expansion
144+
145+
**Examples:**
146+
| Pattern | Matches | Does Not Match |
147+
|---------|---------|----------------|
148+
| `*.github.com` | `api.github.com`, `raw.github.com` | `github.com` |
149+
| `api-*.example.com` | `api-v1.example.com`, `api-test.example.com` | `api.example.com` |
150+
| `github.com` | `github.com`, `api.github.com` | `notgithub.com` |
151+
124152
### Multiple Domains
125153

126154
```bash
@@ -155,17 +183,79 @@ For MCP servers:
155183
mcp.deepwiki.com
156184
```
157185

158-
## Limitations
186+
## Domain Blocklist
159187

160-
### No Wildcard Syntax
188+
You can explicitly block specific domains using `--block-domains` and `--block-domains-file`. **Blocked domains take precedence over allowed domains**, enabling fine-grained control.
161189

162-
Wildcards are not needed - subdomains match automatically:
190+
### Basic Blocklist Usage
163191

164192
```bash
165-
--allow-domains '*.github.com' # ✗ syntax not supported
166-
--allow-domains github.com # ✓ matches *.github.com automatically
193+
# Allow example.com but block internal.example.com
194+
sudo awf \
195+
--allow-domains example.com \
196+
--block-domains internal.example.com \
197+
-- curl https://api.example.com # ✓ works
198+
199+
sudo awf \
200+
--allow-domains example.com \
201+
--block-domains internal.example.com \
202+
-- curl https://internal.example.com # ✗ blocked
167203
```
168204

205+
### Blocklist with Wildcards
206+
207+
```bash
208+
# Allow all of example.com except any subdomain starting with "internal-"
209+
sudo awf \
210+
--allow-domains example.com \
211+
--block-domains 'internal-*.example.com' \
212+
-- curl https://api.example.com # ✓ works
213+
214+
# Block all subdomains matching the pattern
215+
sudo awf \
216+
--allow-domains '*.example.com' \
217+
--block-domains '*.secret.example.com' \
218+
-- curl https://api.example.com # ✓ works
219+
```
220+
221+
### Using a Blocklist File
222+
223+
```bash
224+
# Create a blocklist file
225+
cat > blocked-domains.txt << 'EOF'
226+
# Internal services that should never be accessed
227+
internal.example.com
228+
admin.example.com
229+
230+
# Block all subdomains of sensitive.org
231+
*.sensitive.org
232+
EOF
233+
234+
# Use the blocklist file
235+
sudo awf \
236+
--allow-domains example.com,sensitive.org \
237+
--block-domains-file blocked-domains.txt \
238+
-- curl https://api.example.com
239+
```
240+
241+
**Combining flags:**
242+
```bash
243+
# You can combine all domain flags
244+
sudo awf \
245+
--allow-domains github.com \
246+
--allow-domains-file allowed.txt \
247+
--block-domains internal.github.com \
248+
--block-domains-file blocked.txt \
249+
-- your-command
250+
```
251+
252+
**Use cases:**
253+
- Allow a broad domain (e.g., `*.example.com`) but block specific sensitive subdomains
254+
- Block known bad domains while allowing a curated list
255+
- Prevent access to internal services from AI agents
256+
257+
## Limitations
258+
169259
### No Internationalized Domains
170260

171261
Use punycode instead:

src/cli.ts

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -309,6 +309,14 @@ program
309309
'--allow-domains-file <path>',
310310
'Path to file containing allowed domains (one per line or comma-separated, supports # comments)'
311311
)
312+
.option(
313+
'--block-domains <domains>',
314+
'Comma-separated list of blocked domains (takes precedence over allowed domains). Supports wildcards.'
315+
)
316+
.option(
317+
'--block-domains-file <path>',
318+
'Path to file containing blocked domains (one per line or comma-separated, supports # comments)'
319+
)
312320
.option(
313321
'--log-level <level>',
314322
'Log level: debug, info, warn, error',
@@ -457,6 +465,38 @@ program
457465
}
458466
}
459467

468+
// Parse blocked domains from both --block-domains flag and --block-domains-file
469+
let blockedDomains: string[] = [];
470+
471+
// Parse blocked domains from command-line flag if provided
472+
if (options.blockDomains) {
473+
blockedDomains = parseDomains(options.blockDomains);
474+
}
475+
476+
// Parse blocked domains from file if provided
477+
if (options.blockDomainsFile) {
478+
try {
479+
const fileBlockedDomainsArray = parseDomainsFile(options.blockDomainsFile);
480+
blockedDomains.push(...fileBlockedDomainsArray);
481+
} catch (error) {
482+
logger.error(`Failed to read blocked domains file: ${error instanceof Error ? error.message : error}`);
483+
process.exit(1);
484+
}
485+
}
486+
487+
// Remove duplicates from blocked domains
488+
blockedDomains = [...new Set(blockedDomains)];
489+
490+
// Validate all blocked domains and patterns
491+
for (const domain of blockedDomains) {
492+
try {
493+
validateDomainOrPattern(domain);
494+
} catch (error) {
495+
logger.error(`Invalid blocked domain or pattern: ${error instanceof Error ? error.message : error}`);
496+
process.exit(1);
497+
}
498+
}
499+
460500
// Parse additional environment variables from --env flags
461501
let additionalEnv: Record<string, string> = {};
462502
if (options.env && Array.isArray(options.env)) {
@@ -492,6 +532,7 @@ program
492532

493533
const config: WrapperConfig = {
494534
allowedDomains,
535+
blockedDomains: blockedDomains.length > 0 ? blockedDomains : undefined,
495536
agentCommand,
496537
logLevel,
497538
keepContainers: options.keepContainers,
@@ -521,6 +562,9 @@ program
521562
};
522563
logger.debug('Configuration:', JSON.stringify(redactedConfig, null, 2));
523564
logger.info(`Allowed domains: ${allowedDomains.join(', ')}`);
565+
if (blockedDomains.length > 0) {
566+
logger.info(`Blocked domains: ${blockedDomains.join(', ')}`);
567+
}
524568
logger.debug(`DNS servers: ${dnsServers.join(', ')}`);
525569

526570
let exitCode = 0;

src/docker-manager.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -438,6 +438,7 @@ export async function writeConfigs(config: WrapperConfig): Promise<void> {
438438
// Write Squid config
439439
const squidConfig = generateSquidConfig({
440440
domains: config.allowedDomains,
441+
blockedDomains: config.blockedDomains,
441442
port: SQUID_PORT,
442443
});
443444
const squidConfigPath = path.join(config.workDir, 'squid.conf');

src/squid-config.test.ts

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -692,4 +692,118 @@ describe('generateSquidConfig', () => {
692692
expect(result).toContain('# ACL definitions for allowed domain patterns');
693693
});
694694
});
695+
696+
describe('Blocklist Support', () => {
697+
it('should generate blocked domain ACL for plain domain', () => {
698+
const config: SquidConfig = {
699+
domains: ['github.com'],
700+
blockedDomains: ['internal.github.com'],
701+
port: defaultPort,
702+
};
703+
const result = generateSquidConfig(config);
704+
expect(result).toContain('acl blocked_domains dstdomain .internal.github.com');
705+
expect(result).toContain('http_access deny blocked_domains');
706+
});
707+
708+
it('should generate blocked domain ACL for wildcard pattern', () => {
709+
const config: SquidConfig = {
710+
domains: ['example.com'],
711+
blockedDomains: ['*.internal.example.com'],
712+
port: defaultPort,
713+
};
714+
const result = generateSquidConfig(config);
715+
expect(result).toContain('acl blocked_domains_regex dstdom_regex -i');
716+
expect(result).toContain('^.*\\.internal\\.example\\.com$');
717+
expect(result).toContain('http_access deny blocked_domains_regex');
718+
});
719+
720+
it('should handle both plain and wildcard blocked domains', () => {
721+
const config: SquidConfig = {
722+
domains: ['example.com'],
723+
blockedDomains: ['internal.example.com', '*.secret.example.com'],
724+
port: defaultPort,
725+
};
726+
const result = generateSquidConfig(config);
727+
expect(result).toContain('acl blocked_domains dstdomain .internal.example.com');
728+
expect(result).toContain('acl blocked_domains_regex dstdom_regex -i');
729+
expect(result).toContain('http_access deny blocked_domains');
730+
expect(result).toContain('http_access deny blocked_domains_regex');
731+
});
732+
733+
it('should place blocked domains deny rule before allowed domains deny rule', () => {
734+
const config: SquidConfig = {
735+
domains: ['github.com'],
736+
blockedDomains: ['internal.github.com'],
737+
port: defaultPort,
738+
};
739+
const result = generateSquidConfig(config);
740+
const blockRuleIndex = result.indexOf('http_access deny blocked_domains');
741+
const allowRuleIndex = result.indexOf('http_access deny !allowed_domains');
742+
expect(blockRuleIndex).toBeLessThan(allowRuleIndex);
743+
});
744+
745+
it('should include blocklist comment section', () => {
746+
const config: SquidConfig = {
747+
domains: ['github.com'],
748+
blockedDomains: ['internal.github.com'],
749+
port: defaultPort,
750+
};
751+
const result = generateSquidConfig(config);
752+
expect(result).toContain('# ACL definitions for blocked domains');
753+
expect(result).toContain('# Deny requests to blocked domains (blocklist takes precedence)');
754+
});
755+
756+
it('should work without blocklist (backward compatibility)', () => {
757+
const config: SquidConfig = {
758+
domains: ['github.com'],
759+
port: defaultPort,
760+
};
761+
const result = generateSquidConfig(config);
762+
expect(result).not.toContain('blocked_domains');
763+
expect(result).toContain('acl allowed_domains dstdomain .github.com');
764+
});
765+
766+
it('should work with empty blocklist', () => {
767+
const config: SquidConfig = {
768+
domains: ['github.com'],
769+
blockedDomains: [],
770+
port: defaultPort,
771+
};
772+
const result = generateSquidConfig(config);
773+
expect(result).not.toContain('blocked_domains');
774+
expect(result).toContain('acl allowed_domains dstdomain .github.com');
775+
});
776+
777+
it('should normalize blocked domains (remove protocol)', () => {
778+
const config: SquidConfig = {
779+
domains: ['github.com'],
780+
blockedDomains: ['https://internal.github.com'],
781+
port: defaultPort,
782+
};
783+
const result = generateSquidConfig(config);
784+
expect(result).toContain('acl blocked_domains dstdomain .internal.github.com');
785+
expect(result).not.toContain('https://');
786+
});
787+
788+
it('should handle multiple blocked domains', () => {
789+
const config: SquidConfig = {
790+
domains: ['example.com'],
791+
blockedDomains: ['internal.example.com', 'secret.example.com', 'admin.example.com'],
792+
port: defaultPort,
793+
};
794+
const result = generateSquidConfig(config);
795+
expect(result).toContain('acl blocked_domains dstdomain .internal.example.com');
796+
expect(result).toContain('acl blocked_domains dstdomain .secret.example.com');
797+
expect(result).toContain('acl blocked_domains dstdomain .admin.example.com');
798+
});
799+
800+
it('should throw error for invalid blocked domain pattern', () => {
801+
const config: SquidConfig = {
802+
domains: ['github.com'],
803+
blockedDomains: ['*'],
804+
port: defaultPort,
805+
};
806+
expect(() => generateSquidConfig(config)).toThrow();
807+
});
808+
});
695809
});

0 commit comments

Comments
 (0)