Skip to content

T8963: policy-route: trigger domain resolver for domain groups#5254

Open
jd82k wants to merge 1 commit into
vyos:rollingfrom
jd82k:domain-group
Open

T8963: policy-route: trigger domain resolver for domain groups#5254
jd82k wants to merge 1 commit into
vyos:rollingfrom
jd82k:domain-group

Conversation

@jd82k

@jd82k jd82k commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Change summary

Fix policy route domain-group handling so vyos-domain-resolver is triggered after PBR nftables sets are recreated.

Previously, PBR rules could reference firewall domain-group sets in ip vyos_mangle, but policy_route.py did not notify vyos-domain-resolver. As a result, D_* sets were created empty and only populated on the resolver’s next periodic interval, causing PBR rules using domain groups to be ineffective for several minutes after commit.

This change adds policy-route resolver usage tracking and restarts/stops vyos-domain-resolver.service consistently with the existing firewall/NAT behavior. A smoketest was added to verify that a PBR rule using a domain group gets its ip vyos_mangle set populated promptly.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes)
  • Migration from an old Vyatta component to vyos-1x, please link to related PR inside obsoleted component
  • Other (please describe):

Related Task(s)

https://vyos.dev/T8963

Related PR(s)

How to test / Smoketest result

vyos@vyos:~$ sudo /usr/libexec/vyos/tests/smoke/cli/test_policy_route.py TestPolicyRoute.test_pbr_domain_grouptest_pbr_domain_group (__main__.TestPolicyRoute.test_pbr_domain_group) ... ok

----------------------------------------------------------------------
Ran 1 test in 33.669s

OK

Checklist:

  • I have read the CONTRIBUTING document
  • I have linked this PR to one or more Phabricator Task(s)
  • I have run the components SMOKETESTS if applicable
  • My commit headlines contain a valid Task id
  • My change requires a change to the documentation
  • I have updated the documentation accordingly

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: b82ef809-e47f-4e3c-aace-df74ce88d630

📥 Commits

Reviewing files that changed from the base of the PR and between cd4daf6 and 069959a.

📒 Files selected for processing (2)
  • smoketest/scripts/cli/test_policy_route.py
  • src/conf_mode/policy_route.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • smoketest/scripts/cli/test_policy_route.py
  • src/conf_mode/policy_route.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build_iso
  • GitHub Check: Mergify Merge Protections
  • GitHub Check: Summary
🧰 Additional context used
🔍 Remote MCP

Relevant Context for Pull Request #5254 Review

Based on the search results, here is the relevant technical context for reviewing this PR:

Background on the Issue Being Fixed

Domain-group matching is available in route policy, but was not working because domain-group usage is written in the rule, but the named set is not defined in the mangle table (only written in vyos_filter table). This is the underlying issue that task T8963 addresses.

Domain-Group and Policy Route Configuration

Policy route rules support matching based on source or destination groups, including domain-groups. Firewall groups represent collections of IP addresses, networks, ports, MAC addresses, domains, or interfaces, and can be referenced in firewall, NAT, and policy route rules.

Domain Resolver Integration Pattern

The PR implements a pattern consistent with existing VyOS architecture. Previous work on firewall domain groups refactored the domain resolver daemon and enables/updates domain groups in policy route and nat tables, establishing that vyos-domain-resolver service coordination is the appropriate mechanism for handling domain-group resolution across multiple VyOS subsystems.

Key Review Points

  1. Service Coordination: The PR adds marker file-based tracking (/run/use-vyos-domain-resolver-policy-route) to coordinate the vyos-domain-resolver service, following the established pattern for firewall and NAT implementations.

  2. Timing Issue Resolution: Previously, domain-group sets were created empty and populated only through the resolver's periodic interval, causing the delay described in the PR objectives. The PR restarts the resolver immediately after PBR table recreation to ensure prompt population.

  3. Smoke Test Coverage: The new test test_pbr_domain_group validates the complete flow: static host mapping → domain-group creation → PBR rule configuration → nftables set population, confirming immediate effectiveness of domain-group based PBR rules.

[::web_search::]


📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Policy-based routing now supports domain groups for routing by domain and automatically manages the domain-resolver service when domain groups are in use.
  • Tests

    • New smoke test verifies domain-group policy-route behavior end-to-end, including resolver-driven IP resolution, routing decision application, and cleanup after test.

Walkthrough

Policy-route now detects destination domain_group usage and manages the vyos-domain-resolver systemd lifecycle via a /run marker. apply() invokes the update step after table marks. Smoketests extend teardown, import run, and add test_pbr_domain_group that polls the nft resolver set and verifies PBR rule generation.

Changes

Policy Route Domain Group Support

Layer / File(s) Summary
Domain resolver service lifecycle management
src/conf_mode/policy_route.py (import glob ~L19; write_file, call imports ~L29-30; constants ~L44-45; logic ~L195-230)
Adds glob/write_file/call imports, defines domain_resolver_usage = '/run/use-vyos-domain-resolver-policy-route' and a glob, implements domain_group_used(policy) to scan enabled policy/route and policy/route6 for domain_group, and update_domain_resolver(policy) to write/remove the marker and run systemctl restart or conditional systemctl stop.
Domain resolver integration in apply flow
src/conf_mode/policy_route.py (apply wiring ~L284-285)
Calls update_domain_resolver(policy) immediately after apply_table_marks(policy) and before GeoIP refresh/update handling.
Domain group smoke tests
smoketest/scripts/cli/test_policy_route.py (import ~L21-22; tearDown edits ~L58-66; test ~L102-141)
Adds run import (~L21-22), extends TestPolicyRoute.tearDown() to remove smoketest_domain firewall domain-group and the pbr.example.com static-host-mapping (commit), and adds test_pbr_domain_group() which creates the static-host mapping and domain-group, installs a PBR destination-match rule that sets mark, polls nft get element until the domain-derived IP appears in D_smoketest_domain, and asserts the nftables jump, set membership/element, destination match against @D_smoketest_domain, and meta mark set.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly and concisely summarizes the main change: fixing policy-route to trigger domain resolver for domain groups.
Description check ✅ Passed Description details the bug fix, root cause, solution approach, and includes smoketest results directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

✅ No typos found in changed files.

@mergify mergify Bot added the rolling label Jun 3, 2026
@mergify mergify Bot assigned jd82k Jun 3, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/conf_mode/policy_route.py (1)

213-228: 💤 Low value

Logic check on resolver lifecycle: OK.

glob('/run/use-vyos-domain-resolver*') runs after the policy-route marker is unlinked, so the stop decision correctly ignores this script's own marker and respects other consumers (firewall/NAT). Restart-on-every-commit when a domain group is present matches existing firewall/NAT behavior.

Note: unconditional restart on each commit briefly drops resolved sets for all consumers mid-resolution. Acceptable for parity, but a reload/no-op-when-unchanged path would be gentler if the service supports it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/conf_mode/policy_route.py` around lines 213 - 228, The
update_domain_resolver function currently unconditionally sets domain_action =
'restart' (when domain_group_used(policy) or marker exists) which briefly drops
resolved sets; change the invoked action to a gentler option by using systemctl
reload-or-restart (or reload if the service supports a true reload) instead of
plain restart: update the logic in update_domain_resolver so that when you would
set domain_action = 'restart' you set domain_action = 'reload-or-restart' (or
'reload' if you verify vyos-domain-resolver supports it), then call
call(f'systemctl {domain_action} vyos-domain-resolver.service'); keep references
to domain_resolver_usage and domain_group_used(policy) intact so other consumers
are respected.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/conf_mode/policy_route.py`:
- Around line 217-218: Two f-strings that have no interpolation should be plain
strings: change the assignment to variable text (the string passed to
write_file(domain_resolver_usage, text)) from f'# Automatically generated by
policy_route.py\nThis file ...\n' to a regular string without the f-prefix, and
similarly change the ConfigError raise in the ConfigError(...) call (raise
ConfigError(f'Cannot match a tcp flag as set and not set')) to remove the
f-prefix so it reads raise ConfigError('Cannot match a tcp flag as set and not
set'); this removes the unnecessary f-prefixes flagged by ruff F541.

---

Nitpick comments:
In `@src/conf_mode/policy_route.py`:
- Around line 213-228: The update_domain_resolver function currently
unconditionally sets domain_action = 'restart' (when domain_group_used(policy)
or marker exists) which briefly drops resolved sets; change the invoked action
to a gentler option by using systemctl reload-or-restart (or reload if the
service supports a true reload) instead of plain restart: update the logic in
update_domain_resolver so that when you would set domain_action = 'restart' you
set domain_action = 'reload-or-restart' (or 'reload' if you verify
vyos-domain-resolver supports it), then call call(f'systemctl {domain_action}
vyos-domain-resolver.service'); keep references to domain_resolver_usage and
domain_group_used(policy) intact so other consumers are respected.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 2154e174-54bd-4c06-b987-aec6f83d2a49

📥 Commits

Reviewing files that changed from the base of the PR and between 88dfc8c and 4fa7194.

📒 Files selected for processing (2)
  • smoketest/scripts/cli/test_policy_route.py
  • src/conf_mode/policy_route.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build_iso
  • GitHub Check: Mergify Merge Protections
  • GitHub Check: Summary
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Python code must target Python 3.11 or higher
Prefer storing Jinja2 templates as discrete files under data/templates/ rather than inline Python strings
Use ruff (version 0.6.4), darker, pylint W0611, and Jinja2 lint for linting; configuration via ruff.toml and nose2.cfg at repository root

Files:

  • smoketest/scripts/cli/test_policy_route.py
  • src/conf_mode/policy_route.py
smoketest/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

smoketest/**/*.py: Use nose2 for running tests with configuration in nose2.cfg
Runtime smoketests must be located under smoketest/ and are used by vyos-build when assembling and testing ISO images

Files:

  • smoketest/scripts/cli/test_policy_route.py
src/conf_mode/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Conf-mode entry-point scripts must be named after CLI components and placed in src/conf_mode/

Files:

  • src/conf_mode/policy_route.py
🧠 Learnings (3)
📚 Learning: 2026-06-01T00:03:28.710Z
Learnt from: CR
Repo: vyos/vyos-1x PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-06-01T00:03:28.710Z
Learning: Applies to smoketest/**/*.py : Runtime smoketests must be located under `smoketest/` and are used by vyos-build when assembling and testing ISO images

Applied to files:

  • smoketest/scripts/cli/test_policy_route.py
📚 Learning: 2026-05-26T06:04:29.163Z
Learnt from: c-po
Repo: vyos/vyos-1x PR: 5109
File: smoketest/scripts/cli/test_service_https.py:118-120
Timestamp: 2026-05-26T06:04:29.163Z
Learning: In VyOS smoketest scripts under `smoketest/scripts/cli/`, it is intentional to call `self.cli_delete(['vrf'])` in both `setUpClass` and `tearDown` to wipe the entire VRF subtree and ensure a clean slate. During code review, do not recommend narrowing the delete to specific VRF identifiers or name subsets (e.g., `['vrf', 'name', 'mgmt']`)—the broad teardown behavior is the established project-wide pattern for these tests.

Applied to files:

  • smoketest/scripts/cli/test_policy_route.py
📚 Learning: 2026-05-26T06:03:59.703Z
Learnt from: c-po
Repo: vyos/vyos-1x PR: 5109
File: smoketest/scripts/cli/test_service_https.py:206-207
Timestamp: 2026-05-26T06:03:59.703Z
Learning: In VyOS smoketests that verify processes running inside a VRF using iproute2, remember that `ip vrf pids <vrf>` outputs one entry per line as `<pid> <process_name>` (e.g., `300431 nginx`), not PIDs alone. Therefore, assertions should check for the presence of the expected process name in the command output (e.g., `assertIn(PROCESS_NAME, cmd(f'ip vrf pids {vrf}'))`) rather than trying to match PID-only output.

Applied to files:

  • smoketest/scripts/cli/test_policy_route.py
🔍 Remote MCP

Summary of Relevant PR Context

Based on the research performed, here is the relevant context for effectively reviewing PR #5254:

Problem Context

The source/destination domain-group matcher is available in route policy, but it wasn't working—domain-group usage was written in the rule, but the named_set was not defined in the mangle table (it was only written in the vyos_filter table).

Domain-Group Architecture

Firewall groups can be referenced in firewall, NAT, and policy route rules as either a source or destination matcher. The domain resolver runs as a system service (vyos-domain-resolver.service), periodically resolves configured domain names to IP addresses, updates nftables sets with the resolved IP addresses, and is automatically started/stopped based on firewall configuration.

How Domain-Groups Work in VyOS Firewall

Domain groups allow filtering addresses by domain name, with resolved addresses stored as named "nft sets" used in nftables rules. The resolver is a systemd daemon (vyos-domain-group-resolve.service) that periodically resolves domain-group addresses by timeout every 300 seconds.

The Root Issue This PR Fixes

The PR addresses the gap where policy-route configuration with domain-groups did not trigger vyos-domain-resolver, meaning the nftables sets (D_* sets) were created empty and populated only on the resolver's next periodic interval (300-second timeout), causing PBR rules using domain groups to be ineffective for several minutes after commit.

Precedent Pattern

Previous work established using named sets in nftables instead of anonymous sets for firewall groups, and refactored nftables cleanup code. The PR extends this pattern to policy-route domain-group handling by aligning it with existing firewall/NAT behavior.

Test Coverage

The included smoketest verifies that a PBR rule using a domain group gets its ip vyos_mangle set populated promptly (passing in 33.669s), which is critical validation that the domain resolver is properly triggered on commit.

🔇 Additional comments (6)
src/conf_mode/policy_route.py (3)

19-19: LGTM!

Also applies to: 29-30


194-211: LGTM!


283-283: LGTM!

smoketest/scripts/cli/test_policy_route.py (3)

19-23: LGTM!

Also applies to: 60-61


82-91: LGTM!


114-140: LGTM!

Comment thread src/conf_mode/policy_route.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Ensures policy-based routing (PBR) commits that reference firewall domain-group sets promptly trigger vyos-domain-resolver, so the D_* nftables sets used in ip vyos_mangle are populated immediately rather than waiting for the resolver’s periodic refresh.

Changes:

  • Add resolver-usage tracking to policy_route.py and restart/stop vyos-domain-resolver.service based on whether PBR rules reference domain-group.
  • Add a smoketest that configures a PBR rule using a domain group and waits for the corresponding D_* set to be populated.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/conf_mode/policy_route.py Detects PBR domain-group usage and restarts/stops vyos-domain-resolver.service via /run/use-vyos-domain-resolver-policy-route.
smoketest/scripts/cli/test_policy_route.py Adds a smoketest for PBR + domain-group and waits for resolver-populated nftables set elements.

Comment thread smoketest/scripts/cli/test_policy_route.py
@jd82k jd82k force-pushed the domain-group branch 2 times, most recently from 4b2558d to cd4daf6 Compare June 4, 2026 16:15

@sarthurdev sarthurdev left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triggers resolver restart on policy change. Smoketest added/passes, tested locally.

Comment thread smoketest/scripts/cli/test_policy_route.py Outdated
Comment thread src/conf_mode/policy_route.py Outdated
Signed-off-by: Miaosen Wang <secretandanon@gmail.com>
@jd82k jd82k requested a review from c-po June 8, 2026 18:58
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

CI integration ❌ failed!

Details

CI logs

  • CLI Smoketests 👍 passed
  • CLI Smoketests (interfaces only) ❌ failed
  • Config tests 👍 passed
  • RAID1 tests 👍 passed
  • CLI Smoketests VPP 👍 passed
  • Config tests VPP 👍 passed
  • TPM tests 👍 passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

4 participants