Make fedify bench safety explicit for staging targets#795
Conversation
Resolve public-looking benchmark targets through DNS before applying the safety gate, and keep unresolved or mixed-public answers in the public tier. Require unsafe public-target overrides to name the target on the command line and to set load and duration explicitly. Make dry-run mode perform discovery planning for runnable scenarios while still avoiding benchmark load, so inbox plans show the resolved actor and inbox destination before a real run. fedify-dev#744 fedify-dev#784 Assisted-by: Codex:gpt-5.5
Move the local benchmark-mode Fedify target used by fedify bench integration tests into test/bench. This gives the scenario tests a shared fixture app that exercises WebFinger, actor discovery, signature verification, and inbox handling through the same benchmark target. fedify-dev#744 fedify-dev#784 Assisted-by: Codex:gpt-5.5
Update the benchmark manual and CLI help to describe discovery-aware dry runs, DNS-backed target classification, and the narrower unsafe public-target override. Link deployment guidance to fedify bench so staging and CI checks are part of production preparation. fedify-dev#744 fedify-dev#784 Assisted-by: Codex:gpt-5.5
Make the shared benchmark fixture work under Node and Bun by replacing the Deno-only test server with a runtime-neutral node:http bridge. Use the resolved target tier when gating same-origin inbox destinations, and apply the unsafe-target bounds to separate public inbox destinations before signed load can be sent to them. fedify-dev#744 fedify-dev#784 Assisted-by: Codex:gpt-5.5
Pass suite-level load and duration defaults into the unsafe public inbox destination check. Public inbox runs allowed with --allow-unsafe-target now follow the same documented explicit-default contract as the top-level target gate. fedify-dev#744 fedify-dev#784 Assisted-by: Codex:gpt-5.5
Classify off-origin inbox destinations with the same DNS-aware resolver used for benchmark targets before applying the inbox safety gate. This lets private staging shared inboxes run without the unsafe public override while keeping public destination override checks bounded. fedify-dev#744 fedify-dev#784 Assisted-by: Codex:gpt-5.5
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR extends the ChangesBenchmark Safety Gating with DNS-Resolved Tiers
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request enhances the safety and robustness of the fedify bench command by resolving DNS for public-looking hostnames to classify targets into risk tiers, and enforcing stricter, more bounded constraints on unsafe public-target overrides (requiring an explicit CLI target, load, and duration). It also improves the --dry-run mode to resolve discovery and display planned destinations without sending benchmark load, and introduces a reusable local benchmark target fixture for testing. A review comment suggests wrapping the new URL instantiation inside the DNS-resolved IP classification loop in a try-catch block to gracefully handle malformed IP strings and prevent the safety gate from crashing.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. What shall we delve into next? ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/bench/fixture.ts`:
- Line 138: Replace the Uint8Array construction with the Node.js Buffer idiom:
in the outgoing.end call (symbol: outgoing.end and the variable response),
convert the response.arrayBuffer() result using Buffer.from(...) instead of new
Uint8Array(...); update the call to outgoing.end to pass Buffer.from(await
response.arrayBuffer()) so it uses the Node Buffer type which is more idiomatic
and may avoid unnecessary copying.
- Line 89: The close helper currently resolves unconditionally via
server.close(() => resolve()), which ignores potential errors; update the
exported close function to return a Promise<void> that calls server.close((err)
=> err ? reject(err) : resolve()) so the Promise rejects on server close
errors—look for the close arrow function and the server.close call in the
fixture.ts and change its callback to accept an error parameter and reject when
present.
- Line 83: The Promise around server.listen currently only resolves on
successful bind and will hang if listen fails; update the logic that creates the
Promise for server.listen to also attach an 'error' listener that rejects the
Promise (and remove the listener on success) so the Promise is rejected when
server emits an error. Locate the await new Promise<void>((resolve) =>
server.listen(0, "127.0.0.1", resolve)); call and change it so the Promise
executor takes (resolve, reject), registers server.once('error', reject) before
calling server.listen, and ensures the error listener is cleaned up when resolve
runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b605dc63-96ae-4ae8-b212-e392db861d60
📒 Files selected for processing (13)
CHANGES.mddocs/manual/benchmarking.mddocs/manual/deploy.mdpackages/cli/src/bench/action.test.tspackages/cli/src/bench/action.tspackages/cli/src/bench/command.tspackages/cli/src/bench/safety/gate.test.tspackages/cli/src/bench/safety/gate.tspackages/cli/src/bench/safety/tiers.test.tspackages/cli/src/bench/safety/tiers.tspackages/cli/src/bench/scenarios/inbox.tspackages/cli/src/bench/scenarios/runner.tstest/bench/fixture.ts
Malformed resolver output should not escape the benchmark safety classifier. Fall back to the public tier so the gate stays conservative when a custom resolver returns an address string that cannot be parsed as a URL host. fedify-dev#795 (comment) Assisted-by: Codex:gpt-5.5
Make fixture server startup and shutdown reject on errors instead of hanging or resolving unconditionally. Use the Node Buffer path when bridging fetch responses back to the HTTP server. fedify-dev#795 (comment) fedify-dev#795 (comment) fedify-dev#795 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the safety and planning capabilities of the fedify bench command. It introduces DNS-resolved target classification to accurately identify loopback, private, and public destinations, treating DNS failures conservatively as public. The safety gate is strengthened by requiring that any unsafe public target override (--allow-unsafe-target) be paired with an explicit command-line --target and explicit load and duration settings for all scenarios. Additionally, the --dry-run mode is upgraded to perform safe discovery (resolving WebFinger and actor inboxes) and print a detailed destination plan without sending benchmark load. A shared local benchmark target fixture has also been introduced to simplify and unify scenario testing. As there are no review comments provided, I have no feedback to offer on the review itself.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a685876f76
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Move the benchmark target fixture under the CLI package so Node resolves its workspace dependencies from the package that declares them. This keeps the repo-root Node test command from resolving @fedify/fedify and @fedify/vocab through the root package scope. fedify-dev#795 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the safety gates and dry-run capabilities of the fedify bench command. It introduces DNS resolution for public-looking hostnames to accurately classify target risk tiers, restricts unsafe public-target overrides by requiring explicit command-line targets and scenario load/duration configurations, and improves the --dry-run option to resolve and display concrete discovery destinations without sending load. Additionally, the test suite has been refactored to use a shared benchmark target fixture. The review feedback suggests two defensive programming improvements to prevent potential runtime TypeErrors: verifying that the resolved addresses are returned as a valid array, and ensuring load is an object before using the in operator.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Treat malformed resolver return values as public before reading or iterating the result. Also guard the explicit-load helper against non-object load values before using property checks. fedify-dev#795 (comment) fedify-dev#795 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the safety and planning capabilities of the fedify bench command. It introduces DNS resolution for target classification (classifying public-looking hostnames based on their resolved IP addresses), enforces stricter rules for unsafe public target overrides (requiring an explicit CLI target, explicit load, and explicit duration), and improves the --dry-run mode to resolve and display discovery plans (such as WebFinger and actor inbox lookups) without sending benchmark load. Additionally, a new local benchmark target fixture is introduced for testing. Feedback on the changes suggests wrapping the discoverInbox call in a try-catch block during dry-run planning to prevent a single recipient resolution failure from crashing the entire preflight check.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Report inbox discovery failures per recipient during dry runs so one bad actor lookup does not hide the rest of the benchmark plan. fedify-dev#795 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the safety and planning capabilities of the fedify bench command. It introduces DNS-resolved target risk classification to classify public-looking hostnames into loopback, private, or public tiers. The --dry-run option is improved to resolve scenario discovery (such as WebFinger and actor inbox lookups) and print a detailed plan showing concrete destinations and safety classifications without sending benchmark load. Additionally, safety gates are tightened: the --allow-unsafe-target override is restricted to require an explicit CLI --target and explicit load and duration configurations for all scenarios, preventing accidental public benchmarks. The test suite has been updated and expanded with a new HTTP-based benchmark target fixture. There are no review comments, so I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
2chanhaeng
left a comment
There was a problem hiding this comment.
Looks great! The flow seems fine. Just a few nitpicks.
Use stable fixture key pairs without a mutable cache, share error message formatting through the CLI utilities, pass suite defaults to unsafe override planning directly, and fold DNS tier fallback handling into a single conservative catch path. fedify-dev#795 (comment) fedify-dev#795 (comment) fedify-dev#795 (comment) fedify-dev#795 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the safety and robustness of the fedify bench command by introducing DNS-resolved target risk classification. It resolves public-looking hostnames to their IP addresses to classify them into loopback, private, or public tiers, ensuring a more conservative safety gate. Additionally, it refines the --dry-run behavior to perform WebFinger and actor inbox discovery without sending benchmark load, introduces stricter validation for unsafe overrides (requiring explicit CLI targets, duration, and load), and adds a local benchmark target fixture for testing. No review comments were provided for this pull request, so there is no feedback to address.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Closes #784. Part of #744.
This PR tightens the safety model for
fedify benchso benchmark runs are allowed only when the target and any signed inbox destinations are clearly safe, explicitly cooperative, or explicitly overridden with bounded load.The main change is that hostnames are now classified with DNS resolution before the benchmark safety gate runs. A staging host that resolves to a private address can be treated as private, while mixed or failed DNS answers still fall back to public. Inbox scenarios use the same resolved-address classification for off-origin shared inboxes, so a private staging target can send signed benchmark traffic to another private staging inbox without needing the unsafe public override.
The unsafe public override is also narrower.
--allow-unsafe-targetnow requires an explicit--target, plus explicit load and duration values either on the scenario or in suite defaults. Public inbox destinations follow the same rule, including suite-level defaults.Dry runs now perform the discovery work needed to print the actual plan, including WebFinger/inbox discovery and destination safety status, but still do not send benchmark load. This makes
fedify bench --dry-runuseful as a preflight check before a signed benchmark run.The shared benchmark fixture was moved into test/bench/fixture.ts and kept runtime-agnostic so the CLI tests continue to run under Deno, Node.js, and Bun.