feat(worker): fetch inventory policy from control plane at agentd startup (#30)

skullcrushercmd · skullcmd · web-flow · commit ef60f6f1eab2 · 2026-04-26T21:17:19.000-04:00
* feat(core): add InventoryPolicySnapshot for worker policy fetch * feat(config): inventory_policy_snapshot accessor + with_inventory_policy_snapshot apply * feat(worker-api): LoadInventoryPolicy request/response variant + client method * feat(api): serve LoadInventoryPolicy from worker_control endpoint * feat(worker): fetch inventory policy from control plane at agentd startup Worker now fetches the live allowed_host_suffixes / allowed_hosts / allowed_cidrs / allowed_ports from anyscan-api at startup before claiming any port-scans, and refreshes on a configurable cadence (default 300s, env AGENT_INVENTORY_REFRESH_SECONDS). This unblocks the prod scan #16 failure mode where workers fell back to InventoryConfig::default() (allowed_host_suffixes=["localhost"]), causing the streaming follow-on flusher to drop every internet IP via config.normalize_target_definition's host_is_allowed gate even after the API allowlist had been widened. Local /etc/agentd/runtime.env values remain a fallback when the control plane is unreachable; fetch failures log warn! and keep the prior policy in memory. Workers do NOT crash on fetch failure. * test(worker): make inventory refresh interval test pure to avoid env-var races The previous test used unsafe std::env::set_var/remove_var which flaked under cargo test's default multithreaded executor (set_var races with other threads reading env). Extract a pure parse_inventory_refresh_interval(Option<&str>) that takes the raw env value as a parameter; the env-reading wrapper is one line and trivially correct. The test now drives the pure function with no shared mutable state. * fix(worker): always run inventory refresh + register before run_once fetch Two review issues from codex on PR #30: 1. run_daemon: the periodic refresh check was placed near the bottom of the loop, after every claim arm that calls 'continue'. On a busy worker repeatedly claiming bootstrap_jobs / port_scans / runs the refresh block was never reached, so control-plane allowlist changes never propagated until the worker idled — defeating the whole point of the periodic refresh for the heavy-traffic case. Move the refresh to immediately after seed_bootstrap_inventory / queue_due_schedules_with_events (which always run), before any continue-able claim attempt. Bonus: claims spawned in the same iteration now operate against the freshest fetched policy. 2. run_once: the initial fetch ran before register_worker_or_bail. But non-register /api/worker/control requests in worker_control require an already-registered worker token; a fresh agent's first LoadInventoryPolicy returned 401, fell through to local fallback, and run_once never refreshed again. Register first, then fetch — matches the run_daemon ordering already in place. * fix(worker): hoist inventory refresh above all continue-able loop branches Codex P2 follow-up: the previous fix (dde4324) placed the refresh after seed_bootstrap_inventory / queue_due_schedules_with_events, which fixed the bootstrap-job / port-scan / run claim shadowing — but two earlier branches still continue above it: claim_next_pending_remote_command and the remote_update scheduling block. A worker continuously receiving remote debug commands or remote updates would loop forever without re-evaluating the refresh predicate. Hoist the refresh to the very top of each loop iteration, immediately after the try_join_next completion-collection (which doesn't continue). Now every iteration runs the refresh predicate exactly once regardless of which downstream fast-path fires. --------- Co-authored-by: skullcmd <skullcmd@anyvm.tech>
diff --git a/src/bin/anyscan-api.rs b/src/bin/anyscan-api.rs
@@ -2019,6 +2019,9 @@ async fn worker_control(
                 .load_scan_settings()
                 .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?,
         },
+        WorkerControlRequest::LoadInventoryPolicy => WorkerControlResponse::InventoryPolicy {
+            policy: state.config.inventory_policy_snapshot(),
+        },
     };
 
     Ok(Json(response))
diff --git a/src/bin/anyscan-worker.rs b/src/bin/anyscan-worker.rs
@@ -462,7 +462,7 @@ async fn run_daemon_with_retry(
 }
 
 async fn run_daemon(
-    config: AppConfig,
+    mut config: AppConfig,
     worker_id: String,
     store: AnyScanStore,
     detectors: DetectorEngine,
@@ -476,6 +476,19 @@ async fn run_daemon(
         &worker_runtime.registration,
         worker_registration_ttl,
     )?;
+    // Initial inventory policy fetch — best-effort. Failure here keeps the
+    // local /etc/agentd/runtime.env fallback (or InventoryConfig::default())
+    // in place so the worker can still claim non-policy-gated tasks. The
+    // streaming follow-on flusher's host_is_allowed check will keep dropping
+    // hosts outside the local allowlist until a refresh succeeds.
+    if let Err(error) = refresh_inventory_policy_from_control_plane(&mut config, &store) {
+        warn!(
+            %error,
+            "initial inventory policy fetch failed; using local fallback (workers may drop hosts outside the local allowlist)"
+        );
+    }
+    let inventory_refresh_interval = inventory_refresh_interval();
+    let mut last_inventory_refresh_at = Instant::now();
     let (remote_update_tx, remote_update_rx) =
         watch::channel(registered_worker.remote_update_requested_at);
     let (registration_shutdown_tx, registration_shutdown_rx) = oneshot::channel();
@@ -520,6 +533,25 @@ async fn run_daemon(
                 }
             }
 
+            // Refresh inventory policy at the very top of every iteration,
+            // before any of the work-claim arms below. Multiple branches in
+            // this loop (remote-debug commands, remote-update scheduling,
+            // bootstrap-job / port-scan / runnable-run claims, archive pass)
+            // each `continue` back to the loop top on success. If the
+            // refresh sat below any of them, a worker continuously feeding
+            // on that signal would never re-evaluate the refresh predicate
+            // and the control-plane allowlist changes would never propagate.
+            // Placing it above every `continue`-able branch guarantees it
+            // runs once per iteration regardless of which fast path fires.
+            if last_inventory_refresh_at.elapsed() >= inventory_refresh_interval {
+                if let Err(error) =
+                    refresh_inventory_policy_from_control_plane(&mut config, &store)
+                {
+                    warn!(%error, "inventory policy refresh failed; keeping prior policy");
+                }
+                last_inventory_refresh_at = Instant::now();
+            }
+
             if worker_runtime.registration.supports_remote_debug_commands {
                 if let Some(command) = store.claim_next_pending_remote_command()? {
                     info!(
@@ -706,14 +738,27 @@ async fn run_once(
     detectors: DetectorEngine,
     worker_runtime: &WorkerRuntime,
 ) -> Result<()> {
-    let worker_registration_ttl = worker_registration_ttl_seconds(config);
+    let mut config = config.clone();
+    let worker_registration_ttl = worker_registration_ttl_seconds(&config);
     let worker_registration_interval =
-        worker_registration_refresh_interval(config, worker_registration_ttl);
+        worker_registration_refresh_interval(&config, worker_registration_ttl);
     let registered_worker = register_worker_or_bail(
         &store,
         &worker_runtime.registration,
         worker_registration_ttl,
     )?;
+    // Fetch inventory policy AFTER register_worker_or_bail — non-register
+    // /api/worker/control requests in worker_control authenticate as an
+    // already-registered worker, so a fresh agent has to register first or
+    // the LoadInventoryPolicy call returns 401 and falls back to the local
+    // policy for the entire one-shot run.
+    if let Err(error) = refresh_inventory_policy_from_control_plane(&mut config, &store) {
+        warn!(
+            %error,
+            "initial inventory policy fetch failed; using local fallback"
+        );
+    }
+    let config = &config;
     let (remote_update_tx, remote_update_rx) =
         watch::channel(registered_worker.remote_update_requested_at);
     let (registration_shutdown_tx, registration_shutdown_rx) = oneshot::channel();
@@ -5507,12 +5552,52 @@ fn load_effective_runtime_config(
     base_config.with_scan_defaults_summary(&scan_settings)
 }
 
+const INVENTORY_REFRESH_INTERVAL_ENV: &str = "AGENT_INVENTORY_REFRESH_SECONDS";
+const DEFAULT_INVENTORY_REFRESH_INTERVAL_SECONDS: u64 = 300;
+
+fn parse_inventory_refresh_interval(raw: Option<&str>) -> Duration {
+    let parsed = raw
+        .and_then(|value| value.trim().parse::<u64>().ok())
+        .filter(|value| *value > 0)
+        .unwrap_or(DEFAULT_INVENTORY_REFRESH_INTERVAL_SECONDS);
+    Duration::from_secs(parsed)
+}
+
+fn inventory_refresh_interval() -> Duration {
+    parse_inventory_refresh_interval(env::var(INVENTORY_REFRESH_INTERVAL_ENV).ok().as_deref())
+}
+
+fn apply_inventory_policy_snapshot_to_config(
+    config: &mut AppConfig,
+    snapshot: &anyscan::core::InventoryPolicySnapshot,
+) -> Result<()> {
+    let updated = config.with_inventory_policy_snapshot(snapshot)?;
+    config.inventory = updated.inventory;
+    Ok(())
+}
+
+fn refresh_inventory_policy_from_control_plane(
+    config: &mut AppConfig,
+    store: &AnyScanStore,
+) -> Result<()> {
+    let snapshot = store.load_inventory_policy()?;
+    apply_inventory_policy_snapshot_to_config(config, &snapshot)?;
+    info!(
+        allowed_host_suffixes = config.inventory.allowed_host_suffixes.len(),
+        allowed_hosts = config.inventory.allowed_hosts.len(),
+        allowed_cidrs = config.inventory.allowed_cidrs.len(),
+        allowed_ports = config.inventory.allowed_ports.len(),
+        "refreshed inventory policy from control plane"
+    );
+    Ok(())
+}
+
 #[cfg(test)]
 mod tests {
     use super::{
         DiscoveredEndpoint, PortScanFollowOnSelectionMode, ReportedProtocolPluginFinding,
         ScannerOutputCounter, WORKER_REGISTRATION_TTL_MULTIPLIER,
-        apply_follow_on_selection_mode_to_targets,
+        apply_follow_on_selection_mode_to_targets, apply_inventory_policy_snapshot_to_config,
         derive_protocol_plugin_findings_with_active_mode, endpoint_cache_key,
         filter_endpoints_excluding_streamed, normalize_platform_architecture,
         normalize_platform_operating_system, parse_endpoint_token, parse_ip_addr_show_output,
@@ -6524,4 +6609,85 @@ mod tests {
 
         let _ = fs::remove_file(&path);
     }
+
+    #[test]
+    fn apply_inventory_policy_snapshot_overwrites_allowlists_in_config() {
+        use anyscan::core::InventoryPolicySnapshot;
+
+        let mut config = AppConfig::default();
+        // local fallback default: only "localhost" is allowed
+        assert_eq!(
+            config.inventory.allowed_host_suffixes,
+            vec!["localhost".to_string()]
+        );
+
+        let snapshot = InventoryPolicySnapshot {
+            allowed_host_suffixes: vec!["example.com".to_string()],
+            allowed_hosts: vec!["box.example.net".to_string()],
+            allowed_cidrs: vec!["10.0.0.0/8".to_string()],
+            allowed_ports: vec![80, 443, 8080],
+        };
+
+        apply_inventory_policy_snapshot_to_config(&mut config, &snapshot)
+            .expect("snapshot should apply cleanly");
+
+        assert_eq!(
+            config.inventory.allowed_host_suffixes,
+            vec!["example.com".to_string()]
+        );
+        assert_eq!(
+            config.inventory.allowed_hosts,
+            vec!["box.example.net".to_string()]
+        );
+        assert_eq!(config.inventory.allowed_cidrs, vec!["10.0.0.0/8".to_string()]);
+        assert_eq!(config.inventory.allowed_ports, vec![80, 443, 8080]);
+
+        // The previous "localhost" suffix is gone — the API host's policy wins.
+        assert!(config.host_is_allowed("box.example.net"));
+        assert!(config.host_is_allowed("api.example.com"));
+        assert!(config.host_is_allowed("10.0.0.42"));
+        assert!(!config.host_is_allowed("evil.test"));
+        assert!(!config.host_is_allowed("localhost"));
+    }
+
+    #[test]
+    fn parse_inventory_refresh_interval_handles_unset_zero_and_invalid_inputs() {
+        use super::parse_inventory_refresh_interval;
+
+        // Unset / None -> default
+        assert_eq!(
+            parse_inventory_refresh_interval(None),
+            Duration::from_secs(300)
+        );
+
+        // Valid positive integer
+        assert_eq!(
+            parse_inventory_refresh_interval(Some("42")),
+            Duration::from_secs(42)
+        );
+
+        // Whitespace is trimmed
+        assert_eq!(
+            parse_inventory_refresh_interval(Some("  42  ")),
+            Duration::from_secs(42)
+        );
+
+        // Zero is rejected — falls back to default
+        assert_eq!(
+            parse_inventory_refresh_interval(Some("0")),
+            Duration::from_secs(300)
+        );
+
+        // Non-numeric -> default
+        assert_eq!(
+            parse_inventory_refresh_interval(Some("not-a-number")),
+            Duration::from_secs(300)
+        );
+
+        // Empty string -> default
+        assert_eq!(
+            parse_inventory_refresh_interval(Some("")),
+            Duration::from_secs(300)
+        );
+    }
 }
diff --git a/src/config.rs b/src/config.rs
@@ -2907,6 +2907,56 @@ impl AppConfig {
         }
     }
 
+    pub fn inventory_policy_snapshot(&self) -> crate::core::InventoryPolicySnapshot {
+        crate::core::InventoryPolicySnapshot {
+            allowed_host_suffixes: self.inventory.allowed_host_suffixes.clone(),
+            allowed_hosts: self.inventory.allowed_hosts.clone(),
+            allowed_cidrs: self.inventory.allowed_cidrs.clone(),
+            allowed_ports: self.inventory.allowed_ports.clone(),
+        }
+    }
+
+    pub fn with_inventory_policy_snapshot(
+        &self,
+        snapshot: &crate::core::InventoryPolicySnapshot,
+    ) -> Result<Self> {
+        let mut config = self.clone();
+
+        let mut suffixes = snapshot
+            .allowed_host_suffixes
+            .iter()
+            .filter_map(|value| normalize_inventory_host(value))
+            .collect::<Vec<_>>();
+        suffixes.sort();
+        suffixes.dedup();
+        config.inventory.allowed_host_suffixes = suffixes;
+
+        let mut hosts = snapshot
+            .allowed_hosts
+            .iter()
+            .filter_map(|value| normalize_inventory_host(value))
+            .collect::<Vec<_>>();
+        hosts.sort();
+        hosts.dedup();
+        config.inventory.allowed_hosts = hosts;
+
+        let mut cidrs = snapshot
+            .allowed_cidrs
+            .iter()
+            .map(|value| normalize_inventory_cidr(value))
+            .collect::<Result<Vec<_>>>()?;
+        cidrs.sort();
+        cidrs.dedup();
+        config.inventory.allowed_cidrs = cidrs;
+
+        let mut ports = snapshot.allowed_ports.clone();
+        ports.sort_unstable();
+        ports.dedup();
+        config.inventory.allowed_ports = ports;
+
+        Ok(config)
+    }
+
     pub fn with_scan_defaults_summary(&self, summary: &ScanDefaultsSummary) -> Result<Self> {
         let mut config = self.clone();
         config.scan.request_engine_mode = summary.request_engine_mode;
@@ -3627,8 +3677,8 @@ mod tests {
     };
 
     use crate::core::{
-        GobusterTargetConfig, PortScanRequest, RepositoryDefinition, RequestEngineMode,
-        ScanDefaultsSummary, TargetDefinition, TargetStrategy,
+        GobusterTargetConfig, InventoryPolicySnapshot, PortScanRequest, RepositoryDefinition,
+        RequestEngineMode, ScanDefaultsSummary, TargetDefinition, TargetStrategy,
     };
 
     use super::{
@@ -3655,6 +3705,51 @@ mod tests {
         }
     }
 
+    #[test]
+    fn inventory_policy_snapshot_round_trips_into_app_config() {
+        let mut base = AppConfig::default();
+        base.inventory.allowed_host_suffixes = vec!["localhost".to_string()];
+        base.inventory.allowed_hosts.clear();
+        base.inventory.allowed_cidrs.clear();
+        base.inventory.allowed_ports.clear();
+
+        let snapshot = InventoryPolicySnapshot {
+            allowed_host_suffixes: vec![".example.com".to_string(), "Example.NET".to_string()],
+            allowed_hosts: vec!["BOX.Example.NET".to_string()],
+            allowed_cidrs: vec!["10.0.0.0/8".to_string()],
+            allowed_ports: vec![443, 80, 80],
+        };
+
+        let updated = base
+            .with_inventory_policy_snapshot(&snapshot)
+            .expect("snapshot should apply cleanly");
+
+        // host suffixes are lowercased, sorted, deduped (matches normalize_inventory)
+        assert_eq!(
+            updated.inventory.allowed_host_suffixes,
+            vec![".example.com".to_string(), "example.net".to_string()]
+        );
+        assert_eq!(
+            updated.inventory.allowed_hosts,
+            vec!["box.example.net".to_string()]
+        );
+        assert_eq!(
+            updated.inventory.allowed_cidrs,
+            vec!["10.0.0.0/8".to_string()]
+        );
+        assert_eq!(updated.inventory.allowed_ports, vec![80, 443]);
+
+        // round-trip the normalized policy back out and confirm field equality
+        let round_trip = updated.inventory_policy_snapshot();
+        assert_eq!(
+            round_trip.allowed_host_suffixes,
+            updated.inventory.allowed_host_suffixes
+        );
+        assert_eq!(round_trip.allowed_hosts, updated.inventory.allowed_hosts);
+        assert_eq!(round_trip.allowed_cidrs, updated.inventory.allowed_cidrs);
+        assert_eq!(round_trip.allowed_ports, updated.inventory.allowed_ports);
+    }
+
     #[test]
     fn exact_allowed_hosts_accept_ip_literals_and_ipv6_endpoints() {
         let mut config = AppConfig::default();
diff --git a/src/core.rs b/src/core.rs
@@ -2513,6 +2513,18 @@ pub struct ScanDefaultsSummary {
     pub directory_probing_discover_backup: bool,
 }
 
+#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq)]
+pub struct InventoryPolicySnapshot {
+    #[serde(default)]
+    pub allowed_host_suffixes: Vec<String>,
+    #[serde(default)]
+    pub allowed_hosts: Vec<String>,
+    #[serde(default)]
+    pub allowed_cidrs: Vec<String>,
+    #[serde(default)]
+    pub allowed_ports: Vec<u16>,
+}
+
 pub fn merge_coverage_source_stat(
     stats: &mut Vec<CoverageSourceStat>,
     source: &str,
diff --git a/src/worker_api.rs b/src/worker_api.rs