Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Lading now built with edition 2024
- Removed use of compromised `tj-actions/changed-files` action from project's GitHub CI configuration
- Fixed devcontainer configuration to ensure the `rust-analyzer` can run successfully within IDEs
- Added a new gauge `processes_found` and a new warning log for processes we skipped

## [0.25.6]
## Fixed
Expand Down
48 changes: 36 additions & 12 deletions lading/src/observer/linux/procfs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,15 @@ impl Sampler {
clippy::too_many_lines,
clippy::cast_sign_loss,
clippy::cast_possible_truncation,
clippy::cast_possible_wrap
clippy::cast_possible_wrap,
clippy::cast_lossless
)]
pub(crate) async fn poll(&mut self, include_smaps: bool) -> Result<(), Error> {
// A tally of the total RSS and PSS consumed by the parent process and
// its children.
let mut aggr = memory::smaps_rollup::Aggregator::default();
let mut processes_found: i32 = 0;
let mut pids_skipped: FxHashSet<i32> = FxHashSet::default();

// Every sample run we collect all the child processes rooted at the
// parent. As noted by the procfs documentation is this done by
Expand Down Expand Up @@ -119,9 +122,18 @@ impl Sampler {
}
}

processes_found += 1;
let pid = process.pid();
if let Err(e) = self.handle_process(process, &mut aggr, include_smaps).await {
warn!("Encountered uncaught error when handling `/proc/{pid}/`: {e}");
match self.handle_process(process, &mut aggr, include_smaps).await {
Ok(true) => {
// handled successfully
}
Ok(false) => {
pids_skipped.insert(pid);
}
Err(e) => {
warn!("Encountered uncaught error when handling `/proc/{pid}/`: {e}");
}
}
}

Expand All @@ -130,10 +142,22 @@ impl Sampler {

gauge!("total_rss_bytes").set(aggr.rss as f64);
gauge!("total_pss_bytes").set(aggr.pss as f64);
gauge!("processes_found").set(processes_found as f64);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to go with consistency rather than correctness here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lint doesn't need a change in the types, just in conversion. I think you can replace this with

Suggested change
gauge!("processes_found").set(processes_found as f64);
gauge!("processes_found").set(processes_found.into());

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See our previous conversation: #1288 (comment)

I'd prefer to do a pass after where we address all of these together.


// If we skipped any processes, log a warning.
if !pids_skipped.is_empty() {
warn!(
"Skipped {} processes: {:?}",
pids_skipped.len(),
pids_skipped
);
}

Ok(())
}

/// Handle a process. Returns true if the process was handled successfully,
/// false if it was skipped for any reason.
#[allow(
clippy::similar_names,
clippy::too_many_lines,
Expand All @@ -146,7 +170,7 @@ impl Sampler {
process: Process,
aggr: &mut memory::smaps_rollup::Aggregator,
include_smaps: bool,
) -> Result<(), Error> {
) -> Result<bool, Error> {
let pid = process.pid();

// `/proc/{pid}/status`
Expand All @@ -156,12 +180,12 @@ impl Sampler {
warn!("Couldn't read status: {:?}", e);
// The pid may have exited since we scanned it or we may not
// have sufficient permission.
return Ok(());
return Ok(false);
}
};
if status.tgid != pid {
// This is a thread, not a process and we do not wish to scan it.
return Ok(());
return Ok(false);
}

// If we haven't seen this process before, initialize its ProcessInfo.
Expand All @@ -174,7 +198,7 @@ impl Sampler {
warn!("Couldn't read exe for pid {}: {:?}", pid, e);
// The pid may have exited since we scanned it or we may not
// have sufficient permission.
return Ok(());
return Ok(false);
}
};
let comm = match proc_comm(pid).await {
Expand All @@ -183,7 +207,7 @@ impl Sampler {
warn!("Couldn't read comm for pid {}: {:?}", pid, e);
// The pid may have exited since we scanned it or we may not
// have sufficient permission.
return Ok(());
return Ok(false);
}
};
let cmdline = match proc_cmdline(pid).await {
Expand All @@ -192,7 +216,7 @@ impl Sampler {
warn!("Couldn't read cmdline for pid {}: {:?}", pid, e);
// The pid may have exited since we scanned it or we may not
// have sufficient permission.
return Ok(());
return Ok(false);
}
};
let pid_s = format!("{pid}");
Expand Down Expand Up @@ -238,7 +262,7 @@ impl Sampler {
// which will happen if we don't have permissions or, more
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the GH UI is preventing me from commenting on the right line but the:

let uptime = uptime::poll().await?;

is preventing us from simplifying the return to simply be bool

// likely, the process has exited.
warn!("Couldn't process `/proc/{pid}/stat`: {e}");
return Ok(());
return Ok(false);
}

if include_smaps {
Expand Down Expand Up @@ -317,10 +341,10 @@ impl Sampler {
// which will happen if we don't have permissions or, more
// likely, the process has exited.
warn!("Couldn't process `/proc/{pid}/smaps_rollup`: {err}");
return Ok(());
return Ok(false);
}

Ok(())
Ok(true)
}
}

Expand Down
Loading