added totals to the statistical captions by KarlDeck · Pull Request #53 · SchmiedmayerLab/SensorTSLM

KarlDeck · 2026-04-12T17:42:22Z

Add daily totals for step, distance, flights, and activity duration captions

♻️ Current situation & Problem

The statistical extractor currently only produces mean/max/min/std summaries for continuous channels. That misses some of the most natural daily summaries for HealthKit-derived signals, especially cumulative metrics such as step count, walking/running distance, and flights climbed, and it does not provide compact day-level duration summaries for binary activity channels such as sleep and in-bed.

This PR extends the statistical extraction flow so daily captions can include total step count and total distance for both devices, total flights climbed, and readable duration summaries for activity-style channels such as 8h in bed.

⚙️ Release Notes

Add daily total captions for iPhone and Apple Watch step count channels
Add daily total captions for iPhone and Apple Watch walking/running distance channels
Add daily total captions for flights climbed
Add duration-based statistical captions for sleep and activity channels such as sleeping, in bed, and workout activities

Examples of newly generated captions:

Daily total iPhone step count: 8421.0 steps.
Daily total Apple Watch distance: 6150.000 m.
Daily total flights climbed (iPhone): 12.0.
8h in bed.
7h 24m sleeping.

This change is not intended to be breaking and does not change the external public interface.

📚 Documentation

The implementation introduces specialized statistical aggregation paths for:

cumulative channels that should report totals in addition to standard summary statistics
binary sleep/activity channels that should report accumulated duration in human-readable form

The change is kept within the existing extractor/configuration structure by:

extending the aggregation result model to support totals and durations
updating the statistical extractor to emit additional caption variants when configured
wiring the relevant MHC channels to the new aggregators in the dataset config

✅ Testing

Verified that the updated files compile successfully with python3.11 -m compileall
Ran a focused synthetic-row sanity check to confirm the extractor emits the expected new captions for:
- daily total steps
- daily total distance
- daily total flights
- duration captions such as 8h in bed and 7h sleeping

I did not run the full dataset pipeline in this workspace because optional runtime dependencies required by the full stack are not currently installed locally.

Code of Conduct & Contributing Guidelines

By creating and submitting this pull request, you agree to follow our Code of Conduct and Contributing Guidelines:

I agree to follow the Code of Conduct and Contributing Guidelines.

coderabbitai · 2026-04-12T17:42:37Z

Warning

Rate limit exceeded

@KarlDeck has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 42 minutes and 37 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dd451907-4ee0-4aa2-b070-d187053218c2

📥 Commits

Reviewing files that changed from the base of the PR and between 32722c4 and a70682e.

📒 Files selected for processing (2)

extractors/statistical.py
mhc/constants.py

📝 Walkthrough

Walkthrough

A new AggregationResult dataclass now structures metric aggregation outputs with optional stats, total, and duration_units fields. The MetricAggregator.aggregate() return type changed from an unstructured tuple to this typed result. Two new aggregator classes (TotalAggregator and DurationAggregator) were introduced alongside updates to the statistical extractor and channel aggregation configuration.

Changes

Cohort / File(s)	Summary
Aggregation Framework `aggregators.py`	Introduced `AggregationResult` dataclass with optional `stats`, `total`, and `duration_units` fields. Updated `MetricAggregator.aggregate()` to return `AggregationResult \| None` instead of unstructured tuple. Added `TotalAggregator` (returns stats and sum) and `DurationAggregator` (filters positive values and returns duration sum).
Statistical Extractor `extractors/statistical.py`	Updated to import and handle `AggregationResult`. Expanded channel selection to include aggregator-configured channels. Modified aggregator result handling to access structured `aggregated.stats` field. Added fallback metadata derivation from `display_name`. Introduced helper methods (`_period_label`, `_total_unit`, `_format_duration`, `_duration_label`) and `_extra_captions()` for generating additional caption outputs per channel.
Health Metrics Configuration `mhc/constants.py`	Replaced single heart-rate aggregator with expanded `aggregators` mapping. Added `TotalAggregator()` for step count, distance, and flights climbed metrics. Added `DurationAggregator()` entries for all activity and sleep channels via dictionary comprehensions. Retained `NonZeroAggregator()` only for heart rate.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'added totals to the statistical captions' accurately summarizes the main change: adding daily totals (and duration) captions to the statistical extraction flow.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining the problem, solution, examples, and testing—all aligned with the implemented changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch KarlDeck/Statistical-Totals

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

extractors/statistical.py (2)

47-47: Avoid shadowing format builtin and prefer def over lambda.

The variable name format shadows Python's built-in function, which can cause subtle issues. Additionally, PEP 8 recommends using def instead of assigning a lambda expression to a variable.

♻️ Suggested refactor

-            format = lambda v, d=decimals: f"{v:.{d}f}"
+            def fmt(v: float, d: int = decimals) -> str:
+                return f"{v:.{d}f}"

Then update usages below to use fmt instead of format.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@extractors/statistical.py` at line 47, The current assignment "format =
lambda v, d=decimals: f\"{v:.{d}f}\"" shadows the built-in format and uses a
lambda; change it to a named function (e.g., def fmt(v, d=decimals): return
f"{v:.{d}f}") and replace all subsequent uses of format with fmt (referencing
the binding where the lambda was created in extractors/statistical.py) to avoid
builtin shadowing and follow PEP 8.

106-118: Redundant int() around round().

In Python 3, round(x) returns an int when called without the second argument, so wrapping it in int() is unnecessary.

🧹 Minor cleanup

     def _format_duration(self, duration_units: float) -> str:
         if self.config.time_unit == "minutes":
-            total_minutes = int(round(duration_units))
+            total_minutes = round(duration_units)
             hours, minutes = divmod(total_minutes, 60)
             if hours and minutes:
                 return f"{hours}h {minutes}m"
             if hours:
                 return f"{hours}h"
             return f"{minutes}m"
         if self.config.time_unit == "hours":
-            rounded = int(round(duration_units))
+            rounded = round(duration_units)
             return f"{rounded}h"
         return f"{duration_units:g} {self.config.time_unit}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@extractors/statistical.py` around lines 106 - 118, In _format_duration,
remove redundant int(...) wrappers around round(...) — replace
int(round(duration_units)) assigned to total_minutes and
int(round(duration_units)) assigned to rounded with just round(duration_units)
(method _format_duration, variables total_minutes and rounded, and the branches
checking self.config.time_unit "minutes" and "hours"); ensure resulting values
are used as integers where needed (e.g., divmod(total_minutes, 60)) so behavior
remains unchanged.

aggregators.py (1)

45-58: Consider extracting shared stats computation.

TotalAggregator.aggregate() duplicates the stats tuple construction from MetricAggregator.aggregate(). If the stats formula changes, both locations need updating.

♻️ One approach: extract a helper method

 class MetricAggregator:
     def prepare(self, series: np.ndarray) -> np.ndarray:
         return series  # identity by default

+    def _compute_stats(self, prepared: np.ndarray) -> tuple[float, float, float, float]:
+        return (
+            float(np.mean(prepared)),
+            float(np.max(prepared)),
+            float(np.min(prepared)),
+            float(np.std(prepared)),
+        )
+
     def aggregate(self, series: np.ndarray) -> AggregationResult | None:
         prepared = self.prepare(series)
         if len(prepared) == 0:
             return None
-        return AggregationResult(
-            stats=(
-                float(np.mean(prepared)),
-                float(np.max(prepared)),
-                float(np.min(prepared)),
-                float(np.std(prepared)),
-            )
-        )
+        return AggregationResult(stats=self._compute_stats(prepared))


 class TotalAggregator(MetricAggregator):
     def aggregate(self, series: np.ndarray) -> AggregationResult | None:
         prepared = self.prepare(series)
         if len(prepared) == 0:
             return None
         return AggregationResult(
-            stats=(
-                float(np.mean(prepared)),
-                float(np.max(prepared)),
-                float(np.min(prepared)),
-                float(np.std(prepared)),
-            ),
+            stats=self._compute_stats(prepared),
             total=float(np.sum(prepared)),
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@aggregators.py` around lines 45 - 58, TotalAggregator.aggregate duplicates
the stats tuple construction logic from MetricAggregator.aggregate (mean, max,
min, std); extract a shared helper on MetricAggregator (e.g., a protected method
like _compute_stats(prepared: np.ndarray) -> tuple[float,float,float,float])
that returns (mean,max,min,std) and use it from both MetricAggregator.aggregate
and TotalAggregator.aggregate, keeping existing calls to prepare() and
AggregationResult construction (referencing TotalAggregator.aggregate,
MetricAggregator.aggregate, prepare, and AggregationResult) so stats computation
is centralized.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@aggregators.py`:
- Around line 45-58: TotalAggregator.aggregate duplicates the stats tuple
construction logic from MetricAggregator.aggregate (mean, max, min, std);
extract a shared helper on MetricAggregator (e.g., a protected method like
_compute_stats(prepared: np.ndarray) -> tuple[float,float,float,float]) that
returns (mean,max,min,std) and use it from both MetricAggregator.aggregate and
TotalAggregator.aggregate, keeping existing calls to prepare() and
AggregationResult construction (referencing TotalAggregator.aggregate,
MetricAggregator.aggregate, prepare, and AggregationResult) so stats computation
is centralized.

In `@extractors/statistical.py`:
- Line 47: The current assignment "format = lambda v, d=decimals:
f\"{v:.{d}f}\"" shadows the built-in format and uses a lambda; change it to a
named function (e.g., def fmt(v, d=decimals): return f"{v:.{d}f}") and replace
all subsequent uses of format with fmt (referencing the binding where the lambda
was created in extractors/statistical.py) to avoid builtin shadowing and follow
PEP 8.
- Around line 106-118: In _format_duration, remove redundant int(...) wrappers
around round(...) — replace int(round(duration_units)) assigned to total_minutes
and int(round(duration_units)) assigned to rounded with just
round(duration_units) (method _format_duration, variables total_minutes and
rounded, and the branches checking self.config.time_unit "minutes" and "hours");
ensure resulting values are used as integers where needed (e.g.,
divmod(total_minutes, 60)) so behavior remains unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 410b6712-bc5d-4e11-86fc-7f7bb0d249dd

📥 Commits

Reviewing files that changed from the base of the PR and between b268ec4 and 32722c4.

📒 Files selected for processing (3)

aggregators.py
extractors/statistical.py
mhc/constants.py

max-rosenblattl · 2026-04-24T01:41:50Z

+    def _period_label(self, row: Recording) -> str:
+        if self.config.time_unit == "minutes" and row.values.shape[1] == 1440:
+            return "Daily"
+        if self.config.time_unit == "hours" and row.values.shape[1] == 24 * 7:
+            return "Weekly"
+        return "Total"


Could lead to "Total total …"

adressed in a70682e

added totals to the statistical captions

32722c4

KarlDeck requested a review from max-rosenblattl April 12, 2026 17:42

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Merge branch 'main' into KarlDeck/Statistical-Totals

3715bf9

max-rosenblattl reviewed Apr 24, 2026

View reviewed changes

KarlDeck and others added 2 commits April 27, 2026 19:14

Merge branch 'main' into KarlDeck/Statistical-Totals

1dd0192

Removed "Total total …" error

a70682e

KarlDeck requested a review from max-rosenblattl April 27, 2026 17:28

max-rosenblattl requested a review from milanagm April 30, 2026 05:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added totals to the statistical captions#53

added totals to the statistical captions#53
KarlDeck wants to merge 4 commits into
mainfrom
KarlDeck/Statistical-Totals

KarlDeck commented Apr 12, 2026

Uh oh!

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

max-rosenblattl Apr 24, 2026

Uh oh!

KarlDeck Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KarlDeck commented Apr 12, 2026

Add daily totals for step, distance, flights, and activity duration captions

♻️ Current situation & Problem

⚙️ Release Notes

📚 Documentation

✅ Testing

Code of Conduct & Contributing Guidelines

Uh oh!

coderabbitai Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

max-rosenblattl Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

KarlDeck Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading