Skip to content

added totals to the statistical captions#53

Open
KarlDeck wants to merge 4 commits into
mainfrom
KarlDeck/Statistical-Totals
Open

added totals to the statistical captions#53
KarlDeck wants to merge 4 commits into
mainfrom
KarlDeck/Statistical-Totals

Conversation

@KarlDeck

Copy link
Copy Markdown
Collaborator

Add daily totals for step, distance, flights, and activity duration captions

♻️ Current situation & Problem

The statistical extractor currently only produces mean/max/min/std summaries for continuous channels. That misses some of the most natural daily summaries for HealthKit-derived signals, especially cumulative metrics such as step count, walking/running distance, and flights climbed, and it does not provide compact day-level duration summaries for binary activity channels such as sleep and in-bed.

This PR extends the statistical extraction flow so daily captions can include total step count and total distance for both devices, total flights climbed, and readable duration summaries for activity-style channels such as 8h in bed.

⚙️ Release Notes

  • Add daily total captions for iPhone and Apple Watch step count channels
  • Add daily total captions for iPhone and Apple Watch walking/running distance channels
  • Add daily total captions for flights climbed
  • Add duration-based statistical captions for sleep and activity channels such as sleeping, in bed, and workout activities

Examples of newly generated captions:

Daily total iPhone step count: 8421.0 steps.
Daily total Apple Watch distance: 6150.000 m.
Daily total flights climbed (iPhone): 12.0.
8h in bed.
7h 24m sleeping.

This change is not intended to be breaking and does not change the external public interface.

📚 Documentation

The implementation introduces specialized statistical aggregation paths for:

  • cumulative channels that should report totals in addition to standard summary statistics
  • binary sleep/activity channels that should report accumulated duration in human-readable form

The change is kept within the existing extractor/configuration structure by:

  • extending the aggregation result model to support totals and durations
  • updating the statistical extractor to emit additional caption variants when configured
  • wiring the relevant MHC channels to the new aggregators in the dataset config

✅ Testing

  • Verified that the updated files compile successfully with python3.11 -m compileall
  • Ran a focused synthetic-row sanity check to confirm the extractor emits the expected new captions for:
    • daily total steps
    • daily total distance
    • daily total flights
    • duration captions such as 8h in bed and 7h sleeping

I did not run the full dataset pipeline in this workspace because optional runtime dependencies required by the full stack are not currently installed locally.

Code of Conduct & Contributing Guidelines

By creating and submitting this pull request, you agree to follow our Code of Conduct and Contributing Guidelines:

@coderabbitai

coderabbitai Bot commented Apr 12, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@KarlDeck has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 42 minutes and 37 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dd451907-4ee0-4aa2-b070-d187053218c2

📥 Commits

Reviewing files that changed from the base of the PR and between 32722c4 and a70682e.

📒 Files selected for processing (2)
  • extractors/statistical.py
  • mhc/constants.py
📝 Walkthrough

Walkthrough

A new AggregationResult dataclass now structures metric aggregation outputs with optional stats, total, and duration_units fields. The MetricAggregator.aggregate() return type changed from an unstructured tuple to this typed result. Two new aggregator classes (TotalAggregator and DurationAggregator) were introduced alongside updates to the statistical extractor and channel aggregation configuration.

Changes

Cohort / File(s) Summary
Aggregation Framework
aggregators.py
Introduced AggregationResult dataclass with optional stats, total, and duration_units fields. Updated MetricAggregator.aggregate() to return AggregationResult | None instead of unstructured tuple. Added TotalAggregator (returns stats and sum) and DurationAggregator (filters positive values and returns duration sum).
Statistical Extractor
extractors/statistical.py
Updated to import and handle AggregationResult. Expanded channel selection to include aggregator-configured channels. Modified aggregator result handling to access structured aggregated.stats field. Added fallback metadata derivation from display_name. Introduced helper methods (_period_label, _total_unit, _format_duration, _duration_label) and _extra_captions() for generating additional caption outputs per channel.
Health Metrics Configuration
mhc/constants.py
Replaced single heart-rate aggregator with expanded aggregators mapping. Added TotalAggregator() for step count, distance, and flights climbed metrics. Added DurationAggregator() entries for all activity and sleep channels via dictionary comprehensions. Retained NonZeroAggregator() only for heart rate.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'added totals to the statistical captions' accurately summarizes the main change: adding daily totals (and duration) captions to the statistical extraction flow.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the problem, solution, examples, and testing—all aligned with the implemented changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch KarlDeck/Statistical-Totals

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
extractors/statistical.py (2)

47-47: Avoid shadowing format builtin and prefer def over lambda.

The variable name format shadows Python's built-in function, which can cause subtle issues. Additionally, PEP 8 recommends using def instead of assigning a lambda expression to a variable.

♻️ Suggested refactor
-            format = lambda v, d=decimals: f"{v:.{d}f}"
+            def fmt(v: float, d: int = decimals) -> str:
+                return f"{v:.{d}f}"

Then update usages below to use fmt instead of format.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extractors/statistical.py` at line 47, The current assignment "format =
lambda v, d=decimals: f\"{v:.{d}f}\"" shadows the built-in format and uses a
lambda; change it to a named function (e.g., def fmt(v, d=decimals): return
f"{v:.{d}f}") and replace all subsequent uses of format with fmt (referencing
the binding where the lambda was created in extractors/statistical.py) to avoid
builtin shadowing and follow PEP 8.

106-118: Redundant int() around round().

In Python 3, round(x) returns an int when called without the second argument, so wrapping it in int() is unnecessary.

🧹 Minor cleanup
     def _format_duration(self, duration_units: float) -> str:
         if self.config.time_unit == "minutes":
-            total_minutes = int(round(duration_units))
+            total_minutes = round(duration_units)
             hours, minutes = divmod(total_minutes, 60)
             if hours and minutes:
                 return f"{hours}h {minutes}m"
             if hours:
                 return f"{hours}h"
             return f"{minutes}m"
         if self.config.time_unit == "hours":
-            rounded = int(round(duration_units))
+            rounded = round(duration_units)
             return f"{rounded}h"
         return f"{duration_units:g} {self.config.time_unit}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extractors/statistical.py` around lines 106 - 118, In _format_duration,
remove redundant int(...) wrappers around round(...) — replace
int(round(duration_units)) assigned to total_minutes and
int(round(duration_units)) assigned to rounded with just round(duration_units)
(method _format_duration, variables total_minutes and rounded, and the branches
checking self.config.time_unit "minutes" and "hours"); ensure resulting values
are used as integers where needed (e.g., divmod(total_minutes, 60)) so behavior
remains unchanged.
aggregators.py (1)

45-58: Consider extracting shared stats computation.

TotalAggregator.aggregate() duplicates the stats tuple construction from MetricAggregator.aggregate(). If the stats formula changes, both locations need updating.

♻️ One approach: extract a helper method
 class MetricAggregator:
     def prepare(self, series: np.ndarray) -> np.ndarray:
         return series  # identity by default

+    def _compute_stats(self, prepared: np.ndarray) -> tuple[float, float, float, float]:
+        return (
+            float(np.mean(prepared)),
+            float(np.max(prepared)),
+            float(np.min(prepared)),
+            float(np.std(prepared)),
+        )
+
     def aggregate(self, series: np.ndarray) -> AggregationResult | None:
         prepared = self.prepare(series)
         if len(prepared) == 0:
             return None
-        return AggregationResult(
-            stats=(
-                float(np.mean(prepared)),
-                float(np.max(prepared)),
-                float(np.min(prepared)),
-                float(np.std(prepared)),
-            )
-        )
+        return AggregationResult(stats=self._compute_stats(prepared))


 class TotalAggregator(MetricAggregator):
     def aggregate(self, series: np.ndarray) -> AggregationResult | None:
         prepared = self.prepare(series)
         if len(prepared) == 0:
             return None
         return AggregationResult(
-            stats=(
-                float(np.mean(prepared)),
-                float(np.max(prepared)),
-                float(np.min(prepared)),
-                float(np.std(prepared)),
-            ),
+            stats=self._compute_stats(prepared),
             total=float(np.sum(prepared)),
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@aggregators.py` around lines 45 - 58, TotalAggregator.aggregate duplicates
the stats tuple construction logic from MetricAggregator.aggregate (mean, max,
min, std); extract a shared helper on MetricAggregator (e.g., a protected method
like _compute_stats(prepared: np.ndarray) -> tuple[float,float,float,float])
that returns (mean,max,min,std) and use it from both MetricAggregator.aggregate
and TotalAggregator.aggregate, keeping existing calls to prepare() and
AggregationResult construction (referencing TotalAggregator.aggregate,
MetricAggregator.aggregate, prepare, and AggregationResult) so stats computation
is centralized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@aggregators.py`:
- Around line 45-58: TotalAggregator.aggregate duplicates the stats tuple
construction logic from MetricAggregator.aggregate (mean, max, min, std);
extract a shared helper on MetricAggregator (e.g., a protected method like
_compute_stats(prepared: np.ndarray) -> tuple[float,float,float,float]) that
returns (mean,max,min,std) and use it from both MetricAggregator.aggregate and
TotalAggregator.aggregate, keeping existing calls to prepare() and
AggregationResult construction (referencing TotalAggregator.aggregate,
MetricAggregator.aggregate, prepare, and AggregationResult) so stats computation
is centralized.

In `@extractors/statistical.py`:
- Line 47: The current assignment "format = lambda v, d=decimals:
f\"{v:.{d}f}\"" shadows the built-in format and uses a lambda; change it to a
named function (e.g., def fmt(v, d=decimals): return f"{v:.{d}f}") and replace
all subsequent uses of format with fmt (referencing the binding where the lambda
was created in extractors/statistical.py) to avoid builtin shadowing and follow
PEP 8.
- Around line 106-118: In _format_duration, remove redundant int(...) wrappers
around round(...) — replace int(round(duration_units)) assigned to total_minutes
and int(round(duration_units)) assigned to rounded with just
round(duration_units) (method _format_duration, variables total_minutes and
rounded, and the branches checking self.config.time_unit "minutes" and "hours");
ensure resulting values are used as integers where needed (e.g.,
divmod(total_minutes, 60)) so behavior remains unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 410b6712-bc5d-4e11-86fc-7f7bb0d249dd

📥 Commits

Reviewing files that changed from the base of the PR and between b268ec4 and 32722c4.

📒 Files selected for processing (3)
  • aggregators.py
  • extractors/statistical.py
  • mhc/constants.py

Comment thread extractors/statistical.py Outdated
Comment on lines +92 to +97
def _period_label(self, row: Recording) -> str:
if self.config.time_unit == "minutes" and row.values.shape[1] == 1440:
return "Daily"
if self.config.time_unit == "hours" and row.values.shape[1] == 24 * 7:
return "Weekly"
return "Total"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could lead to "Total total …"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adressed in a70682e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants