added totals to the statistical captions#53
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughA new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
extractors/statistical.py (2)
47-47: Avoid shadowingformatbuiltin and preferdefoverlambda.The variable name
formatshadows Python's built-in function, which can cause subtle issues. Additionally, PEP 8 recommends usingdefinstead of assigning a lambda expression to a variable.♻️ Suggested refactor
- format = lambda v, d=decimals: f"{v:.{d}f}" + def fmt(v: float, d: int = decimals) -> str: + return f"{v:.{d}f}"Then update usages below to use
fmtinstead offormat.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extractors/statistical.py` at line 47, The current assignment "format = lambda v, d=decimals: f\"{v:.{d}f}\"" shadows the built-in format and uses a lambda; change it to a named function (e.g., def fmt(v, d=decimals): return f"{v:.{d}f}") and replace all subsequent uses of format with fmt (referencing the binding where the lambda was created in extractors/statistical.py) to avoid builtin shadowing and follow PEP 8.
106-118: Redundantint()aroundround().In Python 3,
round(x)returns anintwhen called without the second argument, so wrapping it inint()is unnecessary.🧹 Minor cleanup
def _format_duration(self, duration_units: float) -> str: if self.config.time_unit == "minutes": - total_minutes = int(round(duration_units)) + total_minutes = round(duration_units) hours, minutes = divmod(total_minutes, 60) if hours and minutes: return f"{hours}h {minutes}m" if hours: return f"{hours}h" return f"{minutes}m" if self.config.time_unit == "hours": - rounded = int(round(duration_units)) + rounded = round(duration_units) return f"{rounded}h" return f"{duration_units:g} {self.config.time_unit}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extractors/statistical.py` around lines 106 - 118, In _format_duration, remove redundant int(...) wrappers around round(...) — replace int(round(duration_units)) assigned to total_minutes and int(round(duration_units)) assigned to rounded with just round(duration_units) (method _format_duration, variables total_minutes and rounded, and the branches checking self.config.time_unit "minutes" and "hours"); ensure resulting values are used as integers where needed (e.g., divmod(total_minutes, 60)) so behavior remains unchanged.aggregators.py (1)
45-58: Consider extracting shared stats computation.
TotalAggregator.aggregate()duplicates the stats tuple construction fromMetricAggregator.aggregate(). If the stats formula changes, both locations need updating.♻️ One approach: extract a helper method
class MetricAggregator: def prepare(self, series: np.ndarray) -> np.ndarray: return series # identity by default + def _compute_stats(self, prepared: np.ndarray) -> tuple[float, float, float, float]: + return ( + float(np.mean(prepared)), + float(np.max(prepared)), + float(np.min(prepared)), + float(np.std(prepared)), + ) + def aggregate(self, series: np.ndarray) -> AggregationResult | None: prepared = self.prepare(series) if len(prepared) == 0: return None - return AggregationResult( - stats=( - float(np.mean(prepared)), - float(np.max(prepared)), - float(np.min(prepared)), - float(np.std(prepared)), - ) - ) + return AggregationResult(stats=self._compute_stats(prepared)) class TotalAggregator(MetricAggregator): def aggregate(self, series: np.ndarray) -> AggregationResult | None: prepared = self.prepare(series) if len(prepared) == 0: return None return AggregationResult( - stats=( - float(np.mean(prepared)), - float(np.max(prepared)), - float(np.min(prepared)), - float(np.std(prepared)), - ), + stats=self._compute_stats(prepared), total=float(np.sum(prepared)), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@aggregators.py` around lines 45 - 58, TotalAggregator.aggregate duplicates the stats tuple construction logic from MetricAggregator.aggregate (mean, max, min, std); extract a shared helper on MetricAggregator (e.g., a protected method like _compute_stats(prepared: np.ndarray) -> tuple[float,float,float,float]) that returns (mean,max,min,std) and use it from both MetricAggregator.aggregate and TotalAggregator.aggregate, keeping existing calls to prepare() and AggregationResult construction (referencing TotalAggregator.aggregate, MetricAggregator.aggregate, prepare, and AggregationResult) so stats computation is centralized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@aggregators.py`:
- Around line 45-58: TotalAggregator.aggregate duplicates the stats tuple
construction logic from MetricAggregator.aggregate (mean, max, min, std);
extract a shared helper on MetricAggregator (e.g., a protected method like
_compute_stats(prepared: np.ndarray) -> tuple[float,float,float,float]) that
returns (mean,max,min,std) and use it from both MetricAggregator.aggregate and
TotalAggregator.aggregate, keeping existing calls to prepare() and
AggregationResult construction (referencing TotalAggregator.aggregate,
MetricAggregator.aggregate, prepare, and AggregationResult) so stats computation
is centralized.
In `@extractors/statistical.py`:
- Line 47: The current assignment "format = lambda v, d=decimals:
f\"{v:.{d}f}\"" shadows the built-in format and uses a lambda; change it to a
named function (e.g., def fmt(v, d=decimals): return f"{v:.{d}f}") and replace
all subsequent uses of format with fmt (referencing the binding where the lambda
was created in extractors/statistical.py) to avoid builtin shadowing and follow
PEP 8.
- Around line 106-118: In _format_duration, remove redundant int(...) wrappers
around round(...) — replace int(round(duration_units)) assigned to total_minutes
and int(round(duration_units)) assigned to rounded with just
round(duration_units) (method _format_duration, variables total_minutes and
rounded, and the branches checking self.config.time_unit "minutes" and "hours");
ensure resulting values are used as integers where needed (e.g.,
divmod(total_minutes, 60)) so behavior remains unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 410b6712-bc5d-4e11-86fc-7f7bb0d249dd
📒 Files selected for processing (3)
aggregators.pyextractors/statistical.pymhc/constants.py
| def _period_label(self, row: Recording) -> str: | ||
| if self.config.time_unit == "minutes" and row.values.shape[1] == 1440: | ||
| return "Daily" | ||
| if self.config.time_unit == "hours" and row.values.shape[1] == 24 * 7: | ||
| return "Weekly" | ||
| return "Total" |
There was a problem hiding this comment.
Could lead to "Total total …"
Add daily totals for step, distance, flights, and activity duration captions
♻️ Current situation & Problem
The statistical extractor currently only produces mean/max/min/std summaries for continuous channels. That misses some of the most natural daily summaries for HealthKit-derived signals, especially cumulative metrics such as step count, walking/running distance, and flights climbed, and it does not provide compact day-level duration summaries for binary activity channels such as sleep and in-bed.
This PR extends the statistical extraction flow so daily captions can include total step count and total distance for both devices, total flights climbed, and readable duration summaries for activity-style channels such as
8h in bed.⚙️ Release Notes
sleeping,in bed, and workout activitiesExamples of newly generated captions:
This change is not intended to be breaking and does not change the external public interface.
📚 Documentation
The implementation introduces specialized statistical aggregation paths for:
The change is kept within the existing extractor/configuration structure by:
✅ Testing
python3.11 -m compileall8h in bedand7h sleepingI did not run the full dataset pipeline in this workspace because optional runtime dependencies required by the full stack are not currently installed locally.
Code of Conduct & Contributing Guidelines
By creating and submitting this pull request, you agree to follow our Code of Conduct and Contributing Guidelines: