Skip to content

fix: Solcast Accept header and sub-30min resampling#739

Merged
davidusb-geek merged 2 commits into
davidusb-geek:masterfrom
kallegrens:fix/solcast-header-and-resampling
Mar 10, 2026
Merged

fix: Solcast Accept header and sub-30min resampling#739
davidusb-geek merged 2 commits into
davidusb-geek:masterfrom
kallegrens:fix/solcast-header-and-resampling

Conversation

@kallegrens
Copy link
Copy Markdown
Contributor

@kallegrens kallegrens commented Mar 5, 2026

DISCLAIMER:
This is a Claude-generated PR, the code was however tested live on my emhass instance and seemed to work. Feel free to close if this is against policy.


Summary

Two bugs in _get_weather_solcast() prevent Solcast forecasts from working correctly:

Bug 1: Wrong HTTP header

The request uses "content-type": "application/json" instead of "Accept": "application/json". The content-type header describes the request body (which is empty for a GET), while Accept tells the server what format the client wants. Without the correct Accept header, Solcast returns HTML instead of JSON, causing a parse error.

Bug 2: No resampling for sub-30min timesteps

Solcast returns data at 30-minute intervals. When optimization_time_step < 30 (e.g. 15 min), the code compared len(data_list) against len(self.forecast_dates) and always found "not enough data" because 48 Solcast points < 96 forecast slots.

The fix builds a timestamped DataFrame from Solcast's period_end timestamps and resamples using reindex() + interpolate(method='time'), matching the pattern already used in _get_weather_solar_forecast().

Also

  • Removes the now-unused zip_longest import

Testing

Unit tests (pytest, local Python 3.12):

  • All 4 Solcast tests pass (3 existing + 1 new)
  • New: test_get_weather_forecast_solcast_15min_resampling_mock

Production pod validation (v0.17.0 on Kubernetes):
Patched forecast.py was kubectl cp'd into the running pod and validated:

  1. 30-min timestep (default): 48 slots, peak 9000 W — direct mapping ✅
  2. 15-min timestep (production): 96 slots, peak 9000 W — interpolated correctly ✅
    • e.g. between 08:00 (100 W) and 08:30 (500 W), 08:15 → 300 W (linear midpoint)
  3. Failure modes — all return False gracefully:
    • 429 rate limit ✅
    • 403 bad API key ✅
    • Empty forecasts ✅
    • Missing API key ✅

Live Solcast API call also confirmed working (got 200 + valid JSON before hitting free-tier daily quota).

Summary by Sourcery

Fix Solcast weather forecast retrieval and resampling to support sub-30-minute optimization time steps and ensure valid JSON responses.

Bug Fixes:

  • Correct Solcast HTTP request header to use the Accept header so the API returns JSON instead of HTML.
  • Handle empty Solcast forecast responses and return a graceful failure instead of proceeding with invalid data.
  • Fix Solcast data length mismatch for sub-30-minute optimization time steps by resampling provider data to match forecast dates.

Enhancements:

  • Aggregate multi-rooftop Solcast data using a timestamped DataFrame with time-based interpolation and zero-filled edges instead of manual list accumulation.

Tests:

  • Add a Solcast unit test verifying 30-minute input data is correctly resampled to a 15-minute optimization time step and contains no NaN values.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Mar 5, 2026

Reviewer's Guide

Fixes Solcast forecast integration by sending the correct HTTP Accept header and by converting Solcast’s 30‑minute PV forecast series into a timestamped DataFrame that is resampled/interpolated to match arbitrary optimization_time_step (e.g. 15 minutes), plus adds a regression test and minor cleanup.

Sequence diagram for Solcast forecast retrieval and resampling

sequenceDiagram
    participant Forecast as ForecastService
    participant SolcastAPI
    participant Pandas as PandasDataFrame

    Forecast->>Forecast: _get_weather_solcast(w_forecast_cache_path)
    Forecast->>Forecast: Build headers with Accept=header_accept
    Forecast->>Forecast: Compute days_solcast and roof_ids
    Forecast->>Forecast: Initialize total_data as empty DataFrame

    loop For each roof_id in roof_ids
        Forecast->>SolcastAPI: GET /rooftop_sites/{roof_id}/forecasts
        SolcastAPI-->>Forecast: HTTP response (status, body)

        alt HTTP 429 or 403 or other error
            Forecast->>Forecast: Log Solcast error
            Forecast-->>Forecast: Return False
        else HTTP 200
            Forecast->>Forecast: Parse JSON body into data
            alt data[forecasts] is empty
                Forecast->>Forecast: Log no data retrieved
                Forecast-->>Forecast: Return False
            else forecasts present
                Forecast->>Pandas: Build solcast_timestamps from period_end
                Forecast->>Pandas: Build data_tmp DataFrame(yhat=pv_estimate*1000, index=solcast_timestamps)
                Forecast->>Pandas: Localize to UTC if tz naive
                Forecast->>Pandas: Convert index tz to forecast_dates.tz
                Forecast->>Pandas: combined_index = union(index, forecast_dates)
                Forecast->>Pandas: Reindex to combined_index
                Forecast->>Pandas: Interpolate(method=time)
                Forecast->>Pandas: Reindex to forecast_dates
                Forecast->>Pandas: Fill NaN with 0.0

                alt total_data is empty
                    Forecast->>Forecast: total_data = data_tmp copy
                else total_data already has data
                    Forecast->>Forecast: total_data = total_data + data_tmp
                end
            end
        end
    end

    Forecast->>Forecast: data = total_data
    alt weather_forecast_cache enabled
        Forecast->>Forecast: set_cached_forecast_data(w_forecast_cache_path, data)
        Forecast-->>Forecast: cached data
    end
    Forecast-->>Forecast: Return DataFrame data
Loading

File-Level Changes

Change Details Files
Use correct HTTP header when calling Solcast so the API returns JSON instead of HTML.
  • Replace use of the content-type header with the Accept header when requesting Solcast forecasts.
  • Retain existing User-Agent and Authorization headers and Solcast URL construction.
src/emhass/forecast.py
Refactor Solcast forecast handling to work with 30-minute API data and arbitrary forecast timestep, aggregating across multiple rooftops.
  • Replace list-based accumulation of pv_estimate values with a timestamped pandas DataFrame built from Solcast period_end timestamps.
  • Return early with a logged error when Solcast forecasts array is empty instead of checking only for "not enough data" via length comparison.
  • Localize Solcast timestamps to UTC if naive, then convert to the forecast timezone used by forecast_dates.
  • Reindex the Solcast time series onto a combined index of original Solcast timestamps and target forecast_dates, then interpolate with method='time' and finally reindex to forecast_dates, filling remaining NaNs with zero.
  • Accumulate multiple rooftop forecasts by summing aligned DataFrames into a single total_data DataFrame, and return this as the result instead of constructing from parallel lists.
  • Remove now-unused zip_longest-based padding logic and list initialisation for total_data_list.
src/emhass/forecast.py
Add regression test to verify 30-minute Solcast data is resampled correctly for 15-minute optimization_time_step.
  • Add async test that sets freq and optimization_time_step to 15 minutes and rebuilds forecast_dates at 15-minute intervals.
  • Mock Solcast API response using stored pbz2 payload and aioresponses, checking that the constructed URL matches the expected hours query parameter.
  • Assert the returned DataFrame type, datetime index timezone, length equal to 15-minute forecast_dates, and absence of NaN values after interpolation.
  • Restore original forecast configuration and cached weather_forecast_data.pkl after the test to avoid side effects.
tests/test_forecast.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In _get_weather_solcast, instead of checking if len(total_data) == 0 to detect the initial iteration, consider initializing total_data to None and using an explicit if total_data is None: guard to avoid relying on DataFrame truthiness/length semantics.
  • The reindexing flow (union of data_tmp.index and self.forecast_dates, then reindex twice) could be simplified by interpolating on the original Solcast index and then directly reindexing to self.forecast_dates, which would avoid creating the extra combined index and reduce overhead while keeping the same behavior.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_get_weather_solcast`, instead of checking `if len(total_data) == 0` to detect the initial iteration, consider initializing `total_data` to `None` and using an explicit `if total_data is None:` guard to avoid relying on DataFrame truthiness/length semantics.
- The reindexing flow (`union` of `data_tmp.index` and `self.forecast_dates`, then `reindex` twice) could be simplified by interpolating on the original Solcast index and then directly reindexing to `self.forecast_dates`, which would avoid creating the extra combined index and reduce overhead while keeping the same behavior.

## Individual Comments

### Comment 1
<location path="tests/test_forecast.py" line_range="471-479" />
<code_context>
+
+            self.assertIsInstance(df_weather_scrap, type(pd.DataFrame()))
+            self.assertIsInstance(df_weather_scrap.index, pd.core.indexes.datetimes.DatetimeIndex)
+            self.assertEqual(df_weather_scrap.index.tz, self.fcst.time_zone)
+            # Key assertion: output length must match the 15-min forecast_dates
+            self.assertEqual(len(df_weather_scrap), len(self.fcst.forecast_dates))
+            # Verify no NaN values after interpolation
+            self.assertFalse(df_weather_scrap["yhat"].isna().any())
+
+        # Restore original freq/forecast_dates
</code_context>
<issue_to_address>
**suggestion (testing):** Consider asserting actual interpolation correctness at a few known timestamps, not just length and non-NaN values

Right now the test only validates shape, timezone, and absence of NaNs. Please also assert specific interpolated values at a few timestamps (e.g., take two known 30‑min points from the fixture, compute the expected linear midpoint at 15 minutes, and check that `yhat` at that time matches within a small tolerance). This will catch regressions where the interpolation method or configuration changes while still passing the current checks.

```suggestion
            self.assertIsInstance(df_weather_scrap, type(pd.DataFrame()))
            self.assertIsInstance(df_weather_scrap.index, pd.core.indexes.datetimes.DatetimeIndex)
            self.assertEqual(df_weather_scrap.index.tz, self.fcst.time_zone)
            # Key assertion: output length must match the 15-min forecast_dates
            self.assertEqual(len(df_weather_scrap), len(self.fcst.forecast_dates))
            # Verify no NaN values after interpolation
            self.assertFalse(df_weather_scrap["yhat"].isna().any())

            # Verify interpolation correctness at a midpoint between two 30-min source timestamps
            # Pick a midpoint index to avoid edge effects
            midpoint_idx = len(df_weather_scrap.index) // 2
            ts_mid = df_weather_scrap.index[midpoint_idx]
            ts_prev = ts_mid - pd.Timedelta(minutes=15)
            ts_next = ts_mid + pd.Timedelta(minutes=15)

            # Ensure the neighboring timestamps exist in the index
            self.assertIn(ts_prev, df_weather_scrap.index)
            self.assertIn(ts_next, df_weather_scrap.index)

            y_prev = df_weather_scrap.loc[ts_prev, "yhat"]
            y_mid = df_weather_scrap.loc[ts_mid, "yhat"]
            y_next = df_weather_scrap.loc[ts_next, "yhat"]

            # Expected linear interpolation at the midpoint
            expected_mid = (y_prev + y_next) / 2.0

            # Check that the interpolated midpoint matches the expected linear value
            self.assertAlmostEqual(y_mid, expected_mid, places=6)

        # Restore original freq/forecast_dates
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread tests/test_forecast.py
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 5, 2026

Codecov Report

❌ Patch coverage is 88.88889% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.03%. Comparing base (eb44041) to head (06a644d).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/emhass/forecast.py 88.88% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #739   +/-   ##
=======================================
  Coverage   81.03%   81.03%           
=======================================
  Files          10       10           
  Lines        5611     5617    +6     
=======================================
+ Hits         4547     4552    +5     
- Misses       1064     1065    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Bug 1: The HTTP header 'content-type' was used instead of 'Accept' when
requesting JSON from the Solcast API. This caused Solcast to return HTML
instead of JSON, breaking the forecast retrieval.

Bug 2: When optimization_time_step < 30 min (e.g. 15 min), the Solcast
function failed because it compared the number of 30-min data points
against the number of sub-30-min forecast slots, always finding 'not
enough data'. The fix builds a timestamped DataFrame from Solcast's
period_end timestamps and resamples via reindex + time interpolation
(matching the pattern used in _get_weather_solar_forecast).

Also removes the now-unused zip_longest import.

Adds test: test_get_weather_forecast_solcast_15min_resampling_mock
that verifies correct output length and no NaN values when
optimization_time_step=15min with 30-min Solcast source data.
@kallegrens kallegrens force-pushed the fix/solcast-header-and-resampling branch from 413627a to ab9ca9d Compare March 7, 2026 23:58
@davidusb-geek
Copy link
Copy Markdown
Owner

This is a Claude-generated PR, the code was however tested live on my emhass instance and seemed to work. Feel free to close if this is against policy.

Hi thanks for your contribution and no problem accepting PR created using AI. As long as they are well constructed and the code is tested. The best way is to propose unit tests to directly test the functionalities of your new code

@kallegrens
Copy link
Copy Markdown
Contributor Author

Hi thanks for your contribution and no problem accepting PR created using AI. As long as they are well constructed and the code is tested. The best way is to propose unit tests to directly test the functionalities of your new code

Okay, nice. There is a unit test in the PR too. Before I just meant that I'm also running this at home now and it is working well. I pushed a container to GHCR on my fork in the meantime until this is merged.

Anything specific you require before merging this?

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@davidusb-geek davidusb-geek merged commit d2f8c50 into davidusb-geek:master Mar 10, 2026
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants