fix: Solcast Accept header and sub-30min resampling#739
Conversation
Reviewer's GuideFixes Solcast forecast integration by sending the correct HTTP Accept header and by converting Solcast’s 30‑minute PV forecast series into a timestamped DataFrame that is resampled/interpolated to match arbitrary optimization_time_step (e.g. 15 minutes), plus adds a regression test and minor cleanup. Sequence diagram for Solcast forecast retrieval and resamplingsequenceDiagram
participant Forecast as ForecastService
participant SolcastAPI
participant Pandas as PandasDataFrame
Forecast->>Forecast: _get_weather_solcast(w_forecast_cache_path)
Forecast->>Forecast: Build headers with Accept=header_accept
Forecast->>Forecast: Compute days_solcast and roof_ids
Forecast->>Forecast: Initialize total_data as empty DataFrame
loop For each roof_id in roof_ids
Forecast->>SolcastAPI: GET /rooftop_sites/{roof_id}/forecasts
SolcastAPI-->>Forecast: HTTP response (status, body)
alt HTTP 429 or 403 or other error
Forecast->>Forecast: Log Solcast error
Forecast-->>Forecast: Return False
else HTTP 200
Forecast->>Forecast: Parse JSON body into data
alt data[forecasts] is empty
Forecast->>Forecast: Log no data retrieved
Forecast-->>Forecast: Return False
else forecasts present
Forecast->>Pandas: Build solcast_timestamps from period_end
Forecast->>Pandas: Build data_tmp DataFrame(yhat=pv_estimate*1000, index=solcast_timestamps)
Forecast->>Pandas: Localize to UTC if tz naive
Forecast->>Pandas: Convert index tz to forecast_dates.tz
Forecast->>Pandas: combined_index = union(index, forecast_dates)
Forecast->>Pandas: Reindex to combined_index
Forecast->>Pandas: Interpolate(method=time)
Forecast->>Pandas: Reindex to forecast_dates
Forecast->>Pandas: Fill NaN with 0.0
alt total_data is empty
Forecast->>Forecast: total_data = data_tmp copy
else total_data already has data
Forecast->>Forecast: total_data = total_data + data_tmp
end
end
end
end
Forecast->>Forecast: data = total_data
alt weather_forecast_cache enabled
Forecast->>Forecast: set_cached_forecast_data(w_forecast_cache_path, data)
Forecast-->>Forecast: cached data
end
Forecast-->>Forecast: Return DataFrame data
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
_get_weather_solcast, instead of checkingif len(total_data) == 0to detect the initial iteration, consider initializingtotal_datatoNoneand using an explicitif total_data is None:guard to avoid relying on DataFrame truthiness/length semantics. - The reindexing flow (
unionofdata_tmp.indexandself.forecast_dates, thenreindextwice) could be simplified by interpolating on the original Solcast index and then directly reindexing toself.forecast_dates, which would avoid creating the extra combined index and reduce overhead while keeping the same behavior.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_get_weather_solcast`, instead of checking `if len(total_data) == 0` to detect the initial iteration, consider initializing `total_data` to `None` and using an explicit `if total_data is None:` guard to avoid relying on DataFrame truthiness/length semantics.
- The reindexing flow (`union` of `data_tmp.index` and `self.forecast_dates`, then `reindex` twice) could be simplified by interpolating on the original Solcast index and then directly reindexing to `self.forecast_dates`, which would avoid creating the extra combined index and reduce overhead while keeping the same behavior.
## Individual Comments
### Comment 1
<location path="tests/test_forecast.py" line_range="471-479" />
<code_context>
+
+ self.assertIsInstance(df_weather_scrap, type(pd.DataFrame()))
+ self.assertIsInstance(df_weather_scrap.index, pd.core.indexes.datetimes.DatetimeIndex)
+ self.assertEqual(df_weather_scrap.index.tz, self.fcst.time_zone)
+ # Key assertion: output length must match the 15-min forecast_dates
+ self.assertEqual(len(df_weather_scrap), len(self.fcst.forecast_dates))
+ # Verify no NaN values after interpolation
+ self.assertFalse(df_weather_scrap["yhat"].isna().any())
+
+ # Restore original freq/forecast_dates
</code_context>
<issue_to_address>
**suggestion (testing):** Consider asserting actual interpolation correctness at a few known timestamps, not just length and non-NaN values
Right now the test only validates shape, timezone, and absence of NaNs. Please also assert specific interpolated values at a few timestamps (e.g., take two known 30‑min points from the fixture, compute the expected linear midpoint at 15 minutes, and check that `yhat` at that time matches within a small tolerance). This will catch regressions where the interpolation method or configuration changes while still passing the current checks.
```suggestion
self.assertIsInstance(df_weather_scrap, type(pd.DataFrame()))
self.assertIsInstance(df_weather_scrap.index, pd.core.indexes.datetimes.DatetimeIndex)
self.assertEqual(df_weather_scrap.index.tz, self.fcst.time_zone)
# Key assertion: output length must match the 15-min forecast_dates
self.assertEqual(len(df_weather_scrap), len(self.fcst.forecast_dates))
# Verify no NaN values after interpolation
self.assertFalse(df_weather_scrap["yhat"].isna().any())
# Verify interpolation correctness at a midpoint between two 30-min source timestamps
# Pick a midpoint index to avoid edge effects
midpoint_idx = len(df_weather_scrap.index) // 2
ts_mid = df_weather_scrap.index[midpoint_idx]
ts_prev = ts_mid - pd.Timedelta(minutes=15)
ts_next = ts_mid + pd.Timedelta(minutes=15)
# Ensure the neighboring timestamps exist in the index
self.assertIn(ts_prev, df_weather_scrap.index)
self.assertIn(ts_next, df_weather_scrap.index)
y_prev = df_weather_scrap.loc[ts_prev, "yhat"]
y_mid = df_weather_scrap.loc[ts_mid, "yhat"]
y_next = df_weather_scrap.loc[ts_next, "yhat"]
# Expected linear interpolation at the midpoint
expected_mid = (y_prev + y_next) / 2.0
# Check that the interpolated midpoint matches the expected linear value
self.assertAlmostEqual(y_mid, expected_mid, places=6)
# Restore original freq/forecast_dates
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #739 +/- ##
=======================================
Coverage 81.03% 81.03%
=======================================
Files 10 10
Lines 5611 5617 +6
=======================================
+ Hits 4547 4552 +5
- Misses 1064 1065 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Bug 1: The HTTP header 'content-type' was used instead of 'Accept' when requesting JSON from the Solcast API. This caused Solcast to return HTML instead of JSON, breaking the forecast retrieval. Bug 2: When optimization_time_step < 30 min (e.g. 15 min), the Solcast function failed because it compared the number of 30-min data points against the number of sub-30-min forecast slots, always finding 'not enough data'. The fix builds a timestamped DataFrame from Solcast's period_end timestamps and resamples via reindex + time interpolation (matching the pattern used in _get_weather_solar_forecast). Also removes the now-unused zip_longest import. Adds test: test_get_weather_forecast_solcast_15min_resampling_mock that verifies correct output length and no NaN values when optimization_time_step=15min with 30-min Solcast source data.
413627a to
ab9ca9d
Compare
Hi thanks for your contribution and no problem accepting PR created using AI. As long as they are well constructed and the code is tested. The best way is to propose unit tests to directly test the functionalities of your new code |
Okay, nice. There is a unit test in the PR too. Before I just meant that I'm also running this at home now and it is working well. I pushed a container to GHCR on my fork in the meantime until this is merged. Anything specific you require before merging this? |
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
DISCLAIMER:
This is a Claude-generated PR, the code was however tested live on my emhass instance and seemed to work. Feel free to close if this is against policy.
Summary
Two bugs in
_get_weather_solcast()prevent Solcast forecasts from working correctly:Bug 1: Wrong HTTP header
The request uses
"content-type": "application/json"instead of"Accept": "application/json". Thecontent-typeheader describes the request body (which is empty for a GET), whileAccepttells the server what format the client wants. Without the correctAcceptheader, Solcast returns HTML instead of JSON, causing a parse error.Bug 2: No resampling for sub-30min timesteps
Solcast returns data at 30-minute intervals. When
optimization_time_step < 30(e.g. 15 min), the code comparedlen(data_list)againstlen(self.forecast_dates)and always found "not enough data" because 48 Solcast points < 96 forecast slots.The fix builds a timestamped DataFrame from Solcast's
period_endtimestamps and resamples usingreindex() + interpolate(method='time'), matching the pattern already used in_get_weather_solar_forecast().Also
zip_longestimportTesting
Unit tests (pytest, local Python 3.12):
test_get_weather_forecast_solcast_15min_resampling_mockProduction pod validation (v0.17.0 on Kubernetes):
Patched
forecast.pywaskubectl cp'd into the running pod and validated:Falsegracefully:Live Solcast API call also confirmed working (got 200 + valid JSON before hitting free-tier daily quota).
Summary by Sourcery
Fix Solcast weather forecast retrieval and resampling to support sub-30-minute optimization time steps and ensure valid JSON responses.
Bug Fixes:
Enhancements:
Tests: