Skip to content

Conversation

@TjebbeH
Copy link
Collaborator

@TjebbeH TjebbeH commented May 9, 2025

  • update to json api extractor (in particular make data_field optional and fix typehints)
  • added extractor for world pop age structure data
  • for this added functions to write to blob storage
  • integration test of world pop extractor

@TjebbeH TjebbeH marked this pull request as ready for review May 9, 2025 12:12
Copy link
Collaborator

@castledan castledan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks super cool!

@TjebbeH I left some comments here and there. Also, as we discussed, it would be good to address the following:

  • The list of countries in the config file does not necessarily coincide with the list of countries for which we should download raster data. In fact, that depends on the intersection between the country polygons, respectively according to our definition and WP definition.
  • Related to the item above, Extract should also take care of downloading the WP country borders, in order to compute intersections.

@p-phung I don't think you need to check everything, but maybe pay attention to the following items:

  • I didn't check much the methods (and related tests) in blob_storage, as I'm not familiar with those;
  • I find the structure of the pipeline very clear, but it does have some differences with the flood pipeline. I'll leave to you to judge whether this is a concern or not.

@TjebbeH
Copy link
Collaborator Author

TjebbeH commented Jun 1, 2025

The code looks super cool!

@TjebbeH I left some comments here and there. Also, as we discussed, it would be good to address the following:

* The list of countries in the config file does not necessarily coincide with the list of countries for which we should download raster data. In fact, that depends on the intersection between the country polygons, respectively according to our definition and WP definition.

* Related to the item above, Extract should also take care of downloading the [WP country borders](https://hub.worldpop.org/geodata/summary?id=29691), in order to compute intersections.

@castledan I'll do both in next PRs ok? As that would be easier when I have the ibf boundaries extraction inplemented.
I created an issue as reminder: #7

Copy link
Contributor

@p-phung p-phung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice and clear for me. Well done @TjebbeH.

I run a test locally. No error at all. Only a few warnings like one below. Is it expected?

WARNING  | retrievalpipeline.extract.api_json_extractor:validate_and_parse:119 - data retrieved from api does not have the expected schema: 1 validation error for AgeStructureSchema
files
  Value error, group '_m_1_' not found in files [type=value_error, input_value=['https://data.worldpop.o...O/LSO_SAP_1km_2020.zip'], input_type=list]

@TjebbeH
Copy link
Collaborator Author

TjebbeH commented Jun 16, 2025

Looks nice and clear for me. Well done @TjebbeH.

I run a test locally. No error at all. Only a few warnings like one below. Is it expected?

WARNING  | retrievalpipeline.extract.api_json_extractor:validate_and_parse:119 - data retrieved from api does not have the expected schema: 1 validation error for AgeStructureSchema
files
  Value error, group '_m_1_' not found in files [type=value_error, input_value=['https://data.worldpop.o...O/LSO_SAP_1km_2020.zip'], input_type=list]

Yes it's expected. Here i think it's because i didnt want to put 36 files in the test as this was not needed for the thing i wanted to test. Sometimes the printed warnings of tests also come from test that test if a 'bad' situation is handled correctly (search for tests with with pytest.raises(...):)

@p-phung
Copy link
Contributor

p-phung commented Jun 23, 2025

I did run some tests locally. Really like this warning part.
image

I am curious if you have some suggestion how we can streamline the notification because this pipeline will be run in a container instance remotely. FYI this is not a priority atm, no need to implement anything, just would like to collect some idea.

@TjebbeH
Copy link
Collaborator Author

TjebbeH commented Jun 23, 2025

I did run some tests locally. Really like this warning part. image

I am curious if you have some suggestion how we can streamline the notification because this pipeline will be run in a container instance remotely. FYI this is not a priority atm, no need to implement anything, just would like to collect some idea.

Maybe the same way you did before? collect the warnings and errors and send an alert?

@TjebbeH TjebbeH merged commit ba32e2c into dev Jul 1, 2025
1 check passed
@TjebbeH TjebbeH deleted the feature/world_pop_age_structure branch July 1, 2025 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants