Skip to content

Conversation

@p-phung
Copy link
Contributor

@p-phung p-phung commented Oct 7, 2025

  • new extractor for IPC acute food insecurity
  • define config for IPC data request
  • extractor also take cares of getting the right metadata for the IPC food insecurity
  • update readme accordingly

Todo:

  • proper way to load secrets
  • add IPC api key loading to the extractor

Copy link
Collaborator

@TjebbeH TjebbeH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I think main suggestion is to fail fast when things are unexpected rather than passing on None values or empty dataframes which then later have to be checked if they are None/empty. And quite some of the complexity of the code can be reduced by validating a bit stronger/earlier.

Let me know if you have any questions about my comments :)

load_dotenv()


class IpcPropertiesSchema(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many fields are optional and are allowed to be None, is this what you want? Is data were all these fields are None valid? Downside of allowing None's is that then later in the code you have to keep checking if things are None or not. This often leads to difficult to read code with many if statements. So it's advised to keep the locations where you allow None to a minimum and directly raise an error when things are not as the are expected to be.

I would suggets to think about if you really need these fields. If not remove them. If you do need them you probably need them to be something else then None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware of it. The data extracted from the API is pretty messy. There are nested geometry features in those features with from_date and to_date None. Only after exploding the nested features, I can extract the right data from there.

I wonder what else we could do to refine it from the tap like here?

if geom_dict.get("type") == "GeometryCollection" and geom_dict.get("coordinates") is None:
geom_dict.pop("coordinates", None)

# skip invalid geometry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know exactly what a geometry should look like, but can invalid geomeries not be filtered out by means of the pydantic schema's above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the nested features issue seems to be quite random. And there are too many features to go through (Somalia has about 3500 features after filtered). Thats why I couldn't structure it in the pydantic schema and opt for "get first, filter later"

geom_dict = copy.deepcopy(feature["geometry"])

# fix invalid GeometryCollection: remove "coordinates": None
if geom_dict.get("type") == "GeometryCollection" and geom_dict.get("coordinates") is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a sympthom of allowing None's

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one of the issues caused by the messy nested features problem above. Happy to brainstorm about this to find another solution

@TjebbeH
Copy link
Collaborator

TjebbeH commented Oct 11, 2025

The integration test fails on retrieving env var, when you add @pytest.mark.needs_secrets, it will retrieve the env vars locally and skip the test in the CI. What was your idea to mock the api in the integration test?

Copy link
Contributor Author

@p-phung p-phung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many of the issue causes by lack of type check for IpcGeometrySchema. I have added this type check Literal as you suggests them and it seem those issues are gone.

still one of the major issue is the input data from IPC api is quite messy with lots of random nested features that makes it difficult to do pydantic check. If you have some advice, happy to learn about it

load_dotenv()


class IpcPropertiesSchema(BaseModel):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware of it. The data extracted from the API is pretty messy. There are nested geometry features in those features with from_date and to_date None. Only after exploding the nested features, I can extract the right data from there.

I wonder what else we could do to refine it from the tap like here?


id: str = Field(description="Identifier of data.", examples=["12511433"])
title: str = Field(description="Title of data.", examples=["Somalia - Acute Food Insecurity July 2017"])
link: str | None = Field(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not actually needed. Should I make it optional instead?

description="Type of IPC Analysis (condition).",
examples=["A", "C"],
)
created: str = Field(description="Creation date of IPC data.", examples=["2018-06-04"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how handy!

geom_dict = copy.deepcopy(feature["geometry"])

# fix invalid GeometryCollection: remove "coordinates": None
if geom_dict.get("type") == "GeometryCollection" and geom_dict.get("coordinates") is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one of the issues caused by the messy nested features problem above. Happy to brainstorm about this to find another solution

if geom_dict.get("type") == "GeometryCollection" and geom_dict.get("coordinates") is None:
geom_dict.pop("coordinates", None)

# skip invalid geometry
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the nested features issue seems to be quite random. And there are too many features to go through (Somalia has about 3500 features after filtered). Thats why I couldn't structure it in the pydantic schema and opt for "get first, filter later"

@p-phung p-phung marked this pull request as ready for review October 23, 2025 13:51
except Exception as e:
logger.error(f"Validation failed for {iso2}: {e}")

return gdf_iso2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gdf_iso2 does not exist if data is none or if an exception is caught in the try explode geometries.

@p-phung p-phung merged commit d8f6c86 into somalia/dev Oct 24, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants