Skip to content

Improvements on correlation-id #136

@ranjan-stha

Description

@ranjan-stha

Issue: Sometimes(or even frequently), the same correlation id are generated.
Case 1: If the events (same hazard) occurs on same date at different locations, the generated Correlation id will be be same as we use the ISO3 level info in the string. Note that in the current implementation, the time information is not present (not in all sources).
Case 2: In case of earthquakes in the sea/ocean, the country(ISO3) is set as UNK (unknown). So, for the events occurring on the same date, will have the same correlation ID.

Solution:

  1. To partition the Earth into several blocks which are apart by 0.2 degrees (lat and long) and assign a number to those blocks. Based on the point geometry or bbox centroid, we get the block ID and attach that block ID to the correleation ID. This shall generate the unique correlation ID. But still not 100% sure as 0.2 degrees in lat and long would mean we cover an area of roughly 20km by 20km. If different events (same hazard) occur on the same date and within this patch of land, it would generate the same block ID.

I shall use the format parquet to save the data which shall be around 7 MB with the above configs of 0.2 degrees (lat, long)

  1. Introduce the time as well whenever it is available in the correlation ID. But, this time information is not available in all the sources (e.g. EMDAT doesn't have time information like at what time did the event occur?)

@emmanuelmathot Any thoughts/quick improvements on this ?

I will start with Solution 1 and send the PR.

cc @subinasr

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions