-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Issue: Sometimes(or even frequently), the same correlation id are generated.
Case 1: If the events (same hazard) occurs on same date at different locations, the generated Correlation id will be be same as we use the ISO3 level info in the string. Note that in the current implementation, the time information is not present (not in all sources).
Case 2: In case of earthquakes in the sea/ocean, the country(ISO3) is set as UNK (unknown). So, for the events occurring on the same date, will have the same correlation ID.
Solution:
- To partition the Earth into several blocks which are apart by 0.2 degrees (lat and long) and assign a number to those blocks. Based on the point geometry or bbox centroid, we get the block ID and attach that block ID to the correleation ID. This shall generate the unique correlation ID. But still not 100% sure as 0.2 degrees in lat and long would mean we cover an area of roughly 20km by 20km. If different events (same hazard) occur on the same date and within this patch of land, it would generate the same block ID.
I shall use the format parquet to save the data which shall be around 7 MB with the above configs of 0.2 degrees (lat, long)
- Introduce the time as well whenever it is available in the correlation ID. But, this time information is not available in all the sources (e.g. EMDAT doesn't have time information like at what time did the event occur?)
@emmanuelmathot Any thoughts/quick improvements on this ?
I will start with Solution 1 and send the PR.
cc @subinasr