Feature/density tracks #1003

NicolasColombi · 2025-02-06T10:29:05Z

Changes proposed in this PR:

This PR adds a function to compute track density. Tracks are considered lines but are in reality a succession of points. The function ensure equal time step for all tracks by calling equal_timestep(), then proceed to compute the number of points per grid cell with the desired resolution. compute_track_density() computes the data for plot_track_density().

Functions added:

compute_track_density()
test_compute_track_density()
compute_grid_area()
plot_track_density()

This PR fixes #

PR Author Checklist

PR Reviewer Checklist

chahank · 2025-02-06T10:35:03Z

climada/hazard/tc_tracks.py

+    """Compute absolute and normalized tropical cyclone track density as the number of points per
+    grid cell.


Interesting addition. I think though that even with the explanation in the pull request, I do not understand what this method does (and what is the use of it). Can you expand on the docstring to make it clearer?

I was exactly doing it in this moment ;)

I updated the docstring which should be clearer now. The role of this function is to create the data for an other plotting function (which will follow soon) and should give something this (first approximation):

A plot like this allows you to compare TC activities between different climate scenarios, or different time periods, or models. It's a visualization for the hazard.

What is the exact goal? Because the method now counts the number of track points per pixel. Thus, if a track is just staying a long time in the same place, it counts as high density. But, it is still just a single track in this pixel. Conversely, if a track is moving fast, it gets low density. I am not sure what this would indicate to be honest.

That is true, but, the goal is to display TC activity, and if a track stays longer in a grid cell, this rightfully represent higher TC activity in my opinion. The question then becomes, how do you define TC activity? Is it how many different tracks cross a grid cell or how many tracks cross a grid, and how long they stay in it (that is the current version)?

If you consider two different cyclones with two different translational speeds but with the same angular wind speed, the slower one will have more impact because it stays longer and has the same intensity as the other. So in an impact context, track density (a measure of TC activity) should reflect this.

But I see your point, and it's worth discussing it with Kerry and Simona next week, frankly both methods would be valid, they just represent something slightly different.

For clarification, the image I sent represent the track density (defined as in this message) for 300 tracks in 2025 with CMIP5 data.

I agree with the analysis. But then I would ask: why not look at wind intensity density? Because if the argument is that a track staying longer in a position creates more impact, then you would also have to consider that higher winds create more impact. Hence, I would say:
1- if you are interested in impacts, you should consider the windfield distributions
2- if you are interested in track density, then you should consider different track densities.
But maybe I am missing the point and the approach for the track point density is the way to go?

Fair, I agree. The point approach was just the practical way I thought to program it. I need to rethink it a bit then 🤓. I'll change it so that a maximum of one point per track per grid cell will be allowed to be counted in that cell.
And at a later stage, we can think of how to compute wind field densities.

Maybe it can be an argument in the method (if it is easy to implement)?

yes it's easy

chahank · 2025-02-06T10:37:25Z

climada/hazard/tc_tracks.py

+    lat_bins = np.arange(-90, 91, res)  # 91 and not 90 for the bins (90 included)
+    lon_bins = np.arange(-180, 181, res)


That would lead to values larger than 90 if say res=0.3. I think this is not what you want. Also, note that it probably is better to use np.linspace than np.arange for non-integer values (see https://numpy.org/doc/stable/reference/generated/numpy.arange.html)

True, thanks for catching that!

chahank · 2025-02-06T10:38:25Z

climada/hazard/tc_tracks.py

+    hist = hist / hist.sum() if mode == "normalized" else hist
+
+    return hist, lat_bins, lon_bins


Would it not be better to work with sparse matrices and rasters? Because in general, the output will be very very sparse.

Good point!

bguillod · 2025-02-06T19:45:27Z

climada/hazard/tc_tracks.py

@@ -2890,6 +2896,10 @@ def compute_track_density(
    density: bool (optional) default: False
        If False it returns the number of samples in each bin. If True, returns the
        probability density function at each bin computed as count_bin / tot_count.
+    filter_tracks: bool (optional) default: True


perhaps calling this argument something like count_tracks would be more explicit?

I agree, filter_track it's a bit cryptic...

bguillod · 2025-02-06T19:47:51Z

climada/hazard/tc_tracks.py

+    """
+
+    # ensure equal time step
+    # tc_track.equal_timestep(time_step_h=time_step)


I think there should be a reasonable check on the track's temporal resolution versus res to make sure all tracks will be counted (see my comment in the other thread).

True, there should be a limit ratio between temporal resolution and grid cell resolution. Considering that the fastest recorded translational speed of a TC is around 110 km/h, a temporal resolution of 1h and grid res of 1° (~100km) should be the limit. In this case, I would make time_step a function of res and not the opposite.

If this slows down the code too much (probably not), given the low average speed of ~20km/h we could adjust it.

bguillod · 2025-02-06T19:56:22Z

climada/hazard/tc_tracks.py

+            track.lat.values, track.lon.values, bins=[lat_bins, lon_bins], density=False
+        )
+        hist_new = csr_matrix(hist_new)
+        hist_new[hist_new > 1] = 1 if filter_tracks else hist_new


why not something like this (seems easier to read):

if filter_tracks: np.clip(hist_new, a_max=1, out=hist_new)

You might need to do this before converting to a csr_matrix

yes I like it, then I guess it doesn't really make sense to use sparse matrixes anymore. Since the core function np.histogram2d doesn't exist with sparse, and the later operations are better off with numpy.

mmh actually np.clip creates some issues later on with the testing, I'll keep it as it is.

If you keep it dense, then you should just make it clear in the docstring that this code does not work well at high resolution. It would be good to do some simple testing (res=1, 0.1, 0.01, 0.001) and see how much memory / time it takes.

Then Maybe:

if filter_tracks: hist_new = np.minimum(hist_new, 1)

Or use np.where(x>1,1,x)

If you keep it dense, then you should just make it clear in the docstring that this code does not work well at high resolution. It would be good to do some simple testing (res=1, 0.1, 0.01, 0.001) and see how much memory / time it takes.

At the moment, using sparse (and numpy), with 300 tracks at global scale it takes:

1°: 141ms
0.1°: 9.1s
0.01°: 17 minutes
0.001°: ...

If, and only if, it makes sense to use this method at really high resolutions (e.g., > 0.1°) I will try to optimize it. But it might be worth it to have faster method, for larger track datasets.

I do not know whether higher resolution is needed, but it would be good to mention in the docstring what kind of resolution is reasonable.

From my experience anything below 0.1 degrees is far too noisy, typically one uses something like 1, 5 or even 10 degrees.

bguillod

I like the overall idea a lot! We actually had implemented something quite similar to validate the new (still to be incorporated) TC stochastic tracks model. However we had done it a bit differently:

First, there was a parameter for the "minimum intensity" threshold to consider (something like min_ max_sustained_wind), such that you can plot the density of tracks exceeding a certain wind speed (or Saffir-Simpson category). This enabled us to find out where there where too many or too few tracks in general (when doing for example the difference between probabilistic and historical tracks) versus how more intense events were over-/under-estimated. I'd suggest add this feature here as well.
Ideally there is a valid unit to the "track density". To do so, we counted not the number of points (which is kind of the total duration of tracks being there) but instead counted the number of tracks (after identifying points within each grid cell, we counted the unique cyc_id values of these points), which leads to units or "number of tracks per year" if you then scale by the number of years in your dataset. You might want to consider doing the same, or at least enabling the user to choose between both approaches.
We had implemented that using lines, but it took ages to calculate (well, it was on a 10'000 years event set...), so using points might be good - and I am already relieved to see the validation of the new synthetic tracks model will be made easier thanks to this method 😄). In fact, there should then be a reasonable default for the temporal resolution of the tracks as a function of the raster cells size (i.e., if temporal resolution is too low there might be no point in a grid cell because it jumped from east to west of the cell but is not counted).
Finally, just to chime on the "track versus windfield" discussion in earlier comments: I think both are very valuable and complementary. One probably want to be able to validate the tracks first, and validate wind fields in a second step. This is the approach we had taken back then and it enabled tuning tracks algorithm without doing the more expensive windfield computation for each test.

NicolasColombi · 2025-02-07T11:11:14Z

I like the overall idea a lot! We actually had implemented something quite similar to validate the new (still to be incorporated) TC stochastic tracks model. However we had done it a bit differently:

Good to hear! Is your version somewhere on a climada branch ?

bguillod · 2025-02-08T09:29:15Z

Good to hear! Is your version somewhere on a climada branch ?

No, it was in our code base at CelsiusPro so I don't have it anymore... @ChrisFairless probably has access to it but I think your approach is easier so I'm not sure you'd gain much (?). There was a class called TcTracksValidation or so which one could use to also plot the fields and their differences if I recall well.

The main main thing missing so far in your code is being able to filter points by minimum wind speed.

NicolasColombi · 2025-02-08T15:56:01Z

The main main thing missing so far in your code is being able to filter points by minimum wind speed.

I see, I have just added the feature 👍 now we can select above, below and in between wind speeds.

NicolasColombi · 2025-02-12T11:15:05Z

@chahank @bguillod I added two functions to compute the grid cell area: One accepts different projections and one uses a spherical approximation, which is a more refined version of an already existing climada function (that doesn't account for the curvature of a tile vs rectangular tile). The two functions have a 0.27% difference on a spherical projection (which mutually validates them, or not). I also added the plotting function. Let me know your thoughts on the current state of the PR.

float issue jenkins

ChrisFairless · 2025-02-20T11:11:26Z

Hey! Late to the party here! I really like this, thanks!

A few responses for the ongoing discussions:

Lines vs points

this is complicated. When a track passes through the edge of a cell, a line-based approach would count this as an intersection (depending on the algorithm – see below) whereas with a point-based approach it would depend on your chosen time step, the speed of the system, the latitude of the cell, and the length of the intersection. I think this is fine, actually, but it needs to be documented. I would agree that the timestep should be set algorithmically with the option to pass it as a manual parameter.

Aside: for a really reliable approach, future work could even copy the TCTracks object and change the timestamps so that the systems all travel at a constant speed. Then the choice of time step can be made so that any track intersection that is more than e.g. 1/4 of a cell width is guaranteed to have a point in it (at least, at the equator – I don't know how to standardise this part without reprojection shenanigans).

Line-based intersection methods are in theory a better solution but implementation isn't simple. You also have to make some quite subjective decisions about what counts as an intersection (e.g. counting all intersections, or using something like https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm). We also found a very specific bug in the version of GDAL that CLIMADA uses which gave incorrect results when rasterizing lines, IIRC it did some double-counting of intersections. I thiiiink it's been fixed in more recent versions but I couldn't find the Issue with a quick Google.

But in short: I like the approach here as a pragmatic solution.

Wind frequency vs track frequency

Both of these are nice. From an insurance industry perspective (and from most academic studies), track frequency is far more common, and maybe even expected. It's also cheaper to calculate and doesn't depend on your choice of wind algorithm.

Other validation scripts

@bguillod is right – we had an internal suite of tools at CelsiusPro for this kind of analysis. I don't think it's something that could be shared easily, since it got very bloated over the years and was also quite tied to the databases that we used

NicolasColombi added 3 commits February 5, 2025 14:22

add function density_track

8bb0886

update

40d63e0

add test

7e3a70e

NicolasColombi changed the base branch from main to develop February 6, 2025 10:29

chahank reviewed Feb 6, 2025

View reviewed changes

NicolasColombi added 4 commits February 6, 2025 11:58

update docstrings and changelog

e0f20a3

use np.linspace and scipy.sparse

cc40685

count only track once per grid cell

fb7aa7c

add argument to filter different tracks for density

fea61a2

bguillod reviewed Feb 6, 2025

View reviewed changes

NicolasColombi mentioned this pull request Feb 7, 2025

Feature/from netcdf fast #993

Merged

13 tasks

NicolasColombi added the enhancement label Feb 7, 2025

add wind speed selection

f9cd50d

NicolasColombi added 6 commits February 8, 2025 19:43

optimize after profiling

49ebfb2

add function compute grid cell area

509d60b

add plotting function

fd7be94

Merge branch 'develop' into feature/density_tracks

9e61a65

update changelog

6016af5

move grid area function to util

541937e

NicolasColombi marked this pull request as ready for review February 12, 2025 08:38

NicolasColombi added 2 commits February 12, 2025 09:51

fix pylint

45ccd8a

fix pylint f string

ec4a819

NicolasColombi added 2 commits February 12, 2025 11:57

add second function to compute grid area with projections

6f06f76

update changelog

26460a7

NicolasColombi and others added 6 commits February 12, 2025 14:36

Merge branch 'develop' into feature/density_tracks

c08727e

fix unit test

8a66674

Merge branch 'develop' into feature/density_tracks

f3de084

Update tc_tracks.py

870ec7c

float issue jenkins

Update tc_tracks.py

9919d11

fix typeError

0b6a81e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/density tracks #1003

Feature/density tracks #1003

NicolasColombi commented Feb 6, 2025 •

edited

Loading

chahank Feb 6, 2025

NicolasColombi Feb 6, 2025

NicolasColombi Feb 6, 2025

chahank Feb 6, 2025

NicolasColombi Feb 6, 2025

chahank Feb 6, 2025

NicolasColombi Feb 6, 2025

chahank Feb 6, 2025

NicolasColombi Feb 6, 2025

chahank Feb 6, 2025

NicolasColombi Feb 6, 2025

chahank Feb 6, 2025

NicolasColombi Feb 6, 2025

bguillod Feb 6, 2025

NicolasColombi Feb 7, 2025 •

edited

Loading

bguillod Feb 6, 2025

NicolasColombi Feb 7, 2025 •

edited

Loading

bguillod Feb 6, 2025

bguillod Feb 6, 2025

NicolasColombi Feb 7, 2025

NicolasColombi Feb 7, 2025

chahank Feb 7, 2025 •

edited

Loading

bguillod Feb 7, 2025 •

edited

Loading

bguillod Feb 7, 2025

NicolasColombi Feb 7, 2025 •

edited

Loading

chahank Feb 7, 2025

bguillod Feb 8, 2025

bguillod left a comment

NicolasColombi commented Feb 7, 2025

bguillod commented Feb 8, 2025 •

edited

Loading

NicolasColombi commented Feb 8, 2025

NicolasColombi commented Feb 12, 2025

ChrisFairless commented Feb 20, 2025

		"""Compute absolute and normalized tropical cyclone track density as the number of points per
		grid cell.

		lat_bins = np.arange(-90, 91, res) # 91 and not 90 for the bins (90 included)
		lon_bins = np.arange(-180, 181, res)

		hist = hist / hist.sum() if mode == "normalized" else hist

		return hist, lat_bins, lon_bins

Feature/density tracks #1003

Are you sure you want to change the base?

Feature/density tracks #1003

Conversation

NicolasColombi commented Feb 6, 2025 • edited Loading

PR Author Checklist

PR Reviewer Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasColombi Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasColombi Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chahank Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

bguillod Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasColombi Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bguillod left a comment

Choose a reason for hiding this comment

NicolasColombi commented Feb 7, 2025

bguillod commented Feb 8, 2025 • edited Loading

NicolasColombi commented Feb 8, 2025

NicolasColombi commented Feb 12, 2025

ChrisFairless commented Feb 20, 2025

Lines vs points

Wind frequency vs track frequency

Other validation scripts

NicolasColombi commented Feb 6, 2025 •

edited

Loading

NicolasColombi Feb 7, 2025 •

edited

Loading

NicolasColombi Feb 7, 2025 •

edited

Loading

chahank Feb 7, 2025 •

edited

Loading

bguillod Feb 7, 2025 •

edited

Loading

NicolasColombi Feb 7, 2025 •

edited

Loading

bguillod commented Feb 8, 2025 •

edited

Loading