-
Notifications
You must be signed in to change notification settings - Fork 35
Description
When I tried to sync a large database I experienced a few errors, for example Request Entitiy too Large
which I could not fix yet by increasing the max-points-on-write
parameter, and similar issues have been discussed here already for large amounts of data. But this is not the main point of this issue.
My data consists of ~50k points which are contained within about one minute, and I tried to sync the last month. So to decrease the amount of points per chunk, I would have to choose a chunk-interval of a few seconds, which results in a huge amount of empty chunks for this month. So I wondered: what is the reason for dividing the data based on time, instead of actual amount?
Granted, my example is a bit extreme, but in cases where the data distribution is uneven or has spikes this approach might not be the best. Instead it might be better to be able to define a chunk size, for example 1000 points, and then syncflux queries the first 1000 points, then the next 1000 points, and so on, resulting in very even and adjustable chunk sizes.
InfluxQL does support this with the LIMIT and OFFSET clauses.
I cannot even think of a reason why aggregating data over time would be better than simply over amount as described. Am I missing something? What do you think?