-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Loading the schema from a DB with a large number of measurements takes a long time. I've observed anywhere from 8-20 minutes before GetSchema
completes.
I suspect the cause of long load times to be a result of:
syncflux/pkg/agent/hacluster.go
Line 147 in 9d69de4
mf[m.Name].Fields = GetFields(hac.Master.cli, db, m.Name, rp.Name) |
This is making individual API calls for each measurement to fetch field keys.
I was thinking that it may be possible to use show field keys on <sdb>
, so that the API responds with field keys for ALL measurements in the selected db. I think this would work, but I haven't investigated whether there are any size limitations with influxdb JSON responses, or the rest client used.
With 1000 measurements, the API took 12s to respond with a 1.72MB JSON payload. Compared to a request for fields on a single measurement, which took between 500-800ms within a small sample size of requests.
An alternate could be splitting the list of measurements and fetch field keys in batches, but this could also be very slow. For example, show field keys from disk,diskio,interrupts,kernel
would take upward of 12s, sometimes even giving an empty response. Maybe influxdb does not index on this sort of query?
For my limited testing, I am running InfluxDB 1.7.7, with queries being routed through influxdb-srelay. Queries made directly to master were slightly faster, with all fields being returned in 4s, and batches of 4 varying between 4-12s per request.
It would be awesome if we could set a flag at the command line to force bulk loading of all field keys in a single request, or have some sort of logic that automatically switches to bulk loading if a certain amount of measurements are seen in one DB. If batching requests is workable with additional configuration in influxdb, that would also be great.
I'd be happy to submit a PR with my proposed solution, but would appreciate some feedback on the correct approach to take.