-
Notifications
You must be signed in to change notification settings - Fork 3
Description
CKAN version
CKAN 2.10.1
Describe the bug
On legacy large CKAN instances (www.data.qld.gov.au / www.publications.qld.gov.au), when a new dataset is created and resources are being added, or adding resources to a dataset that does not have many activity/tracking_summary details. Page load times are very slow for 'author/sysadmin' users.
Steps to reproduce
Have a very large database with lots of data in the tracking_summary "9,436,568 rows" and activity "2,131,004 rows" tables that is unrelated to the dataset that is about to be edited.
Then attempt to update various datasets including uploading a new document. As well as create a new dataset then edit it right after.
Page will take longer than 180 seconds to load. With our setup, the CDN will give up waiting for the system and return HTTP 502.
Expected behavior
Normal dataset/resource updating/adding without waiting more than the upload time for the file. Or under 3 seconds for url resource creation.
Additional details
When reviewing the performance metrics on the database, it was noticed that it took 27seconds to return this query:
SELECT tracking_summary.url AS tracking_summary_url, tracking_summary.package_id AS tracking_summary_package_id, tracking_summary.tracking_type AS tracking_summary_tracking_type, tracking_summary.count AS tracking_summary_count, tracking_summary.running_total AS tracking_summary_running_total, tracking_summary.recent_views AS tracking_summary_recent_views, tracking_summary.tracking_date AS tracking_summary_tracking_date
FROM tracking_summary
WHERE tracking_summary.package_id = ':package_id:' ORDER BY tracking_date desc
LIMIT 1
AND 1.27 calls / sec, 7.26 avg latency (ms)/call.
SELECT activity.id AS activity_id, activity.timestamp AS activity_timestamp, activity.user_id AS activity_user_id, activity.object_id AS activity_object_id, activity.revision_id AS activity_revision_id, activity.activity_type AS activity_activity_type, activity.data AS activity_data
FROM activity LEFT OUTER JOIN package ON package.id = activity.object_id AND package.private = false
WHERE (package.owner_org = ':packageOwnerOrg:' OR activity.object_id = ':ObjectID:') AND (activity.user_id NOT IN (':UserID:')) ORDER BY activity.timestamp DESC
LIMIT 32
When the following indexes were manually created, the backend author views became performant again.
CREATE INDEX IF NOT EXISTS activity_object_id_user_id_order_by_timestamp_desc
ON public.activity USING btree
(user_id COLLATE pg_catalog."default" ASC NULLS LAST, object_id COLLATE pg_catalog."default" ASC NULLS LAST, "timestamp" DESC NULLS FIRST)
TABLESPACE pg_default;
CREATE INDEX IF NOT EXISTS tracking_summary_package_id_orderby_tracking_date_desc
ON public.tracking_summary USING btree
(package_id COLLATE pg_catalog."default" ASC NULLS LAST, tracking_date DESC NULLS FIRST)
TABLESPACE pg_default;
It was noted that there are indeed indexes on these tables, but they are set to ASC mode and may not being used and full table scan's are occuring.
i.e.
CREATE INDEX IF NOT EXISTS idx_activity_object_id
ON public.activity USING btree
(object_id COLLATE pg_catalog."default" ASC NULLS LAST, "timestamp" ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX IF NOT EXISTS idx_activity_user_id
ON public.activity USING btree
(user_id COLLATE pg_catalog."default" ASC NULLS LAST, "timestamp" ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX IF NOT EXISTS tracking_summary_package_id
ON public.tracking_summary USING btree
(package_id COLLATE pg_catalog."default" ASC NULLS LAST)
TABLESPACE pg_default;