-
Notifications
You must be signed in to change notification settings - Fork 84
Tracker indexes #2011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Tracker indexes #2011
Conversation
WalkthroughThe changes update the MongoDB index definitions in the Changes
Sequence Diagram(s)sequenceDiagram
participant App
participant KMongoTrackerStore
participant MongoDB
App->>KMongoTrackerStore: Initialize or update tracker store
KMongoTrackerStore->>MongoDB: Ensure new set of indexes on conversations collection
MongoDB-->>KMongoTrackerStore: Indexes created/updated as per new definitions
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
kairon/shared/trackers.py (1)
70-70: Field-order change may improve current queries but hurts equality look-upsPlacing
event.timestampbeforeevent.eventmeans queries that filter onevent.event(exact match) but do not constrain the timestamp will now have to scan the whole timestamp range. If any analytics path relies onevent.event = "session_started"without a timestamp predicate, the previous order was more selective.Double-check typical patterns; if both exist you might need two separate compound indexes rather than re-ordering one.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
kairon/shared/trackers.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: Python CI
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
kairon/shared/trackers.py (1)
68-68: Asc/Desc switch – verify query plansChanging
event.timestampfromDESCENDINGtoASCENDINGaligns with the$sort : {"event.timestamp" : 1}used later, so looks correct.
However, confirm there are no dashboards / cron jobs that still requestsort({"event.timestamp": -1}); they will no longer be covered.
| def _ensure_indices(self) -> None: | ||
| indexes = [ | ||
| IndexModel([("sender_id", ASCENDING), ("event.event", ASCENDING)]), | ||
| IndexModel([("type", ASCENDING), ("timestamp", ASCENDING)]), | ||
| IndexModel([("sender_id", ASCENDING), ("conversation_id", ASCENDING)]), | ||
| IndexModel([("event.event", ASCENDING), ("event.timestamp", DESCENDING)]), | ||
| IndexModel([("event.name", ASCENDING), ("event.timestamp", DESCENDING)]), | ||
| IndexModel([("event.timestamp", DESCENDING)]), | ||
| IndexModel([("event.timestamp", ASCENDING), ("event.event", ASCENDING)]), | ||
| IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.event", ASCENDING)]), | ||
| IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.event", ASCENDING), ("event.timestamp", ASCENDING)]), | ||
| IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.timestamp", ASCENDING), ("event.event", ASCENDING)]), | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Potential index bloat & missing maintenance hooks
The new compound indexes are created un-named and without first removing the obsolete ones that existed in production (type+timestamp, event.event+timestamp, etc.).
Because MongoDB treats [(a,1),(b,1)] and [(a,1),(b,-1)] and different field orders as distinct, create_indexes() will happily add the new keys and keep the old ones. On a busy conversations collection this can:
• double the on-disk index size and memory footprint
• slow down writes while the extra indexes are being built
• leave the optimiser with too many similar choices
I strongly recommend:
-indexes = [
- IndexModel([("sender_id", ASCENDING), ("event.event", ASCENDING)]),
- ...
- IndexModel([("event.timestamp", ASCENDING), ("event.event", ASCENDING)]),
- ...
-]
+# obsolete = [
+# ("type_1_timestamp_-1", True),
+# ("event.event_1_event.timestamp_-1", True),
+# ("event.name_1_event.timestamp_-1", True),
+# ]
+# for name, drop in obsolete:
+# if drop and name in self.conversations.index_information():
+# self.conversations.drop_index(name)
+
+indexes = [
+ IndexModel([("sender_id", ASCENDING), ("event.event", ASCENDING)],
+ name="sender_event_idx", background=True),
+ IndexModel([("sender_id", ASCENDING), ("conversation_id", ASCENDING)],
+ name="sender_cid_idx", background=True),
+ IndexModel([("event.timestamp", ASCENDING), ("event.event", ASCENDING)],
+ name="ts_event_idx", background=True),
+ IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.timestamp", ASCENDING), ("event.event", ASCENDING)],
+ name="sender_type_ts_evt_idx", background=True),
+]• Give every index an explicit name so future migrations can be idempotent and easy to drop.
• Build them with background=True (or {"background":True} on legacy servers) to avoid blocking writes.
• Drop the superseded keys inside _ensure_indices() – it already runs at start-up when only one process owns the connection, so it’s safe.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _ensure_indices(self) -> None: | |
| indexes = [ | |
| IndexModel([("sender_id", ASCENDING), ("event.event", ASCENDING)]), | |
| IndexModel([("type", ASCENDING), ("timestamp", ASCENDING)]), | |
| IndexModel([("sender_id", ASCENDING), ("conversation_id", ASCENDING)]), | |
| IndexModel([("event.event", ASCENDING), ("event.timestamp", DESCENDING)]), | |
| IndexModel([("event.name", ASCENDING), ("event.timestamp", DESCENDING)]), | |
| IndexModel([("event.timestamp", DESCENDING)]), | |
| IndexModel([("event.timestamp", ASCENDING), ("event.event", ASCENDING)]), | |
| IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.event", ASCENDING)]), | |
| IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.event", ASCENDING), ("event.timestamp", ASCENDING)]), | |
| IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.timestamp", ASCENDING), ("event.event", ASCENDING)]), | |
| ] | |
| def _ensure_indices(self) -> None: | |
| # drop any obsolete indexes before creating the new ones | |
| obsolete = [ | |
| ("type_1_timestamp_-1", True), | |
| ("event.event_1_event.timestamp_-1", True), | |
| ("event.name_1_event.timestamp_-1", True), | |
| ] | |
| for name, drop in obsolete: | |
| if drop and name in self.conversations.index_information(): | |
| self.conversations.drop_index(name) | |
| indexes = [ | |
| IndexModel( | |
| [("sender_id", ASCENDING), ("event.event", ASCENDING)], | |
| name="sender_event_idx", | |
| background=True, | |
| ), | |
| IndexModel( | |
| [("sender_id", ASCENDING), ("conversation_id", ASCENDING)], | |
| name="sender_cid_idx", | |
| background=True, | |
| ), | |
| IndexModel( | |
| [("event.timestamp", ASCENDING), ("event.event", ASCENDING)], | |
| name="ts_event_idx", | |
| background=True, | |
| ), | |
| IndexModel( | |
| [ | |
| ("sender_id", ASCENDING), | |
| ("type", ASCENDING), | |
| ("event.timestamp", ASCENDING), | |
| ("event.event", ASCENDING), | |
| ], | |
| name="sender_type_ts_evt_idx", | |
| background=True, | |
| ), | |
| ] | |
| # … rest of _ensure_indices, e.g.: | |
| # self.conversations.create_indexes(indexes) |
🤖 Prompt for AI Agents
In kairon/shared/trackers.py around lines 64 to 71, the compound indexes are
created without explicit names and without removing obsolete indexes, which can
cause index bloat and performance issues. To fix this, assign explicit unique
names to each IndexModel, set background=True to avoid blocking writes during
index creation, and add logic in _ensure_indices() to drop any superseded or
obsolete indexes before creating the new ones, ensuring the process is
idempotent and safe during startup.
Summary by CodeRabbit