Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 2 additions & 5 deletions kairon/shared/trackers.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,10 @@ def conversations(self) -> Collection:
def _ensure_indices(self) -> None:
indexes = [
IndexModel([("sender_id", ASCENDING), ("event.event", ASCENDING)]),
IndexModel([("type", ASCENDING), ("timestamp", ASCENDING)]),
IndexModel([("sender_id", ASCENDING), ("conversation_id", ASCENDING)]),
IndexModel([("event.event", ASCENDING), ("event.timestamp", DESCENDING)]),
IndexModel([("event.name", ASCENDING), ("event.timestamp", DESCENDING)]),
IndexModel([("event.timestamp", DESCENDING)]),
IndexModel([("event.timestamp", ASCENDING), ("event.event", ASCENDING)]),
IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.event", ASCENDING)]),
IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.event", ASCENDING), ("event.timestamp", ASCENDING)]),
IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.timestamp", ASCENDING), ("event.event", ASCENDING)]),
]
Comment on lines 64 to 71
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Potential index bloat & missing maintenance hooks

The new compound indexes are created un-named and without first removing the obsolete ones that existed in production (type+timestamp, event.event+timestamp, etc.).
Because MongoDB treats [(a,1),(b,1)] and [(a,1),(b,-1)] and different field orders as distinct, create_indexes() will happily add the new keys and keep the old ones. On a busy conversations collection this can:

• double the on-disk index size and memory footprint
• slow down writes while the extra indexes are being built
• leave the optimiser with too many similar choices

I strongly recommend:

-indexes = [
-    IndexModel([("sender_id", ASCENDING), ("event.event", ASCENDING)]),
-    ...
-    IndexModel([("event.timestamp", ASCENDING), ("event.event", ASCENDING)]),
-    ...
-]
+# obsolete = [
+#     ("type_1_timestamp_-1",            True),
+#     ("event.event_1_event.timestamp_-1", True),
+#     ("event.name_1_event.timestamp_-1", True),
+# ]
+# for name, drop in obsolete:
+#     if drop and name in self.conversations.index_information():
+#         self.conversations.drop_index(name)
+
+indexes = [
+    IndexModel([("sender_id", ASCENDING), ("event.event", ASCENDING)],
+               name="sender_event_idx", background=True),
+    IndexModel([("sender_id", ASCENDING), ("conversation_id", ASCENDING)],
+               name="sender_cid_idx",    background=True),
+    IndexModel([("event.timestamp", ASCENDING), ("event.event", ASCENDING)],
+               name="ts_event_idx",      background=True),
+    IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.timestamp", ASCENDING), ("event.event", ASCENDING)],
+               name="sender_type_ts_evt_idx", background=True),
+]

• Give every index an explicit name so future migrations can be idempotent and easy to drop.
• Build them with background=True (or {"background":True} on legacy servers) to avoid blocking writes.
• Drop the superseded keys inside _ensure_indices() – it already runs at start-up when only one process owns the connection, so it’s safe.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _ensure_indices(self) -> None:
indexes = [
IndexModel([("sender_id", ASCENDING), ("event.event", ASCENDING)]),
IndexModel([("type", ASCENDING), ("timestamp", ASCENDING)]),
IndexModel([("sender_id", ASCENDING), ("conversation_id", ASCENDING)]),
IndexModel([("event.event", ASCENDING), ("event.timestamp", DESCENDING)]),
IndexModel([("event.name", ASCENDING), ("event.timestamp", DESCENDING)]),
IndexModel([("event.timestamp", DESCENDING)]),
IndexModel([("event.timestamp", ASCENDING), ("event.event", ASCENDING)]),
IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.event", ASCENDING)]),
IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.event", ASCENDING), ("event.timestamp", ASCENDING)]),
IndexModel([("sender_id", ASCENDING), ("type", ASCENDING), ("event.timestamp", ASCENDING), ("event.event", ASCENDING)]),
]
def _ensure_indices(self) -> None:
# drop any obsolete indexes before creating the new ones
obsolete = [
("type_1_timestamp_-1", True),
("event.event_1_event.timestamp_-1", True),
("event.name_1_event.timestamp_-1", True),
]
for name, drop in obsolete:
if drop and name in self.conversations.index_information():
self.conversations.drop_index(name)
indexes = [
IndexModel(
[("sender_id", ASCENDING), ("event.event", ASCENDING)],
name="sender_event_idx",
background=True,
),
IndexModel(
[("sender_id", ASCENDING), ("conversation_id", ASCENDING)],
name="sender_cid_idx",
background=True,
),
IndexModel(
[("event.timestamp", ASCENDING), ("event.event", ASCENDING)],
name="ts_event_idx",
background=True,
),
IndexModel(
[
("sender_id", ASCENDING),
("type", ASCENDING),
("event.timestamp", ASCENDING),
("event.event", ASCENDING),
],
name="sender_type_ts_evt_idx",
background=True,
),
]
# … rest of _ensure_indices, e.g.:
# self.conversations.create_indexes(indexes)
🤖 Prompt for AI Agents
In kairon/shared/trackers.py around lines 64 to 71, the compound indexes are
created without explicit names and without removing obsolete indexes, which can
cause index bloat and performance issues. To fix this, assign explicit unique
names to each IndexModel, set background=True to avoid blocking writes during
index creation, and add logic in _ensure_indices() to drop any superseded or
obsolete indexes before creating the new ones, ensuring the process is
idempotent and safe during startup.

self.conversations.create_indexes(indexes)

Expand Down
Loading