Analytics pipeline creation 2 #2265

Aniketkharkia · 2025-11-24T06:06:22Z

Summary by CodeRabbit

Release Notes

New Features
- Added analytics data model and management: scripts can add analytics entries, retrieve entries by collection name, and mark entries as processed with user attribution for auditability.
Tests
- Added unit tests covering analytics data validation, storage, retrieval, and marking-as-processed behaviors.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-24T06:06:34Z

Walkthrough

This PR adds an AnalyticsCollectionData model and three CallbackScriptUtility methods (add_data_analytics, get_data_analytics, mark_as_processed), exposes them as predefined PyScript objects, and introduces unit tests and validation around analytics data handling.

Changes

Cohort / File(s)	Summary
Data Model `kairon/shared/cognition/data_objects.py`	Adds `AnalyticsCollectionData(Auditlog)` with fields: `bot`, `collection_name`, `user`, `source`, `data`, `received_at`, `is_data_processed`; includes `clean` and `validate` enforcing non-empty, normalized `collection_name` and dict-typed `data`, plus an index on `bot`.
Analytics Utility Methods `kairon/shared/pyscript/callback_pyscript_utils.py`	Adds `CallbackScriptUtility.get_data_analytics(collection_name, bot=None)`, `add_data_analytics(user, payload, bot=None)`, and `mark_as_processed(user, collection_name, bot=None)`; imports `AnalyticsCollectionData`; implements validation, normalization, querying, creation, and marking-as-processed logic.
PyScript Environment Setup `kairon/async_callback/utils.py`	Registers three new predefined PyScript objects in `execute_script`: `add_data_analytics`, `get_data_analytics`, and `mark_as_processed`, bound as partials to `CallbackScriptUtility` methods with the bot context.
Tests `tests/unit_test/callback/pyscript_handler_test.py`, `tests/unit_test/data_processor/data_processor2_test.py`	Adds/updates tests for the new utilities and the `AnalyticsCollectionData` model: creation, validation (type and required fields), normalization, retrieval shaping, and mark-as-processed behavior.

Sequence Diagram(s)

sequenceDiagram
    participant PyScript as PyScript
    participant Utils as CallbackScriptUtility
    participant DB as AnalyticsCollectionData (DB)

    rect rgb(200,220,240)
    note over PyScript,DB: Create analytics record
    PyScript->>Utils: add_data_analytics(user, payload, bot)
    Utils->>Utils: validate bot, extract fields
    Utils->>DB: create & save record
    DB-->>Utils: saved record (id)
    Utils-->>PyScript: {"message":"success","id":id}
    end

    rect rgb(220,240,200)
    note over PyScript,DB: Retrieve analytics records
    PyScript->>Utils: get_data_analytics(collection_name, bot)
    Utils->>Utils: normalize collection_name
    Utils->>DB: query by bot & collection_name
    DB-->>Utils: list of records
    Utils->>Utils: map records -> dict list
    Utils-->>PyScript: {"data":[...]}
    end

    rect rgb(240,220,200)
    note over PyScript,DB: Mark records processed
    PyScript->>Utils: mark_as_processed(user, collection_name, bot)
    Utils->>DB: query matching records
    DB-->>Utils: records
    Utils->>DB: update each record (user, is_data_processed=true) & save
    DB-->>Utils: confirm saves
    Utils-->>PyScript: {"message":"marked processed"}
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Points to focus on:

Validation and normalization in AnalyticsCollectionData.clean() and validate().
Input validation, error messages, and return shapes in add_data_analytics, get_data_analytics, and mark_as_processed.
Query/update patterns and save semantics in mark_as_processed (bulk vs per-record saves).
Tests covering edge cases (missing bot, missing collection_name, non-dict data).

Suggested reviewers

hiteshghuge

Poem

🐇 With a twitch and a hop I log and then trace,
I store little packets in tidy, soft space.
Add, fetch, mark as done — a rabbit’s small art,
PyScript now hums with analytics heart. ✨📊

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Analytics pipeline creation 2' is vague and generic, using a numbered suffix that doesn't convey the specific changes made in the changeset.	Provide a more descriptive title that highlights the main changes, such as 'Add analytics data management utilities' or 'Implement analytics collection and processing methods'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 027bbe0 and 7ab4f1f.

📒 Files selected for processing (1)

tests/unit_test/data_processor/data_processor2_test.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/unit_test/data_processor/data_processor2_test.py (1)

kairon/shared/cognition/data_objects.py (1)

AnalyticsCollectionData (126-149)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Codacy Static Code Analysis
GitHub Check: Python CI
GitHub Check: Analyze (python)

🔇 Additional comments (4)

tests/unit_test/data_processor/data_processor2_test.py (4)

8-8: LGTM! Import updates are appropriate.

The import changes correctly support the new AnalyticsCollectionData test cases. ValidationError is used for testing validation failures, and AnalyticsCollectionData is the model under test.

Also applies to: 13-13

1530-1533: LGTM! Test correctly validates case-insensitive file extensions.

The test ensures that file type validation handles uppercase extensions properly.

1534-1549: LGTM! Test correctly validates the success path with cleaning.

The test verifies that validate(clean=True) properly normalizes the collection_name by trimming whitespace and converting to lowercase, which aligns with the model's clean() method behavior.

1551-1594: LGTM! Comprehensive validation tests for AnalyticsCollectionData.

The three test cases effectively cover:

Invalid data type rejection: Ensures non-dict data raises ValidationError

Missing collection_name rejection: Ensures whitespace-only collection_name raises ValidationError

Clean method behavior: Verifies that clean() trims and lowercases collection_name

All tests align correctly with the model's validation logic defined in kairon/shared/cognition/data_objects.py.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

kairon/shared/pyscript/callback_pyscript_utils.py (2)
342-342: Use explicit type hints for optional parameters.

The bot parameter is optional but lacks explicit Optional or | None type annotation. Per PEP 484, implicit optional is discouraged.

Apply this diff:
     @staticmethod
-    def get_data_analytics(collection_name: str, bot: Text = None):
+    def get_data_analytics(collection_name: str, bot: Text | None = None):
         if not bot:
             raise Exception("Missing bot id")
 
         # ... rest of method
 
     @staticmethod
-    def add_data_analytics(user: str, payload: dict, bot: str = None):
+    def add_data_analytics(user: str, payload: dict, bot: str | None = None):
         if not bot:
             raise Exception("Missing bot id")
 
         # ... rest of method
 
     @staticmethod
-    def mark_as_processed(user: str, collection_name: str, bot: str = None):
+    def mark_as_processed(user: str, collection_name: str, bot: str | None = None):
         if not bot:
             raise Exception("Missing bot id")
Also applies to: 372-372, 395-395

372-392: Consider validating payload fields in add_data_analytics.

The method extracts source and received_at from the payload without validation. If received_at is provided as a string rather than a datetime object, it could cause issues. Consider adding validation or documenting expected payload structure.
tests/unit_test/callback/pyscript_handler_test.py (1)

3955-4034: Good test coverage for the new analytics methods.

The tests validate the core functionality of add_data_analytics, get_data_analytics, and mark_as_processed. They properly use mocks and verify the expected behavior.

However, consider adding tests for edge cases:

add_data_analytics with missing collection_name in payload

get_data_analytics with non-existent collection

mark_as_processed when no records exist (to verify the fix for the queryset check issue flagged in callback_pyscript_utils.py)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd7ddc4 and 027bbe0.

📒 Files selected for processing (4)

kairon/async_callback/utils.py (1 hunks)
kairon/shared/cognition/data_objects.py (1 hunks)
kairon/shared/pyscript/callback_pyscript_utils.py (2 hunks)
tests/unit_test/callback/pyscript_handler_test.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

kairon/shared/cognition/data_objects.py (1)

kairon/shared/data/audit/data_objects.py (1)

Auditlog (24-77)

kairon/async_callback/utils.py (1)

kairon/shared/pyscript/callback_pyscript_utils.py (3)

add_data_analytics (372-392)

get_data_analytics (342-369)

mark_as_processed (395-411)

kairon/shared/pyscript/callback_pyscript_utils.py (2)

kairon/shared/cognition/data_objects.py (1)

AnalyticsCollectionData (126-149)

kairon/exceptions.py (1)

AppException (1-3)

tests/unit_test/callback/pyscript_handler_test.py (2)

kairon/shared/cognition/data_objects.py (1)

AnalyticsCollectionData (126-149)

kairon/shared/pyscript/callback_pyscript_utils.py (3)

add_data_analytics (372-392)

get_data_analytics (342-369)

mark_as_processed (395-411)

🪛 Ruff (0.14.5)

kairon/shared/cognition/data_objects.py

135-135: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

142-142: Avoid specifying long messages outside the exception class

(TRY003)

145-145: Avoid specifying long messages outside the exception class

(TRY003)

kairon/shared/pyscript/callback_pyscript_utils.py

339-339: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

339-339: Create your own exception

(TRY002)

339-339: Avoid specifying long messages outside the exception class

(TRY003)

339-339: Use explicit conversion flag

Replace with conversion flag

(RUF010)

342-342: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

344-344: Create your own exception

(TRY002)

344-344: Avoid specifying long messages outside the exception class

(TRY003)

372-372: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

374-374: Create your own exception

(TRY002)

374-374: Avoid specifying long messages outside the exception class

(TRY003)

395-395: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

397-397: Create your own exception

(TRY002)

397-397: Avoid specifying long messages outside the exception class

(TRY003)

402-402: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Codacy Static Code Analysis
GitHub Check: Python CI
GitHub Check: Analyze (python)

🔇 Additional comments (2)

kairon/shared/cognition/data_objects.py (1)

126-149: LGTM! The new analytics data model is well-structured.

The AnalyticsCollectionData class follows established patterns in the codebase (similar to CollectionData and CognitionSchema). The validation ensures data integrity by enforcing non-empty collection names and type-checking the data field.

kairon/async_callback/utils.py (1)

85-87: LGTM! The new analytics functions are correctly exposed.

The three new analytics methods are properly bound using partial with the bot parameter, following the established pattern for other predefined objects in the PyScript environment.

coderabbitai · 2025-11-24T06:09:22Z

kairon/shared/pyscript/callback_pyscript_utils.py

            return media_id
        except Exception as e:
-            raise Exception(f"encryption failed-{str(e)}")
+            raise Exception(f"encryption failed-{str(e)}")


⚠️ Potential issue | 🟡 Minor

Fix the error message in the exception.

The error message says "encryption failed" but this method is save_as_pdf. This appears to be a copy-paste error.

Apply this diff:

- raise Exception(f"encryption failed-{str(e)}") + raise Exception(f"PDF generation failed-{str(e)}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

raise Exception(f"encryption failed-{str(e)}")

raise Exception(f"PDF generation failed-{str(e)}")

🧰 Tools

🪛 Ruff (0.14.5)

339-339: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

339-339: Create your own exception

(TRY002)

339-339: Avoid specifying long messages outside the exception class

(TRY003)

339-339: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents

In kairon/shared/pyscript/callback_pyscript_utils.py around line 339, the raised exception message incorrectly says "encryption failed" in the save_as_pdf method; update the exception text to reflect the actual operation (e.g., "save_as_pdf failed" or "PDF save failed") and include the original exception details (str(e)) in the message so the log accurately describes the failure and preserves error context.

coderabbitai · 2025-11-24T06:09:22Z

kairon/shared/pyscript/callback_pyscript_utils.py

+        queryset = AnalyticsCollectionData.objects(bot=bot, collection_name=collection_name)
+
+        if not queryset:
+            raise AppException("No records found for given bot and collection_name")


⚠️ Potential issue | 🟠 Major

MongoEngine queryset truthiness check may not work as expected.

The check if not queryset: on line 401 will not work correctly. MongoEngine querysets are always truthy, even when empty. To check if a queryset has no results, use .count() or iterate and check.

Apply this diff:

queryset = AnalyticsCollectionData.objects(bot=bot, collection_name=collection_name) - if not queryset: + if queryset.count() == 0: raise AppException("No records found for given bot and collection_name")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

queryset = AnalyticsCollectionData.objects(bot=bot, collection_name=collection_name)

if not queryset:

raise AppException("No records found for given bot and collection_name")

queryset = AnalyticsCollectionData.objects(bot=bot, collection_name=collection_name)

if queryset.count() == 0:

raise AppException("No records found for given bot and collection_name")

🧰 Tools

🪛 Ruff (0.14.5)

402-402: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In kairon/shared/pyscript/callback_pyscript_utils.py around lines 399 to 402, the truthiness check `if not queryset:` is invalid because MongoEngine querysets are always truthy; replace it with an explicit emptiness check such as `if queryset.count() == 0:` or `if queryset.first() is None:` so the code correctly detects no matching records and then raise the AppException.

Aniket Kharkia added 4 commits November 24, 2025 11:33

schema creation for analytics pipeline

8f2e2e2

schema creation for analytics pipeline

da76b5a

schema creation for analytics pipeline

45b3b7d

schema creation for analytics pipeline

027bbe0

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

test cases for analytics pipeline

7ab4f1f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Analytics pipeline creation 2 #2265

Analytics pipeline creation 2 #2265

Aniketkharkia commented Nov 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 24, 2025

Uh oh!

coderabbitai bot Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	raise Exception(f"encryption failed-{str(e)}")
	raise Exception(f"PDF generation failed-{str(e)}")

Analytics pipeline creation 2 #2265

Are you sure you want to change the base?

Analytics pipeline creation 2 #2265

Conversation

Aniketkharkia commented Nov 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Aniketkharkia commented Nov 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 24, 2025 •

edited

Loading