Skip to content

Conversation

@katamreddyganesh
Copy link
Contributor

@katamreddyganesh katamreddyganesh commented Jan 4, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Improved file discovery and path resolution to handle missing or empty folders and avoid processing failures
    • Safer cleanup: only removes actual files and now cleans up related folders safely after processing
  • Tests

    • Expanded unit tests for file discovery edge cases and uploader behavior, including verification that temporary files are deleted after processing failures

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 4, 2026

📝 Walkthrough

Walkthrough

Refactors upload resolution to a two-step latest-file lookup, updates FileImporter usage and cleanup to remove files safely, hardens Utility.get_latest_file to handle empty/no-match cases, and adds/adjusts unit tests around file resolution and post-processing cleanup.

Changes

Cohort / File(s) Summary
Upload flow & cleanup
kairon/events/definitions/crud_file_upload.py
Two-step resolution: call Utility.get_latest_file('file_content_upload_records', bot) to get a folder identifier, then Utility.get_latest_file(folder_path) to get the actual file. File cleanup now checks os.path.isfile(path) before os.remove(path) and uses Utility.delete_directory(folder_path, True) to remove the folder. Adds import os.
FileImporter preprocessing
kairon/importer/file_importer.py
Removed the line that set file_path = os.path.join(self.path, self.file_received) in preprocess, but subsequent code still references file_path, leaving it undefined (likely a bug requiring follow-up).
Utility: latest-file resolution
kairon/shared/utils.py
get_latest_file now collects matches into a list, logs matches, raises AppException when no files found, and returns the newest file using max(..., key=os.path.getctime) to avoid generator-empty failures.
Tests — utility & data processor
tests/unit_test/data_processor/data_processor_test.py
Added tests for Utility.get_latest_file (nonexistent folder, empty folder, returns latest file) and retains/expands tests for story/action management and edge cases.
Tests — upload handler
tests/unit_test/data_processor/upload_handler_log_processor_test.py
Added CrudFileUploader import and tests ensuring processed files are deleted in both exception and finally paths (test_execute_deletes_file_after_processing, test_execute_deletes_file_in_finally); removed previous test_get_latest_event_file_name. Mocks added for Utility.get_latest_file and FileImporter behavior.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Scheduler as Scheduler/Trigger
  participant Uploader as CrudFileUploader
  participant Utility as Utility.get_latest_file
  participant Importer as FileImporter
  participant FS as Filesystem

  Note over Scheduler,Uploader: Upload processing starts
  Scheduler->>Uploader: execute()
  Uploader->>Utility: get_latest_file('file_content_upload_records', bot)
  Utility-->>Uploader: folder_path
  Uploader->>Utility: get_latest_file(folder_path)
  Utility-->>Uploader: file_path
  Uploader->>Importer: instantiate with folder_path
  Importer->>FS: open/parse file_path
  alt processing succeeds
    Importer-->>Uploader: processed
  else processing error
    Importer-->>Uploader: raises/throws
  end
  Uploader->>FS: if os.path.isfile(path) then os.remove(path)
  Uploader->>Utility: delete_directory(folder_path, True)
  Note over Uploader,FS: cleanup ensures file removed and folder deleted safely
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hop between folders, sniff the latest trail,

two-step and tidy so no data goes stale.
If files misbehave, I catch and then cheer,
I remove what's needed and leave no loose gear.
A neat little hop — uploads safe and clear!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Fix error in file upload' is vague and generic, lacking specificity about which error or what aspect of file upload is being fixed. Provide a more specific title that describes the actual error being fixed, such as 'Fix undefined file_path in file upload preprocessing' or 'Fix get_latest_file to handle empty folders in file upload'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
kairon/shared/utils.py (1)

408-413: Consider a less speculative error message.

The error message on line 412 suggests "Another upload may still be in progress" which may not always be accurate. The folder could be empty for other reasons such as incorrect path, permission issues, or files already processed. Consider a more generic message like:

-        raise AppException(f"No files found in folder {folder}. Another upload may still be in progress.")
+        raise AppException(f"No files found in folder {folder}.")

Alternatively, if you want to keep the helpful context, you could phrase it as a possibility rather than a statement:

-        raise AppException(f"No files found in folder {folder}. Another upload may still be in progress.")
+        raise AppException(f"No files found in folder {folder}. This may indicate that another upload is still in progress or that the folder has not been populated yet.")
kairon/events/definitions/crud_file_upload.py (1)

61-62: Consider adding a clarifying comment for the two-step path resolution.

The two-step approach to resolve the file path is not immediately obvious. The first call retrieves the bot-specific folder, and the second retrieves the latest file within that folder. Adding a brief comment would improve code readability:

💡 Suggested improvement
+            # Step 1: Get the latest folder for this bot's uploaded files
             folder_path = Utility.get_latest_file('file_content_upload_records', self.bot)
+            # Step 2: Get the latest file within that folder
             path = Utility.get_latest_file(folder_path)
tests/unit_test/data_processor/data_processor_test.py (2)

9943-9947: Minor formatting: missing space after comma in method signature.

The test logic correctly validates the expected behavior. However, there's a minor PEP 8 style issue with the method signature.

🔎 Suggested fix
-    def test_get_latest_file_folder_not_exists(self,tmp_path):
+    def test_get_latest_file_folder_not_exists(self, tmp_path):

9957-9972: Remove unused import and consider test performance.

  1. The import os on line 9960 is unused within this test - remove it.
  2. The time.sleep(1) is necessary for reliable getctime differentiation but adds 1 second to test execution. This is acceptable given the nature of the test.
  3. Same minor formatting issue with the method signature.
🔎 Suggested fix
-    def test_get_latest_file_returns_latest(self,tmp_path):
+    def test_get_latest_file_returns_latest(self, tmp_path):
         folder = tmp_path / "files"
         import time
-        import os
         folder.mkdir()
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7e2592 and 3b51efc.

📒 Files selected for processing (5)
  • kairon/events/definitions/crud_file_upload.py
  • kairon/importer/file_importer.py
  • kairon/shared/utils.py
  • tests/unit_test/data_processor/data_processor_test.py
  • tests/unit_test/data_processor/upload_handler_log_processor_test.py
🧰 Additional context used
🧬 Code graph analysis (4)
tests/unit_test/data_processor/data_processor_test.py (2)
kairon/exceptions.py (1)
  • AppException (1-3)
kairon/shared/utils.py (1)
  • get_latest_file (394-413)
tests/unit_test/data_processor/upload_handler_log_processor_test.py (1)
kairon/events/definitions/crud_file_upload.py (2)
  • CrudFileUploader (13-78)
  • execute (54-78)
kairon/shared/utils.py (1)
kairon/exceptions.py (1)
  • AppException (1-3)
kairon/events/definitions/crud_file_upload.py (1)
kairon/shared/utils.py (1)
  • get_latest_file (394-413)
🪛 Ruff (0.14.10)
kairon/shared/utils.py

412-412: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Python CI
  • GitHub Check: Analyze (python)
🔇 Additional comments (3)
kairon/events/definitions/crud_file_upload.py (1)

77-78: LGTM! Improved cleanup logic.

The change to use os.path.isfile(path) and os.remove(path) is more precise and safer than the previous approach. This ensures that only files are deleted in the cleanup, preventing accidental directory removal.

tests/unit_test/data_processor/upload_handler_log_processor_test.py (1)

171-231: LGTM! Good test coverage for cleanup behavior.

The new tests effectively verify that files are properly deleted in the finally block of CrudFileUploader.execute(), covering both exception scenarios:

  1. test_execute_deletes_file_after_processing: Verifies deletion when preprocess fails
  2. test_execute_deletes_file_in_finally: Verifies deletion when FileImporter instantiation fails

The tests appropriately use mocking to isolate the cleanup logic and use tmp_path fixture for safe temporary file creation.

tests/unit_test/data_processor/data_processor_test.py (1)

9949-9954: LGTM - Test correctly validates empty folder handling.

Same minor formatting issue with the method signature (missing space after comma). Consider applying the same fix as suggested above.

Comment on lines 26 to 28
def preprocess(self):
file_path = os.path.join(self.path, self.file_received)

try:
df = pd.read_csv(file_path)
df = pd.read_csv(self.path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file structure and context around the mentioned lines
head -80 kairon/importer/file_importer.py | cat -n

Repository: digiteinfotech/kairon

Length of output: 2348


🏁 Script executed:

# Search for all file_received usages in the repository
rg -n 'file_received' --type=py -B 2 -A 2

Repository: digiteinfotech/kairon

Length of output: 21041


🏁 Script executed:

# Check specifically in file_importer.py for self.file_received
rg -n 'self\.file_received' kairon/importer/file_importer.py

Repository: digiteinfotech/kairon

Length of output: 110


Remove unused file_received parameter from FileImporter.__init__.

The file_received parameter is stored as self.file_received in __init__ (line 19) but is never used in any method. The preprocess() method reads directly from self.path and neither validate() nor import_data() reference it. This parameter should be removed from the constructor signature to eliminate dead code.

🤖 Prompt for AI Agents
In kairon/importer/file_importer.py around lines 19 to 28, the constructor
stores a file_received parameter that is never used; remove file_received from
the FileImporter.__init__ signature and delete the self.file_received
assignment, then update any instantiations elsewhere in the codebase (tests and
callers) to stop passing that argument; ensure no other methods reference
self.file_received and run tests to confirm nothing else relies on it.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
kairon/events/definitions/crud_file_upload.py (1)

61-62: Clarify folder path construction to respect API semantics.

Line 61 passes self.bot as the extension_pattern parameter, but this parameter is documented for file extension patterns (e.g., ".tar.gz", ".zip"), not bot identifiers. While glob patterns can match directory names and the code functionally works, this violates the API's intended semantics and creates maintainability confusion.

Instead of the two-step approach:

folder_path = Utility.get_latest_file('file_content_upload_records', self.bot)
path = Utility.get_latest_file(folder_path)

Consider constructing the path explicitly:

path = Utility.get_latest_file(os.path.join('file_content_upload_records', self.bot))

This pattern is used correctly elsewhere in the codebase (e.g., data_importer.py line 71). Note: The same issue appears in content_importer.py line 74.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b51efc and 4c0aeab.

📒 Files selected for processing (2)
  • kairon/events/definitions/crud_file_upload.py
  • kairon/importer/file_importer.py
💤 Files with no reviewable changes (1)
  • kairon/importer/file_importer.py
🧰 Additional context used
🧬 Code graph analysis (1)
kairon/events/definitions/crud_file_upload.py (4)
kairon/shared/utils.py (1)
  • get_latest_file (394-413)
kairon/shared/upload_handler/upload_handler_log_processor.py (2)
  • UploadHandlerLogProcessor (10-120)
  • add_log (16-69)
kairon/shared/data/constant.py (1)
  • EVENT_STATUS (93-111)
kairon/importer/file_importer.py (1)
  • FileImporter (10-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Python CI
  • GitHub Check: Analyze (python)
🔇 Additional comments (2)
kairon/events/definitions/crud_file_upload.py (2)

1-1: LGTM!

The os import is necessary for the new file existence check and removal logic in the finally block.


64-64: LGTM!

Correctly passing folder_path to FileImporter, which expects a directory path and will join it with file_received internally. This aligns with the FileImporter implementation pattern.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
kairon/events/definitions/crud_file_upload.py (1)

77-81: Consider simplifying the cleanup logic.

The finally block first removes the individual file (lines 78-79), then deletes the entire folder_path directory (lines 80-81). Since Utility.delete_directory(folder_path, True) recursively removes the directory and all its contents, the individual file removal is redundant.

🔎 Simplified cleanup approach
         finally:
-            if path and os.path.isfile(path):
-                os.remove(path)
             if folder_path and os.path.isdir(folder_path):
                 Utility.delete_directory(folder_path, True)

This reduces filesystem operations while achieving the same cleanup result. The directory deletion encompasses all files within it.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c0aeab and 0a2891e.

📒 Files selected for processing (1)
  • kairon/events/definitions/crud_file_upload.py
🧰 Additional context used
🧬 Code graph analysis (1)
kairon/events/definitions/crud_file_upload.py (3)
kairon/shared/upload_handler/upload_handler_log_processor.py (1)
  • get_latest_event_file_name (110-114)
kairon/shared/utils.py (2)
  • get_latest_file (394-413)
  • delete_directory (500-509)
kairon/importer/file_importer.py (1)
  • FileImporter (10-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Analyze (python)
  • GitHub Check: Python CI
🔇 Additional comments (2)
kairon/events/definitions/crud_file_upload.py (2)

1-1: LGTM!

The os import is correctly added to support file and directory existence checks in the finally block.


65-65: LGTM - Correct usage of FileImporter with directory path.

The FileImporter is now correctly instantiated with folder_path (a directory) rather than a file path. This aligns with the FileImporter implementation, which expects a directory path and internally joins it with the filename in the preprocess() method.

Comment on lines +62 to +63
folder_path = Utility.get_latest_file('file_content_upload_records', self.bot)
path = Utility.get_latest_file(folder_path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, find and examine the target file
echo "=== Examining target file ==="
cat -n kairon/events/definitions/crud_file_upload.py | sed -n '55,70p'

Repository: digiteinfotech/kairon

Length of output: 1279


🏁 Script executed:

#!/bin/bash
# Find Utility.get_latest_file definition
echo "=== Finding Utility class and get_latest_file method ==="
fd -t f "Utility.py\|utility.py" | head -20

Repository: digiteinfotech/kairon

Length of output: 121


🏁 Script executed:

#!/bin/bash
# Search for get_latest_file method definition and usages
echo "=== Searching for get_latest_file definition ==="
rg -n "def get_latest_file" --type py -A 10

Repository: digiteinfotech/kairon

Length of output: 882


🏁 Script executed:

#!/bin/bash
# Find usages of get_latest_file with multiple arguments to understand patterns
echo "=== Checking get_latest_file usages ==="
rg -n "get_latest_file\(" --type py -B 2 -A 2 | head -60

Repository: digiteinfotech/kairon

Length of output: 5677


🏁 Script executed:

#!/bin/bash
# Search for file_content_upload_records folder structure patterns
echo "=== Checking file_content_upload_records usage patterns ==="
rg -n "file_content_upload_records" --type py -B 2 -A 2

Repository: digiteinfotech/kairon

Length of output: 2418


Fix incorrect folder path construction at line 62.

Line 62 passes self.bot as the second argument to Utility.get_latest_file, but the method signature is get_latest_file(folder, extension_pattern="*"). The second parameter expects a file extension pattern (e.g., "*.tar.gz"), not a bot identifier.

The folder structure throughout the codebase is consistently os.path.join('file_content_upload_records', bot) (see kairon/shared/data/processor.py:8999 and kairon/events/definitions/upload_handler.py:49). All existing usages of get_latest_file pass only the folder path argument; the two-argument pattern here is incorrect.

Recommended fix
-            folder_path = Utility.get_latest_file('file_content_upload_records', self.bot)
+            folder_path = Utility.get_latest_file(f'file_content_upload_records/{self.bot}')
             path = Utility.get_latest_file(folder_path)
🤖 Prompt for AI Agents
In kairon/events/definitions/crud_file_upload.py around lines 62-63, the code
incorrectly calls Utility.get_latest_file('file_content_upload_records',
self.bot) — the second parameter should be an extension pattern, not the bot id.
Construct the folder path as os.path.join('file_content_upload_records',
self.bot) and then call Utility.get_latest_file(folder_path) (i.e., folder_path
= os.path.join('file_content_upload_records', self.bot); path =
Utility.get_latest_file(folder_path)); add an import os at the top if not
present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant