-
Notifications
You must be signed in to change notification settings - Fork 84
Fix error in file upload #2313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix error in file upload #2313
Conversation
📝 WalkthroughWalkthroughRefactors upload resolution to a two-step latest-file lookup, updates FileImporter usage and cleanup to remove files safely, hardens Utility.get_latest_file to handle empty/no-match cases, and adds/adjusts unit tests around file resolution and post-processing cleanup. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Scheduler as Scheduler/Trigger
participant Uploader as CrudFileUploader
participant Utility as Utility.get_latest_file
participant Importer as FileImporter
participant FS as Filesystem
Note over Scheduler,Uploader: Upload processing starts
Scheduler->>Uploader: execute()
Uploader->>Utility: get_latest_file('file_content_upload_records', bot)
Utility-->>Uploader: folder_path
Uploader->>Utility: get_latest_file(folder_path)
Utility-->>Uploader: file_path
Uploader->>Importer: instantiate with folder_path
Importer->>FS: open/parse file_path
alt processing succeeds
Importer-->>Uploader: processed
else processing error
Importer-->>Uploader: raises/throws
end
Uploader->>FS: if os.path.isfile(path) then os.remove(path)
Uploader->>Utility: delete_directory(folder_path, True)
Note over Uploader,FS: cleanup ensures file removed and folder deleted safely
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
kairon/shared/utils.py (1)
408-413: Consider a less speculative error message.The error message on line 412 suggests "Another upload may still be in progress" which may not always be accurate. The folder could be empty for other reasons such as incorrect path, permission issues, or files already processed. Consider a more generic message like:
- raise AppException(f"No files found in folder {folder}. Another upload may still be in progress.") + raise AppException(f"No files found in folder {folder}.")Alternatively, if you want to keep the helpful context, you could phrase it as a possibility rather than a statement:
- raise AppException(f"No files found in folder {folder}. Another upload may still be in progress.") + raise AppException(f"No files found in folder {folder}. This may indicate that another upload is still in progress or that the folder has not been populated yet.")kairon/events/definitions/crud_file_upload.py (1)
61-62: Consider adding a clarifying comment for the two-step path resolution.The two-step approach to resolve the file path is not immediately obvious. The first call retrieves the bot-specific folder, and the second retrieves the latest file within that folder. Adding a brief comment would improve code readability:
💡 Suggested improvement
+ # Step 1: Get the latest folder for this bot's uploaded files folder_path = Utility.get_latest_file('file_content_upload_records', self.bot) + # Step 2: Get the latest file within that folder path = Utility.get_latest_file(folder_path)tests/unit_test/data_processor/data_processor_test.py (2)
9943-9947: Minor formatting: missing space after comma in method signature.The test logic correctly validates the expected behavior. However, there's a minor PEP 8 style issue with the method signature.
🔎 Suggested fix
- def test_get_latest_file_folder_not_exists(self,tmp_path): + def test_get_latest_file_folder_not_exists(self, tmp_path):
9957-9972: Remove unused import and consider test performance.
- The
import oson line 9960 is unused within this test - remove it.- The
time.sleep(1)is necessary for reliablegetctimedifferentiation but adds 1 second to test execution. This is acceptable given the nature of the test.- Same minor formatting issue with the method signature.
🔎 Suggested fix
- def test_get_latest_file_returns_latest(self,tmp_path): + def test_get_latest_file_returns_latest(self, tmp_path): folder = tmp_path / "files" import time - import os folder.mkdir()
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
kairon/events/definitions/crud_file_upload.pykairon/importer/file_importer.pykairon/shared/utils.pytests/unit_test/data_processor/data_processor_test.pytests/unit_test/data_processor/upload_handler_log_processor_test.py
🧰 Additional context used
🧬 Code graph analysis (4)
tests/unit_test/data_processor/data_processor_test.py (2)
kairon/exceptions.py (1)
AppException(1-3)kairon/shared/utils.py (1)
get_latest_file(394-413)
tests/unit_test/data_processor/upload_handler_log_processor_test.py (1)
kairon/events/definitions/crud_file_upload.py (2)
CrudFileUploader(13-78)execute(54-78)
kairon/shared/utils.py (1)
kairon/exceptions.py (1)
AppException(1-3)
kairon/events/definitions/crud_file_upload.py (1)
kairon/shared/utils.py (1)
get_latest_file(394-413)
🪛 Ruff (0.14.10)
kairon/shared/utils.py
412-412: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: Python CI
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
kairon/events/definitions/crud_file_upload.py (1)
77-78: LGTM! Improved cleanup logic.The change to use
os.path.isfile(path)andos.remove(path)is more precise and safer than the previous approach. This ensures that only files are deleted in the cleanup, preventing accidental directory removal.tests/unit_test/data_processor/upload_handler_log_processor_test.py (1)
171-231: LGTM! Good test coverage for cleanup behavior.The new tests effectively verify that files are properly deleted in the
finallyblock ofCrudFileUploader.execute(), covering both exception scenarios:
test_execute_deletes_file_after_processing: Verifies deletion whenpreprocessfailstest_execute_deletes_file_in_finally: Verifies deletion whenFileImporterinstantiation failsThe tests appropriately use mocking to isolate the cleanup logic and use
tmp_pathfixture for safe temporary file creation.tests/unit_test/data_processor/data_processor_test.py (1)
9949-9954: LGTM - Test correctly validates empty folder handling.Same minor formatting issue with the method signature (missing space after comma). Consider applying the same fix as suggested above.
kairon/importer/file_importer.py
Outdated
| def preprocess(self): | ||
| file_path = os.path.join(self.path, self.file_received) | ||
|
|
||
| try: | ||
| df = pd.read_csv(file_path) | ||
| df = pd.read_csv(self.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the file structure and context around the mentioned lines
head -80 kairon/importer/file_importer.py | cat -nRepository: digiteinfotech/kairon
Length of output: 2348
🏁 Script executed:
# Search for all file_received usages in the repository
rg -n 'file_received' --type=py -B 2 -A 2Repository: digiteinfotech/kairon
Length of output: 21041
🏁 Script executed:
# Check specifically in file_importer.py for self.file_received
rg -n 'self\.file_received' kairon/importer/file_importer.pyRepository: digiteinfotech/kairon
Length of output: 110
Remove unused file_received parameter from FileImporter.__init__.
The file_received parameter is stored as self.file_received in __init__ (line 19) but is never used in any method. The preprocess() method reads directly from self.path and neither validate() nor import_data() reference it. This parameter should be removed from the constructor signature to eliminate dead code.
🤖 Prompt for AI Agents
In kairon/importer/file_importer.py around lines 19 to 28, the constructor
stores a file_received parameter that is never used; remove file_received from
the FileImporter.__init__ signature and delete the self.file_received
assignment, then update any instantiations elsewhere in the codebase (tests and
callers) to stop passing that argument; ensure no other methods reference
self.file_received and run tests to confirm nothing else relies on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
kairon/events/definitions/crud_file_upload.py (1)
61-62: Clarify folder path construction to respect API semantics.Line 61 passes
self.botas theextension_patternparameter, but this parameter is documented for file extension patterns (e.g., ".tar.gz", ".zip"), not bot identifiers. While glob patterns can match directory names and the code functionally works, this violates the API's intended semantics and creates maintainability confusion.Instead of the two-step approach:
folder_path = Utility.get_latest_file('file_content_upload_records', self.bot) path = Utility.get_latest_file(folder_path)Consider constructing the path explicitly:
path = Utility.get_latest_file(os.path.join('file_content_upload_records', self.bot))This pattern is used correctly elsewhere in the codebase (e.g.,
data_importer.pyline 71). Note: The same issue appears incontent_importer.pyline 74.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
kairon/events/definitions/crud_file_upload.pykairon/importer/file_importer.py
💤 Files with no reviewable changes (1)
- kairon/importer/file_importer.py
🧰 Additional context used
🧬 Code graph analysis (1)
kairon/events/definitions/crud_file_upload.py (4)
kairon/shared/utils.py (1)
get_latest_file(394-413)kairon/shared/upload_handler/upload_handler_log_processor.py (2)
UploadHandlerLogProcessor(10-120)add_log(16-69)kairon/shared/data/constant.py (1)
EVENT_STATUS(93-111)kairon/importer/file_importer.py (1)
FileImporter(10-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: Python CI
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
kairon/events/definitions/crud_file_upload.py (2)
1-1: LGTM!The
osimport is necessary for the new file existence check and removal logic in the finally block.
64-64: LGTM!Correctly passing
folder_pathtoFileImporter, which expects a directory path and will join it withfile_receivedinternally. This aligns with theFileImporterimplementation pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
kairon/events/definitions/crud_file_upload.py (1)
77-81: Consider simplifying the cleanup logic.The finally block first removes the individual file (lines 78-79), then deletes the entire
folder_pathdirectory (lines 80-81). SinceUtility.delete_directory(folder_path, True)recursively removes the directory and all its contents, the individual file removal is redundant.🔎 Simplified cleanup approach
finally: - if path and os.path.isfile(path): - os.remove(path) if folder_path and os.path.isdir(folder_path): Utility.delete_directory(folder_path, True)This reduces filesystem operations while achieving the same cleanup result. The directory deletion encompasses all files within it.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
kairon/events/definitions/crud_file_upload.py
🧰 Additional context used
🧬 Code graph analysis (1)
kairon/events/definitions/crud_file_upload.py (3)
kairon/shared/upload_handler/upload_handler_log_processor.py (1)
get_latest_event_file_name(110-114)kairon/shared/utils.py (2)
get_latest_file(394-413)delete_directory(500-509)kairon/importer/file_importer.py (1)
FileImporter(10-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: Analyze (python)
- GitHub Check: Python CI
🔇 Additional comments (2)
kairon/events/definitions/crud_file_upload.py (2)
1-1: LGTM!The
osimport is correctly added to support file and directory existence checks in the finally block.
65-65: LGTM - Correct usage of FileImporter with directory path.The FileImporter is now correctly instantiated with
folder_path(a directory) rather than a file path. This aligns with the FileImporter implementation, which expects a directory path and internally joins it with the filename in thepreprocess()method.
| folder_path = Utility.get_latest_file('file_content_upload_records', self.bot) | ||
| path = Utility.get_latest_file(folder_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, find and examine the target file
echo "=== Examining target file ==="
cat -n kairon/events/definitions/crud_file_upload.py | sed -n '55,70p'Repository: digiteinfotech/kairon
Length of output: 1279
🏁 Script executed:
#!/bin/bash
# Find Utility.get_latest_file definition
echo "=== Finding Utility class and get_latest_file method ==="
fd -t f "Utility.py\|utility.py" | head -20Repository: digiteinfotech/kairon
Length of output: 121
🏁 Script executed:
#!/bin/bash
# Search for get_latest_file method definition and usages
echo "=== Searching for get_latest_file definition ==="
rg -n "def get_latest_file" --type py -A 10Repository: digiteinfotech/kairon
Length of output: 882
🏁 Script executed:
#!/bin/bash
# Find usages of get_latest_file with multiple arguments to understand patterns
echo "=== Checking get_latest_file usages ==="
rg -n "get_latest_file\(" --type py -B 2 -A 2 | head -60Repository: digiteinfotech/kairon
Length of output: 5677
🏁 Script executed:
#!/bin/bash
# Search for file_content_upload_records folder structure patterns
echo "=== Checking file_content_upload_records usage patterns ==="
rg -n "file_content_upload_records" --type py -B 2 -A 2Repository: digiteinfotech/kairon
Length of output: 2418
Fix incorrect folder path construction at line 62.
Line 62 passes self.bot as the second argument to Utility.get_latest_file, but the method signature is get_latest_file(folder, extension_pattern="*"). The second parameter expects a file extension pattern (e.g., "*.tar.gz"), not a bot identifier.
The folder structure throughout the codebase is consistently os.path.join('file_content_upload_records', bot) (see kairon/shared/data/processor.py:8999 and kairon/events/definitions/upload_handler.py:49). All existing usages of get_latest_file pass only the folder path argument; the two-argument pattern here is incorrect.
Recommended fix
- folder_path = Utility.get_latest_file('file_content_upload_records', self.bot)
+ folder_path = Utility.get_latest_file(f'file_content_upload_records/{self.bot}')
path = Utility.get_latest_file(folder_path)🤖 Prompt for AI Agents
In kairon/events/definitions/crud_file_upload.py around lines 62-63, the code
incorrectly calls Utility.get_latest_file('file_content_upload_records',
self.bot) — the second parameter should be an extension pattern, not the bot id.
Construct the folder path as os.path.join('file_content_upload_records',
self.bot) and then call Utility.get_latest_file(folder_path) (i.e., folder_path
= os.path.join('file_content_upload_records', self.bot); path =
Utility.get_latest_file(folder_path)); add an import os at the top if not
present.
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.