-
Notifications
You must be signed in to change notification settings - Fork 8.2k
feat: create load file component #11133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe PR introduces a new LoadFileComponent for loading files from multiple storage backends (local, AWS S3, Google Drive) and removes the Local storage option from several starter project configurations, streamlining available storage choices to cloud providers. Changes
Sequence DiagramsequenceDiagram
participant User as User/System
participant LFC as LoadFileComponent
participant StorageMux as Storage Router
participant Local as Local FS
participant AWS as AWS S3
participant GDrive as Google Drive
participant Result as Result Handler
User->>LFC: Load file (with storage_location)
LFC->>LFC: Validate & resolve paths
LFC->>StorageMux: Route by storage type
alt Local Storage
StorageMux->>Local: Read file from path
Local-->>Result: File content / BaseFile
else AWS S3
StorageMux->>AWS: Validate credentials
AWS->>AWS: Download to temp file
AWS-->>Result: BaseFile (temp path, delete-flag)
else Google Drive
StorageMux->>GDrive: Validate credentials & file_id
GDrive->>GDrive: Download to temp file
GDrive-->>Result: BaseFile (temp path, delete-flag)
end
Result->>LFC: Process files (add Data/paths)
LFC->>LFC: Compose Message with file paths
LFC-->>User: Return Message (files field)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touchesImportant Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (4 inconclusive)
✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #11133 +/- ##
=======================================
Coverage 33.24% 33.24%
=======================================
Files 1394 1394
Lines 65986 65986
Branches 9770 9770
=======================================
Hits 21934 21934
Misses 42925 42925
Partials 1127 1127
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (1)
876-1555: Unresolved merge conflict in flow definition; JSON is invalid and requires immediate resolution.This starter project contains unresolved git conflict markers that render the file unparseable:
- Conflict markers present:
<<<<<<< Updated upstream,=======,>>>>>>> Stashed changes.- Two conflicting File node definitions:
File-kiTPL(upstream) andFile-gVVqw(downstream) with overlapping purposes and different template configurations.- The file cannot be loaded as valid JSON until the conflict is resolved.
The two File node variants also differ in
storage_location.options—one includes cloud-only options (AWS, Google Drive) while the context suggests the other may differ. This inconsistency must be aligned when resolving the conflict.Action required:
- Resolve the merge conflict by selecting one File node implementation and removing all conflict markers.
- Delete the unused File node to eliminate duplication.
- Verify the chosen implementation's configuration aligns with intended deployment context.
- Ensure all edges reference the correct, single File node ID after resolution.
- Validate the resulting JSON is well-formed before committing.
🧹 Nitpick comments (6)
src/lfx/src/lfx/components/files_and_knowledge/__init__.py (1)
7-23: Wire-up ofLoadFileComponentlooks correct; consider alphabetizing imports/mapping.The TYPE_CHECKING imports and
_dynamic_importsentry forLoadFileComponentare correct and consistent withload_file.py. To align with the repo convention of alphabetically sorted imports/exports in these component__init__modules, you may want to moveLoadFileComponentafterKnowledgeRetrievalComponentin the TYPE_CHECKING block and_dynamic_importsdict (the__all__list is already sorted).
Based on learnings, imports in these files are expected to be alphabetically sorted.src/lfx/src/lfx/components/files_and_knowledge/load_file.py (3)
1-32: Minor: adjust module docstring to match behavior (paths only, not bytes).The top-level docstring mentions returning “raw file paths/bytes”, but this component only ever exposes paths (bytes are only used internally when downloading from AWS/GDrive). Consider tightening the wording to “raw file paths” to avoid confusion.
26-31: Duplicate storage-location helper vsFileComponent; consider centralizing.
_get_storage_location_optionsand thestorage_locationinput definition are effectively the same as inFileComponent. If both components are meant to stay in sync (e.g., when adding/removing backends or Astra-specific behavior), extracting this into a shared helper or constant would reduce drift between components.Also applies to: 120-206
376-414: Clarify S3 behavior when environment storage is S3 vs. explicit AWS/Drive selection.
load_files_rawusessettings.storage_type == "s3"to decide to returnfile.pathwithout an existence check (treating it as a virtual key), while Local storage filters onfile.path.exists(). For files coming from_read_from_aws_s3/_read_from_google_drive,file.pathis always a local temp path.If this component is ever used in an environment where
settings.storage_type == "s3"andstorage_locationis"AWS"or"Google Drive", those temp paths would be treated as S3 keys rather than local paths, which may confuse downstream components that rely on the environment storage abstraction.Consider guarding the S3 branch with the selected storage location, e.g. only treating paths as virtual keys when
storage_locationisLocal/ default, and always using the “local-path” branch for explicit AWS/GDrive locations.Example tweak (illustrative)
- settings = get_settings_service().settings - - # Collect file paths - for S3 storage, use virtual storage keys - # For local storage, use actual file paths - if settings.storage_type == "s3": - file_paths = [file.path.as_posix() for file in files] - else: - file_paths = [file.path.as_posix() for file in files if file.path.exists()] + settings = get_settings_service().settings + storage_location = self._get_selected_storage_location() + + # For env-level S3 storage, only treat paths as virtual keys when using the + # platform storage (Local/default). Explicit AWS/Google Drive selections + # should keep using the local temp paths created by this component. + if settings.storage_type == "s3" and storage_location in ("Local", "", None): + file_paths = [file.path.as_posix() for file in files] + else: + file_paths = [file.path.as_posix() for file in files if file.path.exists()]src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json (2)
2639-3308: Two different File nodes embed divergent copies ofFileComponentcodeThe graph defines two File nodes:
File-7D3hP(displayed as “File”, modulelfx.components.files_and_knowledge.file.FileComponent, code hash5008cc086d7f)File-CxEQk(displayed as “Read File”, same module, older code hash9d57e0bfda44)Each node inlines a large, slightly different
FileComponentimplementation in itstemplate.code.value. OnlyFile-CxEQkis actually wired into the flow (File-CxEQk → SplitText-1fBfh), whileFile-7D3hPappears unused.This duplication is risky:
- Behaviour for the same module (
FileComponent) can diverge between nodes in the same template.- Future changes to the backend File component are likely to get out of sync with one or both embedded copies.
- The unused
File-7D3hPnode adds noise and may confuse users browsing the starter.Consider simplifying:
- Keep a single File node in this flow (probably the new behaviour you intend) and remove the unused one.
- Ensure that node’s template matches the canonical backend
FileComponent(fields, defaults, andstorage_locationoptions).- Update edges to point at the chosen File node.
This will make the starter project easier to maintain and less surprising for users.
Also applies to: 5438-6157
2816-3290:_get_selected_storage_locationsilently defaulting to"Local"may bypass the “no Local in cloud” intentInside the embedded
FileComponentcode (both copies),_get_selected_storage_locationreturns"Local"wheneverself.storage_locationis unset/falsy:def _get_selected_storage_location(self) -> str: if hasattr(self, "storage_location") and self.storage_location: ... return "Local" # Default to Local if not specifiedAt the same time:
_get_storage_location_options()omits Local whenis_astra_cloud_environment()is true.update_build_configupdatesstorage_location.optionsdynamically from_get_storage_location_options.This means in Astra cloud environments:
- The UI hides the Local option, but if the user never explicitly chooses a storage location, the runtime still treats it as
"Local"via this default.- That can be surprising and may produce confusing errors when no local files are actually available.
It may be safer to:
- Return
""orNonewhen nothing is selected, and:
- Either raise a clear error (“Storage Location is required”) in
_validate_and_resolve_paths, or- Default to a cloud backend consistent with how the template is positioned.
If you want to keep defaulting to Local for self‑hosted installs, you could gate that default on
not is_astra_cloud_environment().Also applies to: 5630-6023
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
src/frontend/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (7)
src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.jsonsrc/backend/base/langflow/initial_setup/starter_projects/News Aggregator.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Text Sentiment Analysis.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.jsonsrc/lfx/src/lfx/components/files_and_knowledge/__init__.pysrc/lfx/src/lfx/components/files_and_knowledge/load_file.py
💤 Files with no reviewable changes (3)
- src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json
- src/backend/base/langflow/initial_setup/starter_projects/Text Sentiment Analysis.json
- src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-11-24T19:46:09.104Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-11-24T19:46:09.104Z
Learning: Applies to src/backend/base/langflow/components/**/*.py : Add new components to the appropriate subdirectory under `src/backend/base/langflow/components/` (agents/, data/, embeddings/, input_output/, models/, processing/, prompts/, tools/, or vectorstores/)
Applied to files:
src/lfx/src/lfx/components/files_and_knowledge/load_file.pysrc/lfx/src/lfx/components/files_and_knowledge/__init__.pysrc/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Test component versioning and backward compatibility using `file_names_mapping` fixture with `VersionComponentMapping` objects mapping component files across Langflow versions
Applied to files:
src/lfx/src/lfx/components/files_and_knowledge/load_file.py
📚 Learning: 2025-11-24T19:46:09.104Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-11-24T19:46:09.104Z
Learning: Applies to src/backend/base/langflow/components/**/__init__.py : Update `__init__.py` with alphabetically sorted imports when adding new components
Applied to files:
src/lfx/src/lfx/components/files_and_knowledge/__init__.py
📚 Learning: 2025-11-24T19:46:09.104Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-11-24T19:46:09.104Z
Learning: Backend components should be structured with clear separation of concerns: agents, data processing, embeddings, input/output, models, text processing, prompts, tools, and vector stores
Applied to files:
src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json
🧬 Code graph analysis (2)
src/lfx/src/lfx/components/files_and_knowledge/load_file.py (6)
src/lfx/src/lfx/base/data/base_file.py (2)
BaseFileComponent(28-820)BaseFile(38-107)src/lfx/src/lfx/schema/message.py (1)
Message(34-315)src/lfx/src/lfx/utils/validate_cloud.py (1)
is_astra_cloud_environment(10-19)src/lfx/src/lfx/custom/custom_component/component.py (1)
get_base_inputs(183-186)src/lfx/src/lfx/schema/data.py (1)
Data(26-288)src/lfx/src/lfx/base/data/cloud_storage_utils.py (3)
create_s3_client(33-59)validate_aws_credentials(13-30)create_google_drive_service(127-156)
src/lfx/src/lfx/components/files_and_knowledge/__init__.py (1)
src/lfx/src/lfx/components/files_and_knowledge/load_file.py (1)
LoadFileComponent(34-488)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test Docker Images / Test docker images
- GitHub Check: Test Starter Templates
- GitHub Check: test-starter-projects
🔇 Additional comments (4)
src/lfx/src/lfx/components/files_and_knowledge/load_file.py (3)
222-259: Cloud path resolution and temp-file handling look consistent with existing patterns.The overrides for
_validate_and_resolve_paths,_read_from_aws_s3, and_read_from_google_drivecorrectly:
- Prioritize cloud backends over
file_path_strand the default FileInput.- Validate credentials and required keys/IDs.
- Download to temp files and wrap them in
BaseFileinstances withdelete_after_processing=True.- Clean up temp files on error with
contextlib.suppress.This matches the conventions used in other file components and looks solid.
Also applies to: 260-295, 297-353
355-374:process_filesaligns with “no-op” semantics and Data layout expectations.Overriding
process_filesto only ensure eachBaseFilehas aDataobject withSERVER_FILE_PATH_FIELDNAME(and not parsing content) is consistent with the LoadFile design and should keep downstream helpers that expect that key happy.
416-488: Build-config toggling for storage backends is clear and consistent with other components.
update_build_config:
- Refreshes
storage_location.optionsvia_get_storage_location_options()(respecting Astra cloud rules).- Hides all storage-specific fields by default, then selectively shows AWS or Google Drive fields and hides the
pathFileInput when a cloud backend is chosen.- Falls back to showing
pathwhen nothing is selected.This matches the UI behavior of the other multi-storage file components and should make the UX predictable.
src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json (1)
2639-3315: Unresolved merge conflict markers make this JSON invalidThis section still contains Git conflict markers (
<<<<<<< Updated upstream,=======,>>>>>>> Stashed changes) around the node data for what looks like the newFile-7D3hPnode and aLanguageModelComponentnode. That means:
- The JSON is syntactically invalid and the starter project cannot be loaded.
- The
data.id/nodekeys are effectively duplicated/ambiguous in this object.You need to resolve the merge by hand:
- Remove all
<<<<<<<,=======,>>>>>>>lines.- Split this into clean, separate node objects (e.g., one for the new File component, one for the Language Model) with consistent
idvalues.- Ensure edges that currently point to
LanguageModelComponent-rL7g3(and any new File node) reference the correct node IDs.Until this is fixed the template will fail to parse and won’t be usable.
⛔ Skipped due to learnings
Learnt from: edwinjosechittilappilly Repo: langflow-ai/langflow PR: 8504 File: src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json:391-393 Timestamp: 2025-06-12T15:25:01.072Z Learning: The repository owner prefers CodeRabbit not to review or comment on JSON files because they are autogenerated.
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.