Skip to content

Conversation

@HimavarshaVS
Copy link
Collaborator

@HimavarshaVS HimavarshaVS commented Dec 22, 2025

Summary by CodeRabbit

  • New Features

    • Added a new file loading component with enhanced multi-storage support, enabling AWS S3 and Google Drive integration for downstream processing.
  • Chores

    • Removed Local storage option from multiple starter project configurations (Document Q&A, News Aggregator, Portfolio Website, Text Sentiment Analysis, Vector Store RAG); AWS and Google Drive storage options remain available.

✏️ Tip: You can customize this high-level summary in your review settings.

@HimavarshaVS HimavarshaVS changed the title create load file component feat: create load file component Dec 22, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 22, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The PR introduces a new LoadFileComponent for loading files from multiple storage backends (local, AWS S3, Google Drive) and removes the Local storage option from several starter project configurations, streamlining available storage choices to cloud providers.

Changes

Cohort / File(s) Summary
Starter Projects: Storage Option Removals
src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json, News Aggregator.json, Text Sentiment Analysis.json
Removed Local storage option (hard-drive icon) from storage_location dropdown menus; AWS and Google Drive remain available.
Starter Projects: File Component Integration
src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json
Contains unresolved merge conflict markers; introduces File component node (File-gVVqw) with cloud-aware file handling, Docling-based document processing, and storage credential management; partially merges with existing LanguageModelComponent.
Starter Projects: File Component Replacement
src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json
Replaced root LanguageModelComponent node (LanguageModelComponent-rL7g3) with new File component (File-7D3hP); expanded node definition includes comprehensive file upload/read logic, multi-storage support (AWS S3, Google Drive, local paths), Docling processing pipeline, and extensive configuration inputs.
Module Export Registration
src/lfx/src/lfx/components/files_and_knowledge/__init__.py
Added LoadFileComponent to public exports via TYPE_CHECKING import, dynamic imports mapping, and \all list.
New File Loading Component
src/lfx/src/lfx/components/files_and_knowledge/load_file.py
Introduced LoadFileComponent class supporting file loading from local, AWS S3, and Google Drive sources; includes dynamic storage location filtering by Astra cloud environment, credential validation, temporary file handling with post-processing cleanup, and path resolution logic for tool-mode usage.

Sequence Diagram

sequenceDiagram
    participant User as User/System
    participant LFC as LoadFileComponent
    participant StorageMux as Storage Router
    participant Local as Local FS
    participant AWS as AWS S3
    participant GDrive as Google Drive
    participant Result as Result Handler

    User->>LFC: Load file (with storage_location)
    LFC->>LFC: Validate & resolve paths
    LFC->>StorageMux: Route by storage type

    alt Local Storage
        StorageMux->>Local: Read file from path
        Local-->>Result: File content / BaseFile
    else AWS S3
        StorageMux->>AWS: Validate credentials
        AWS->>AWS: Download to temp file
        AWS-->>Result: BaseFile (temp path, delete-flag)
    else Google Drive
        StorageMux->>GDrive: Validate credentials & file_id
        GDrive->>GDrive: Download to temp file
        GDrive-->>Result: BaseFile (temp path, delete-flag)
    end

    Result->>LFC: Process files (add Data/paths)
    LFC->>LFC: Compose Message with file paths
    LFC-->>User: Return Message (files field)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (4 inconclusive)
Check name Status Explanation Resolution
Test Coverage For New Implementations ❓ Inconclusive Unable to locate specific test files for LoadFileComponent implementation in the provided context. Please provide the PR details, file structure, or test file locations to verify test coverage adequacy.
Test Quality And Coverage ❓ Inconclusive Unable to assess pull request test quality and coverage without access to repository structure, implementation details, or test files. Please provide the repository contents, pull request details, or specific files to analyze for test coverage and quality assessment.
Test File Naming And Structure ❓ Inconclusive Unable to examine pull request without access to specific repository or pull request details. Provide pull request URL, repository name, or specific file paths to verify test naming patterns and structure.
Excessive Mock Usage Warning ❓ Inconclusive PR introduces LoadFileComponent without corresponding test file, making it impossible to assess test quality and mock usage patterns. Add test file for LoadFileComponent following existing patterns, using real objects for core logic and mocks only for external dependencies like AWS S3 and Google Drive APIs.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: create load file component' accurately describes the main change - introducing a new LoadFileComponent for file handling across storage backends.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 17%
16.67% (4703/28209) 10% (2183/21809) 10.94% (678/6192)

Unit Test Results

Tests Skipped Failures Errors Time
1829 0 💤 0 ❌ 0 🔥 25.051s ⏱️

@codecov
Copy link

codecov bot commented Dec 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.24%. Comparing base (87f5721) to head (b2d29e0).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main   #11133   +/-   ##
=======================================
  Coverage   33.24%   33.24%           
=======================================
  Files        1394     1394           
  Lines       65986    65986           
  Branches     9770     9770           
=======================================
  Hits        21934    21934           
  Misses      42925    42925           
  Partials     1127     1127           
Flag Coverage Δ
frontend 15.36% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions github-actions bot added the enhancement New feature or request label Dec 22, 2025
@jordanrfrazier jordanrfrazier marked this pull request as draft December 22, 2025 22:28
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (1)

876-1555: Unresolved merge conflict in flow definition; JSON is invalid and requires immediate resolution.

This starter project contains unresolved git conflict markers that render the file unparseable:

  • Conflict markers present: <<<<<<< Updated upstream, =======, >>>>>>> Stashed changes.
  • Two conflicting File node definitions: File-kiTPL (upstream) and File-gVVqw (downstream) with overlapping purposes and different template configurations.
  • The file cannot be loaded as valid JSON until the conflict is resolved.

The two File node variants also differ in storage_location.options—one includes cloud-only options (AWS, Google Drive) while the context suggests the other may differ. This inconsistency must be aligned when resolving the conflict.

Action required:

  1. Resolve the merge conflict by selecting one File node implementation and removing all conflict markers.
  2. Delete the unused File node to eliminate duplication.
  3. Verify the chosen implementation's configuration aligns with intended deployment context.
  4. Ensure all edges reference the correct, single File node ID after resolution.
  5. Validate the resulting JSON is well-formed before committing.
🧹 Nitpick comments (6)
src/lfx/src/lfx/components/files_and_knowledge/__init__.py (1)

7-23: Wire-up of LoadFileComponent looks correct; consider alphabetizing imports/mapping.

The TYPE_CHECKING imports and _dynamic_imports entry for LoadFileComponent are correct and consistent with load_file.py. To align with the repo convention of alphabetically sorted imports/exports in these component __init__ modules, you may want to move LoadFileComponent after KnowledgeRetrievalComponent in the TYPE_CHECKING block and _dynamic_imports dict (the __all__ list is already sorted).
Based on learnings, imports in these files are expected to be alphabetically sorted.

src/lfx/src/lfx/components/files_and_knowledge/load_file.py (3)

1-32: Minor: adjust module docstring to match behavior (paths only, not bytes).

The top-level docstring mentions returning “raw file paths/bytes”, but this component only ever exposes paths (bytes are only used internally when downloading from AWS/GDrive). Consider tightening the wording to “raw file paths” to avoid confusion.


26-31: Duplicate storage-location helper vs FileComponent; consider centralizing.

_get_storage_location_options and the storage_location input definition are effectively the same as in FileComponent. If both components are meant to stay in sync (e.g., when adding/removing backends or Astra-specific behavior), extracting this into a shared helper or constant would reduce drift between components.

Also applies to: 120-206


376-414: Clarify S3 behavior when environment storage is S3 vs. explicit AWS/Drive selection.

load_files_raw uses settings.storage_type == "s3" to decide to return file.path without an existence check (treating it as a virtual key), while Local storage filters on file.path.exists(). For files coming from _read_from_aws_s3 / _read_from_google_drive, file.path is always a local temp path.

If this component is ever used in an environment where settings.storage_type == "s3" and storage_location is "AWS" or "Google Drive", those temp paths would be treated as S3 keys rather than local paths, which may confuse downstream components that rely on the environment storage abstraction.

Consider guarding the S3 branch with the selected storage location, e.g. only treating paths as virtual keys when storage_location is Local / default, and always using the “local-path” branch for explicit AWS/GDrive locations.

Example tweak (illustrative)
-        settings = get_settings_service().settings
-
-        # Collect file paths - for S3 storage, use virtual storage keys
-        # For local storage, use actual file paths
-        if settings.storage_type == "s3":
-            file_paths = [file.path.as_posix() for file in files]
-        else:
-            file_paths = [file.path.as_posix() for file in files if file.path.exists()]
+        settings = get_settings_service().settings
+        storage_location = self._get_selected_storage_location()
+
+        # For env-level S3 storage, only treat paths as virtual keys when using the
+        # platform storage (Local/default). Explicit AWS/Google Drive selections
+        # should keep using the local temp paths created by this component.
+        if settings.storage_type == "s3" and storage_location in ("Local", "", None):
+            file_paths = [file.path.as_posix() for file in files]
+        else:
+            file_paths = [file.path.as_posix() for file in files if file.path.exists()]
src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json (2)

2639-3308: Two different File nodes embed divergent copies of FileComponent code

The graph defines two File nodes:

  • File-7D3hP (displayed as “File”, module lfx.components.files_and_knowledge.file.FileComponent, code hash 5008cc086d7f)
  • File-CxEQk (displayed as “Read File”, same module, older code hash 9d57e0bfda44)

Each node inlines a large, slightly different FileComponent implementation in its template.code.value. Only File-CxEQk is actually wired into the flow (File-CxEQk → SplitText-1fBfh), while File-7D3hP appears unused.

This duplication is risky:

  • Behaviour for the same module (FileComponent) can diverge between nodes in the same template.
  • Future changes to the backend File component are likely to get out of sync with one or both embedded copies.
  • The unused File-7D3hP node adds noise and may confuse users browsing the starter.

Consider simplifying:

  • Keep a single File node in this flow (probably the new behaviour you intend) and remove the unused one.
  • Ensure that node’s template matches the canonical backend FileComponent (fields, defaults, and storage_location options).
  • Update edges to point at the chosen File node.

This will make the starter project easier to maintain and less surprising for users.

Also applies to: 5438-6157


2816-3290: _get_selected_storage_location silently defaulting to "Local" may bypass the “no Local in cloud” intent

Inside the embedded FileComponent code (both copies), _get_selected_storage_location returns "Local" whenever self.storage_location is unset/falsy:

def _get_selected_storage_location(self) -> str:
    if hasattr(self, "storage_location") and self.storage_location:
        ...
    return "Local"  # Default to Local if not specified

At the same time:

  • _get_storage_location_options() omits Local when is_astra_cloud_environment() is true.
  • update_build_config updates storage_location.options dynamically from _get_storage_location_options.

This means in Astra cloud environments:

  • The UI hides the Local option, but if the user never explicitly chooses a storage location, the runtime still treats it as "Local" via this default.
  • That can be surprising and may produce confusing errors when no local files are actually available.

It may be safer to:

  • Return "" or None when nothing is selected, and:
    • Either raise a clear error (“Storage Location is required”) in _validate_and_resolve_paths, or
    • Default to a cloud backend consistent with how the template is positioned.

If you want to keep defaulting to Local for self‑hosted installs, you could gate that default on not is_astra_cloud_environment().

Also applies to: 5630-6023

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87f5721 and b2d29e0.

⛔ Files ignored due to path filters (1)
  • src/frontend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (7)
  • src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json
  • src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json
  • src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json
  • src/backend/base/langflow/initial_setup/starter_projects/Text Sentiment Analysis.json
  • src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json
  • src/lfx/src/lfx/components/files_and_knowledge/__init__.py
  • src/lfx/src/lfx/components/files_and_knowledge/load_file.py
💤 Files with no reviewable changes (3)
  • src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json
  • src/backend/base/langflow/initial_setup/starter_projects/Text Sentiment Analysis.json
  • src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-11-24T19:46:09.104Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-11-24T19:46:09.104Z
Learning: Applies to src/backend/base/langflow/components/**/*.py : Add new components to the appropriate subdirectory under `src/backend/base/langflow/components/` (agents/, data/, embeddings/, input_output/, models/, processing/, prompts/, tools/, or vectorstores/)

Applied to files:

  • src/lfx/src/lfx/components/files_and_knowledge/load_file.py
  • src/lfx/src/lfx/components/files_and_knowledge/__init__.py
  • src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Test component versioning and backward compatibility using `file_names_mapping` fixture with `VersionComponentMapping` objects mapping component files across Langflow versions

Applied to files:

  • src/lfx/src/lfx/components/files_and_knowledge/load_file.py
📚 Learning: 2025-11-24T19:46:09.104Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-11-24T19:46:09.104Z
Learning: Applies to src/backend/base/langflow/components/**/__init__.py : Update `__init__.py` with alphabetically sorted imports when adding new components

Applied to files:

  • src/lfx/src/lfx/components/files_and_knowledge/__init__.py
📚 Learning: 2025-11-24T19:46:09.104Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-11-24T19:46:09.104Z
Learning: Backend components should be structured with clear separation of concerns: agents, data processing, embeddings, input/output, models, text processing, prompts, tools, and vector stores

Applied to files:

  • src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json
🧬 Code graph analysis (2)
src/lfx/src/lfx/components/files_and_knowledge/load_file.py (6)
src/lfx/src/lfx/base/data/base_file.py (2)
  • BaseFileComponent (28-820)
  • BaseFile (38-107)
src/lfx/src/lfx/schema/message.py (1)
  • Message (34-315)
src/lfx/src/lfx/utils/validate_cloud.py (1)
  • is_astra_cloud_environment (10-19)
src/lfx/src/lfx/custom/custom_component/component.py (1)
  • get_base_inputs (183-186)
src/lfx/src/lfx/schema/data.py (1)
  • Data (26-288)
src/lfx/src/lfx/base/data/cloud_storage_utils.py (3)
  • create_s3_client (33-59)
  • validate_aws_credentials (13-30)
  • create_google_drive_service (127-156)
src/lfx/src/lfx/components/files_and_knowledge/__init__.py (1)
src/lfx/src/lfx/components/files_and_knowledge/load_file.py (1)
  • LoadFileComponent (34-488)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test Docker Images / Test docker images
  • GitHub Check: Test Starter Templates
  • GitHub Check: test-starter-projects
🔇 Additional comments (4)
src/lfx/src/lfx/components/files_and_knowledge/load_file.py (3)

222-259: Cloud path resolution and temp-file handling look consistent with existing patterns.

The overrides for _validate_and_resolve_paths, _read_from_aws_s3, and _read_from_google_drive correctly:

  • Prioritize cloud backends over file_path_str and the default FileInput.
  • Validate credentials and required keys/IDs.
  • Download to temp files and wrap them in BaseFile instances with delete_after_processing=True.
  • Clean up temp files on error with contextlib.suppress.

This matches the conventions used in other file components and looks solid.

Also applies to: 260-295, 297-353


355-374: process_files aligns with “no-op” semantics and Data layout expectations.

Overriding process_files to only ensure each BaseFile has a Data object with SERVER_FILE_PATH_FIELDNAME (and not parsing content) is consistent with the LoadFile design and should keep downstream helpers that expect that key happy.


416-488: Build-config toggling for storage backends is clear and consistent with other components.

update_build_config:

  • Refreshes storage_location.options via _get_storage_location_options() (respecting Astra cloud rules).
  • Hides all storage-specific fields by default, then selectively shows AWS or Google Drive fields and hides the path FileInput when a cloud backend is chosen.
  • Falls back to showing path when nothing is selected.

This matches the UI behavior of the other multi-storage file components and should make the UX predictable.

src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json (1)

2639-3315: Unresolved merge conflict markers make this JSON invalid

This section still contains Git conflict markers (<<<<<<< Updated upstream, =======, >>>>>>> Stashed changes) around the node data for what looks like the new File-7D3hP node and a LanguageModelComponent node. That means:

  • The JSON is syntactically invalid and the starter project cannot be loaded.
  • The data.id / node keys are effectively duplicated/ambiguous in this object.

You need to resolve the merge by hand:

  • Remove all <<<<<<<, =======, >>>>>>> lines.
  • Split this into clean, separate node objects (e.g., one for the new File component, one for the Language Model) with consistent id values.
  • Ensure edges that currently point to LanguageModelComponent-rL7g3 (and any new File node) reference the correct node IDs.

Until this is fixed the template will fail to parse and won’t be usable.

⛔ Skipped due to learnings
Learnt from: edwinjosechittilappilly
Repo: langflow-ai/langflow PR: 8504
File: src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json:391-393
Timestamp: 2025-06-12T15:25:01.072Z
Learning: The repository owner prefers CodeRabbit not to review or comment on JSON files because they are autogenerated.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 22, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants