fix: Serialize metadata for documents in PGVectorStoreComponent #11031

marcioibm · 2025-12-15T19:43:43Z

When ingesting 1 single file in PGVector you get this error:

Error building Component PGVector: (builtins.TypeError) Object of type Properties is not JSON serializable

By serializing the metadata the same way it was fixed in AstraDB (#9777) we fix #10213

Summary by CodeRabbit

Release Notes

Bug Fixes
- Session continuity now preserved when reusing previous messages in conversations
- Improved source attribution tracking for message outputs
New Features
- Unified language model interface with consolidated provider selection, replacing multiple provider-specific fields
- API key field added to Structured Output component
Improvements
- Streamlined component configuration with simplified model and provider selection
- Enhanced message session handling across conversation workflows

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-15T19:44:01Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR updates 16 starter project JSON templates across the Langflow initial setup. Changes focus on three main areas: ChatOutput now preserves incoming Message session IDs via fallback chaining, LanguageModelComponent is refactored to delegate to centralized helper functions (get_llm, update_model_options_in_build_config) instead of inline provider logic, and StructuredOutputComponent adds API key field exposure. All code_hash metadata values updated accordingly.

Changes

Cohort / File(s)	Summary
ChatOutput Session ID & Source Preservation `Basic Prompt Chaining.json`, `Basic Prompting.json`, `Blog Writer.json`, `Custom Component Generator.json`, `Document Q&A.json`, `Financial Report Parser.json`, `Hybrid Search RAG.json`, `Image Sentiment Analysis.json`, `Knowledge Retrieval.json`, `Meeting Summary.json`, `Memory Chatbot.json`, `Research Translation Loop.json`, `SEO Keyword Generator.json`	Updated message_response to capture and preserve existing_session_id from incoming Message, using fallback chain: self.session_id → existing_session_id → graph.session_id → "". Enhanced _build_source to resolve source from objects with model_name/model attributes. Updated code_hash metadata to reflect changes.
LanguageModelComponent Refactor `Basic Prompt Chaining.json`, `Basic Prompting.json`, `Memory Chatbot.json`, `Research Translation Loop.json`	Replaced inline provider-specific logic with delegated helper functions (get_llm, update_model_options_in_build_config, get_language_model_options). In some files, update_build_config signature changed from (build_config: dotdict, field_value: Any, field_name: str \| None = None) → dotdict to (build_config: dict, field_value: str, field_name: str \| None = None) → dict. Consolidated model construction and provider handling via centralized utilities.
StructuredOutputComponent API Key Exposure `Financial Report Parser.json`, `Hybrid Search RAG.json`, `Image Sentiment Analysis.json`	Added api_key field (SecretStrInput) to component template with display_name, security properties, and load_from_db behavior for OpenAI API key input in UI.
LanguageModelComponent Input Rename (llm → model) `Hybrid Search RAG.json`, `Portfolio Website Code Generator.json`	Renamed primary language model input from llm (HandleInput) to model (ModelInput) and added external_options field for connecting alternative models. Updated template metadata (model_type, placeholder, refresh controls, tool_mode).
Agent Component Restructuring `Invoice Summarizer.json`, `Price Deal Finder.json`	Replaced agent_llm dropdown and per-provider fields (model_name, max_output_tokens, max_tokens, temperature, timeout, base_url, openai_api_base, project_id, max_retries, seed) with unified model (ModelInput) input with external_options for provider-aware configuration. Updated API key field label/description. Removed memory inputs.
Enhanced Data Serialization & StructuredOutputComponent Refactor `Financial Report Parser.json`	Added _serialize_data method for JSON-encoding Data.data with orjson, returning Markdown code block. Updated StructuredOutputComponent to use model input (instead of llm) with parameter order (llm, schema, config_dict) and llm.with_structured_output integration.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–60 minutes

Areas requiring extra attention:

LanguageModelComponent refactor across 4 files: Verify that delegation to get_llm and update_model_options_in_build_config correctly handles all provider branches (OpenAI, Anthropic, Google, IBM watsonx.ai, Ollama), especially edge cases around API key validation and Ollama URL normalization.
update_build_config signature changes in Basic Prompting.json and Memory Chatbot.json: Confirm that the transition from (dotdict, Any) → (dict, str) parameters maintains backward compatibility and all callers are updated.
Session ID preservation logic: Test fallback chain (self.session_id → existing_session_id → graph.session_id → "") across different input flows (Message vs. non-Message, chat-input-connected vs. standalone).
Agent component restructuring (Invoice Summarizer.json, Price Deal Finder.json): Validate that the new unified model input with external_options correctly surfaces all necessary provider configuration fields and maintains feature parity with the prior per-provider field approach.
Input rename consistency (llm → model): Ensure all references in node edges, display metadata, and connected workflows correctly use the new model field name.

Possibly related PRs

feat: Add a Unified Model Providers configuration #10565: Involves the same LanguageModelComponent refactor pattern (ModelInput, get_llm, update_model_options_in_build_config, and update_build_config signature changes) across starter project templates.
fix: Image upload for Gemini/Anthropic #10867: Modifies ChatOutput.message_response logic to preserve and reuse existing incoming Message.session_id in the same way.
deps: upgrade altk #10804: Overlaps on modifications to starter-project ChatOutput, _build_source, and LanguageModelComponent template code.

Suggested labels

refactoring, starter-projects

Suggested reviewers

edwinjosechittilappilly
carlosrcoelho
erichare

Pre-merge checks and finishing touches

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Test Quality And Coverage	⚠️ Warning	Pull request introduces PGVectorStoreComponent with metadata serialization fix but contains zero test coverage for the production-blocking bug fix.	Add comprehensive pytest tests in src/lfx/tests/unit/components/pgvector/ validating metadata serialization, document ingestion, search queries, and error handling.
Test File Naming And Structure	⚠️ Warning	PGVectorStoreComponent source code modification adds metadata serialization but lacks corresponding test file following established pytest patterns.	Create test_pgvector_store_component.py inheriting from ComponentTestBaseWithoutClient with metadata serialization test methods matching existing vector store test structure.
Test Coverage For New Implementations	❓ Inconclusive	Test coverage cannot be properly assessed due to critical scope mismatch between PR objective (fix PGVectorStoreComponent metadata serialization) and actual changes (13 starter project JSON files with no visible test files).	Clarify actual scope: confirm PGVectorStoreComponent changes, identify test files included, verify regression tests for issue #10213 exist, and confirm starter project changes are intentional.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix: Serialize metadata for documents in PGVectorStoreComponent' directly addresses the core change: serializing metadata for documents in the PGVectorStoreComponent to fix JSON serialization errors.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Excessive Mock Usage Warning	✅ Passed	The PR exclusively modifies JSON configuration files and does not include any Python test files with mock objects.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/backend/base/langflow/initial_setup/starter_projects/Knowledge Retrieval.json (1)
235-310: Fix invalid isinstance usage with | unions in ChatOutput

In the embedded ChatOutput class, _validate_input uses Message | Data | DataFrame | str (and similar) directly inside isinstance. In CPython this raises TypeError: isinstance() argument 2 cannot be a union at runtime; the checks will never work.

Update the checks to pass a tuple of types instead of a union:
-        if isinstance(self.input_value, list) and not all(
-            isinstance(item, Message | Data | DataFrame | str) for item in self.input_value
-        ):
+        if isinstance(self.input_value, list) and not all(
+            isinstance(item, (Message, Data, DataFrame, str)) for item in self.input_value
+        ):
@@
-        if not isinstance(
-            self.input_value,
-            Message | Data | DataFrame | str | list | Generator | type(None),
-        ):
+        if not isinstance(
+            self.input_value,
+            (Message, Data, DataFrame, str, list, Generator, type(None)),
+        ):
Without this, any non-None input reaching _validate_input will error.
src/backend/base/langflow/initial_setup/starter_projects/Meeting Summary.json (1)

1-3612: Critical mismatch between PR objectives and file content.

The PR objectives state this change is to "fix: Serialize metadata for documents in PGVectorStoreComponent" to resolve issue #10213, but the provided file is a starter project JSON template ("Meeting Summary.json") containing ChatOutput and LanguageModelComponent component updates. There is no PGVectorStoreComponent implementation in this file.

The PR description and the code under review are misaligned. Please clarify:

Is this the correct file for the PGVectorStoreComponent serialization fix?

Should the review focus on starter project template changes instead?

♻️ Duplicate comments (5)

src/backend/base/langflow/initial_setup/starter_projects/SEO Keyword Generator.json (1)

561-636: Same ChatOutput isinstance union bug as in Knowledge Retrieval

This ChatOutput template is identical to the one in Knowledge Retrieval.json and has the same invalid isinstance(... Message | Data | DataFrame | str) usage. Please apply the same tuple-based fix here as well to avoid runtime TypeError.

src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json (2)

405-481: ChatOutput shares the same isinstance union bug

This ChatOutput definition matches the ones already reviewed and carries the same invalid isinstance union usage; please align it with the tuple-based fix described in Knowledge Retrieval.json.

1307-1408: LanguageModelComponent identical to previously approved template

The Language Model component here is the same refactored version already approved in SEO Keyword Generator.json; apply any fixes or future adjustments there consistently here as well.

src/backend/base/langflow/initial_setup/starter_projects/Memory Chatbot.json (2)

424-499: ChatOutput: same isinstance union issue as other starter projects

This ChatOutput template reuses the same _validate_input implementation with Message | Data | DataFrame | str unions in isinstance. Please update to the tuple-of-types form as described in the first file so this template doesn't hit runtime TypeError.

1261-1362: LanguageModelComponent matches the already-reviewed unified implementation

The Language Model component here is the same unified implementation already reviewed and approved; keep it in sync with any fixes (e.g., if you later adjust helper signatures) applied in SEO Keyword Generator.json.

🧹 Nitpick comments (4)

src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (2)

698-804: ChatOutput: session handling, source construction, and validation look solid

The updated ChatOutput implementation is coherent: _build_source safely extracts model identifiers, message_response preserves existing session_id when reusing a Message and falls back cleanly to component/graph IDs, and _validate_input plus convert_to_string cover the expected input shapes without obvious edge‑case breaks. One nit: _serialize_data is currently unused; either wire it into convert_to_string for Data inputs or drop it to avoid dead code.

2698-2925: StructuredOutputComponent: model/api_key wiring is good, but “llm” vs “model” naming may confuse the template

The Structured Output code and template correctly add an api_key field and switch to a ModelInput named model, and the Python component uses get_llm(model=self.model, ...) plus the shared update_model_options_in_build_config helper as expected.

However, in this node:

field_order still lists "llm" instead of "model".

The edge from LanguageModelComponent-aH5Bi to this node still targets fieldName: "llm".

That mismatch between llm (graph wiring / ordering) and model (template + code) can confuse the UI and potentially break automatic connections to the new ModelInput.

I recommend aligning the JSON metadata so everything consistently refers to model (or intentionally removing the now‑unused llm handle) and quickly validating the flow in the UI.

src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (2)

152-230: ChatOutput: consistent improvements; unused serializer is minor clean‑up

This ChatOutput block mirrors the other template: message/session handling, source extraction, and input validation are coherent and should behave well across Data / DataFrame / Message inputs. As before, _serialize_data isn’t used anywhere; consider integrating it for Data inputs or removing it.

1092-1210: StructuredOutputComponent: API key exposure and model selection are wired correctly

The Structured Output component now exposes an api_key field and uses a ModelInput named model, with build_structured_output_base calling get_llm(model=self.model, user_id=self.user_id, api_key=self.api_key) and the build‑config helper. That aligns with the intended provider‑agnostic setup and should support both built‑in and externally connected models (via external_options).

One consistency concern (same as in the other template): the node’s field_order and the edge from LanguageModelComponent-iAML1 still reference "llm" while the template/code use "model". It would be safer to:

Update field_order and edge fieldName to "model", or

Explicitly drop the obsolete llm handle if you intend StructuredOutput to be configured only via model.

Please verify in the UI that the Language Model connection to Structured Output still works as expected after this rename.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53015c1 and b941846.

📒 Files selected for processing (16)

src/backend/base/langflow/initial_setup/starter_projects/Basic Prompt Chaining.json (5 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Basic Prompting.json (3 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Blog Writer.json (3 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Custom Component Generator.json (3 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json (3 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (7 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (8 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json (8 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Invoice Summarizer.json (6 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Knowledge Retrieval.json (2 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Meeting Summary.json (8 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Memory Chatbot.json (3 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (8 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Price Deal Finder.json (6 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json (3 hunks)
src/backend/base/langflow/initial_setup/starter_projects/SEO Keyword Generator.json (3 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: ogabrielluiz
Repo: langflow-ai/langflow PR: 0
File: :0-0
Timestamp: 2025-06-26T19:43:18.260Z
Learning: In langflow custom components, the `module_name` parameter is now propagated through template building functions to add module metadata and code hashes to frontend nodes for better component tracking and debugging.

Learnt from: edwinjosechittilappilly
Repo: langflow-ai/langflow PR: 0
File: :0-0
Timestamp: 2025-08-07T20:23:23.569Z
Learning: The Langflow codebase has an excellent structlog implementation that follows best practices, with proper global configuration, environment-based output formatting, and widespread adoption across components. The main cleanup needed is updating starter project templates and documentation examples that still contain legacy `from loguru import logger` imports.

Learnt from: edwinjosechittilappilly
Repo: langflow-ai/langflow PR: 0
File: :0-0
Timestamp: 2025-08-07T20:23:23.569Z
Learning: Some Langflow starter project files and components still use `from loguru import logger` instead of the centralized structlog logger from `langflow.logging.logger`. These should be updated to ensure consistent structured logging across the entire codebase.

📚 Learning: 2025-11-24T19:46:09.104Z

Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-11-24T19:46:09.104Z
Learning: Backend components should be structured with clear separation of concerns: agents, data processing, embeddings, input/output, models, text processing, prompts, tools, and vector stores

Applied to files:

src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json
src/backend/base/langflow/initial_setup/starter_projects/Custom Component Generator.json
src/backend/base/langflow/initial_setup/starter_projects/Basic Prompt Chaining.json
src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json

📚 Learning: 2025-06-26T19:43:18.260Z

Learnt from: ogabrielluiz
Repo: langflow-ai/langflow PR: 0
File: :0-0
Timestamp: 2025-06-26T19:43:18.260Z
Learning: In langflow custom components, the `module_name` parameter is now propagated through template building functions to add module metadata and code hashes to frontend nodes for better component tracking and debugging.

Applied to files:

src/backend/base/langflow/initial_setup/starter_projects/Custom Component Generator.json

📚 Learning: 2025-08-11T16:52:26.755Z

Learnt from: edwinjosechittilappilly
Repo: langflow-ai/langflow PR: 9336
File: src/backend/base/langflow/base/models/openai_constants.py:29-33
Timestamp: 2025-08-11T16:52:26.755Z
Learning: The "gpt-5-chat-latest" model in the OpenAI models configuration does not support tool calling, so tool_calling should be set to False for this model in src/backend/base/langflow/base/models/openai_constants.py.

Applied to files:

src/backend/base/langflow/initial_setup/starter_projects/Invoice Summarizer.json
src/backend/base/langflow/initial_setup/starter_projects/Basic Prompting.json
src/backend/base/langflow/initial_setup/starter_projects/Price Deal Finder.json

📚 Learning: 2025-11-24T19:46:09.104Z

Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-11-24T19:46:09.104Z
Learning: Applies to src/backend/base/langflow/components/**/*.py : Add new components to the appropriate subdirectory under `src/backend/base/langflow/components/` (agents/, data/, embeddings/, input_output/, models/, processing/, prompts/, tools/, or vectorstores/)

Applied to files:

src/backend/base/langflow/initial_setup/starter_projects/Basic Prompting.json

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (66)

GitHub Check: Run Frontend Tests / Playwright Tests - Shard 50/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 45/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 33/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 32/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 41/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 23/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 17/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 44/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 36/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 46/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 49/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 42/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 48/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 43/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 40/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 47/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 34/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 39/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 35/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 26/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 16/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 27/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 31/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 38/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 37/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 22/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 21/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 20/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 29/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 19/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 28/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 30/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 25/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 12/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 24/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 18/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 15/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 8/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 11/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 9/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 7/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 14/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 13/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 10/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 3/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 4/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 5/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 1/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 2/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 6/50
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
GitHub Check: Run Backend Tests / LFX Tests - Python 3.10
GitHub Check: Lint Backend / Run Mypy (3.12)
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
GitHub Check: Lint Backend / Run Mypy (3.11)
GitHub Check: Lint Backend / Run Mypy (3.13)
GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
GitHub Check: Test Docker Images / Test docker images
GitHub Check: Test Starter Templates
GitHub Check: test-starter-projects
GitHub Check: Optimize new Python code in this PR
GitHub Check: Update Component Index
GitHub Check: Run Ruff Check and Format

🔇 Additional comments (26)

src/backend/base/langflow/initial_setup/starter_projects/Invoice Summarizer.json (2)
1-1: ⚠️ Critical Scope Mismatch: PR Description vs. File Content

The PR objectives describe a fix for "PGVectorStoreComponent" metadata serialization, but this file is a starter project template with no reference to PGVectorComponent. The actual changes involve refactoring ChatOutput session handling and Agent model/provider wiring. Please confirm:

Is this the correct file for the PR?

Does the PR scope include both PGVectorStoreComponent fixes and starter project updates?

Should the Invoice Summarizer updates be in a separate PR?

This mismatch needs clarification before merging.

308-308: ChatOutput session_id Fallback Chaining Implementation

The ChatOutput code now preserves incoming Message session_id via fallback logic (line 383):
message.session_id = (
    self.session_id or existing_session_id or (self.graph.session_id if hasattr(self, "graph") else None) or ""
)
This aligns with the learnings about session_id preservation. Verify that the fallback order is correct: explicit input > existing message > graph session > empty string.

Also applies to: 383-383
src/backend/base/langflow/initial_setup/starter_projects/Basic Prompt Chaining.json (4)

631-631: Verify ChatOutput session_id preservation logic and implementation correctness.

The refactored message_response() method now implements a fallback chain for session_id: self.session_id → existing_session_id → graph.session_id → "". While the intent (preserving incoming Message session IDs) aligns with the summary, verify that:

The existing_session_id capture (line with existing_session_id = message.session_id) only occurs when the input is a Message and not connected to a chat input—this logic appears correct but confirm is_connected_to_chat_input() exists and works as expected.

All error cases in _validate_input() and convert_to_string() are handled properly, especially the Generator type check.

The _serialize_data() method correctly handles edge cases (e.g., null/empty Data objects).

Also applies to: 705-705

1261-1261: Verify LanguageModelComponent centralized helper functions exist and match the refactored code.

All three LanguageModelComponent instances now delegate to centralized helpers (get_llm, update_model_options_in_build_config, get_language_model_options) from lfx.base.models.unified_models. Confirm:

These helper functions exist in the codebase and have the expected signatures.

The get_llm() call with parameters (model, user_id, api_key, temperature, stream) is correct—verify the model parameter type and how it differs from the old provider/model_name split.

The update_build_config() delegation is complete and does not lose any prior functionality.

The import of LCModelComponent base class is correct and the inheritance is compatible.

Also applies to: 1583-1583, 1904-1904

753-753: Verify input field type definition changes.

The input_value field's _input_type is set to "MessageInput" (not explicitly shown in old code, assumed changed). Confirm this matches the actual component input definition and that the component's MessageInput class is compatible with the template field type.

Also applies to: 1264-1264, 1586-1586, 1907-1907

1-10: This file is newly created as part of an automated repository setup commit ([autofix.ci] apply automated fixes), not a feature PR. There is no stated PR objective about PGVectorStoreComponent. The file contains a valid starter project template and requires no changes.

Likely an incorrect or invalid review comment.

src/backend/base/langflow/initial_setup/starter_projects/SEO Keyword Generator.json (1)

946-963: LanguageModelComponent refactor to shared helpers looks sound

The Language Model component now cleanly delegates model construction and option updates to get_llm / update_model_options_in_build_config, with ModelInput driving provider/model selection. Inputs and helper usage are consistent with the unified-models API; no issues spotted.

src/backend/base/langflow/initial_setup/starter_projects/Custom Component Generator.json (1)

1-2816: Inconsistency between PR objectives and provided files.

The PR objectives state the fix addresses "Serialize metadata for documents in PGVectorStoreComponent" and resolves issue #10213 regarding JSON serialization of Properties objects. However, the provided file is a starter project template (Custom Component Generator.json) that contains changes to ChatOutput (line 2230 code_hash) and LanguageModelComponent (line 2618 code_hash), with no PGVectorStoreComponent present.

The AI-generated summary also describes updates to ChatOutput and LanguageModelComponent refactoring, not vector store metadata serialization. This suggests either:

The wrong files were provided for review, or

The PR scope includes updating starter project templates to reflect changes to these components

Please clarify: Are these starter project updates intentional as part of this PR, or should the actual PGVectorStoreComponent implementation files be provided for review?

src/backend/base/langflow/initial_setup/starter_projects/Basic Prompting.json (1)

583-583: Remove verification request for code_hash matching against src/backend/base/langflow/components/.

The code hashes in the starter project JSON files are consistent across all templates (ChatOutput: "8c87e536cca4", LanguageModel: "bb5f8714781b"), but these components are imported from the external lfx library (v0.2.0), not from src/backend/base/langflow/components/. The starter projects import ChatOutput from lfx.components.input_output and use lfx-based language model components, so these code hashes reference external component versions, not internal langflow implementations.

If verifying code synchronization is necessary, it should target the lfx library dependencies, not langflow's own component directory. Otherwise, the consistent hash values across all starter project templates are already correct.

Likely an incorrect or invalid review comment.

src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json (2)

486-486: ChatOutput code refactoring with session_id preservation logic.

The updated implementation now preserves an incoming Message's session_id through a fallback chain (self.session_id → existing_session_id → graph.session_id → ""). The _build_source method adds logic to extract source properties (model_name or model attribute) from the source object.

While the logic appears sound, verify that:

The session_id fallback chain aligns with the intended behavior across all starter projects

The _build_source method handles all expected source object types without raising AttributeError

This change is compatible with existing flows that may depend on the previous session_id assignment logic

982-982: LanguageModelComponent refactored to use unified model helper functions.

The code has been substantially refactored to delegate to centralized helpers (get_llm, update_model_options_in_build_config) instead of inline provider logic. The input definitions have changed—notably the model field is now a ModelInput (instead of separate provider/model_name), suggesting a more unified provider/model selection UI.

Key considerations:

Ensure backward compatibility if existing flows reference the old "provider" and "model_name" fields

Verify that get_llm and update_model_options_in_build_config are available and stable in the lfx.base.models.unified_models module

Confirm that the removal of inline provider logic (OpenAI, Anthropic, etc.) doesn't break any custom or edge-case configurations
src/backend/base/langflow/initial_setup/starter_projects/Blog Writer.json (3)
476-476: Consistency check: ChatOutput changes are identical across starter projects.

The ChatOutput code_hash ("8c87e536cca4") and implementation (including session_id preservation and _build_source logic) are identical in both Document Q&A.json and Blog Writer.json. This indicates a coordinated, consistent refactor across starter projects.

Positive observation: The consistent application of the refactoring reduces the risk of divergence between starter templates.

Also applies to: 550-550

1457-1457: LanguageModelComponent refactoring is consistent across starter projects.

The LanguageModelComponent code refactoring (lines 1457 in this file, line 982 in Document Q&A.json) is identical, confirming a coordinated update across all affected starter projects. The shift to unified model selection (ModelInput, get_llm, update_model_options_in_build_config) appears intentional and widespread.

However, verify that this refactoring does not introduce breaking changes for users with existing flows that may have custom configurations or hardcoded references to the old provider/model_name structure.

1-1: The PR does include the PGVectorStoreComponent with metadata serialization implementation.

PGVectorStoreComponent is located in src/lfx/src/lfx/components/pgvector/pgvector.py (not in src/backend/base/langflow/components/). Line 40 shows the metadata serialization fix:
documents[-1].metadata = serialize(documents[-1].metadata, to_str=True)
The PR objectives are met—metadata for documents in PGVectorStoreComponent is properly serialized. The implementation follows the same pattern as other vector store components (e.g., AstraDB vectorstore).

Likely an incorrect or invalid review comment.
src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (4)

405-406: Session ID preservation logic looks solid.

The ChatOutput refactor correctly chains session ID fallbacks (component input → incoming message → graph session) with a safe empty string default. The explicit preservation of existing_session_id from incoming Message objects avoids losing user-set context when reusing messages. Implementation includes proper validation and error handling.

1531-1532: LanguageModelComponent refactor to ModelInput is consistent and well-structured.

Both instances use the same refactored code, delegating provider/model logic to centralized get_llm() and update_model_options_in_build_config() helpers. This reduces duplication and makes the component intent clearer. Fallback inputs for provider-specific config (API keys, base URLs) are retained, ensuring backward compatibility with different LLM providers.

Also applies to: 1858-1859

2159-2178: StructuredOutputComponent API key exposure and ModelInput migration are appropriate.

Exposing api_key as an explicit advanced input provides flexibility for per-component API key overrides while load_from_db=true ensures good user experience. The external_options configuration in the model field correctly hints at the connection UI for linking other models. Refactor to use ModelInput and centralized helpers (get_llm, update_model_options_in_build_config) mirrors the LanguageModelComponent pattern, reducing code duplication.

Also applies to: 2222-2257

1-10: Clarify PR scope: file appears unrelated to stated PGVectorStoreComponent metadata serialization fix.

The PR title references fixing PGVectorStoreComponent metadata serialization (issue #10213), but this file is a starter project template for "Portfolio Website Code Generator" containing ChatOutput, LanguageModelComponent, and StructuredOutputComponent. The changes here align with the AI summary (session ID preservation, ModelInput refactoring, API key exposure) rather than PGVectorStoreComponent serialization. Verify this is intentional or confirm the correct file is under review.

src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1)

1166-1472: LanguageModelComponent refactor appears correct and provider‑agnostic

The new LanguageModelComponent code that delegates to get_llm and update_model_options_in_build_config is internally consistent: inputs are defined once, build_model simply returns the unified model instance, and update_build_config defers to the shared helper with a clear cache key. I don’t see correctness or wiring issues in this block.

src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (1)

769-875: LanguageModelComponent: unified model provisioning looks correct

The refactored LanguageModelComponent here is the same as in the other starter: build_model delegates to get_llm with the right parameters, and update_build_config uses update_model_options_in_build_config with a clear cache key. The inputs defined in the template match what the Python class expects.

src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json (3)

509-509: Enhanced ChatOutput component with proper metadata serialization and session_id preservation.

The ChatOutput code has been updated to include:

Import of jsonable_encoder from FastAPI for JSON-serializable metadata handling

New _serialize_data() method for proper Data object serialization using orjson

New _validate_input() method for type validation

Session ID preservation logic that uses existing message session_id as a fallback

These changes align with the PR objective of serializing metadata properly. The fallback chaining for session_id (self.session_id or existing_session_id or ...) ensures session continuity across message transformations.

Also applies to: 583-583

1229-1229: LanguageModelComponent refactored to use centralized provider-agnostic helpers.

The component now:

Delegates LLM instantiation to get_llm() instead of inline provider logic

Uses update_model_options_in_build_config() for dynamic model filtering

Replaces legacy per-provider fields with a unified ModelInput (name="model")

Imports from lfx.base.models.unified_models for centralized configuration

This refactor reduces code duplication and improves maintainability by centralizing provider selection logic. Verify that the get_llm() function properly handles the new model parameter signature and all provider types.

Also applies to: 1230-1230, 1250-1290

1827-1846: StructuredOutputComponent adds API key exposure and unified model selection via ModelInput.

New additions:

api_key field (lines 1827-1846): A SecretStrInput allowing provider-specific API key configuration (advanced, optional)

model field (lines 1890-1926): A ModelInput with external_options providing a unified provider selection UI with "Connect other models" option

The component now mirrors LanguageModelComponent's provider-agnostic pattern. Ensure get_llm() in the component code (line 1863) correctly uses self.model (not legacy self.llm or self.agent_llm).

Also applies to: 1890-1926

src/backend/base/langflow/initial_setup/starter_projects/Price Deal Finder.json (3)

419-419: ChatOutput code consistent with Image Sentiment Analysis template.

The ChatOutput component has identical code and code_hash ("8c87e536cca4") to the Image Sentiment Analysis starter project, confirming consistent implementation across templates.

Also applies to: 495-495

1606-1606: Agent component refactored with provider-agnostic model selection and updated dependencies.

Changes include:

Code hash updated to "1834a4d901fa" reflecting significant refactoring

New dependency: langchain_core version 0.3.80 (lines 1618-1620)

API key field display name changed from "OpenAI API Key" to generic "API Key" (line 1696)

API key info text updated to "Model Provider API key" (line 1698)

These changes prepare the component for the new unified model provider pattern. However, verify that all internal references to legacy fields (e.g., agent_llm, OpenAI-specific configuration) have been fully replaced in the refactored code (line 1727).

Also applies to: 1616-1620, 1696-1696, 1698-1698

1836-1872: Agent component adds ModelInput field with external_options for provider selection.

The new model field (lines 1836-1872) provides:

ModelInput type with model_type: "language" and refresh_button: true

external_options enabling "Connect other models" option in the UI

Placeholder text "Setup Provider" to guide users

This unifies the Agent's model selection interface with other components. Verify that the Agent's update_build_config() method (should be in code section at line 1727) properly filters for tool-calling-capable models when updating the model field.

coderabbitai · 2025-12-15T19:57:55Z

src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json

            "lf_version": "1.4.3",
            "metadata": {
-              "code_hash": "cae45e2d53f6",
+              "code_hash": "8c87e536cca4",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for PGVectorStoreComponent in the repository find . -type f -name "*.py" -o -name "*.json" | head -100 rg -l "PGVectorStoreComponent" --type py

Repository: langflow-ai/langflow

Length of output: 6055

🏁 Script executed:

# Check if there are any recent changes related to PGVectorStore rg "PGVectorStore" --type py -A 2 -B 2 | head -100

Repository: langflow-ai/langflow

Length of output: 1215

🏁 Script executed:

# Look for the document Q&A starter project file and check its structure cat -n src/backend/base/langflow/initial_setup/starter_projects/Document\ Q\&A.json | head -50

Repository: langflow-ai/langflow

Length of output: 2121

🏁 Script executed:

# Check for any references to PGVector in the starter projects directory rg -i "pgvector" src/backend/base/langflow/initial_setup/starter_projects/

Repository: langflow-ai/langflow

Length of output: 46

🏁 Script executed:

# Get context around line 412 (ChatOutput code_hash) in Document Q&A.json cat -n "src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json" | sed -n '405,420p'

Repository: langflow-ai/langflow

Length of output: 662

🏁 Script executed:

# Get context around line 486 (ChatOutput code) in Document Q&A.json cat -n "src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json" | sed -n '480,495p'

Repository: langflow-ai/langflow

Length of output: 8447

🏁 Script executed:

# Get context around line 982 (LanguageModelComponent code) in Document Q&A.json cat -n "src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json" | sed -n '975,990p'

Repository: langflow-ai/langflow

Length of output: 4801

🏁 Script executed:

# Search for any git diff or metadata about what files were changed git status 2>/dev/null || echo "Not a git repo or git unavailable"

Repository: langflow-ai/langflow

Length of output: 195

🏁 Script executed:

# Check if there are any other starter project files that reference PGVector rg -i "pgvector" . --type json

Repository: langflow-ai/langflow

Length of output: 50380

Clarify scope mismatch: PGVectorStoreComponent changes vs. Document Q&A starter project updates.

The PR description states the goal is to "serialize metadata for documents in PGVectorStoreComponent," but the Document Q&A.json starter project file contains updates to ChatOutput and LanguageModelComponent components. PGVectorStoreComponent resides in a separate location (src/lfx/src/lfx/components/pgvector/pgvector.py) and is not present in this starter project template. The changes shown (code_hash and code implementations for ChatOutput and LanguageModelComponent) are unrelated to the stated PR objectives.

Confirm whether these changes are in scope for the PR or if PGVectorStoreComponent modifications are being reviewed separately.

🤖 Prompt for AI Agents

In src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json around line 412, the file shows edits to ChatOutput and LanguageModelComponent (code_hash changes) that are unrelated to the PR goal of serializing metadata in src/lfx/src/lfx/components/pgvector/pgvector.py; either remove or revert the unrelated starter-project modifications from this JSON (restore the previous code_hash and component code) so the PR only contains PGVectorStoreComponent changes, or update the PR description to explicitly include and justify these starter-project edits; ensure PGVectorStoreComponent edits remain in src/lfx/.../pgvector.py and are the only functional changes tied to the stated objective.

fix: Serialize metadata for documents in PGVectorStoreComponent

5e2a53e

github-actions bot added the community Pull Request from an external contributor label Dec 15, 2025

github-actions bot added the bug Something isn't working label Dec 15, 2025

fix: reordering import to keep imports sorted

e58e115

github-actions bot added bug Something isn't working and removed bug Something isn't working labels Dec 15, 2025

[autofix.ci] apply automated fixes

d19c693

github-actions bot added bug Something isn't working and removed bug Something isn't working labels Dec 15, 2025

[autofix.ci] apply automated fixes (attempt 2/3)

b941846

github-actions bot added bug Something isn't working and removed bug Something isn't working labels Dec 15, 2025

[autofix.ci] apply automated fixes (attempt 3/3)

ddcb7e0

github-actions bot added bug Something isn't working and removed bug Something isn't working labels Dec 15, 2025

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Serialize metadata for documents in PGVectorStoreComponent #11031

fix: Serialize metadata for documents in PGVectorStoreComponent #11031

Uh oh!

marcioibm commented Dec 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 15, 2025 •

edited

Loading

Review skipped

Pre-merge checks failed

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: Serialize metadata for documents in PGVectorStoreComponent #11031

Are you sure you want to change the base?

fix: Serialize metadata for documents in PGVectorStoreComponent #11031

Uh oh!

Conversation

marcioibm commented Dec 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Pre-merge checks failed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcioibm commented Dec 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 15, 2025 •

edited

Loading