Skip to content

[Explore vis]add data transformation panel#11944

Merged
ruanyl merged 8 commits into
opensearch-project:mainfrom
Qxisylolo:feat/add_data_transformation
May 26, 2026
Merged

[Explore vis]add data transformation panel#11944
ruanyl merged 8 commits into
opensearch-project:mainfrom
Qxisylolo:feat/add_data_transformation

Conversation

@Qxisylolo

Copy link
Copy Markdown
Collaborator

Description

This pr adds data transformation panel:

Data Transformation is a new feature added to Explore visualization editor that allows users to reshape query results before they reach the visualization layer—without changing the underlying query.

Transformations are applied through a user-defined pipeline, where each step builds on the output of the previous one. This creates a flexible, composable workflow for manipulating data directly within the editor.

RFC:
#11928

Issues Resolved

Screenshot

Testing the changes

Check List

  • All tests pass
    • yarn test:jest
    • yarn test:jest_integration
  • New functionality includes testing.
  • New functionality has been documented.
  • Commits are signed per the DCO using --signoff

@Qxisylolo Qxisylolo force-pushed the feat/add_data_transformation branch from aa750d5 to 84b4f1c Compare May 21, 2026 03:37
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 84b4f1c

Qxisylolo added 4 commits May 22, 2026 15:23
Signed-off-by: Qxisylolo <qianxisy@amazon.com>
Signed-off-by: Qxisylolo <qianxisy@amazon.com>
Signed-off-by: Qxisylolo <qianxisy@amazon.com>
Signed-off-by: Qxisylolo <qianxisy@amazon.com>
@Qxisylolo Qxisylolo force-pushed the feat/add_data_transformation branch from 84b4f1c to af57f2c Compare May 22, 2026 07:24
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit af57f2c

@Qxisylolo Qxisylolo force-pushed the feat/add_data_transformation branch from af57f2c to b66561a Compare May 22, 2026 08:56
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit b66561a

Signed-off-by: Qxisylolo <qianxisy@amazon.com>
@Qxisylolo Qxisylolo force-pushed the feat/add_data_transformation branch from b66561a to ec60b3e Compare May 22, 2026 10:19
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit ec60b3e

const pipeline = transformationService.pipeline$.getValue();
const serializedPipeline = pipeline.map((instance) => ({
definitionId: instance.definition_id,
instanceId: instance.instance_id,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need to persist instance_id? It seems it's purely a runtime identifier

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we don't need to save it, will delete it

splitField: visConfig?.splitField,
splitLayout: visConfig?.splitLayout,
showSplitLabel: visConfig?.showSplitLabel,
dataTransformationJSON: JSON.stringify(serializedPipeline),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why does it need JSON.stringify() for serializedPipeline?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t need to double-stringify this, I assume it is the leftover code from the initial changes.

},
[services, dispatch]
);
}, [visualizationBuilder, results, pipeline]);

@ruanyl ruanyl May 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VisualizationContainer should not known the exists of pipeline, better if we can decouple transformation pipeline from VisualizationContainer.

I guess you added pipeline to deps array because you want it to trigger handleData to apply data transformations. But I think this should be VisualizationBuilder internal logic.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, VisualizationBuilder will do the subscription.

Signed-off-by: Qxisylolo <qianxisy@amazon.com>
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 1fe50fd

@Hailong-am Hailong-am left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Comments

1. Mutation inside applyPipeline (transformation_service.ts:148)

if (cleaned !== instance.config) {
  instance.config = cleaned;  // mutates the object inside the BehaviorSubject
  pipelineChanged = true;
}

This directly mutates an object that's currently held by pipeline$. While it won't cause an infinite loop (the debounced re-invocation self-terminates because the cleaned config becomes stable), it violates the otherwise-immutable contract established by updateTransformationConfig, removeTransformation, etc. (which all create new objects).

The practical risk: any code that captured a reference to pipeline$.getValue() before applyPipeline ran would see its instances mutated in place unexpectedly.

Suggested fix — use an index-based loop and clone the instance:

let pipelineChanged = false;
const updatedInstances = [...instances];

for (let i = 0; i < updatedInstances.length; i++) {
  let instance = updatedInstances[i];
  schemaMap.set(instance.instance_id, currentSchema);

  if (instance.validateConfig) {
    const cleaned = instance.validateConfig(instance.config, currentSchema);
    if (cleaned !== instance.config) {
      instance = { ...instance, config: cleaned };
      updatedInstances[i] = instance;
      pipelineChanged = true;
    }
  }
  // ...rest of loop
}

if (pipelineChanged) {
  this.pipeline$.next(updatedInstances);
}

2. Unused transformationService prop on VisualizationContainer

In visualization_editor_bottom_left_container.tsx:

return <VisualizationContainer transformationService={transformServices} />;

But VisualizationContainer is defined with React.memo() and uses hooks internally — it never declares or consumes a transformationService prop. This prop is silently ignored (React doesn't error on extra props). It should be removed to avoid confusion.

Hailong-am
Hailong-am previously approved these changes May 25, 2026
* transformation catalog management
*/
registerDefinition<TConfig>(definition: TransformationDefinition<TConfig>): void {
this.definitions.set(definition.id, definition as TransformationDefinition);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if id is duplicate, should we log a warning message here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion,fixed

ruanyl
ruanyl previously approved these changes May 25, 2026
Signed-off-by: Qxisylolo <qianxisy@amazon.com>
@Qxisylolo Qxisylolo dismissed stale reviews from ruanyl and Hailong-am via 13501e0 May 25, 2026 09:35
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a6f9deb

2 similar comments
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a6f9deb

@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a6f9deb

@ruanyl ruanyl merged commit 9fe2e21 into opensearch-project:main May 26, 2026
152 of 154 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants