Skip to content

Conversation

swalrus1
Copy link
Collaborator

@swalrus1 swalrus1 commented Oct 16, 2025

Changelog entry

Reduce volume of reads for deduplication on CS

Changelog category

  • Performance improvement

Description for reviewers

...

@swalrus1 swalrus1 requested a review from a team as a code owner October 16, 2025 13:21
Copy link

github-actions bot commented Oct 16, 2025

2025-10-16 13:22:29 UTC Pre-commit check linux-x86_64-relwithdebinfo for 2f5707c has started.
2025-10-16 13:22:43 UTC Artifacts will be uploaded here
2025-10-16 13:26:13 UTC ya make is running...
🟡 2025-10-16 14:48:58 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
31267 29870 0 1 1373 23

2025-10-16 14:49:08 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-10-16 15:03:58 UTC Tests successful.

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
585 (only retried tests) 565 0 0 0 20

🟢 2025-10-16 15:04:01 UTC Build successful.
🟢 2025-10-16 15:04:17 UTC ydbd size 2.3 GiB changed* by +6.0 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: 0cf099b merge: 2f5707c diff diff %
ydbd size 2 424 209 616 Bytes 2 424 215 776 Bytes +6.0 KiB +0.000%
ydbd stripped size 516 072 648 Bytes 516 073 352 Bytes +704 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Oct 16, 2025

2025-10-16 13:22:46 UTC Pre-commit check linux-x86_64-release-asan for 2f5707c has started.
2025-10-16 13:22:59 UTC Artifacts will be uploaded here
2025-10-16 13:26:32 UTC ya make is running...
🟡 2025-10-16 15:34:05 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
15481 14968 0 202 293 18

🟢 2025-10-16 15:34:10 UTC Build successful.
🟢 2025-10-16 15:34:32 UTC ydbd size 3.8 GiB changed* by +4.9 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: b84ab08 merge: 2f5707c diff diff %
ydbd size 4 033 345 384 Bytes 4 033 350 400 Bytes +4.9 KiB +0.000%
ydbd stripped size 1 498 040 928 Bytes 1 498 041 632 Bytes +704 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Oct 16, 2025

🟢 2025-10-16 13:30:30 UTC The validation of the Pull Request description is successful.

@swalrus1 swalrus1 enabled auto-merge (squash) October 16, 2025 16:42
@swalrus1 swalrus1 requested a review from Vladilen October 16, 2025 16:42
@Vladilen Vladilen requested a review from Copilot October 17, 2025 08:18
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes data fetching in the columnshard component by reducing the volume of reads during deduplication operations. The changes ensure that only the requested columns are fetched and assembled, rather than reading entire portions.

Key Changes

  • Modified TAskDataStep to correctly apply column filtering based on the presence of ColumnIds
  • Updated TAssembleDataStep to accept and use column IDs for creating filtered schemas
  • Integrated TFilteredSnapshotSchema to limit schema operations to only requested columns

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
ydb/core/tx/columnshard/data_reader/fetching_steps.h Fixed conditional logic in TAskDataStep, added column filtering to TAssembleDataStep via TFilteredSnapshotSchema, and updated constructor to accept column IDs
ydb/core/tx/columnshard/data_reader/fetcher.cpp Updated instantiation of TAssembleDataStep to pass entityIds parameter in both fetching methods

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

private:
std::shared_ptr<NReader::NCommon::TColumnsSetIds> ColumnIds;

private:
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate private: access specifier. The second private: on line 313 is redundant since line 310 already establishes private access. Remove the duplicate specifier.

Suggested change
private:

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

@Vladilen Vladilen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

как это вообще работало?)

@swalrus1 swalrus1 merged commit fdc227e into ydb-platform:main Oct 17, 2025
15 of 16 checks passed
@swalrus1 swalrus1 deleted the fix-column-fetching branch October 17, 2025 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants