Skip to content

HPCC-34144 Remove code for CRC calculation when writing and reading files #19876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ghalliday
Copy link
Member

@ghalliday ghalliday commented May 16, 2025

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-34144

Jirabot Action Result:
Assigning user: [email protected]
Workflow Transition To: Merge Pending
Updated PR

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes CRC calculation logic from file reading and writing routines in the HPCC codebase, in accordance with the HPCC-34144 issue. The changes include the removal of CRC checks and tallying across various disk I/O activities and related helper functions, while introducing an option to verify file dates.

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
thorlcr/thorutil/thormisc.hpp Added a new option (THOROPT_CHECK_FILE_DATES) for file date verification and a minor comment typo
thorlcr/thorutil/thmem.cpp Updated flush() calls by removing unused parameters
thorlcr/activities/* Removed CRC tallies and related logic in XML read/write, disk read/write, spill, and CSV activities
system/jlib/jfile.hpp & jfile.cpp Revised serial stream creation functions to remove deprecated callback parameters
ecl/hthor/hthor.cpp Removed flush() calls with NULL for CRC
common/thorhelper/thorcommon.* Removed rw_crc flag and adjusted row stream writer interfaces accordingly

@ghalliday ghalliday force-pushed the removecallback branch 2 times, most recently from 482dea8 to aebe1e5 Compare May 16, 2025 11:19
Copy link
Contributor

@dcamper dcamper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. Two minor questions.

}
virtual void reset(offset_t _offset, offset_t _flen)
{
in->reset(_offset, _flen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should the CRC be (somehow) modified/adjusted/discarded when reset() is called? IFileSerialStreamCallback does not support anything like that, but should it?

virtual void skip(size32_t sz) override
{
throwUnexpectedX("Skip called on an input stream that is being tallied");
in->skip(sz);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this line ever be hit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code now deleted.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghalliday - a good clearout. I didn't spot it in review, but it's causing some regression tests to fail.

@@ -138,7 +132,7 @@ class SpillSlaveActivity : public CSlaveActivity
Owned<IFile> ifile = createIFile(fileName.str());
offset_t sz = ifile->size();
mb.append(getDataLinkCount()).append(compress?uncompressedBytesWritten:sz).append(sz);
unsigned crc = compress?~0:fileCRC.get();
unsigned crc = ~0;
mb.append(crc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to get rid of this unused serialization element at the same time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - as part of fixing the thor problems.

@ghalliday ghalliday requested a review from jakesmith May 22, 2025 06:49
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghalliday - looks good.

I didn't understand the relevance of the last commit (Fix issues related to modified time granularity in roxie), but discussed offline.
Maybe worth clarifying in the squashed commit message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants