Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit for gcsync command line tool #796

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

tonyxiaowei
Copy link
Contributor

Implementation of gcsync command

Fix of an issue in the core rsync algorithm

Replace the use of slice()

#. Use full 256 bits checksum

Unit test for gcsyncClient


public static final String INSTRUCTION_FILE_SUFFIX = "instruction";

public static final String TMP_FILE_SUFFIX = "updated";
Copy link
Collaborator

@misolt misolt Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider adding an enum instead of the *_FILE_SUFFIX constants.


public static final String FILES_TO_RSYNC_FILE_NAME = "filesToRsync.txt";

public static final String JAR_FILE_NAME = "gcsync-all.jar";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and other constants which are only used in one file, I'd move it to that file.

private GcsyncClient clientUnderTest;

// For test simplicity, define a smaller threshold for "rsync" logic
private static final long RSYNC_SIZE_THRESHOLD = 1024; // e.g. 1KB
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the size threshold could be changed to be a parameter of GsyncClient instead of a constant?

This way we can initialize the test version with a different value rather than using different versions of the constant.

copyStart = -1;
copyLength = 0;
}
this.sourceStream = in.openStream();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to move this openStream call outside the constructor?

}

private void flushCopy() throws IOException {
if (copyStart == -1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the value -1 mean for copyStart?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means copyStart is unset

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document this, e.g.

...
// the start index for copy() if >= 0, or -1 if it is unset
@CheckForSigned
private int copyStart = -1
...

Copy link
Collaborator

@shevek-google shevek-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Publishing comments.

@@ -76,7 +76,7 @@ public void generate(Consumer<? super Instruction> out, ByteSource in,
if (cc != null) {
HashCode strongHashCode = rollingChecksum.getStrongHashCode();
for (Checksum c : cc) {
if (c.getStrongChecksum() == strongHashCode.asLong()) {
if (c.getStrongChecksum().equals(strongHashCode.toString())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compute toString() once outside the loop.
Or compare the asBytes() values using Arrays.equals()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

copyLength = 0;
}

private void copy(long copyLength) throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, I think, is ByteStreams.copy()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, instead of setting limit here, I find it easier to just implement it which is pretty straightforward.

}
}
}

private void skip(long length) throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumption here: The fastest way to skip is to read/skip bytes or call skip().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes in the case of GCS. Guava's implementation of SlicedByteSource use skip() too.

@@ -13,7 +13,7 @@ message Checksum {
uint64 blockOffset = 1;
uint32 blockLength = 2;
int32 weakChecksum = 3;
int64 strongChecksum = 4;
string strongChecksum = 4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be safer as bytes - although bytes are really expensive in protobuf, so it might as well be string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it as string.

writer.write(file.getFileName().toString());
writer.newLine();
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files.writeLines()
CharSink.writeLines is probably easier.

Justification for attempting to avoid a temp file is every so often some customer has java.io.tmpdir misconfigured and this breaks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for (Path file : filesToRsync) {
Path tmpCheckSumFile = getTemporaryCheckSumFileName(file);
try (OutputStream instructionSink = gcsStorage.newByteSink(
new URI(tmpBucket).resolve(Util.getInstructionFileName(file.getFileName().toString())))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is getTemporaryChecksumFileName here, but getInstructionFileName in util?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

instruction.writeDelimitedTo(instructionSink);
} catch (IOException e) {
throw new RuntimeException(e);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be tempted to modify generate() to permit the throwing of IOException, and use a custom class instead of Consumer.

interface InstructionConsumer<X extends Exception> { public void accept(Instruction i) throws X; }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

checksum.writeDelimitedTo(bufferedOutputStream);
} catch (IOException e) {
throw new RuntimeException("Failed to generate checksum", e);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again the typed-exception-in-interface trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants