Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flush behavior during packing #33

Open
strssndktn opened this issue Jul 8, 2020 · 3 comments
Open

flush behavior during packing #33

strssndktn opened this issue Jul 8, 2020 · 3 comments

Comments

@strssndktn
Copy link

strssndktn commented Jul 8, 2020

Two issues popped up in transit-clj which are better recorded here:

In essence:

The flushwriter flushes down the data from the output channel after serializing every single element of a data structure, causing a performance drop because the process will need to wait until the operating system forced every single newly serialized bit of data out onto the disk (or in the example of other streams, fragment it somehow on the network level).

I don't think the flush there is necessary but should move only to the end of the whole serialization process in write.

I would be happy to help or provide patches, just let me know. Thanks! :)

@puredanger
Copy link
Contributor

Flushing is always a balancing act between efficiency and immediacy. As such I'm not sure there is one right answer to this question. This really needs a top-down design process to consider what level of control is possible and useful in different use cases.

@strssndktn
Copy link
Author

Flushing is always a balancing act between efficiency and immediacy. As such I'm not sure there is one right answer to this question. This really needs a top-down design process to consider what level of control is possible and useful in different use cases.

Yes, makes sense, there is a trade-off.

Yesterday I sketched up a small test to see what the current behaviour looks like:

    public void testFlushBehavior() {
    	int flushes[] = {0};
    	Map testData = Map.of(
    			Arrays.asList("Key1", "Key2"), Arrays.asList("Value1", "Value2"),
    			Arrays.asList("Key3", "Key4"), Arrays.asList("Value3", "Value4"));
    	ByteArrayOutputStream out = new ByteArrayOutputStream() {
    		public void flush() throws IOException {
    			super.flush();
    			flushes[0]++;
    		};
    	};
    	
    	for (TransitFactory.Format format : TransitFactory.Format.values()) {
    		Writer writer = TransitFactory.writer(format, out);
    		writer.write(testData);
    		System.err.format("Number of flushes for format %s: %d\n", format, flushes[0]);
    		out.reset();
    		flushes[0] = 0;
    	}
    }

The result with an otherwise pristine transit-java source tree is:

Number of flushes for format JSON: 15
Number of flushes for format MSGPACK: 1
Number of flushes for format JSON_VERBOSE: 15

It looks to me that the current behaviour is different for MSGPACK and JSON and indeed looks good for MSGPACK. The reason for the different behaviour is that org.msgpack.io.StreamOutput swallows the flush. It has an empty function body.

The reason I am a bit concerned about the amount of flushes for the JSON writer is that I worry about wear leveling on SSDs, as the operating system forces all data to disk and blocks for FileOutputStreams. Same for unnecessarily splitting up data units in network protocols.

Thinking about policy, do you think it makes sense to introduce some kind of policy argument during the construction of the TransitWriters? I see use cases for both behaviours but find the current one a bit surprising and inconsistent. ;)

@tonsky
Copy link

tonsky commented Jul 21, 2023

@puredanger how about flushing only the boundary with user code? When I call transit/write I don’t expect any intermediate flushes, at most one at the end.

The best would probably be not to do any flushes at all and leave them all to the user, but this change can be backwards-incompatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants