Make RandomAccess.Read|Write methods work with non-seekable files #96711

adamsitnik · 2024-01-09T19:48:10Z

…nges required)

ghost · 2024-01-09T19:48:21Z

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Author:	adamsitnik
Assignees:	adamsitnik
Labels:	`area-System.IO`
Milestone:	-

…ection

jozkee · 2024-01-23T01:14:36Z

src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/RandomAccess/NonSeekable.cs

+                ex = await Assert.ThrowsAsync<TaskCanceledException>(() => RandomAccess.ReadAsync(writeHandle, GenerateVectors(1, 1), 0, token).AsTask());
+                Assert.Equal(token, ex.CancellationToken);
+                ex = await Assert.ThrowsAsync<TaskCanceledException>(() => RandomAccess.WriteAsync(writeHandle, GenerateReadOnlyVectors(1, 1), 0, token).AsTask());
+                Assert.Equal(token, ex.CancellationToken);


Can't you avoid the duplicated calls by doing sth like this?

Task t = RandomAccess.WriteAsync(writeHandle, GenerateReadOnlyVectors(1, 1), 0, token).AsTask(); Assert.True(t.IsCanceled); ex = await Assert.ThrowsAsync<TaskCanceledException>(() => t); Assert.Equal(token, ex.CancellationToken);

jozkee · 2024-01-23T01:26:58Z

src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/RandomAccess/NonSeekable.cs

+            int bytesRead = 0;
+            do
+            {
+                bytesRead += RandomAccess.Read(readHandle, buffer.Slice(bytesRead), fileOffset: 0); // fileOffset NOT set to bytesRead on purpose


Would it be good to throw or break out of the loop if RandomAccess.Read returns 0 to avoid a potential infinite loop?

jozkee · 2024-01-23T01:33:00Z

src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/RandomAccess/NonSeekable.cs

+
+                ReadExactly(readHandle, buffer, content.Length); // what is required for the above write to succeed
+
+                Assert.Equal(content, buffer.AsSpan(0, content.Length).ToArray());


I thought we always used AssertExtensions.SequenceEqual for array comparisons.

jozkee · 2024-01-23T01:50:35Z

src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/RandomAccess/NonSeekable.cs

+                byte[] buffer = new byte[content.Length * 2];
+                int readFromOffset456 = RandomAccess.Read(readHandle, buffer, fileOffset: 456);
+
+                Assert.InRange(readFromOffset456, 1, content.Length);


Is the InRange used because it is not right to assume that Read will fill the whole buffer?

jozkee · 2024-01-23T01:54:08Z

src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/RandomAccess/NonSeekable.cs

+            {
+                byte[] content = RandomNumberGenerator.GetBytes(BufferSize);
+                Task writeToOffset123 = RandomAccess.WriteAsync(writeHandle, content, fileOffset: 123).AsTask();
+                byte[] buffer = new byte[content.Length * 2];


Why is read buffer's size double the size of content?

jozkee · 2024-01-23T01:57:07Z

src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/RandomAccess/NonSeekable.cs

+                readOnlyVectors.SelectMany(vector => vector.ToArray()).Take(byteCount),
+                writableVectors.SelectMany(vector => vector.ToArray()).Take(byteCount));
+    }
+}


I think you are missing tests for Get/SetLength.

danmoseley · 2024-01-23T03:12:14Z

src/libraries/System.Private.CoreLib/src/System/IO/RandomAccess.Unix.cs

+        {
+            // We need to fallback to the non-offset version for certain file types
+            // e.g: character devices (such as /dev/tty), pipes, and sockets.
+            Interop.ErrorInfo errorInfo = Interop.Sys.GetLastErrorInfo();


Would it be easy for a future refactor to introduce something that resets last error before this is called? Maybe last error should be passed in so it's visible.

tmds · 2024-01-23T06:10:45Z

I assume this PR is about bringing the capability to pass multiple buffers to a read/write with a non-seekable handle?

I think it would make sense to add new methods that don't take an offset argument.

Rather than try to handle it for the user, I think we can expect the user to not use non -seekable handles with methods that take an offset. We don't need to try to detect and handle it.

@adamsitnik what do you think?

adamsitnik · 2024-01-23T18:18:01Z

I assume this PR is about bringing the capability to pass multiple buffers to a read/write with a non-seekable handle?

The multiple buffers is a secondary goal, the main one is to allow users to read/write to/from non-seekable files by using only SafeFileHandle.

I think it would make sense to add new methods that don't take an offset argument.

Yes and no. The main blocker to me is... the name of the type: RandomAccess, as what you describe is more of a SequentialAccess. When we were introduced this API we have discussed other names, the main problem was that all best names were already taken: File, FileAccess. Adding them to SafeFileHandle was not received well.

So we have 3 options right now:

Relax the requirements for RandomAccess (this PR) and stop throwing for non-seekable files.
Introduce non-offset based to RandomAccess type
Introduce a new type, SequentialAccess and move the new methods there.

@stephentoub @jozkee what are your preferences here?

stephentoub · 2024-01-23T18:52:32Z

@stephentoub @jozkee what are your preferences here?

I still prefer (1).

We already found it valuable to be able to do that internally, which is why the single buffer methods already support this for internal use.

tmds · 2024-01-23T20:00:19Z

We already found it valuable to be able to do that internally, which is why the single buffer methods already support this for internal use.

I think the FileStream case is a special once since it is meant to work with both seekable and non-seekable handles.

The typical user of this API will have a handle that he means to use in a seekable way or not. I prefer (2) because if he does mean to use it in a seekable way, that is clear from the API he is using.

What I dislike about (1) is that the API takes an offset parameter and if the handle turns out to be non-seekable, it is just ignored.

jozkee · 2024-01-23T20:26:46Z

What I dislike about (1) is that the API takes an offset parameter and if the handle turns out to be non-seekable, it is just ignored.

Could a middle-ground be to throw if offset is != 0 on non-seekable handles?

tmds · 2024-01-23T22:04:55Z

src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/RandomAccess/NonSeekable.cs

+            await Task.WhenAll(server.WaitForConnectionAsync(), client.ConnectAsync());
+
+            bool isAsync = (PipeOptions & PipeOptions.Asynchronous) != 0;
+            return (GetFileHandle(server, isAsync), GetFileHandle(client, isAsync));


On Unix, the handles are set to non-blocking when they are used in an async operation.
Because these handles weren't used yet, the handles are still blocking when PipeOptions.Asynchronous is set in RandomAccess_NonSeekable_AsyncHandles.

If the handles were non-blocking, some of the reads/writes should fail with EAGAIN.

When making the handles non-blocking by adding:

if (isAsync) { await client.WriteAsync(new byte[1]); await server.ReadAsync(new byte[1]); }

6 of the RandomAccess_NonSeekable_AsyncHandles tests fail with:

Error Message: System.IO.IOException : The process cannot access the file because it is being used by another process.

This is the EAGAIN that is surfacing.

This is the EAGAIN that is surfacing.

Great catch! In such case, we should most likely do a bigger refactor and use the epoll code path here

We had some discussions in the past of making the epoll path more generic and usable elsewhere.
So far we've avoided the effort and used it as-is (like in #44647).

stephentoub · 2024-01-23T22:35:25Z

What I dislike about (1) is that the API takes an offset parameter and if the handle turns out to be non-seekable, it is just ignored.

Could a middle-ground be to throw if offset is != 0 on non-seekable handles?

That's reasonable.

adamsitnik · 2024-01-24T16:43:05Z

Could a middle-ground be to throw if offset is != 0 on non-seekable handles?

The problem is that sometimes you don't expect that given file is non-seekable (you get a path and you don't know what it is exactly) and following code could fail:

int index = 0;
int count = requestedSize;
byte[] bytes = new byte[count];
while (count > 0)
{
    int n = RandomAccess.Read(sfh, bytes.AsSpan(index, count), index);
    
    index += n;
    count -= n;
}

tmds · 2024-01-24T16:58:12Z

the main one is to allow users to read/write to/from non-seekable files by using only SafeFileHandle.

Perhaps the situation is different on Windows, but in contrast to using RandomAccess instead of FileStream, it's less obvious for me what the expected gains are for example when using it with a socket instead of using the Socket class directly.

stephentoub · 2024-06-22T20:13:31Z

the main one is to allow users to read/write to/from non-seekable files by using only SafeFileHandle.

Perhaps the situation is different on Windows, but in contrast to using RandomAccess instead of FileStream, it's less obvious for me what the expected gains are for example when using it with a socket instead of using the Socket class directly.

From my perspective:

It's not obvious you can use Socket with things other than sockets.
You shouldn't need to construct a class just to perform an operation on the SafeHandle; these static APIs will let you do that.
If you don't care about offset, then it shouldn't matter whether it's ignored or not.
You shouldn't need to know the seekable-ness of the file descriptor to know whether you can use these APIs.
If we were to add new methods without offsets, I would exect them to behave the same as if a default 0 offset were supplied, which means the existing ones shouldn't fail for such cases, which means we also wouldn't need the new overloads (for usability we could make the existing parameters optional).

tmds · 2024-06-22T21:39:51Z

It's not obvious you can use Socket with things other than sockets.

I didn't mean to suggest that Socket should be used for everything. If you are dealing with a pipe, you can use an AnonymousPipeClientStream.

You shouldn't need to construct a class just to perform an operation on the SafeHandle; these static APIs will let you do that.

Yes, these APIs avoid a user having to create a class. My comment was meant to challenge if this is much of a problem.

If you don't care about offset, then it shouldn't matter whether it's ignored or not.

Non-seekable handles are limited compared to seekable handles. I think it would be good if the API shows that it will work with non-seekable handles. We're blurring this distinction by making users pass an offset which they should expect to be ignored.

stephentoub · 2024-07-22T13:24:59Z

@adamsitnik, what are you planning to do with this PR? Do we want it in .NET 9?

adamsitnik · 2024-07-22T15:57:34Z

what are you planning to do with this PR? Do we want it in .NET 9?

I wanted to have it merged for 9, but as @tmds has pointed in #96711 (comment) the current implementation is not going to work for async handles on Linux. Making it work is possible, but would require a lot of work (mostly refactoring the epoll-related code and re-using it here). On the other hand, we could just document this limitation as currently not supported and call it a day.

@stephentoub what is your opinion?

stephentoub · 2024-07-23T16:33:43Z

what are you planning to do with this PR? Do we want it in .NET 9?

I wanted to have it merged for 9, but as @tmds has pointed in #96711 (comment) the current implementation is not going to work for async handles on Linux. Making it work is possible, but would require a lot of work (mostly refactoring the epoll-related code and re-using it here). On the other hand, we could just document this limitation as currently not supported and call it a day.

@stephentoub what is your opinion?

Today the methods don't work with any non-seekable file descriptors. With this change, as far as I can tell we're making the existing methods work with more file descriptors and without any take backs. Even if there are still more beyond that we could support in the future, this seems to me like moving in the right direction.

Up to you what you want to do with it.

stephentoub · 2025-05-28T18:12:18Z

@adamsitnik, what do you want to do with this? We should probably either merge or close.

adamsitnik added 3 commits January 9, 2024 17:16

add tests

75360d5

the actual change to public API, tests are passing on Windows (no cha…

ddc88b6

…nges required)

Unix implementation

ace2855

adamsitnik self-assigned this Jan 9, 2024

ghost added the area-System.IO label Jan 9, 2024

adamsitnik changed the title ~~Relax random access~~ Make RandomAccess.Read*|Write* methods work with non-seekable files Jan 9, 2024

This was referenced Jan 9, 2024

Tracking issue for CI build timeouts #76454

Closed

pip fails to build wheels for cffi (gcc not found) in ubuntu.1804.armarch.open using image alpine-3.15-helix-arm32v7 dotnet/dnceng#1423

Closed

fix the tests by accounting for partial writes and reads, remove refl…

7536707

…ection

adamsitnik force-pushed the relaxRandomAccess branch from f5edfd5 to 7536707 Compare January 10, 2024 10:59

adamsitnik marked this pull request as ready for review January 10, 2024 12:26

adamsitnik requested review from stephentoub and jozkee January 10, 2024 12:26

adamsitnik added this to the 9.0.0 milestone Jan 10, 2024

build-analysis bot mentioned this pull request Jan 10, 2024

Tracking issue for "WORKLOAD TIMED OUT" #90309

Closed

jozkee reviewed Jan 23, 2024

View reviewed changes

danmoseley reviewed Jan 23, 2024

View reviewed changes

tmds reviewed Jan 23, 2024

View reviewed changes

alexrp mentioned this pull request Jan 27, 2024

Switch to FileStream/RandomAccess for I/O if feasible (.NET 9+) vezel-dev/cathode#76

Open

adamsitnik modified the milestones: 9.0.0, 10.0.0 Jul 31, 2024

adamsitnik added the blocked Issue/PR is blocked on something - see comments label Jul 31, 2024


		ReadExactly(readHandle, buffer, content.Length); // what is required for the above write to succeed

		Assert.Equal(content, buffer.AsSpan(0, content.Length).ToArray());

Make RandomAccess.Read*|Write* methods work with non-seekable files #96711

Are you sure you want to change the base?

Make RandomAccess.Read*|Write* methods work with non-seekable files #96711

Conversation

adamsitnik commented Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Jan 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tmds commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamsitnik commented Jan 23, 2024

Uh oh!

stephentoub commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tmds commented Jan 23, 2024

Uh oh!

jozkee commented Jan 23, 2024

Uh oh!

tmds Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub commented Jan 23, 2024

Uh oh!

adamsitnik commented Jan 24, 2024

Uh oh!

tmds commented Jan 24, 2024

Uh oh!

stephentoub commented Jun 22, 2024

Uh oh!

tmds commented Jun 22, 2024

Uh oh!

stephentoub commented Jul 22, 2024

Uh oh!

adamsitnik commented Jul 22, 2024

Uh oh!

stephentoub commented Jul 23, 2024

Uh oh!

stephentoub commented May 28, 2025

Uh oh!

Uh oh!

Make RandomAccess.Read|Write methods work with non-seekable files #96711

Make RandomAccess.Read|Write methods work with non-seekable files #96711

adamsitnik commented Jan 9, 2024 •

edited

Loading

tmds commented Jan 23, 2024 •

edited

Loading

stephentoub commented Jan 23, 2024 •

edited

Loading

tmds Jan 23, 2024 •

edited

Loading