[history server] Improve `api/v0/logs/file` by machichima · Pull Request #4456 · ray-project/kuberay

machichima · 2026-01-29T11:27:23Z

Why are these changes needed?

This PR implement TODOs mentioned in: #4387 (comment), which contains following:

Add more test cases for api/v0/logs/file and collect test cases with for loop to improve the readability.
- For live cluster, collect existing test cases into for loop and add new test cases to cover timeout, attempt_numbe, download_file, filter_ansi_c parameters
- For dead cluster, collect existing test cases into for loop
Add implementation for attempt_numbe, download_filename, filter_ansi_code in dead cluster
- Note: timeout requires us to refactor all StorageReader implementations to add context support, so we are skipping this now and will do in follow-up
Add support for node_ip, actor_id, task_id, pid, suffix
- Note: submission_id cannot be implemented now as DriverJobDefinitionEvent is missing submission_id field, which we cannot get submission_id -> driver_node_id mapping

With node_id or node_ip, pid or filename need to be provided. Otherwise user can just provide actor_id or task_id directly

Related issue number

Related to #4387

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

Signed-off-by: machichima <nary12321@gmail.com>

timeout only test validation, not testing the behavior Signed-off-by: machichima <nary12321@gmail.com>

historyserver/test/e2e/historyserver_test.go

Signed-off-by: machichima <nary12321@gmail.com>

historyserver/pkg/historyserver/router.go

historyserver/pkg/historyserver/reader.go

Signed-off-by: machichima <nary12321@gmail.com>

historyserver/pkg/historyserver/router.go

historyserver/pkg/historyserver/reader.go

Signed-off-by: machichima <nary12321@gmail.com>

historyserver/pkg/historyserver/reader.go

Signed-off-by: machichima <nary12321@gmail.com>

historyserver/pkg/historyserver/reader.go

Signed-off-by: machichima <nary12321@gmail.com>

historyserver/pkg/historyserver/router.go

historyserver/test/e2e/historyserver_test.go

Signed-off-by: machichima <nary12321@gmail.com>

historyserver/pkg/historyserver/types.go

machichima · 2026-02-01T09:15:59Z

Test parameters manually (without log/stream) with following commands on live and dead cluster. Also e2e test are added

0. List all available log files

Test if /nodes endpoint works

NODE_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/nodes?view=summary" | jq -r '.data.summary[0].raylet.nodeId')
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs?node_id=${NODE_ID}"

1. Use filename (specify the file directly)

NODE_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/nodes?view=summary" | jq -r '.data.summary[0].raylet.nodeId')
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?node_id=${NODE_ID}&filename=raylet.out&lines=100"

2. Use node_ip (alternative to node_id)

Not working in a dead cluster; when calling the nodes/ endpoint, the ip field will always be UNKNOWN

kuberay/historyserver/pkg/historyserver/reader.go

Line 434 in e562958

"ip": "UNKNOWN",

NODE_IP=$(curl -s -b ~/cookies.txt "http://localhost:8080/nodes?view=summary" | jq -r '.data.summary[0].ip')
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?node_ip=${NODE_IP}&filename=raylet.out"

3. Use pid

Obtain from an actually existing worker log.

Not working for a dead cluster, as there’s no way to get the pid for a dead cluster

kuberay/historyserver/test/e2e/historyserver_test.go

Lines 718 to 725 in e562958

    
           // Sub-test for pid parameter (dead cluster) 
        
           // NOTE: This test is skipped because Ray export events don't include worker_pid. 
        
           // See: https://github.com/ray-project/ray/issues/60129 
        
           // Worker lifecycle events are not yet exported, so we cannot obtain worker PIDs 
        
           // from historical data for dead clusters. 
        
           test.T().Run("pid parameter", func(t *testing.T) { 
        
           	t.Skip("Skipping pid parameter test for dead cluster: worker_pid not available in Ray export events (see https://github.com/ray-project/ray/issues/60129)") 
        
           })

NODE_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/nodes?view=summary" | jq -r '.data.summary[0].raylet.nodeId')
PID=$(curl -s -b ~/cookies.txt "http://localhost:8080/api/v0/logs?node_id=${NODE_ID}" | jq -r '.data.result.core_worker[0]' | sed -E 's/.*_([0-9]+)\.log/\1/')
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?node_id=${NODE_ID}&pid=${PID}&suffix=out"

4. Use actor_id (stdout)

ACTOR_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/logical/actors" | jq -r '.data.actors | keys[0]')
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?actor_id=${ACTOR_ID}&suffix=out"

5. Use actor_id (stderr)

ACTOR_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/logical/actors" | jq -r '.data.actors | keys[0]')
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?actor_id=${ACTOR_ID}&suffix=err"

6. Use task_id

Choose a normal task

TASK_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/api/v0/tasks" | jq -r '.data.result.result[] | select(.type == "NORMAL_TASK") | .task_id' | head -1)
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?task_id=${TASK_ID}&suffix=out"

7. Use task_id and specify attempt_number

Choose a normal task

TASK_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/api/v0/tasks" | jq -r '.data.result.result[] | select(.type == "NORMAL_TASK") | .task_id' | head -1)
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?task_id=${TASK_ID}&suffix=out&attempt_number=0"

8. Use download_file parameter to download the file

You will see a new file named newfile.txt in your directory

NODE_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/nodes?view=summary" | jq -r '.data.summary[0].raylet.nodeId')
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?node_id=${NODE_ID}&filename=raylet.out&download_filename=newfile.txt" -J -O

9. Use filter_ansi_code parameter to filter ANSI codes

NODE_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/nodes?view=summary" | jq -r '.data.summary[0].raylet.nodeId')
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?node_id=${NODE_ID}&filename=raylet.out&filter_ansi_code=true"

10. Combination test (actor_id + attempt_number + all params)

ACTOR_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/logical/actors" | jq -r '.data.actors | keys[0]')
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?actor_id=${ACTOR_ID}&attempt_number=0&suffix=out&lines=1000&filter_ansi_code=true"

11. Combination test (task_id + attempt_number + all params)

Choose a normal task

TASK_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/api/v0/tasks" | jq -r '.data.result.result[] | select(.type == "NORMAL_TASK") | .task_id' | head -1)
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?task_id=${TASK_ID}&attempt_number=0&suffix=out&lines=1000&filter_ansi_code=true"

13. Test pid with node_ip

Obtain from an actually existing worker log

Not working for a dead cluster, as there’s no way to get the pid for a dead cluster

kuberay/historyserver/test/e2e/historyserver_test.go

Lines 718 to 725 in e562958

    
           // Sub-test for pid parameter (dead cluster) 
        
           // NOTE: This test is skipped because Ray export events don't include worker_pid. 
        
           // See: https://github.com/ray-project/ray/issues/60129 
        
           // Worker lifecycle events are not yet exported, so we cannot obtain worker PIDs 
        
           // from historical data for dead clusters. 
        
           test.T().Run("pid parameter", func(t *testing.T) { 
        
           	t.Skip("Skipping pid parameter test for dead cluster: worker_pid not available in Ray export events (see https://github.com/ray-project/ray/issues/60129)") 
        
           })

NODE_IP=$(curl -s -b ~/cookies.txt "http://localhost:8080/nodes?view=summary" | jq -r '.data.summary[0].ip')
NODE_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/nodes?view=summary" | jq -r '.data.summary[0].raylet.nodeId')
PID=$(curl -s -b ~/cookies.txt "http://localhost:8080/api/v0/logs?node_id=${NODE_ID}" | jq -r '.data.result.core_worker[0]' | sed -E 's/.*_([0-9]+)\.log/\1/')
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?node_ip=${NODE_IP}&pid=${PID}&suffix=out"

14. Test actor_id with node_ip (actor automatically resolves node_id)

ACTOR_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/logical/actors" | jq -r '.data.actors | keys[0]')
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?actor_id=${ACTOR_ID}&suffix=out&lines=200"

15. Test task_id with node_ip (task automatically resolves node_id)

Choose a normal task

TASK_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/api/v0/tasks" | jq -r '.data.result.result[] | select(.type == "NORMAL_TASK") | .task_id' | head -1)
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/file?task_id=${TASK_ID}&suffix=out&lines=300"

Signed-off-by: machichima <nary12321@gmail.com>

machichima · 2026-02-01T12:22:49Z

Test log/stream with following command

❯ NODE_ID=$(curl -s -b ~/cookies.txt "http://localhost:8080/nodes?view=summary" | jq -r '.data.summary[0].raylet.nodeId')
curl -b ~/cookies.txt "http://localhost:8080/api/v0/logs/stream?node_id=${NODE_ID}&filename=raylet.out&filter_ansi_code=true&lines=20"

[state-dump]    RaySyncer.BroadcastMessage - 2 total (0 active), Execution time: mean = 0.03ms, total = 0.07ms, Queueing time: mean = 0.00ms, max = 0.00ms, min = 0.00ms, total = 0.00ms
[state-dump]    RaySyncerRegister - 2 total (0 active), Execution time: mean = 0.00ms, total = 0.00ms, Queueing time: mean = 0.00ms, max = 0.00ms, min = 0.00ms, total = 0.00ms
[state-dump]    ReporterService.grpc_client.HealthCheck - 2 total (0 active), Execution time: mean = 0.51ms, total = 1.02ms, Queueing time: mean = 0.00ms, max = -0.00ms, min = 9223372036854.78ms, total = 0.00ms
[state-dump]     - 2 total (0 active), Execution time: mean = 0.03ms, total = 0.07ms, Queueing time: mean = 0.02ms, max = 0.02ms, min = 0.01ms, total = 0.04ms
[state-dump]    NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 3.40ms, total = 6.80ms, Queueing time: mean = 0.02ms, max = 0.04ms, min = 0.04ms, total = 0.04ms
[state-dump]    ReporterService.grpc_client.HealthCheck.OnReplyReceived - 2 total (0 active), Execution time: mean = 0.06ms, total = 0.11ms, Queueing time: mean = 0.43ms, max = 0.84ms, min = 0.02ms, total = 0.85ms
[state-dump]    ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 0.60ms, total = 1.20ms, Queueing time: mean = 0.00ms, max = -0.00ms, min = 9223372036854.78ms, total = 0.00ms
[state-dump]    MetricsAgentClient.WaitForServerReadyWithRetry - 1 total (0 active), Execution time: mean = 0.17ms, total = 0.17ms, Queueing time: mean = 1000.67ms, max = 1000.67ms, min = 1000.67ms, total = 1000.67ms
[state-dump]    ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 0.01ms, total = 0.01ms, Queueing time: mean = 0.03ms, max = 0.03ms, min = 0.03ms, total = 0.03ms
[state-dump]    ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 0.11ms, total = 0.11ms, Queueing time: mean = 0.02ms, max = 0.02ms, min = 0.02ms, total = 0.02ms
[state-dump]    Subscriber.HandlePublishedMessage_GCS_NODE_ADDRESS_AND_LIVENESS_CHANNEL - 1 total (0 active), Execution time: mean = 0.05ms, total = 0.05ms, Queueing time: mean = 0.44ms, max = 0.44ms, min = 0.44ms, total = 0.44ms
[state-dump]    ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeAddressAndLiveness.OnReplyReceived - 1 total (0 active), Execution time: mean = 0.06ms, total = 0.06ms, Queueing time: mean = 0.02ms, max = 0.02ms, min = 0.02ms, total = 0.02ms
[state-dump]    ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.12ms, total = 1.12ms, Queueing time: mean = 0.00ms, max = -0.00ms, min = 9223372036854.78ms, total = 0.00ms
[state-dump]    ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeAddressAndLiveness - 1 total (0 active), Execution time: mean = 1.74ms, total = 1.74ms, Queueing time: mean = 0.00ms, max = -0.00ms, min = 9223372036854.78ms, total = 0.00ms
[state-dump]    ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 8.97ms, total = 8.97ms, Queueing time: mean = 0.01ms, max = 0.01ms, min = 0.01ms, total = 0.01ms
[state-dump]    ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 0.21ms, total = 0.21ms, Queueing time: mean = 0.00ms, max = -0.00ms, min = 9223372036854.78ms, total = 0.00ms
[state-dump]    ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 0.32ms, total = 0.32ms, Queueing time: mean = 0.00ms, max = -0.00ms, min = 9223372036854.78ms, total = 0.00ms
[state-dump] DebugString() time ms: 0
[state-dump]

JiangJiaWei1103 · 2026-02-02T02:29:36Z

historyserver/test/e2e/historyserver_test.go

+			if resp.StatusCode != tc.expectedStatus {
+				LogWithTimestamp(t, "Test case '%s' failed: expected %d, got %d, body: %s",
+					tc.name, tc.expectedStatus, resp.StatusCode, string(body))
+			}
+
+			g.Expect(resp.StatusCode).To(Equal(tc.expectedStatus),
+				"Test case '%s' failed: expected %d, got %d", tc.name, tc.expectedStatus, resp.StatusCode)


Suggested change

if resp.StatusCode != tc.expectedStatus {

LogWithTimestamp(t, "Test case '%s' failed: expected %d, got %d, body: %s",

tc.name, tc.expectedStatus, resp.StatusCode, string(body))

}

g.Expect(resp.StatusCode).To(Equal(tc.expectedStatus),

"Test case '%s' failed: expected %d, got %d", tc.name, tc.expectedStatus, resp.StatusCode)

g.Expect(resp.StatusCode).To(Equal(tc.expectedStatus),

"Test case '%s' failed: expected %d, got %d", tc.name, tc.expectedStatus, resp.StatusCode)

Thanks Nary! This seems redundant.

Thank you for point this out! I would prefer keeping the

if resp.StatusCode != tc.expectedStatus { LogWithTimestamp(t, "Test case '%s' failed: expected %d, got %d, body: %s", tc.name, tc.expectedStatus, resp.StatusCode, string(body)) }

and remove the one below. In this case, when the status code mismatch, the body is printed out so that we can know what error we are getting.

Done in 200d56c

Signed-off-by: machichima <nary12321@gmail.com>

historyserver/pkg/historyserver/reader.go

historyserver/test/e2e/historyserver_test.go

AndySung320 · 2026-02-02T20:45:58Z

historyserver/pkg/historyserver/reader.go

+// 1. Already hex format - returns as-is
+// 2. Base64-encoded - decodes to hex
+// It tries RawURLEncoding first (Ray's default), falling back to StdEncoding if that fails.
+func decodeBase64ToHex(id string) (string, error) {


I noticed that we currently have multiple places dealing with Base64 → hex ID conversion:

reader.go defines decodeBase64ToHex(...)

historyserver/pkg/utils/utils.go already has a similar helper

router.go also has a TODO noting that Ray Base Event IDs are Base64-encoded, while other APIs use hex, and that Base64 IDs can break URL routing

Among these, the implementation in reader.go is actually more robust.
Would it make sense to centralize this logic in pkg/utils and reuse it from reader.go and router.go or some other component?

Make sense! I update the ConvertBase64ToHex in utils.go with more robust logic, and use it in reader.go in: d9b7dba

Signed-off-by: machichima <nary12321@gmail.com>

…logs-file-test Signed-off-by: machichima <nary12321@gmail.com>

put in utils and use in reader Signed-off-by: machichima <nary12321@gmail.com>

historyserver/pkg/historyserver/router.go

historyserver/pkg/utils/utils.go

Signed-off-by: machichima <nary12321@gmail.com>

historyserver/pkg/historyserver/reader.go

Signed-off-by: machichima <nary12321@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

cursor · 2026-02-03T12:05:10Z

historyserver/pkg/historyserver/router.go

+	})
+
+	if disposition == "" {
+		logrus.Errorf("Failed to format Content-Disposition header for filename %q: %v", downloadFilename, err)


Error variable is nil when logged

Low Severity

The error log at line 775 references err from the earlier _getNodeLogFile call, but at this point err is guaranteed to be nil (the function returns early if there was an error). The mime.FormatMediaType function doesn't return an error - it returns an empty string on failure. This results in a misleading log message showing <nil> as the error value.

Fixed in 145ab04

cursor · 2026-02-03T12:05:10Z

historyserver/pkg/utils/utils.go

+func ConvertBase64ToHex(id string) (string, error) {
+	// Check if already hex (only [0-9a-f])
+	if matched, _ := regexp.MatchString("^[0-9a-f]+$", id); matched {
+		return id, nil


Hex detection regex ignores uppercase characters

Low Severity

The regex ^[0-9a-f]+$ only matches lowercase hex characters. If an ID arrives as uppercase hex (e.g., "ABC123DEF"), it won't match and the function attempts Base64 decoding. Since uppercase letters are valid Base64 characters, the decode may "succeed" but produce completely wrong byte data. The hex string "ABCD" represents bytes 0xAB 0xCD, but Base64-decoding "ABCD" produces entirely different bytes.

Fixed in 145ab04

Signed-off-by: machichima <nary12321@gmail.com>

machichima · 2026-02-03T14:01:03Z

I think right now the following two tests are flaky:

TestHistoryServer//v0/logs/file_endpoint_(dead_cluster)/task_id_parameter
TestHistoryServer//v0/logs/file_endpoint_(dead_cluster)/actor_id_parameter

As they relies on /api/v0/tasks and /logical/actors to get all available task and actor IDs, we would need to look into those two endpoints to identify why sometimes we will get 0 task or actor IDs.

Should we remove / comment those tests out for now and open follow-up issue to solve this?

AndySung320

One thing to note: req.QueryParameter("actor_id") returns a URL-decoded value.

If actor_id contains standard Base64 characters such as + (or /, =), + will be decoded into a space when parsing the query string. This makes the original Base64 unrecoverable.

We can already see this in the test logs:
YL1IU+OndDKZT2iTAgAAAA== becomes YL1IU OndDKZT2iTAgAAAA==, which then causes the lookup to fail.
Maybe we should avoid passing standard Base64 IDs directly via query parameters, or ensure they are URL-encoded / normalized (e.g. to hex) earlier.

Signed-off-by: machichima <nary12321@gmail.com>

machichima · 2026-02-04T12:24:07Z

@AndySung320 Nice catch!! Just updated in 9980d22. I run the test 7 times locally and they all passed

❯ go test -run TestHistoryServer
PASS
ok      github.com/ray-project/kuberay/historyserver/test/e2e   113.564s

~/workData/open-source/kuberay/historyserver/test/e2e improve-logs-file-test *4 !1 ?4                                                                                                                                                  1m 55s
❯ go test -run TestHistoryServer
PASS
ok      github.com/ray-project/kuberay/historyserver/test/e2e   112.837s

~/workData/open-source/kuberay/historyserver/test/e2e improve-logs-file-test *4 !1 ?4                                                                                                                                                  1m 56s
❯ go test -run TestHistoryServer
PASS
ok      github.com/ray-project/kuberay/historyserver/test/e2e   116.831s

~/workData/open-source/kuberay/historyserver/test/e2e improve-logs-file-test *4 !1 ?4                                                                                                                                                   2m 0s
❯ go test -run TestHistoryServer
PASS
ok      github.com/ray-project/kuberay/historyserver/test/e2e   115.684s

~/workData/open-source/kuberay/historyserver/test/e2e improve-logs-file-test *4 !1 ?4                                                                                                                                                  1m 57s
❯ go test -run TestHistoryServer
PASS
ok      github.com/ray-project/kuberay/historyserver/test/e2e   115.424s

~/workData/open-source/kuberay/historyserver/test/e2e improve-logs-file-test *4 !1 ?4                                                                                                                                                  1m 58s
❯ go test -run TestHistoryServer
PASS
ok      github.com/ray-project/kuberay/historyserver/test/e2e   116.670s

~/workData/open-source/kuberay/historyserver/test/e2e improve-logs-file-test *4 !1 ?4                                                                                                                                                   2m 0s
❯ go test -run TestHistoryServer
PASS
ok      github.com/ray-project/kuberay/historyserver/test/e2e   116.547s

machichima added 6 commits January 29, 2026 19:20

test: add more test case for live cluster e2e test

1eb0afb

Signed-off-by: machichima <nary12321@gmail.com>

test: more test case for dead cluster e2e test

64df48d

Signed-off-by: machichima <nary12321@gmail.com>

refactor: clean up comments

4a4f729

Signed-off-by: machichima <nary12321@gmail.com>

feat: move query options to struct&add more options

0d89338

Signed-off-by: machichima <nary12321@gmail.com>

feat: implement attempt_numbe, download_file, filter_ansi_code function

5dc5f12

Signed-off-by: machichima <nary12321@gmail.com>

feat: e2e test for attempt_numbe, download_file, filter_ansi_code

6de7601

timeout only test validation, not testing the behavior Signed-off-by: machichima <nary12321@gmail.com>

machichima changed the title ~~[Test][history server] Improve api/v0/logs/file test~~ [Test][history server] Improve api/v0/logs/file Jan 30, 2026

cursor bot reviewed Jan 30, 2026

View reviewed changes

historyserver/test/e2e/historyserver_test.go Show resolved Hide resolved

historyserver/test/e2e/historyserver_test.go Outdated Show resolved Hide resolved

cursor bot reviewed Jan 30, 2026

View reviewed changes

historyserver/test/e2e/historyserver_test.go Outdated Show resolved Hide resolved

machichima added 4 commits January 30, 2026 21:52

fix: update live cluster invalide param status code

8d11aff

Signed-off-by: machichima <nary12321@gmail.com>

feat: add id related param&implement task_id+suffix

ff75e12

Signed-off-by: machichima <nary12321@gmail.com>

test: remove eventual & print body when status code mismatch

4f56280

Signed-off-by: machichima <nary12321@gmail.com>

feat: logic to find logs based on worker ID

e01c113

Signed-off-by: machichima <nary12321@gmail.com>

machichima force-pushed the improve-logs-file-test branch from 18fc3cb to e01c113 Compare January 31, 2026 10:12

cursor bot reviewed Jan 31, 2026

View reviewed changes

historyserver/pkg/historyserver/router.go Show resolved Hide resolved

historyserver/pkg/historyserver/reader.go Show resolved Hide resolved

historyserver/pkg/historyserver/reader.go Outdated Show resolved Hide resolved

machichima added 2 commits January 31, 2026 18:35

test: for suffix and task_id

41e5970

Signed-off-by: machichima <nary12321@gmail.com>

fix: update rayjob.yaml to ensure produce log.out file

6497225

Signed-off-by: machichima <nary12321@gmail.com>

cursor bot reviewed Jan 31, 2026

View reviewed changes

historyserver/pkg/historyserver/router.go Show resolved Hide resolved

historyserver/pkg/historyserver/reader.go Outdated Show resolved Hide resolved

machichima added 2 commits January 31, 2026 20:16

feat+test: support actor_id query

6b66a30

Signed-off-by: machichima <nary12321@gmail.com>

feat+test: support pid query

d204ef1

Signed-off-by: machichima <nary12321@gmail.com>

cursor bot reviewed Jan 31, 2026

View reviewed changes

historyserver/pkg/historyserver/reader.go Outdated Show resolved Hide resolved

machichima added 2 commits February 1, 2026 09:42

docs: todo comment for submission_id

f739e5a

Signed-off-by: machichima <nary12321@gmail.com>

feat+test: add node_ip support

e562958

Signed-off-by: machichima <nary12321@gmail.com>

cursor bot reviewed Feb 1, 2026

View reviewed changes

historyserver/pkg/historyserver/reader.go Outdated Show resolved Hide resolved

historyserver/pkg/historyserver/reader.go Show resolved Hide resolved

historyserver/pkg/historyserver/reader.go Show resolved Hide resolved

machichima added 3 commits February 1, 2026 16:13

test: move pid invalid test to logFileTestCases

f2f0161

Signed-off-by: machichima <nary12321@gmail.com>

fix: add download_filename rather than download_file flag

e0151db

Signed-off-by: machichima <nary12321@gmail.com>

refactor: Base64 to hex conversion logic to util function

892d41a

Signed-off-by: machichima <nary12321@gmail.com>

cursor bot reviewed Feb 1, 2026

View reviewed changes

historyserver/pkg/historyserver/router.go Outdated Show resolved Hide resolved

historyserver/test/e2e/historyserver_test.go Outdated Show resolved Hide resolved

machichima added 2 commits February 1, 2026 16:27

fix: skip convert to hex if already is

d254463

Signed-off-by: machichima <nary12321@gmail.com>

fix: close reader to prevent connection leak

ccd8243

Signed-off-by: machichima <nary12321@gmail.com>

fix: remove duplicate suffix validation

4bd16da

Signed-off-by: machichima <nary12321@gmail.com>

cursor bot reviewed Feb 1, 2026

View reviewed changes

historyserver/pkg/historyserver/types.go Outdated Show resolved Hide resolved

refactor: remove not yet implemented comment

f8dd9ee

Signed-off-by: machichima <nary12321@gmail.com>

machichima mentioned this pull request Feb 1, 2026

[history server] Add timeout parameter support for api/v0/logs/file #4471

Open

machichima changed the title ~~[Test][history server] Improve api/v0/logs/file~~ [history server] Improve api/v0/logs/file Feb 1, 2026

feat+test: add logs/stream endpoint

fe49c92

Signed-off-by: machichima <nary12321@gmail.com>

JiangJiaWei1103 reviewed Feb 2, 2026

View reviewed changes

fix: remove redundant status code check

200d56c

Signed-off-by: machichima <nary12321@gmail.com>

cursor bot reviewed Feb 2, 2026

View reviewed changes

historyserver/pkg/historyserver/reader.go Show resolved Hide resolved

historyserver/test/e2e/historyserver_test.go Show resolved Hide resolved

AndySung320 reviewed Feb 2, 2026

View reviewed changes

machichima added 3 commits February 3, 2026 18:51

fix: update format of worker log in comment

c41d874

Signed-off-by: machichima <nary12321@gmail.com>

Merge branch 'master' of github.com:ray-project/kuberay into improve-…

dd8a77f

…logs-file-test Signed-off-by: machichima <nary12321@gmail.com>

feat: more robust ConvertBase64ToHex and centralize the logic

d9b7dba

put in utils and use in reader Signed-off-by: machichima <nary12321@gmail.com>

machichima force-pushed the improve-logs-file-test branch from bf3ff58 to d9b7dba Compare February 3, 2026 11:06

cursor bot reviewed Feb 3, 2026

View reviewed changes

historyserver/pkg/historyserver/router.go Outdated Show resolved Hide resolved

historyserver/pkg/utils/utils.go Show resolved Hide resolved

fix: return original id if cannot decode

f1799d1

Signed-off-by: machichima <nary12321@gmail.com>

cursor bot reviewed Feb 3, 2026

View reviewed changes

historyserver/pkg/historyserver/reader.go Show resolved Hide resolved

machichima added 4 commits February 3, 2026 19:38

fix: escape filename correctly

e848198

Signed-off-by: machichima <nary12321@gmail.com>

fix: add sessionID == "" check in resolveActorLogFilename

4e30f18

Signed-off-by: machichima <nary12321@gmail.com>

fix: use correct cluster name to query task and actor id

1432037

Signed-off-by: machichima <nary12321@gmail.com>

test: filename header use no ""

aba1b9e

Signed-off-by: machichima <nary12321@gmail.com>

cursor bot reviewed Feb 3, 2026

View reviewed changes

fix: redundant err and correct regex for base64

145ab04

Signed-off-by: machichima <nary12321@gmail.com>

AndySung320 reviewed Feb 4, 2026

View reviewed changes

fix: properly encode url parameter for task and actor id

9980d22

Signed-off-by: machichima <nary12321@gmail.com>

Copilot AI mentioned this pull request Feb 5, 2026

Review all open pull requests #4482

Closed

4 tasks

Conversation

machichima commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

machichima commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

0. List all available log files

1. Use filename (specify the file directly)

2. Use node_ip (alternative to node_id)

3. Use pid

4. Use actor_id (stdout)

5. Use actor_id (stderr)

6. Use task_id

7. Use task_id and specify attempt_number

8. Use download_file parameter to download the file

9. Use filter_ansi_code parameter to filter ANSI codes

10. Combination test (actor_id + attempt_number + all params)

11. Combination test (task_id + attempt_number + all params)

13. Test pid with node_ip

14. Test actor_id with node_ip (actor automatically resolves node_id)

15. Test task_id with node_ip (task automatically resolves node_id)

Uh oh!

machichima commented Feb 1, 2026

Uh oh!

JiangJiaWei1103 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

machichima Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

machichima Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AndySung320 Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

machichima Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 3, 2026

Choose a reason for hiding this comment

Error variable is nil when logged

Uh oh!

machichima Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 3, 2026

Choose a reason for hiding this comment

Hex detection regex ignores uppercase characters

Uh oh!

machichima Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

machichima commented Feb 3, 2026

machichima commented Jan 29, 2026 •

edited

Loading

machichima commented Feb 1, 2026 •

edited

Loading