[Feature][history server] support endpoint `/api/v0/tasks/timeline` by AndySung320 · Pull Request #4437 · ray-project/kuberay

AndySung320 · 2026-01-25T09:03:26Z

Why are these changes needed?

This PR implements the /api/v0/tasks/timeline endpoint for History Server to support task execution timeline visualization.

Currently, Ray Dashboard provides timeline data only for live clusters. Once a cluster is deleted, this profiling information is lost. This implementation enables users to query task timelines from historical data stored in MinIO.

The timeline data is formatted in Chrome Tracing Format.

Manual test: #4437 (comment)

Related issue number

Closes #4390

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

AndySung320 · 2026-01-25T09:24:13Z

Manual test:

Following setup guide here, but skip "7. Delete Ray Cluster (Trigger Log Upload)" step
Get timeline from the live cluster

SESSION="live"
curl -c ~/cookies.txt "http://localhost:8080/enter_cluster/default/raycluster-historyserver/$SESSION"

curl -b ~/cookies.txt "http://localhost:8080/api/v0/tasks/timeline"

Delete the RayCluster to trigger the upload
Get timeline from the dead cluster

curl "http://localhost:8080/clusters"  #to get the session number

SESSION="YOUR SESSION"
curl -c ~/cookies.txt "http://localhost:8080/enter_cluster/default/raycluster-historyserver/$SESSION"

curl -b ~/cookies.txt "http://localhost:8080/api/v0/tasks/timeline"

Missing Profile Events

Based on testing, the following task detail events are often not present in the collected logs:

task:deserialize_arguments
task:execute
task:store_outputs

The root cause is still under investigation.

Additional Events

The history server may include extra events not shown in Dashboard:

submit_task events from the driver node
Note: In the Ray Dashboard API, ProfileEvents is defined as optional (not repeated) in the Protobuf. When events are merged with MergeFrom, profile events from different component types overwrite each other. Because worker events usually arrive later than driver events, the driver’s profile events get overwritten by the worker’s.
In contrast, the event export API emits driver and worker as separate RayEvents, so they are not merged and are sent independently. Thus, that's why we could see the extra events.

For best results, wait 1-2 minutes after job completion before deleting the cluster.

historyserver/test/support/raycluster.go

historyserver/pkg/eventserver/eventserver.go

historyserver/test/e2e/historyserver_test.go

historyserver/pkg/eventserver/eventserver.go

fscnick · 2026-01-27T12:57:55Z

historyserver/pkg/eventserver/eventserver.go

+			return fmt.Errorf("event does not have 'taskProfileEvents'")
+		}
+
+		var profileData struct {


Would it be better not to use anonymous struct?

yes, i will refactor it.

fscnick · 2026-01-27T13:00:41Z

historyserver/pkg/eventserver/eventserver.go

+		}
+
+		// Convert events to ProfileEventRaw format
+		var rawEvents []types.ProfileEventRaw


nit:

Suggested change

var rawEvents []types.ProfileEventRaw

var rawEvents = make([]types.ProfileEventRaw, 0, len(profileData.ProfileEvents.Events))

thanks for pointing it out~

fscnick · 2026-01-27T13:06:40Z

historyserver/pkg/eventserver/eventserver.go

+			existingKeys := make(map[eventKey]bool)
+			for _, e := range t.ProfileData.Events {
+				existingKeys[eventKey{e.EventName, e.StartTime, e.EndTime}] = true
+			}
+
+			for _, e := range rawEvents {
+				key := eventKey{e.EventName, e.StartTime, e.EndTime}
+				if !existingKeys[key] {
+					t.ProfileData.Events = append(t.ProfileData.Events, e)
+					existingKeys[key] = true


nit:

Suggested change

existingKeys := make(map[eventKey]bool)

for _, e := range t.ProfileData.Events {

existingKeys[eventKey{e.EventName, e.StartTime, e.EndTime}] = true

}

for _, e := range rawEvents {

key := eventKey{e.EventName, e.StartTime, e.EndTime}

if !existingKeys[key] {

t.ProfileData.Events = append(t.ProfileData.Events, e)

existingKeys[key] = true

existingKeys := make(map[eventKey]struct{})

for _, e := range t.ProfileData.Events {

existingKeys[eventKey{e.EventName, e.StartTime, e.EndTime}] = struct{}{}

}

for _, e := range rawEvents {

key := eventKey{e.EventName, e.StartTime, e.EndTime}

if _, ok := existingKeys[key]; !ok {

t.ProfileData.Events = append(t.ProfileData.Events, e)

existingKeys[key] = struct{}{}

fscnick · 2026-01-27T13:37:03Z

historyserver/test/e2e/historyserver_test.go

+				args, ok := event["args"].(map[string]any)
+				gg.Expect(ok).To(BeTrue(), "process_name should have args")


It seems the ok means the type assertion works or not, which might be a bit confused with the following error message. May I know what is aim for?

Good catch! The ok variable checks whether the type assertion succeeded (i.e., whether event["args"] is a map[string]any), but the error message "process_name should have args" might be misleading since it could be interpreted as checking for the existence of args rather than its type.

I can refactor this to make it more explicit. Same as line:407

argsAny, exists := event["args"] gg.Expect(exists).To(BeTrue(), "process_name should have 'args' field") args, ok := argsAny.(map[string]any) gg.Expect(ok).To(BeTrue(), "process_name args should be a map[string]any")

updated in 218d97b .

historyserver/pkg/eventserver/eventserver.go

historyserver/pkg/eventserver/types/task.go

machichima · 2026-01-28T12:46:22Z

historyserver/pkg/eventserver/eventserver.go

@@ -223,11 +224,23 @@ func (h *EventHandler) storeEvent(eventMap map[string]any) error {
 		taskMap.CreateOrMergeAttempt(currTask.TaskID, currTask.AttemptNumber, func(t *types.Task) {
 			// Merge definition fields (preserve existing Events if any)


Suggested change

// Merge definition fields (preserve existing Events if any)

// Merge definition fields (preserve existing Events, ProfileData, and identifiers if any)

nit

machichima · 2026-01-28T13:38:02Z

historyserver/pkg/eventserver/eventserver.go

+	}
+
+	// Direct mapping for known event names
+	switch eventName {


I think this is referenced from: https://github.com/ray-project/ray/blob/68d01c4c48a59c7768ec9c2359a1859966c446b6/python/ray/_private/profiling.py#L25-L25?
Could we add a comment here to show that this part is following the above link in ray?

sure. i will add it

historyserver/pkg/eventserver/eventserver.go

machichima · 2026-01-29T12:10:02Z

historyserver/pkg/eventserver/eventserver.go

+			if strings.HasPrefix(profEvent.EventName, "task::") {
+				if extraData != nil {
+					if name, ok := extraData["name"].(string); ok {
+						args["name"] = name
+					}
+				}
+			}
+
+			// Determine event name for display
+			eventName := profEvent.EventName
+			displayName := profEvent.EventName
+
+			// For overall task events like "task::slow_task", use the full name from extraData
+			if strings.HasPrefix(profEvent.EventName, "task::") && extraData != nil {
+				if name, ok := extraData["name"].(string); ok {
+					displayName = name
+				}
+			}


I think those two if block can be merged? Can we just do

// Determine event name for display eventName := profEvent.EventName displayName := profEvent.EventName // For overall task events like "task::slow_task", extract name from extraData if strings.HasPrefix(profEvent.EventName, "task::") && extraData != nil { if name, ok := extraData["name"].(string); ok { args["name"] = name displayName = name } }

sure, updated in 18a6f71

machichima · 2026-01-29T12:11:32Z

historyserver/pkg/eventserver/eventserver.go

+				Name:  "thread_name",
+				PID:   pid,
+				TID:   &tidVal,
+				Phase: "M",


Just curious, what does phase "M" and "X" mean? Could we document it in ChromeTraceEvent struct? As it's not obvious without a comment

"M" means Metadata event (e.g., process_name, thread_name)
"X" means Complete event (duration events with start and end times)
I will add the comment in ChromeTraceEvent struct.

machichima · 2026-01-29T12:30:14Z

historyserver/pkg/eventserver/eventserver.go

+			continue
+		}
+		nodeIP := task.ProfileData.NodeIPAddress
+		workerID := task.ProfileData.ComponentID


In Ray Dashboard, we filter out component type that's not worker or driver. Do we need to add it here?

https://github.com/ray-project/ray/blob/68d01c4c48a59c7768ec9c2359a1859966c446b6/python/ray/_private/profiling.py#L166-L167

yes, thanks for pointing this out.
we should filter it out to align with Ray Dashboard.

machichima · 2026-01-29T12:34:24Z

historyserver/pkg/eventserver/types/task.go

+	StartTime int64  `json:"start_time"` // nanoseconds
+	EndTime   int64  `json:"end_time"`   // nanoseconds


Could we add a comment showing why they are in nanoseconds?

I think it's here: https://github.com/ray-project/ray/blob/68d01c4c48a59c7768ec9c2359a1859966c446b6/src/ray/core_worker/profile_event.cc#L51-L51

sure, i will add a comment

machichima · 2026-01-29T12:49:09Z

historyserver/pkg/eventserver/eventserver.go

+			continue
+		}
+		nodeIP := task.ProfileData.NodeIPAddress
+		workerID := task.ProfileData.ComponentID


In Ray Dashboard, they use component type + component id to form cluster id, which I think is similar to our workerID here? Could you explain why we just use workerID rather than cluster id that consider different component types?

https://github.com/ray-project/ray/blob/68d01c4c48a59c7768ec9c2359a1859966c446b6/python/ray/_private/profiling.py#L164-L164

You’re right that in Ray Dashboard they form a single logical id from component_type and component_id (e.g. "worker:xyz", "driver:abc"), which is effectively the “cluster id” for that component.
In current implementation I only used ComponentID (the raw id) as the key and didn’t take ComponentType into account. I didn’t consider that distinction at the time.

Using the composite key is safer (no collision if the same id is reused across driver/worker on a node) and matches the Dashboard. I’ll update our code with that and use component_type + ":" + component_id for both the internal key and the displayed thread name.

updated in 18a6f71

fscnick · 2026-01-30T13:22:10Z

historyserver/pkg/eventserver/eventserver.go

+			if existingProfileData != nil {
+				t.ProfileData = existingProfileData
+			}
+			if existingNodeID != "" {
+				t.NodeID = existingNodeID
+			}
+			if existingWorkerID != "" {
+				t.WorkerID = existingWorkerID
+			}


What does these lines mean ? It seems to get existingProfileData from t.ProfileData and set it back.

These lines preserve ProfileData, NodeID, and WorkerID that may have been set by earlier events (TASK_PROFILE_EVENT or TASK_LIFECYCLE_EVENT) before the TASK_DEFINITION_EVENT arrives. Since events are processed concurrently and order is not guaranteed, we save these values before overwriting with *t = currTask (line 230), then restore them to prevent data loss.
This follows the same pattern as ACTOR_DEFINITION_EVENT handling.

Thanks for explanation.

cursor · 2026-01-30T20:22:31Z

historyserver/pkg/eventserver/eventserver.go

+		// Skip if clusterID is empty (consistent with first pass)
+		if clusterID == "" {
+			continue
+		}


Empty string check for clusterID is always false

Low Severity

The check if clusterID == "" can never be true because clusterID is constructed as ComponentType + ":" + ComponentID. Even when both fields are empty, clusterID will be ":", not an empty string. The comment says "Skip if clusterID is empty" but this code path is unreachable. Similarly, the check if clusterID != "" at line 871 always evaluates to true. If the intent is to skip tasks with empty ComponentID, the check needs to validate ComponentID directly rather than checking the concatenated string.

Additional Locations (1)

historyserver/pkg/eventserver/eventserver.go#L870-L876

updated in 83ba2dc

cursor · 2026-01-30T20:22:31Z

historyserver/pkg/eventserver/eventserver.go

+				if name, ok := extraData["name"].(string); ok {
+					displayName = name
+					args["name"] = name
+				}


Missing empty string check for displayName assignment

Low Severity

The condition at line 979 checks if name, ok := extraData["name"].(string); ok without verifying that name is non-empty. This is inconsistent with line 626 which correctly checks ok && name != "". If extraData["name"] is an empty string, displayName will become empty instead of keeping the original profEvent.EventName as a fallback. This would result in trace events with empty names in the Chrome Tracing visualization.

updated in 83ba2dc

cursor · 2026-01-30T20:53:15Z

historyserver/pkg/eventserver/eventserver.go

+			}
+			if existingWorkerID != "" {
+				t.WorkerID = existingWorkerID
+			}


TASK_DEFINITION_EVENT overwrites FuncOrClassName set by profile event

Medium Severity

When TASK_PROFILE_EVENT arrives before TASK_DEFINITION_EVENT, the FuncOrClassName and Name fields extracted from the profile event's extraData are lost. The TASK_DEFINITION_EVENT handler preserves Events, ProfileData, NodeID, and WorkerID before doing *t = currTask, but doesn't preserve FuncOrClassName or Name. Since profile events contain more detailed naming information (e.g., task::Counter.increment format for actor methods), this causes the timeline to show less informative or empty function names when events arrive out of order.

Additional Locations (1)

historyserver/pkg/eventserver/eventserver.go#L620-L635

updated in ddbc3d0

cursor · 2026-01-30T21:44:16Z

historyserver/pkg/eventserver/eventserver.go

+	// Check if all Fs (no actor)
+	if actorPortion == "ffffffffffffffffffffffff" {
+		return ""
+	}


Case-sensitive hex comparison may fail for uppercase IDs

Low Severity

The extractActorIDFromTaskID function checks if a task has no associated actor by comparing actorPortion == "ffffffffffffffffffffffff" (all lowercase). This comparison is case-sensitive, so if Ray ever outputs task IDs with uppercase hex characters (e.g., "FFFFFFFFFFFFFFFFFFFFFFFF"), the check would fail. This would cause tasks without actors to incorrectly have an actor_id value set in the timeline output instead of nil.

updated in 6e3e214

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

cursor · 2026-02-04T05:14:21Z

historyserver/pkg/historyserver/router.go

+	jobID := req.QueryParameter("job_id")
+	download := req.QueryParameter("download")
+
+	timeline := s.eventHandler.GetTasksTimeline(clusterNameID, jobID)


Missing session name in cluster key for timeline lookup

High Severity

The getTasksTimeline function constructs clusterNameID as clusterName + "_" + clusterNamespace, but task data is stored using utils.BuildClusterSessionKey(clusterName, clusterNamespace, sessionName) which includes the session name. This key mismatch causes GetTasksTimeline to always return an empty timeline since it can't find any tasks. Other task endpoints like getTaskSummarize and getTasks correctly use utils.BuildClusterSessionKey.

updated in b56e672

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

cursor · 2026-02-04T08:50:24Z

historyserver/pkg/eventserver/eventserver.go

+	for _, task := range tasks {
+		if task.ProfileData == nil {
+			continue
+		}


Inconsistent filtering causes orphan metadata events

Low Severity

The first pass that collects node/worker mappings only checks task.ProfileData == nil, while the second pass that generates trace events additionally checks len(task.ProfileData.Events) == 0. This discrepancy causes process_name and thread_name metadata events to be emitted for nodes/workers that have no actual trace events, producing orphaned metadata entries in the Chrome Tracing output.

Additional Locations (1)

historyserver/pkg/eventserver/eventserver.go#L1128-L1131

updated in 0439a98

AndySung320 · 2026-02-04T09:22:52Z

hi @Future-Outlier PTAL ~

machichima · 2026-02-04T11:43:45Z

historyserver/pkg/eventserver/eventserver.go

+			type eventKey struct {
+				EventName string
+				StartTime int64
+				EndTime   int64
+			}


Just curious, only using EventName as key is not enough?

I think this key is sufficient because:

Deduplication is scoped to a single task attempt (TaskID + AttemptNumber already isolate different tasks)

Ray's profiling semantics: a task cannot have two events with the same name at the same time

Consistent with TASK_LIFECYCLE_EVENT deduplication pattern (State + Timestamp)

machichima · 2026-02-04T11:48:30Z

historyserver/pkg/eventserver/eventserver.go

+				StartTime int64
+				EndTime   int64
+			}
+			existingKeys := make(map[eventKey]struct{})


Suggested change

existingKeys := make(map[eventKey]struct{})

existingKeys := make(map[eventKey]struct{}, len(t.ProfileData.Events)+len(rawEvents))

I think we can pre-allocate the size to prevent resizing, as we already know the maximum element we will have

updated in 7249ace

machichima · 2026-02-04T11:52:51Z

historyserver/pkg/eventserver/eventserver.go

+
+			// Extract func_or_class_name from extraData if available
+			for _, e := range rawEvents {
+				if strings.HasPrefix(e.EventName, "task::") && e.ExtraData != "" {


nit: I think we may extract task:: into constant like TaskPrefix or something else, as I saw this in multiple places

Just minor, no need to be in this PR

yea, we can do it in other PR .

machichima · 2026-02-04T11:57:18Z

historyserver/pkg/eventserver/eventserver.go

+		// Skip if clusterID is empty
+		if clusterID == ":" {
+			continue
+		}


Will there be a case that ComponentType has value but ComponentID is empty?

Based on the log that I've seen for now, I haven't seen any case that ComponentType has value while ComponentID is empty and vise versa.
If it did happen, the impact would be minimal (something like a display issue).
I think we can fix it if this issue does occur in the future. what do you think ?

machichima · 2026-02-04T12:01:04Z

historyserver/pkg/eventserver/eventserver.go

+	// Build PID/TID mappings
+	// PID: Node IP -> numeric ID
+	// TID: clusterID (componentType:componentId) -> numeric ID per node


It seems like the PID and TID here are not getting from Ray Event, but created here by counter (0, 1, 2, ...)? Is it what Ray Dashboard do or is it just an alternative here as we currently cannot get PID and TID from Ray exported event?

Yes, PID and TID here are created by counters (0, 1, 2, …), not read from the Ray event payload. This is the same approach Ray Dashboard uses.
Ray’s task profile events do not include pid or tid at all; they only have node_ip_address, component_type, and component_id (WorkerID). The Dashboard’s timeline is built by chrome_tracing_dump() in ray/_private/profiling.py.

The PID logic matches Ray’s, but the TID logic slightly differs: Ray uses globally unique TIDs, while our implementation with Go assigns TIDs per node. Do we need to switch to a globally unique TID to align with Ray?

Yes, I think it's better for us to totally follow how Ray Dashboard API did if possible. While this is minor, I think it's fine to do in a follow-up PR.

You can create a refactor issue to deal with this and following comments:

[Feature][history server] support endpoint /api/v0/tasks/timeline #4437 (comment)

[Feature][history server] support endpoint /api/v0/tasks/timeline #4437 (review)

machichima · 2026-02-04T12:20:04Z

historyserver/pkg/eventserver/eventserver.go

+	}
+}
+
+func extractActorIDFromTaskID(taskIDHex string) string {


Could you provide where you get the rules used in this function (link to the Ray code) for future reference?

sure, i will add a comment above this function.
we can trace back to src/ray/design_docs/id_specification.md which shows TaskID = 8B unique + 16B ActorID; ActorID = 12B unique + 4B JobID.

updated in 7249ace

machichima · 2026-02-04T12:28:16Z

historyserver/pkg/historyserver/router.go

+
+	ws.Route(ws.GET("/v0/tasks/timeline").To(s.getTasksTimeline).Filter(s.CookieHandle).
+		Doc("get tasks timeline").
+		Param(ws.QueryParameter("job_id", "filter by job_id")).


Should we also add download here?

yes, thanks for pointing this out, i will add it

updated in 9f8d4a6

machichima · 2026-02-04T12:29:48Z

historyserver/pkg/historyserver/router.go

+		return
+	}
+
+	resp.Header().Set("Content-Type", "application/json")


Could we use .Produces(restful.MIME_JSON) in the router level instead to follow how other endpoints did?

sure, thanks for the suggestion.

updated in 9f8d4a6

Future-Outlier · 2026-02-05T05:37:12Z

Hi, @AndySung320
do you mind help me merge master?

Signed-off-by: AndySung320 <andysung0320@gmail.com>

…olution Signed-off-by: AndySung320 <andysung0320@gmail.com>

…econd Signed-off-by: AndySung320 <andysung0320@gmail.com>

Signed-off-by: AndySung320 <andysung0320@gmail.com>

Future-Outlier

both dead and live cluster work in my env
can you add tests for query parameters
1. download
2. job_id
For all variables that include DTO in their names, I’d like to replace them.
Could you help open an issue and assign it to yourself?
You can work on it as a separate PR.
trust @machichima and @fscnick 's detailed review
I checked ray's source code, the endpoint's response is the same as ray's dashboard, so high level LGTM

machichima

LGTM! Thank you

Just a comment for follow-up issue: #4437 (comment)

Signed-off-by: AndySung320 <andysung0320@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

cursor · 2026-02-06T08:20:12Z

historyserver/pkg/eventserver/eventserver.go

+			// This shouldn't happen if first pass worked correctly,
+			// but skip to avoid null TID
+			continue
+		}


Inconsistent TID pointer pattern compared to nearby code

Low Severity

The code at line 1152 takes the address of tid directly (tidPtr = &tid) without first copying to a local variable. However, the metadata event generation code just 40 lines above (lines 1113-1118) uses an explicit copy pattern (tidVal := tid) before taking the address. This inconsistency within the same function is a code smell. While Go's escape analysis makes both patterns work correctly, the explicit copy pattern is more defensive and consistent with the existing code style.

Additional Locations (1)

historyserver/pkg/eventserver/eventserver.go#L1112-L1118

updated in 811d9b6

cursor · 2026-02-06T08:20:12Z

historyserver/pkg/eventserver/eventserver.go

+		// Skip if clusterID is empty
+		if clusterID == ":" {
+			continue
+		}


Empty ComponentID check is unreachable due to prior filter

Medium Severity

The check clusterID == ":" is unreachable and provides no protection. Since componentType is filtered to only "worker" or "driver" at lines 1074-1076, clusterID (built as ComponentType + ":" + ComponentID) will always be "worker:..." or "driver:...", never just ":". If ComponentID is empty, tasks would incorrectly share the same TID in timeline visualization (e.g., all workers with empty ComponentID on a node would appear as one thread).

Additional Locations (1)

historyserver/pkg/eventserver/eventserver.go#L1145-L1148

updated in 811d9b6

cursor · 2026-02-06T08:20:12Z

historyserver/pkg/eventserver/eventserver.go

+			}
+			if t.JobID == "" {
+				t.JobID = profileData.JobID
+			}


Missing AttemptNumber initialization in profile event handler

Medium Severity

The TASK_PROFILE_EVENT handler sets TaskID and JobID on the task if empty, but does not set AttemptNumber. While profileData.AttemptNumber is used for storage indexing via CreateOrMergeAttempt, the Task.AttemptNumber field remains at its default value (0). The timeline endpoint later outputs task.AttemptNumber, resulting in incorrect attempt_number values for tasks that only have profile events without a corresponding definition event.

updated in 811d9b6

…er handling Signed-off-by: AndySung320 <andysung0320@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

cursor · 2026-02-06T09:34:15Z

historyserver/pkg/eventserver/eventserver.go

+				t.TaskID = profileData.TaskID
+			}
+			if t.JobID == "" {
+				t.JobID = profileData.JobID


JobID format inconsistency breaks timeline job filtering

High Severity

The TASK_PROFILE_EVENT handler stores profileData.JobID directly without converting it from base64 to hex format. Other event handlers (JOB_DEFINITION_EVENT at line 579, DRIVER_JOB_LIFECYCLE_EVENT at line 639) convert JobID using utils.ConvertBase64ToHex. This means tasks created from TASK_PROFILE_EVENT have base64-encoded JobIDs while the /api/jobs endpoint returns hex-encoded JobIDs. When users filter timeline by job_id from /api/jobs, the string comparison in GetTasksByJobID fails because the formats don't match. The e2e test works around this by manually converting hex to base64, but API consumers would hit this inconsistency.

I think the ID encoding issue will need more discussion,
so we may want to keep the current behavior for now ?

fixed in 701b0c6

… to hex Signed-off-by: AndySung320 <andysung0320@gmail.com>

Signed-off-by: AndySung320 <andysung0320@gmail.com>

AndySung320 marked this pull request as ready for review January 26, 2026 22:22

cursor bot reviewed Jan 26, 2026

View reviewed changes

historyserver/test/support/raycluster.go Outdated Show resolved Hide resolved

historyserver/pkg/eventserver/eventserver.go Outdated Show resolved Hide resolved

AndySung320 force-pushed the feature/history-sever-task-timeline branch from 7889ccb to bca027a Compare January 26, 2026 22:54

cursor bot reviewed Jan 26, 2026

View reviewed changes

historyserver/test/e2e/historyserver_test.go Outdated Show resolved Hide resolved

historyserver/pkg/eventserver/eventserver.go Outdated Show resolved Hide resolved

fscnick reviewed Jan 27, 2026

View reviewed changes

cursor bot reviewed Jan 27, 2026

View reviewed changes

historyserver/pkg/eventserver/eventserver.go Outdated Show resolved Hide resolved

historyserver/pkg/eventserver/eventserver.go Show resolved Hide resolved

cursor bot reviewed Jan 27, 2026

View reviewed changes

historyserver/pkg/eventserver/types/task.go Show resolved Hide resolved

machichima reviewed Jan 28, 2026

View reviewed changes

cursor bot reviewed Jan 28, 2026

View reviewed changes

historyserver/pkg/eventserver/eventserver.go Outdated Show resolved Hide resolved

cursor bot reviewed Jan 28, 2026

View reviewed changes

historyserver/pkg/eventserver/eventserver.go Show resolved Hide resolved

machichima reviewed Jan 29, 2026

View reviewed changes

fscnick reviewed Jan 30, 2026

View reviewed changes

cursor bot reviewed Jan 30, 2026

View reviewed changes

AndySung320 force-pushed the feature/history-sever-task-timeline branch from ddbc3d0 to af41ebf Compare February 4, 2026 05:02

cursor bot reviewed Feb 4, 2026

View reviewed changes

machichima reviewed Feb 4, 2026

View reviewed changes

AndySung320 added 6 commits February 4, 2026 22:04

feat: add ChromeTraceEvent type for timeline API

db17d4f

Signed-off-by: AndySung320 <andysung0320@gmail.com>

feat: implement GetTasksTimeline for /api/v0/tasks/timeline endpoint

3ce0d4d

Signed-off-by: AndySung320 <andysung0320@gmail.com>

feat: add /api/v0/tasks/timeline endpoint

c12d717

Signed-off-by: AndySung320 <andysung0320@gmail.com>

remove logging...

06df4cc

Signed-off-by: AndySung320 <andysung0320@gmail.com>

add e2e test for endpoint api/v0/tasks/timeline

16c372d

Signed-off-by: AndySung320 <andysung0320@gmail.com>

fix GetTasksTimeline, correct accidental deletion during conflict res…

12f65ef

…olution Signed-off-by: AndySung320 <andysung0320@gmail.com>

AndySung320 added 14 commits February 4, 2026 22:04

add comment, correct timestamp conversion from microseconds to millis…

b6d574f

…econd Signed-off-by: AndySung320 <andysung0320@gmail.com>

fix timestamp conversion

48d6dda

Signed-off-by: AndySung320 <andysung0320@gmail.com>

fix: avoid null tid in trace events when workerID is empty

654fb87

Signed-off-by: AndySung320 <andysung0320@gmail.com>

docs: add comments to ChromeTraceEvent struct

6b78b07

Signed-off-by: AndySung320 <andysung0320@gmail.com>

fix logic to match Ray Dashboard implementation

caf17c1

Signed-off-by: AndySung320 <andysung0320@gmail.com>

add download parameter

910ea55

Signed-off-by: AndySung320 <andysung0320@gmail.com>

set Content-Type for JSON response

fbcee06

Signed-off-by: AndySung320 <andysung0320@gmail.com>

fix minor

13fb028

Signed-off-by: AndySung320 <andysung0320@gmail.com>

preserve FuncOrClassName and Name when handling TASK_DEFINITION_EVENT

f0381bf

Signed-off-by: AndySung320 <andysung0320@gmail.com>

update getTasksTimeline to use clusterSessionKey

0d6e6c5

Signed-off-by: AndySung320 <andysung0320@gmail.com>

minor fix of extractActorIDFromTaskID

8e26017

Signed-off-by: AndySung320 <andysung0320@gmail.com>

Fix inconsistent filtering of tasks without profile events

b6aee6a

Signed-off-by: AndySung320 <andysung0320@gmail.com>

add comment for extractActorIDFromTaskID

fe2674d

Signed-off-by: AndySung320 <andysung0320@gmail.com>

Declare JSON response type at router level

ce1a652

Signed-off-by: AndySung320 <andysung0320@gmail.com>

AndySung320 force-pushed the feature/history-sever-task-timeline branch from 9f8d4a6 to ce1a652 Compare February 5, 2026 06:05

Future-Outlier approved these changes Feb 5, 2026

View reviewed changes

Future-Outlier added this to @Future-Outlier's kuberay project Feb 5, 2026

Future-Outlier moved this to can be merged in @Future-Outlier's kuberay project Feb 5, 2026

Future-Outlier moved this from can be merged to others in @Future-Outlier's kuberay project Feb 5, 2026

machichima approved these changes Feb 5, 2026

View reviewed changes

Copilot AI mentioned this pull request Feb 5, 2026

Review all open pull requests #4482

Closed

4 tasks

JiangJiaWei1103 mentioned this pull request Feb 5, 2026

[Feat] [history server] Add actor task endpoint #4463

Open

4 tasks

AndySung320 added 2 commits February 6, 2026 00:02

e2e: add timeline endpoint tests for job_id filtering and download mode

5441ec8

Signed-off-by: AndySung320 <andysung0320@gmail.com>

modify test comment

cc5eb86

Signed-off-by: AndySung320 <andysung0320@gmail.com>

cursor bot reviewed Feb 6, 2026

View reviewed changes

fix task attempt number, unreachable ComponentID check, and TID point…

811d9b6

…er handling Signed-off-by: AndySung320 <andysung0320@gmail.com>

cursor bot reviewed Feb 6, 2026

View reviewed changes

AndySung320 added 2 commits February 6, 2026 12:46

fix id conversion (task_id, job_id, worker_id, driver_id) from base64…

701b0c6

… to hex Signed-off-by: AndySung320 <andysung0320@gmail.com>

remove hexToBase64 fuction

ba135b0

Signed-off-by: AndySung320 <andysung0320@gmail.com>

AndySung320 mentioned this pull request Feb 6, 2026

[Refactor][history server] align timeline conventions with Ray Dashboard #4488

Open

2 tasks

	var rawEvents []types.ProfileEventRaw
	var rawEvents = make([]types.ProfileEventRaw, 0, len(profileData.ProfileEvents.Events))

		args, ok := event["args"].(map[string]any)
		gg.Expect(ok).To(BeTrue(), "process_name should have args")

		@@ -223,11 +224,23 @@ func (h *EventHandler) storeEvent(eventMap map[string]any) error {
		taskMap.CreateOrMergeAttempt(currTask.TaskID, currTask.AttemptNumber, func(t *types.Task) {
		// Merge definition fields (preserve existing Events if any)

		StartTime int64 `json:"start_time"` // nanoseconds
		EndTime int64 `json:"end_time"` // nanoseconds

	existingKeys := make(map[eventKey]struct{})
	existingKeys := make(map[eventKey]struct{}, len(t.ProfileData.Events)+len(rawEvents))

Conversation

AndySung320 commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

AndySung320 commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Missing Profile Events

Additional Events

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

machichima Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fscnick Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 30, 2026

Choose a reason for hiding this comment

Empty string check for clusterID is always false

Uh oh!

Choose a reason for hiding this comment

AndySung320 commented Jan 25, 2026 •

edited

Loading

AndySung320 commented Jan 25, 2026 •

edited

Loading

machichima Jan 28, 2026 •

edited

Loading

fscnick Jan 30, 2026 •

edited

Loading