Skip to content

Revert "trace_processor: Parse adreno_cmdbatch_retired/submitted events (#5519)"#5793

Merged
LalitMaganti merged 1 commit into
mainfrom
dev/lalitm/revert
May 8, 2026
Merged

Revert "trace_processor: Parse adreno_cmdbatch_retired/submitted events (#5519)"#5793
LalitMaganti merged 1 commit into
mainfrom
dev/lalitm/revert

Conversation

@LalitMaganti
Copy link
Copy Markdown
Member

This reverts commit ffaeb0c.

Unfortunately this is necessary as this patch does not work on all QCOM
devices: concretely, the events seem desynced from the render stage events
which shouldn't really happen. Moreover, this also causes the raw ftrace event
to be shifted which internal teams did not expect and were not happy with.

…ts (#5519)"

This reverts commit ffaeb0c.

Unfortunately this is necessary as this patch does *not* work on all QCOM
devices: concretely, the events seem desynced from the render stage events
which shouldn't really happen. Moreover, this also causes the raw ftrace event
to be shifted which internal teams did not expect and were not happy with.
@LalitMaganti LalitMaganti requested a review from a team as a code owner May 8, 2026 22:37
@LalitMaganti
Copy link
Copy Markdown
Member Author

@rossning92 unfortunately due to some issues raised by internal clients (cc @batesj), we discovered that not all QCOM events seem to be emitting the correct timestamps in the start field and that's causing desync between the render stage events and these events which is just confusing for everyone.

If we want this patch, we should figure out how to detect whether these events are reliable or not based on the device/board etc and then make a decision on shifting the timestamps of these events

@LalitMaganti LalitMaganti requested a review from sashwinbalaji May 8, 2026 22:40
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

🎨 Perfetto UI Builds

@LalitMaganti LalitMaganti enabled auto-merge (squash) May 8, 2026 22:44
@LalitMaganti LalitMaganti merged commit eb7e77f into main May 8, 2026
24 checks passed
@LalitMaganti LalitMaganti deleted the dev/lalitm/revert branch May 8, 2026 23:03
@rossning92
Copy link
Copy Markdown
Contributor

rossning92 commented May 9, 2026

Thanks for letting me know @LalitMaganti @batesj! Do you happen to know which Qualcomm SoC has the desync issue or if there is a trace I can take a look.? I'll try to debug on my end.

@LalitMaganti
Copy link
Copy Markdown
Member Author

@batesj would be best to answer there, I'm not actually sure myself.

@batesj
Copy link
Copy Markdown
Contributor

batesj commented May 9, 2026

@rossning92 Tested on a XR2+ Gen2 device and the new events were about two frames later than the corresponding Qcom renderstage GPU events. On a different device there was also an offset and additionally seemed to be some events at the start of the trace that were duplicated in the new GPU ftrace tracks.

@rossning92
Copy link
Copy Markdown
Contributor

rossning92 commented May 9, 2026

Thanks @batesj! I tested all the way from XR2Gen1 to XR2Gen3 devices, and both the render stage trace and the command batch trace align quite well (the difference is mostly on the order of dozens of microseconds). Since the kernel reporting is quite straightforward - it just emits the gpu active counter register and the kernel monotonic time at the same time for CPU <-> CPU sync via adreno_cmdbatch_submitted in events - so i had bit of hard time figuring out what might go wrong (e.g., could it be an incorrect timestamp or duplicated events).
image

If possible, would you be willing to share your trace? that would make it much easier for me to debug this.

@batesj
Copy link
Copy Markdown
Contributor

batesj commented May 11, 2026

@rossning92 Your renderstage events look different from ours so I'm not sure what other differences there are (for us the parent event is named just "Surface" instead of your "surface#0 ..."). Are your renderstage events coming from the Qcom Qprof integration that runs in each process with a GLES/Vulkan context, via this perfetto config?

data_sources {
  config {
    name: "gpu.renderstages"
    gpu_renderstages_config { ... }
  }
}

As a quick test, let's compare what you get in your traces with this query. Do the generated events line up with your GPU Cmdbatch events?

/*
 * Copyright 2025 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
WITH
  kgsl_gpu AS (
    WITH
      raw_events AS (SELECT * FROM ftrace_event WHERE name LIKE '%adreno_%'),
      with_args AS (
        SELECT
          raw_events.*,
          args.arg_set_id,
          min(CASE key WHEN 'active' THEN CAST(display_value AS int) END) AS active,
          min(CASE key WHEN 'retire' THEN CAST(display_value AS int) END) AS retire,
          min(CASE key WHEN 'retired_on_gmu' THEN CAST(display_value AS int) END) AS retired_on_gmu,
          min(CASE key WHEN 'start' THEN CAST(display_value AS int) END) AS start,
          min(CASE key WHEN 'submitted_to_rb' THEN CAST(display_value AS int) END)
            AS submitted_to_rb,
          min(CASE key WHEN 'ticks' THEN CAST(display_value AS int) END) AS ticks,
          min(CASE key WHEN 'timestamp' THEN CAST(display_value AS int) END) AS gpu_queue_id
        FROM raw_events
        JOIN args
          ON args.arg_set_id = raw_events.arg_set_id
        GROUP BY raw_events.arg_set_id
        ORDER BY raw_events.ts
      ),
      with_queue_id AS (
        SELECT
          *,
          max(
            CASE
              WHEN (name IS 'adreno_cmdbatch_queued' OR name IS 'kgsl_adreno_cmdbatch_queued')
                THEN id
              ELSE NULL
              END)
            OVER (PARTITION BY gpu_queue_id ORDER BY ts RANGE BETWEEN 1e9 PRECEDING AND CURRENT ROW)
            AS queue_id
        FROM with_args
        ORDER BY ts
      ),
      with_queue_info AS (
        SELECT
          queue_id,
          process.name AS process,
          thread.name AS thread,
          process.upid,
          process.pid,
          thread.utid,
          thread.tid
        FROM with_queue_id
        LEFT JOIN thread
          ON
            with_queue_id.utid = thread.utid
            AND (
              with_queue_id.name IS 'adreno_cmdbatch_queued'
              OR with_queue_id.name IS 'kgsl_adreno_cmdbatch_queued')
        LEFT JOIN process
          ON thread.upid = process.upid
        WHERE
          (
            with_queue_id.name IS 'adreno_cmdbatch_queued'
            OR with_queue_id.name IS 'kgsl_adreno_cmdbatch_queued')
          AND queue_id IS NOT NULL
        ORDER BY queue_id
      ),
      with_process_info AS (
        SELECT with_queue_id.*, process, thread, pid, tid
        FROM with_queue_id
        LEFT JOIN with_queue_info
          ON with_queue_id.queue_id = with_queue_info.queue_id
        WHERE with_queue_id.queue_id IS NOT NULL
        ORDER BY with_queue_id.ts
      )
    SELECT
      sync.queue_id,
      sync.process AS process,
      sync.thread AS thread,
      sync.pid AS pid,
      sync.tid AS tid,
      CAST(retired.active * 52.0833 AS int) AS active_dur,
      CAST((retired.retire - retired.start) * 52.0833 AS int) AS total_dur,
      sync.ts AS sync_ts,
      CAST(sync.ts + (retired.submitted_to_rb - sync.ticks) * 52.0833 AS int) AS submit_ts,
      CAST(sync.ts + (retired.start - sync.ticks) * 52.0833 AS int) AS start_ts,
      CAST(sync.ts + (retired.retire - sync.ticks) * 52.0833 AS int) AS end_ts,
      CAST(sync.ts + (retired.retired_on_gmu - sync.ticks) * 52.0833 AS int) AS exit_gpu_ts
    FROM with_process_info sync
    JOIN with_process_info retired
      ON
        sync.queue_id = retired.queue_id
        AND (sync.name IS 'adreno_cmdbatch_sync' OR sync.name IS 'kgsl_adreno_cmdbatch_sync')
        AND (
          retired.name IS 'adreno_cmdbatch_retired'
          OR retired.name IS 'kgsl_adreno_cmdbatch_retired')
    ORDER BY start_ts
  )
SELECT *, start_ts AS ts, total_dur AS dur, concat(thread, ' (', process, ')') AS name FROM kgsl_gpu

@rossning92
Copy link
Copy Markdown
Contributor

rossning92 commented May 11, 2026

event is named just "Surface" instead of your "surface#0 ..

@batesj ah I used Meta's integration based on qprof, but the underlying logic is the same should be same all based on Qualcomm's closed-source profiling library.

I just redid the trace using the native renderstage event along with the SQL query you shared, and here are the results; they seem to align well:

image

If you could share the trace that would be extremely helpful - hopefully it's just a trivial bug!

@batesj
Copy link
Copy Markdown
Contributor

batesj commented May 11, 2026

First let's look at some images showing the discrepancies that I'm seeing. If this doesn't help narrow down the problem for you let me know and I'll try to produce some sharable traces:

Before this CL on gen2:
image

After:
image

Before on newer:
image

After:
image

Also note that we're talking about SoC versions but not driver versions which may also differ in how they produce both kgsl and renderstage data.

@rossning92
Copy link
Copy Markdown
Contributor

rossning92 commented May 12, 2026

@batesj this was super helpful!

I think I may know what is going on. Originally, in my PR, I used only the first adreno_cmdbatch_submitted's secs/usecs timestamp and GPU ticks as GPU<->CPU sync point. Whereas in your sql query, each adreno_cmdbatch_sync event provides a unique GPU<->CPU sync point which should eliminate any drift happening between cmdbatches.

I opened a new PR for testing: #5817 wonder what might be the best way for us to testing this. If you have a couple of traces to share (even with only kgsl and renderstage events), I would be super happy to run through them on my end!

image

LalitMaganti added a commit that referenced this pull request May 13, 2026
…ts (#5519)" (#5793)

This reverts commit ffaeb0c.

Unfortunately this is necessary as this patch does *not* work on all
QCOM
devices: concretely, the events seem desynced from the render stage
events
which shouldn't really happen. Moreover, this also causes the raw ftrace
event
to be shifted which internal teams did not expect and were not happy
with.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants