BUG(pydarshan): `id` field in `to_df()` output should be `uint64` instead of `int64`


TL;DR: The id field in the DataFrame generated by `DarshanRecordCollection.to_df()` should be of type `uint64` instead of `int64`.

Currently, the Pandas DataFrames created from PyDarshan POSIX records (via `report.records[modulename].to_df()`) represent the `id` field as a signed 64-bit integer (`int64`). However, this field originates from the Darshan C library, which uses a hash function (`darshan_hash()`) that returns values of type unsigned long long—corresponding to a 64-bit unsigned integer (`uint64`).

To maintain fidelity with the underlying data representation and avoid potential issues with signed integer overflow or interpretation, it would be more appropriate to cast the `id` field to `uint64` in the resulting DataFrame.


References:

Implementation of darshan_hash: [darshan-util/darshan-common.c](https://github.com/darshan-hpc/darshan/blob/main/darshan-util/darshan-common.c)

Header for darshan_hash: [darshan-util/darshan-common.h](https://github.com/darshan-hpc/darshan/blob/main/darshan-util/darshan-common.h)

Slack: [Issue](https://darshan-io.slack.com/archives/C04JF57PH9P/p1744668496511629), [Response](https://darshan-io.slack.com/archives/C04JF57PH9P/p1744727393390139) from @shanedsnyder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG(pydarshan): `id` field in `to_df()` output should be `uint64` instead of `int64` #1031

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG(pydarshan): id field in to_df() output should be uint64 instead of int64 #1031

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

BUG(pydarshan): `id` field in `to_df()` output should be `uint64` instead of `int64` #1031