Skip to content

Need a way to define new columns through functions from python #17885

Open
@acampove

Description

@acampove

Feature description

Hi,

We need something like:

variable_computer = VariableComputer()
rdf = rdf.Define('var', variable_computer )

where variable_computer is an instance of a python class that operates on each entry of the dataframe and returns a float. That float is var. E.g.

class VariableComputer:
    def __call__(self, entry):
        return entry.x + entry.y

i imagine that should be doable. In pandas I can do:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    "x": [i * 0.1 for i in range(10)],  # x = 0.0, 0.1, ..., 0.9
    "y": [i * 0.2 for i in range(10)]   # y = 0.0, 0.2, ..., 1.8
})

# Define a function with complex logic
def custom_logic(row):
    return row["x"]**2 + row["y"]**2 > 1.0  # Example: Outside the unit circle

# Apply the function row-wise
df["is_outside_circle"] = df.apply(custom_logic, axis=1)

# Display the DataFrame
print(df)

the logic can be a function too, as in pandas. In general, could you try to follow pandas design and check what they do? Currently the only reasons why I use ROOT dataframes are:

  • I can do multithreading. Which libraries like this let me do in pandas anyway.
  • I can read ROOT files directly. Which I might be able to get around with by using uproot.

Alternatives considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions