Skip to content

Feed back numpy data to RDF graph, in batches #16739

Open
@arizzi

Description

@arizzi

Feature description

As discussed during CHEP24, it would be very useful to have the option
to not only extract data from RDF graph in numpy/torch/tf format but
also to be able to feed back into RDF the data in batches (e.g. for NN
inference not supported in SOPHIE)

When exporting/importing it would be useful to have the option to
explode/flatten vecops of same length.

Pseudo code example:

def processBatch(nparray)
    #do something with pyTorch
    ...
    return outTensor

rdf.BatchProcess(inputCols={"Jet_pt","Jet_eta","Jet_mass","MET_pt"},
outputVectorCols={"Jet_regressedPt", "Jet_regressedMass"},
outputScalarCols={}, processBatch,
batchSize=100000,flattenRVec=True,broadCastScalars=True)

Alternatives considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

experimentAffects an experiment / reported by its software & computimng expertsin:RDataFramenew feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions