Skip to content

[RDataFrame] Unable to cacheread remote file #15028

Open
@AlkaidCheng

Description

@AlkaidCheng

Check duplicate issues.

  • Checked for duplicates

Description

When input files to RDataFrame are remote files, force caching of remote files does not work and the remote files will be downloaded every time.

Reproducer

import os
import ROOT

user = os.environ['USER']
outdir = f"/eos/user/{user[0]}/{user}"
filename = os.path.join(outdir, "test.root")
# create dummy root file
ROOT.RDataFrame(100).Define("x", "1").Snapshot("test", filename)

ROOT.TFile.SetCacheFileDir("/tmp", True, True)
# this does not trigger loading of cached root file
ROOT.RDataFrame("test", f"root://eosuser.cern.ch/{filename}").Sum("x").GetValue()

This is because internally RDataFrame will create a TChain using ROOT.Internal.TreeUtils.MakeChainForMT(treename), which creates a TChain object with the mode ROOT.TChain.kWithoutGlobalRegistration. This in turn forces the TFile open option to be "READ_WITHOUT_GLOBALREGISTRATION". This causes the TFile to be opened without caching since it only checks the fgCacheFileForce flag when option is "READ"

ROOT version

6.30/04 (LCG105a)

Installation method

LCG (Swan)

Operating system

Linux

Additional context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions