Open
Description
Check duplicate issues.
- Checked for duplicates
Description
When input files to RDataFrame
are remote files, force caching of remote files does not work and the remote files will be downloaded every time.
Reproducer
import os
import ROOT
user = os.environ['USER']
outdir = f"/eos/user/{user[0]}/{user}"
filename = os.path.join(outdir, "test.root")
# create dummy root file
ROOT.RDataFrame(100).Define("x", "1").Snapshot("test", filename)
ROOT.TFile.SetCacheFileDir("/tmp", True, True)
# this does not trigger loading of cached root file
ROOT.RDataFrame("test", f"root://eosuser.cern.ch/{filename}").Sum("x").GetValue()
This is because internally RDataFrame will create a TChain using ROOT.Internal.TreeUtils.MakeChainForMT(treename)
, which creates a TChain
object with the mode ROOT.TChain.kWithoutGlobalRegistration
. This in turn forces the TFile open option to be "READ_WITHOUT_GLOBALREGISTRATION". This causes the TFile to be opened without caching since it only checks the fgCacheFileForce flag when option is "READ"
ROOT version
6.30/04 (LCG105a)
Installation method
LCG (Swan)
Operating system
Linux
Additional context
No response