-
In the getting started guide it says that nested structures will automatically be flattened into a DataFrame with MultiIndex when importing using Pandas. This does not seem to be the case any more: df = structured_tree.arrays(["NMuon", "Muon_Px", "Muon_Py", "Muon_Pz"], library="pd")
print(df)
So I guess that should be updated and maybe an example given how to use the awkward accessor on the structured columns. Assuming that that is the intended way of going about things. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 9 replies
-
You're right: this is a new feature and the docs are out of date. Thanks for the heads-up! What's happening now is that any non-flat data uses the Awkward dtype provided by awkward-pandas. If you want to explode the data as before, you can get it as an Awkward Array and use ak.to_dataframe: >>> import uproot
>>> import awkward as ak
>>> ak.to_dataframe(events.arrays(filter_name="/(Jet|Muon)_P[xyz]/", library="ak"))
Jet_Px Jet_Py Jet_Pz Muon_Px Muon_Py Muon_Pz
entry subentry
1 0 -38.874714 19.863453 -0.894942 -0.816459 -24.404259 20.199968
3 0 -71.695213 93.571579 196.296432 22.088331 -85.835464 403.848450
1 36.606369 21.838793 91.666283 76.691917 -13.956494 335.094208
4 0 3.880162 -75.234055 -359.601624 45.171322 67.248787 -89.695732
1 4.979580 -39.231731 68.456718 39.750957 25.403667 20.115053
... ... ... ... ... ... ...
2414 0 33.961163 58.900467 -17.006561 -9.204197 -42.204014 -64.264900
2416 0 37.071465 20.131996 225.669037 -39.285824 -14.607491 61.715790
2417 0 -33.196457 -59.664749 -29.040150 35.067146 -14.150043 160.817917
2418 0 -3.714818 -37.202377 41.012222 -29.756786 -15.303859 -52.663750
2419 0 -36.361286 10.173571 226.429214 1.141870 63.609570 162.176315
[2038 rows x 6 columns] |
Beta Was this translation helpful? Give feedback.
-
Oops, I take it back. Something weird is going on: structured_tree = uproot.open("HZZ.root")["events"]
df = structured_tree.arrays(["NMuon", "Muon_Px", "Muon_Py", "Muon_Pz"], library='pd')
with open("structured_df.txt", "w") as f:
print(df, file=f)
import awkward as ak
arr = structured_tree.arrays(["NMuon", "Muon_Px", "Muon_Py", "Muon_Pz"])
df = ak.to_dataframe(arr)
with open("flattened_df.txt", "w") as f:
print(df, file=f)
idx = pd.IndexSlice
df = df.loc[idx[:,0], :]
with open("sliced_df.txt", "w") as f:
print(df, file=f)
df = structured_tree.arrays(["NMuon", "Muon_Px", "Muon_Py", "Muon_Pz"], library="pd")
df = df[df.NMuon > 0]
df["Muon_Px"] = df.Muon_Px.ak[:,0]
df["Muon_Py"] = df.Muon_Py.ak[:,0]
df["Muon_Pz"] = df.Muon_Pz.ak[:,0]
with open("sliced_alt_df.txt", "w") as f:
print(df, file=f)
It looks like things work fine at the beginning, but then things go awkward (pun intended). The last And it gets even weirder when I use the I think I have the newest versions of awkward installed too:
|
Beta Was this translation helpful? Give feedback.
-
Right, I forgot about the Just my alternative way of using the df = structured_tree.arrays(["NMuon", "Muon_Px", "Muon_Py", "Muon_Pz"], library="pd")
df = df[df.NMuon > 0]
df["Muon_Px"] = df.Muon_Px.ak[:,0]
df["Muon_Py"] = df.Muon_Py.ak[:,0]
df["Muon_Pz"] = df.Muon_Pz.ak[:,0]
print(df)
And if I do not filter out the muonless events I would get an indexing error. There can be rows without muons in the structured df = structured_tree.arrays(["NMuon", "Muon_Px", "Muon_Py", "Muon_Pz"], library="pd")
df = df[df.NMuon == 0]
print(df)
So I guess this is a bug in |
Beta Was this translation helpful? Give feedback.
You're right: this is a new feature and the docs are out of date. Thanks for the heads-up!
What's happening now is that any non-flat data uses the Awkward dtype provided by awkward-pandas.
If you want to explode the data as before, you can get it as an Awkward Array and use ak.to_dataframe: