-
Notifications
You must be signed in to change notification settings - Fork 3
Python Pandas Style Guide
Generally the PEP 8 Style Guide is a good reference; look into a PEP linting program for your editor. Here are some additional hints specific to Pandas.
read_counts[row, 1]
is harder to understand than read_counts[row, SAMPLE_ID]
.
This also brings up the point of defining column header strings in CONSTANT_VARIABLES, so that the same string value can be referenced from multiple places in the code.
I'm not 100% sure on this one, but it feels like there's rarely a case when you need to create a new data frame in the code. Data will almost always come from other existing sources, and can usually be turned into whatever format you want through a combination of reshaping, grouping, merging, or subsetting.
Instead of:
for method in all_methods:
curr_table = tbl[tbl['method'] == method]
for sample in all_samples:
curr_loc = curr_table['sample'] == sample
curr_avg = np.mean(curr_table.loc[curr_loc, 'coverage'])
Try:
df.groupby(['method', 'sample']).mean()
Code is good, but in general, less code, and less indentation is probably preferable
Footer is such a weird word. Footer.