Skip to content

Python Pandas Style Guide

Ian edited this page May 8, 2018 · 11 revisions

Generally the PEP 8 Style Guide is a good reference; look into a PEP linting program for your editor. Here are some additional hints specific to Pandas.

1. Give names to your columns

read_counts[row, 1] is harder to understand than read_counts[row, SAMPLE_ID].

This also brings up the point of defining column header strings in CONSTANT_VARIABLES, so that the same string value can be referenced from multiple places in the code.

2. Avoid df = pd.DataFrame({...})

I'm not 100% sure on this one, but it feels like there's rarely a case when you need to create a new data frame in the code. Data will almost always come from other existing sources, and can usually be turned into whatever format you want through a combination of reshaping, grouping, merging, or subsetting.

3. For loops are so 00s

Instead of:

for method in all_methods:
    curr_table = tbl[tbl['method'] == method]
    for sample in all_samples:
        curr_loc = curr_table['sample'] == sample
        curr_avg = np.mean(curr_table.loc[curr_loc, 'coverage'])

Try:

df.groupby(['method', 'sample']).mean()

Code is good, but in general, less code, and less indentation is probably preferable

Clone this wiki locally