Python Pandas Style Guide

Generally the PEP 8 Style Guide is a good reference; look into a PEP linting program for your editor. Here are some additional hints specific to Pandas.

1. Give names to your columns

read_counts[row, 1] is harder to understand than read_counts[row, SAMPLE_ID].

This also brings up the point of defining column header strings in CONSTANT_VARIABLES, so that the same string value can be referenced from multiple places in the code.

2. Avoid `df = pd.DataFrame({...})`

I'm not 100% sure on this one, but it feels like there's rarely a case when you need to create a new data frame in the code. Data will almost always come from other existing sources, and can usually be turned into whatever format you want through a combination of reshaping, grouping, merging, or subsetting.

3. For loops are so 00s

Instead of:

for method in all_methods:
    curr_table = tbl[tbl['method'] == method]
    for sample in all_samples:
        curr_loc = curr_table['sample'] == sample
        curr_avg = np.mean(curr_table.loc[curr_loc, 'coverage'])

Try:

df.groupby(['method', 'sample']).mean()

Code is good, but in general, less code, and less indentation is probably preferable

Footer is such a weird word. Footer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python Pandas Style Guide

1. Give names to your columns

2. Avoid `df = pd.DataFrame({...})`

3. For loops are so 00s

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Python Pandas Style Guide

1. Give names to your columns

2. Avoid df = pd.DataFrame({...})

3. For loops are so 00s

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

2. Avoid `df = pd.DataFrame({...})`