bamboo.py

Summary

A handful of utilities monkey-patched onto the pandas DataFrame class. Some were written before I properly understood pandas, and should be obsoleted.

Dependencies

Required: pandas, pudzu-utils.

Optional: pyparsing (for filter expressions), tqdm (for progress bars).

Documentation

>> df
   children   name     surname
0         1   Fred  Flintstone
1         2  Wilma  Flintstone
2        15   Dino         NaN

standalone functions

pd_print: print a value using the given pandas display. options (e.g. min_rows=60)

>> pd_print(df, max_rows=2)
    children  name     surname
0          1  Fred  Filntstone
..       ...   ...         ...
2         15  Dino         NaN

DataFrame

filter_rows: filter rows by a row/index predicate or a filter expression (see FilterExpression docstring for details). Less efficient than boolean indexing.

>> df.filter_rows(lambda r: r['name'].startswith("F"))
   children  name     surname
0         2  Fred  Flintstone
>> df.filter_rows(lambda r, i: i % 2 == 0)
   children  name     surname
0         1  Fred  Flintstone
2        15   Dino         NaN
>> df.filter_rows("name=Fred or children>2")
   children  name     surname
0         1  Fred  Flintstone
2        15  Dino         NaN
>> df.filter_rows("*name~'^F'") # field wildcard and regex match
   children   name     surname
0         1   Fred  Flintstone
1         2  Wilma  Flintstone

assign_rows: assign or update columns using a row/index function or constant, with an optional row/index predicate condition. Supports progress bars using tqdm.

>> df.assign_rows(assign_if="not surname:exists", pups=lambda r: r["children"], children=None)
   children   name     surname  pups
0       1.0   Fred  Flintstone   NaN
1       2.0  Wilma  Flintstone   NaN
2       NaN   Dino         NaN  15.0
>> df.assign_rows(assign_if="not surname:exists", surname=prompt_for_value(prompt=lambda r: r["name"]))
[Dino] = Snorkasaurus
   children   name       surname
0         1   Fred    Flintstone
1         2  Wilma    Flintstone
2        15   Dino  Snorkasaurus

update_columns: update existing columns using a value function or constant, with an optional value predicate condition. Supports progress bars using tqdm.

>> df.update_columns(update_if=True, surname=str.upper)
   children   name     surname
0         1   Fred  FLINTSTONE
1         2  Wilma  FLINTSTONE
2        15   Dino         NaN

groupby_rows: group rows using a row function, map, list or column name.

>> df.groupby_rows(lambda r: len(r['name'])).count()
   children  name  surname
4         2     2        1
5         1     1        1

split_columns: split column string values on a given delimiter.

>> df.assign(children=["Pebbles","Pebbles,Stony", np.nan])
        children   name     surname
0        Pebbles   Fred  Flintstone
1  Pebbles,Stony  Wilma  Flintstone
2            NaN   Dino         NaN
>> _.split_columns("children", ",")
           children   name     surname
0         (Pebbles)   Fred  Flintstone
1  (Pebbles, Stony)  Wilma  Flintstone
2                ()   Dino         NaN

explode_to_columns: explode column sequence values to multiple columns.

>> _.explode_to_columns("children")
           children   name     surname children_0 children_1
0        (Pebbles,)   Fred  Filntstone    Pebbles        NaN
1  (Pebbles, Stony)  Wilma  Flintstone    Pebbles      Stony
2                ()   Dino         NaN        NaN        NaN

combine_columns: combine columns into a tuple, ignoring NaNs and Nones, returning a series.

>> _.combine_columns(["children_1", "children_0"])
0          (Pebbles,)
1    (Stony, Pebbles)
2                  ()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bamboo.py

Summary

Dependencies

Documentation

standalone functions

DataFrame

FilesExpand file tree

bamboo.md

Latest commit

History

bamboo.md

File metadata and controls

bamboo.py

Summary

Dependencies

Documentation

standalone functions

DataFrame