("yoots"): utilities augmenting the Python standard library; processes, Pytest, Pandas, Plotly, …
- Install
- Import:
from utz import * - Modules
utz.proc:subprocesswrappers; shell out commands, parse outpututz.collections: collection/list helpersutz.env:os.environwrapper +contextmanagerutz.fn: decorator/function utilitiesutz.jsn:JsonEncoderfor datetimes,dataclassesutz.context:{async,}contextmanagerhelpersutz.cli:clickhelpersutz.mem: memray wrapperutz.time:Timetimer,now/todayhelpersutz.size:humanize.naturalsizewrapperutz.hash_file: hash file contentsutz.ym:YM(year/month) classutz.cd: "change directory" contextmanagersutz.gist: GitHub Gist operationsutz.gzip: deterministic GZip helpersutz.s3: S3 utilitiesutz.plot: Plotly helpersutz.setup:setup.pyhelperutz.version: runtime package version with git hashutz.test:dataclasstest cases,raiseshelperutz.docker,utz.tmpdir, etc.
- Examples / Users
pip install utz- Requires Python 3.10+
utzhas one dependency,stdlb(wild-card standard library imports).- "Extras" provide optional deps (e.g. Pandas, Plotly, …).
I often import utz.* in Jupyter notebooks:
from utz import *This imports most standard library modules/functions (via stdlb), as well as the utz members below.
You can also import utz.* during Python REPL startup:
cat >~/.pythonrc <<EOF
try:
from utz import *
err("Imported utz")
except ImportError:
err("Couldn't find utz")
EOF
export PYTHONSTARTUP=~/.pythonrc
# Configure for Python REPL in new Bash shells:
echo 'export PYTHONSTARTUP=~/.pythonrc' >> ~/.bashrcHere are a few utz modules, in rough descending order of how often I use them:
utz.proc: subprocess wrappers; shell out commands, parse output
from utz.proc import *
# Run a command
run('git', 'commit', '-m', 'message') # Commit staged changes
# Passing a single string implies `shell=True` (for all functions listed here)
# Return `list[str]` of stdout lines
lines('git log -n5 --format=%h') # Last 5 commit SHAs
# Verify exactly one line of stdout, return it
line('git log -1 --format=%h') # Current HEAD commit SHA
# Return stdout as a single string
output('git log -1 --format=%B') # Current HEAD commit message
# Pass input to stdin
line('git mktree', input=b'100644 blob abc123\tfile.txt\n') # Create git tree from stdin
# Check whether a command succeeds, suppress output
check('git diff --exit-code --quiet') # `True` iff there are no uncommitted changes
# Nested arrays are flattened (for all commands above):
check(['git', 'diff', ['--exit-code', '--quiet']])
err("This will be output to stderr")
# Execute a "pipeline" of commands
pipeline(['seq 10', 'head -n5']) # '1\n2\n3\n4\n5\n'See also: test_proc.py.
utz.proc.aio: async subprocess wrappers
Async versions of most utz.proc helpers are also available:
from utz.proc.aio import *
import asyncio
from asyncio import gather
async def test():
_1, _2, _3, nums = await gather(*[
run('sleep', '1'),
run('sleep', '2'),
run('sleep', '3'),
lines('seq', '10'),
])
return nums
asyncio.run(test())
# ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']utz.collections: collection/list helpers
from utz import *
# Verify a collection has one element, return it:
singleton(["aaa"]) # ✅ "aaa"
singleton({'a': 1}) # ✅ ('a', 1); works on `dict`s`
singleton([("aaa",), ("aaa",)]) # ✅ ("aaa",); dedupes by default (elems must be hashable)
singleton(["aaa", "bbb"]) # ❌ `raise utz.collections.Expected1FoundN("2 elems found: bbb,aaa")`
# `solo`, `one`, and `e1` are aliases for `singleton`:
solo(["aaa"]) # "aaa"
one(["aaa"]) # "aaa"
e1(["aaa"]) # "aaa"
# Filter by a predicate
one([2, 3, 4], lambda n: n % 2) # 3
one([{'a': 1}, {'b': 2}], lambda o: 'a' in o) # {'a': 1}See also: test_collections.py.
utz.env: os.environ wrapper + contextmanager
from utz import env, os
# Temporarily set env vars
with env(FOO='bar'):
assert os.environ['FOO'] == 'bar'
assert 'FOO' not in os.environThe env() contextmanager also supports configurable on_conflict and on_exit kwargs, for handling env vars that were patched, then changed while the context was active.
See also: test_env.py.
utz.fn: decorator/function utilities
from utz import decos
from click import option
common_opts = decos(
option('-n', type=int),
option('-v', is_flag=True),
)
@common_opts
def subcmd1(n: int, v: bool):
...
@common_opts
def subcmd2(n: int, v: bool):
...from utz import call, wraps
def fn1(a, b):
...
@wraps(fn1)
def fn2(a, c, **kwargs):
...
kwargs = dict(a=11, b='22', c=33, d=44)
call(fn1, **kwargs) # passes {a, b}, not {c, d}
call(fn2, **kwargs) # passes {a, b, c}, not {d}See also: test_fn.py.
utz.jsn: JsonEncoder for datetimes, dataclasses
from utz import dataclass, Encoder, fromtimestamp, json # Convenience imports from standard library
epoch = fromtimestamp(0)
print(json.dumps({ 'epoch': epoch }, cls=Encoder))
# {"epoch": "1969-12-31 19:00:00"}
print(json.dumps({ 'epoch': epoch }, cls=Encoder("%Y-%m-%d"), indent=2))
# {
# "epoch": "1969-12-31"
# }
@dataclass
class A:
n: int
print(json.dumps(A(111), cls=Encoder))
# {"n": 111}See test_jsn.py for more examples.
utz.context: {async,}contextmanager helpers
ctxs: composecontextmanagersactxs: composeasynccontextmanagerswith_exit_hook: wrap acontextmanager's__exit__method in anothercontextmanager
utz.cli provides wrappers around click.option for parsing common option formats:
@count: "count" options, including optional value mappings (e.g.-v→ "info",-vv→ "debug")@multi: parse comma-delimited values (or other delimiter), with optional value-parsecallback (e.g.-a1,2 -a3→(1,2,3))@num: parse numeric values, including human-readable SI/IEC suffixes (i.e.10k→10_000)@obj: parse dictionaries from multi-value options (e.g.-eFOO=BAR -eBAZ=QUX→dict(FOO="BAR", BAZ="QUX"))@incs/@excs: construct anIncludesorExcludesobject for regex-filtering of string arguments@inc_exc: combination of@incsand@excs; constructs anIncludesorExcludesfor regex-filtering of strings, from two (mutually-exclusive)options@opt,@arg,@flag: wrappers forclick.{option,argument},option(is_flag=True)
Examples:
# cli.py
from utz.cli import cmd, count, incs, multi, num, obj
from utz import Includes, Literal
@cmd # Alias for `click.command`
@multi('-a', '--arr', parse=int, help="Comma-separated integers")
@obj('-e', '--env', help='Env vars, in the form `k=v`')
@incs('-i', '--include', 'includes', help="Only print `env` keys that match one of these regexs")
@num('-m', '--max-memory', help='Max memory size (e.g. "100m"')
@count('-v', '--verbosity', values=['warn', 'info', 'debug'], help='0x: "warn", 1x: "info", 2x: "debug"')
def main(
arr: tuple[int, ...],
env: dict[str, str],
includes: Includes,
max_memory: int,
verbosity: Literal['warn', 'info', 'debug'],
):
filtered_env = { k: v for k, v in env.items() if includes(k) }
print(f"{arr} {filtered_env} {max_memory} {verbosity}")
if __name__ == '__main__':
main()Saving the above as cli.py and running will yield:
python cli.py -a1,2 -a3 -eAAA=111 -eBBB=222 -eccc=333 -i[A-Z] -m10k
# (1, 2, 3) {'AAA': '111', 'BBB': '222'} 10000 warn
python cli.py -m 1Gi -v
# () {} 1073741824 infofrom utz.cli import arg, cmd, inc_exc, multi
from utz.rgx import Patterns
@cmd
@inc_exc(
multi('-i', '--include', help="Print arguments iff they match at least one of these regexs; comma-delimited, and can be passed multiple times"),
multi('-x', '--exclude', help="Print arguments iff they don't match any of these regexs; comma-delimited, and can be passed multiple times"),
)
@arg('vals', nargs=-1)
def main(patterns: Patterns, vals: tuple[str, ...]):
print(' '.join([ val for val in vals if patterns(val) ]))
if __name__ == '__main__':
main()Saving the above as cli.py and running will yield:
python cli.py -i a.,b aa bc cb c a AA B
# aa bc cb
python cli.py -x a.,b aa bc cb c a AA B
# c a AA BSee test_cli for more examples.
Use memray to profile memory allocations, extract stats, flamegraph HTML, and peak memory use:
from utz.mem import Tracker
from utz import iec
with (tracker := Tracker()):
nums = list(sorted(range(1_000_000, 0, -1)))
peak_mem = tracker.peak_mem
print(f'Peak memory use: {peak_mem:,} ({iec(peak_mem)})')
# Peak memory use: 48,530,432 (46.3 MiB)utz.time: Time timer, now/today helpers
from utz import Time, sleep
time = Time()
time("step 1")
sleep(1)
time("step 2") # Ends "step 1" timer
sleep(1)
time() # Ends "step 2" timer
print(f'Step 1 took {time["step 1"]:.1f}s, step 2 took {time["step 2"]:.1f}s.')
# Step 1 took 1.0s, step 2 took 1.0s.
# contextmanager timers can overlap/contain others
with time("run"): # ≈2s
time("sleep-1") # ≈1s
sleep(1)
time("sleep-2") # ≈1s
sleep(1)
print(f'Run took {time["run"]:.1f}s')
# Run took 1.0snow and today are wrappers around datetime.datetime.now that expose convenient functions:
from utz import now, today
now() # 2024-10-11T15:43:54Z
today() # 2024-10-11
now().s # 1728661583
now().ms # 1728661585952Use in conjunction with utz.bases codecs for easy timestamp-nonces:
from utz import b62, now
b62(now().s) # A18Q1l
b62(now().ms) # dZ3fYdS
b62(now().us) # G31Cn073vSample values for various units and codecs:
| unit | b62 | b64 | b90 |
|---|---|---|---|
| s | A2kw7P |
+aYIh1 |
:Kn>H |
| ds | R7FCrj |
D8oM9b |
"tn_BH |
| cs | CCp7kK0 |
/UpIuxG |
=Fc#jK |
| ms | dj4u83i |
MFSOKhy |
#8;HF8g |
| us | G6cozJjWb |
385u0dp8B |
D>$y/9Hr |
(generated by time-slug-grid.py)
utz.size: humanize.naturalsize wrapper
iec wraps humanize.naturalsize, printing IEC-formatted sizes by default, to 3 sigfigs:
from utz import iec
iec(2**30 + 2**29 + 2**28 + 2**27)
# '1.88 GiB'utz.hash_file: hash file contents
from utz import hash_file
hash_file("path/to/file") # sha256 by default
hash_file("path/to/file", 'md5')utz.ym: YM (year/month) class
The YM class represents a year/month, e.g. 202401 for January 2024.
from utz import YM
ym = YM(202501) # Jan '25
assert ym + 1 == YM(202502) # Add one month
assert YM(202502) - YM(202406) == 8 # subtract months
YM(202401).until(YM(202501)) # 202401, 202402, ..., 202412
# `YM` constructor accepts several representations:
assert all(ym == YM(202401) for ym in [
YM(202401),
YM('202401'),
YM('2024-01'),
YM(2024, 1),
YM(y=2024, m=1),
YM(dict(year=2022, month=12)),
YM(YM(202401)),
])utz.cd: "change directory" contextmanagers
from utz import cd, cd_tmpdir
with cd('..'):
# Inside parent dir
...
# Back in original dir
with cd('a/b/c', mk=True):
# Moved into a/b/c (created it if it didn't exist)
...
with cd_tmpdir(dir='.', name='my_tmpdir') as tmpdir:
# Inside a temporary subdirectory of previous working directory, with basename `my_tmpdir`
...See also test_cd.py.
from utz.gist import create_gist, upload_files_to_gist, get_github_user
# Get current GitHub username (via `gh` CLI)
username = get_github_user()
# Create a new gist
gist_id = create_gist(description="My gist", public=True)
# Upload files to a gist
upload_files_to_gist(
gist_id=gist_id,
files={'hello.txt': 'Hello, world!'},
branch='main',
commit_message='Add hello.txt'
)utz.gzip: deterministic GZip helpers
from utz import deterministic_gzip_open, hash_file
with deterministic_gzip_open('a.gz', 'w') as f:
f.write('\n'.join(map(str, range(10))))
hash_file('a.gz') # dfbe03625c539cbc2a2331d806cc48652dd3e1f52fe187ac2f3420dbfb320504See also: test_gzip.py.
utz.s3: S3 utilities
client(): cached boto3 S3 clientparse_bkt_key(args: tuple[str, ...]) -> tuple[str, str]: parse bucket and key from s3:// URL or separate argumentsget_etag(*args: str, err_ok: bool = False, strip: bool = True) -> str | None: get ETag of S3 objectget_etags(*args: str) -> dict[str, str]: get ETags for all objects with the given prefixatomic_edit(...) -> Iterator[str]: context manager for atomically editing S3 objects
from utz import s3, pd
url = 's3://bkt/key.parquet'
# `url`'s ETag is snapshotted on initial read
with s3.atomic_edit(url) as out_path:
df = pd.read_parquet(url)
df.sort_index(inplace=True)
df.to_parquet(out_path)
# On contextmanager exit, `out_path` is uploaded to `url`, iff
# `url`'s ETag hasn't changed (no concurrent update has occurred).Helpers for Plotly transformations I make frequently, e.g.:
from utz import plot
import plotly.express as px
fig = px.bar(x=[1, 2, 3], y=[4, 5, 6])
plot(
fig,
name='my-plot', # Filename stem. will save my-plot.png, my-plot.json, optional my-plot.html
title=['Some Title', 'Some subtitle'], # Plot title, followed by "subtitle" line(s) (smaller font, just below)
bg='white', xgrid='#ccc', # white background, grey x-gridlines
hoverx=True, # show x-values on hover
x="X-axis title", # x-axis title or configs
y=dict(title="Y-axis title", zerolines=True), # y-axis title or configs
# ...
)Example usages: hudcostreets/nj-crashes, ryan-williams/arrayloader-benchmarks.
utz.setup: setup.py helper
utz/setup.py provides defaults for various setuptools.setup() params:
name: use parent directory nameversion: parse from git tag (otherwise fromgit describe --tags)install_requires: readrequirements.txtauthor_{name,email}: infer from last commitlong_description: parseREADME.md(and setlong_description_content_type)description: parse first<p>under opening<h1>fromREADME.mdlicense: parse fromLICENSEfile (MIT and Apache v2 supported)
For an example, see gsmo==0.0.1 (and corresponding release).
This library also "self-hosts" using utz.setup; see pyproject.toml:
[build-system]
requires = ["setuptools", "utz[setup]==0.4.2", "wheel"]
build-backend = "setuptools.build_meta"and setup.py:
from utz.setup import setup
extras_require = {
# …
}
# Various fields auto-populated from git, README.md, requirements.txt, …
setup(
name="utz",
version="0.8.0",
extras_require=extras_require,
url="https://github.com/runsascoded/utz",
python_requires=">=3.10",
)The setup helper can be installed via a pip "extra":
pip install utz[setup]utz.version: runtime package version with git hash
Get your package version with current git commit hash at runtime, useful for verifying which exact commit is installed during local development:
# In your package's __init__.py:
from utz.version import pkg_version_with_git
__version__ = "0.1.1"
def get_version(include_git=True):
"""Get version string with optional git hash."""
return pkg_version_with_git(pkg_version=__version__, include_git=include_git)Usage:
import mypackage
mypackage.get_version()
# "0.1.1+git.abc1234" (clean working tree)
# "0.1.1+git.abc1234.dirty" (uncommitted changes)
mypackage.get_version(include_git=False)
# "0.1.1"
mypackage.__version__
# "0.1.1"The +git.HASH format follows PEP 440 local version identifier conventions. This helps verify which exact commit is installed when doing pip install -e . during local development, especially when working with multiple interdependent packages.
Features:
- Auto-detects git repo from caller's package directory
- Falls back to plain version if git not available (e.g., PyPI installs)
- Detects uncommitted changes (
.dirtysuffix) - Supports short (7-char, default) or full (40-char) hashes
Also available: utz.git.is_dirty() to check if the working tree has uncommitted changes.
utz.test: dataclass test cases, raises helper
utz.parametrize: pytest.mark.parametrize wrapper, accepts dataclass instances
from utz import parametrize
from dataclasses import dataclass
def fn(f: float, fmt: str) -> str:
"""Example function, to be tested with ``Case``s below."""
return f"{f:{fmt}}"
@dataclass
class case:
"""Container for a test-case; float, format, and expected output."""
f: float
fmt: str
expected: str
@property
def id(self):
return f"fmt-{self.f}-{self.fmt}"
@parametrize(
case(1.23, "0.1f", "1.2"),
case(123.456, "0.1e", "1.2e+02"),
case(-123.456, ".0f", "-123"),
)
def test_fn(f, fmt, expected):
"""Example test, "parametrized" by several ``Cases``s."""
assert fn(f, fmt) == expectedtest_parametrize.py contains more examples, customizing test "ID"s, adding parameter sweeps, etc.
from utz import TmpDir, tmp_ensure_dir, TmpPath
# ``TemporaryDirectory`` wrapper that creates ``dir`` (and parents), if necessary (and removes any dirs it created, on exit)
# Also adds support for specifying exact basename, via ``name`` kwarg.
with TmpDir(dir='nested/subdir', name='basename') as tmpdir:
...
# Yields a path with the requested basename, inside a ``TemporaryDirectory``.
# As with ``TmpDir``, ``dir`` (and parents) will be created, if necessary (and removed on exit, leaving the filesystem in the same state it started in)
with TmpPath('basename.txt', dir='nested/subdir') as tmppath:
...
# Multiple right-most path components can be specified exactly.
with TmpPath('dir1/dir2/basename.txt', dir='nested/subdir') as tmppath:
...
# Used by ``TmpDir``/``TmpPath`` above, creates ``dir`` (and parents), if necessary (and removes any dirs it created, on exit)
with tmp_ensure_dir(dir='nested/subdir'):
...See also: test_tmpdir.py.
utz.docker, utz.bases, etc.
Misc other modules:
- bases: encode/decode in various bases (62, 64, 90, …)
- escape: split/join on an arbitrary delimiter, with backslash-escaping;
utz.escescapes a specific character in a string. - ctxs: compose
contextmanagers - o:
dictwrapper exposing keys as attrs (e.g.:o({'a':1}).a == 1) - docker: DSL for programmatically creating Dockerfiles (and building images from them)
- tmpdir: make temporary directories with a specific basename
- ssh: SSH tunnel wrapped in a context manager
- backoff: exponential-backoff utility
- git: Git helpers, wrappers around GitPython
- pnds: pandas imports and helpers
Some repos that use utz: