This document is the style guide for all code in Binary Refinery.
All refinery code must support Python 3.8 and later versions.
For example, this means that the match statement is currently not supported.
First and foremost, all code should pass flake8, with the following tests disabled:
- Disabled to allow command line argument annotations to work:
F821(undefined name)F722(syntax error in forward annotation)
- Disabled because the maintainer doesn't like them:
E128(continuation line is under-indented for a visual indentation)E203(colons should not have any space before them)E261(at least two spaces before inline comment)W503(line break occurred before a binary operator)
We use pyflakes for checking compliance and run the following isort command to normalize imports
inside the refinery code package (tests are not subject to this):
isort --py=38 refinery
The refinery code base uses modern type hints, i.e.:
int | Noneinstead ofOptional[int],int | boolinstead ofUnion[int, bool],list[bool]instead ofList[bool].
To facilitate this and ensure backwards compatibility with Python 3.8, we prefix all code with
from __future__ import annotations
In rare cases where a modern type hint would have to be resolved at runtime however,
it is permissible to import from typing and define types compatible with Python 3.8.
On the other hand, this is very often avoidable by making such definitions only when TYPE_CHECKING is true,
since development happens using a modern Python environment.
The goal for refinery is to be fully typed.
Use the pyright type checker for newly written code and ensure that it reports no problems.
Comments should be avoided wherever it is possible and used only when important information about the code cannot be communicated otherwise. Prioritize expressive, well-structured code and comprehensive naming of variables and functions. Especially undesired are plate comments that intend to separate a source file into sections or merely announce the code that will follow.
All code in refinery uses LF line breaks exclusively, never CR/LF.
The line length for refinery code is 100 characters. This is a hard limit for docstrings and comments, and a soft limit for code. Note also that lines should not wrap at less than 100 characters. Do not wrap at 80 characters. The hard limit for code is at 140 characters to allow occasional code lines that exceed 100 characters.
When a function call or definition becomes too long for the line width limit, it should be split like so:
result = function_call(
argument_1,
argument_2,
keyword_parameter_1=keyword_argument_1,
keyword_parameter_2=keyword_argument_2,
)And for function definitions:
def function_call(
argument_1: int,
argument_2: int,
keyword_parameter_1: str = '',
keyword_parameter_2: str = '',
):
...In other words, each positional and keyword argument as well as the closing parenthesis are on one separate line respectively. Indentation is increased by one for the arguments, the closing parenthesis is not indented. The same rule applies to other comma-separated list, tuple, or set literals.
The following style is only permitted for function calls, and only if the line is broken exactly once:
result = function_call(argument_1, argument_2, argument_3,
argument_4, keyword_parameter_1=keyword_argument_1, keyword_parameter_2=keyword_argument_2)Docstrings use three double quotes """ as separators. Always write docstrings like this:
class cls:
"""
[docstring]
"""and never like this:
class cls:
"""[docstring]"""The docstrings for refinery units should be written with keyword search in mind. A short paragraph at the beginning should give a quick overview of what the unit does, followed by a lengthy explanation including the possible keywords that would help users discover it.
When typing large dictionaries, the omission of E203 is to allow you to write them like so:
data = {
'key1' : 'data1',
'a-longer-key' : 'data2',
'other-key' : 'data3',
}This can make large dictionaries with somewhat tabular data easier to read in the code.
All code in binary refinery aims to minimize the number of byte copy operations. To this end:
- Make functions as flexible as possible when it comes to what they accept as input;
Allow
memoryviewinputs wherever possible. You can usecodecs.decodeto decodememoryviewobjects to strings, and therefinery.lib.idmodule contains methodsbuffer_offsetandbuffer_containswhich allow you to search withinmemoryviewobjects. - For the top-level entry points of an API, allow
bytes | bytearray | memoryviewas input and pass onmemoryviewobjects to subroutines when possible. - When binary buffers have to be sliced, a
memoryviewis the best choice since slicing it has no memory cost. - Building output should always be done in a
bytearray, never by concatenatingbytesobjects. - Returning
bytearrayobjects rather thanbytesis always acceptable; the two types expose the same API.
When accepting bytes | bytearray | memoryview but only requiring a memoryview internally,
cast your input to memoryview unconditionally at the start of your function:
def _input_agnostic_function(data: bytes | bytearray | memoryview):
view = memoryview(data)
# work only with viewNever use the following pattern:
view = memoryview(data) if not isinstance(data, memoryview) else dataProducing a memoryview from an existing memoryview is cheap.
Doing it unconditionally helps the type checker and any human reader.
Data transfer object should not be stored in Python dictionaries.
It should use a dataclass or NamedTuple with clear type hints instead.
Do not use opaque string or integer literal constants for what can be captured by an Enum.
For parsing structured data, the standard interface is Struct[memoryview] from refinery.lib.structures.
This should usually be preferred over manual parsing using offset calculations and the struct module.
All strings use single quotes, except for docstrings, which use three double quotes. When possible, strings should not be concatenated with string literals, F-Strings should be used instead. For example, the code
message = 'Hello, ' + world + '\n'should be replaced with:
message = F'Hello, {world}\n'