Skip to content

Commit 2cd6c4e

Browse files
Update docstring
1 parent 6564a9f commit 2cd6c4e

File tree

1 file changed

+20
-14
lines changed

1 file changed

+20
-14
lines changed

balance/utils/model_matrix.py

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -286,23 +286,29 @@ def _prepare_input_model_matrix(
286286
287287
Args:
288288
sample (pd.DataFrame | Any): Input sample data as either a DataFrame or
289-
a ``Sample``-like object that stores the data in ``._df``.
289+
a ``Sample``-like object that stores the underlying frame in
290+
``._df``.
290291
target (pd.DataFrame | Any | None, optional): Optional target data as
291-
either a DataFrame or a ``Sample``-like object. If provided, rows
292-
are concatenated with sample rows for downstream matrix creation.
293-
Defaults to None.
294-
variables (List[str] | None, optional): Explicit variables to keep from
295-
``sample``/``target`` before concatenation. If None, variables are
296-
inferred via ``choose_variables`` on the provided inputs.
297-
add_na (bool, optional): If True, add missingness indicator columns to
298-
the concatenated data. If False, drop rows with missing values and
299-
preserve target-only-all-NA validation behavior. Defaults to True.
300-
fix_columns_names (bool, optional): Defaults to True. If to fix the
301-
column names of the DataFrame by changing special characters to
302-
'_'.
292+
either a DataFrame or a ``Sample``-like object. If provided, the
293+
model-matrix inputs are prepared from a sample/target union of
294+
variables and rows. Defaults to None.
295+
variables (List[str] | None, optional): Variables to use from both
296+
inputs. If provided, `choose_variables` validates that each
297+
requested variable exists in both sample and target (when target is
298+
supplied), otherwise it raises ``ValueError``. If None, variables
299+
are inferred by `choose_variables`.
300+
add_na (bool, optional): If True, add NA indicator columns before
301+
model-matrix creation. If False, drop rows containing missing
302+
values; this can raise ``ValueError`` if dropping rows empties the
303+
sample or target. Defaults to True.
304+
fix_columns_names (bool, optional): Whether to sanitize column names by
305+
replacing non-word characters with ``_`` and making duplicate names
306+
unique. Defaults to True.
303307
304308
Raises:
305-
Exception: "Variable names cannot contain characters '[' or ']'"
309+
ValueError: If requested ``variables`` are not present in both inputs,
310+
if variables contain ``[`` or ``]``, or if ``add_na=False`` drops
311+
all rows from sample/target.
306312
307313
Returns:
308314
Dict[str, Any]: returns a dictionary containing two keys: 'all_data' and 'sample_n'.

0 commit comments

Comments
 (0)