@@ -286,23 +286,29 @@ def _prepare_input_model_matrix(
286286
287287 Args:
288288 sample (pd.DataFrame | Any): Input sample data as either a DataFrame or
289- a ``Sample``-like object that stores the data in ``._df``.
289+ a ``Sample``-like object that stores the underlying frame in
290+ ``._df``.
290291 target (pd.DataFrame | Any | None, optional): Optional target data as
291- either a DataFrame or a ``Sample``-like object. If provided, rows
292- are concatenated with sample rows for downstream matrix creation.
293- Defaults to None.
294- variables (List[str] | None, optional): Explicit variables to keep from
295- ``sample``/``target`` before concatenation. If None, variables are
296- inferred via ``choose_variables`` on the provided inputs.
297- add_na (bool, optional): If True, add missingness indicator columns to
298- the concatenated data. If False, drop rows with missing values and
299- preserve target-only-all-NA validation behavior. Defaults to True.
300- fix_columns_names (bool, optional): Defaults to True. If to fix the
301- column names of the DataFrame by changing special characters to
302- '_'.
292+ either a DataFrame or a ``Sample``-like object. If provided, the
293+ model-matrix inputs are prepared from a sample/target union of
294+ variables and rows. Defaults to None.
295+ variables (List[str] | None, optional): Variables to use from both
296+ inputs. If provided, `choose_variables` validates that each
297+ requested variable exists in both sample and target (when target is
298+ supplied), otherwise it raises ``ValueError``. If None, variables
299+ are inferred by `choose_variables`.
300+ add_na (bool, optional): If True, add NA indicator columns before
301+ model-matrix creation. If False, drop rows containing missing
302+ values; this can raise ``ValueError`` if dropping rows empties the
303+ sample or target. Defaults to True.
304+ fix_columns_names (bool, optional): Whether to sanitize column names by
305+ replacing non-word characters with ``_`` and making duplicate names
306+ unique. Defaults to True.
303307
304308 Raises:
305- Exception: "Variable names cannot contain characters '[' or ']'"
309+ ValueError: If requested ``variables`` are not present in both inputs,
310+ if variables contain ``[`` or ``]``, or if ``add_na=False`` drops
311+ all rows from sample/target.
306312
307313 Returns:
308314 Dict[str, Any]: returns a dictionary containing two keys: 'all_data' and 'sample_n'.
0 commit comments