This project includes feature-wise scalers under scalers/.
All scalers operate column-wise (axis=0) on NumPy arrays.
In this repository, the primary pattern is to construct scalers from precomputed
client statistics (self.stats) rather than calling fit() at runtime.
| Class | Transform (per feature) | Inverse transform |
|---|---|---|
BaseScaler |
x |
x |
Standard |
(x - mean) / std |
(x * std) + mean |
MinMax |
(x - min) / (max - min) |
x * (max - min) + min |
Robust |
(x - q1) / (q3 - q1) |
x * (q3 - q1) + q1 |
MaxAbs |
x / max(abs(x)) |
x * max(abs(x)) |
Each scaler follows the same method pattern:
fit(data): estimate statistics from training datatransform(data): scale input datainverse_transform(data): map scaled data back to the original space
The client pipeline initializes scalers from precomputed train stats:
self.stats = self.private_data["stats"]["train"]
self.scaler = getattr(__import__("scalers"), self.scaler)(self.stats)Then uses:
x_scaled = self.scaler.transform(x)
y_scaled = self.scaler.transform(y)
pred = self.scaler.inverse_transform(pred_scaled)You can still fit stats directly from arrays when precomputed stats are not available:
scaler.fit(train_x)
train_x_scaled = scaler.transform(train_x)
test_x_scaled = scaler.transform(test_x)
pred_y = scaler.inverse_transform(pred_y_scaled)BaseScaler is a base class with no-op defaults for fit, transform, and inverse_transform.
Helper utility:
divide_no_nan(a, b): computesa / band replacesNaN/Infwith0.0
This helper is used by scalers that need safe division.
File: scalers/Standard.py
fit: computes per-featuremeanandstdtransform:divide_no_nan((x - mean), std)inverse_transform:(x * std) + mean
- If a feature has zero variance (
std == 0), transformed values become0for that feature due todivide_no_nan.
stat = {
"feature_0": {"mean": 10.2, "std": 3.1},
"feature_1": {"mean": 5.0, "std": 1.7},
}File: scalers/MinMax.py
fit: computes per-featureminandmaxtransform:(x - min) / (max - min)inverse_transform:x * (max - min) + min
- Range is typically
[0, 1]on data similar to the fit distribution. - This scaler currently uses direct division; if
max == minfor a feature, division-by-zero can occur.
stat = {
"feature_0": {"min": -4.0, "max": 9.0},
"feature_1": {"min": 0.0, "max": 3.0},
}File: scalers/Robust.py
fit: computes per-feature quartilesq1(25th) andq3(75th)transform:(x - q1) / (q3 - q1)inverse_transform:x * (q3 - q1) + q1
- More resistant to outliers than mean/std scaling.
- This scaler currently uses direct division; if
q3 == q1for a feature, division-by-zero can occur.
stat = {
"feature_0": {"q1": 2.0, "q3": 8.0},
"feature_1": {"q1": -1.0, "q3": 1.0},
}File: scalers/MaxAbs.py
fit: computes per-featuremax_abs = max(abs(x))transform:divide_no_nan(x, max_abs)inverse_transform:x * max_abs
- Preserves sign.
- Useful for data centered around zero.
- If a feature is all zeros,
divide_no_nankeeps output stable (zeros).
stat = {
"feature_0": {"max_abs": 5.0},
"feature_1": {"max_abs": 12.0},
}scalers/__init__.py imports scaler classes dynamically and exposes them through SCALERS and __all__.
Class naming requirement:
- filename and class name must match (for example
MaxAbs.py->class MaxAbs)