A blueprint for scalable ML pipelines that eliminate feature skew from training to serving
By Abhijeet Dhumal and Nikhil Kathole
Machine learning has moved far beyond experimental notebooks. Today, enterprises deploy models that drive real business decisions—from demand forecasting and fraud detection to personalized recommendations and predictive maintenance. Yet the path from a working prototype to a reliable production system remains fraught with challenges that have little to do with model architecture or hyperparameter tuning.
Google's seminal paper Hidden Technical Debt in Machine Learning Systems (NIPS 2015) established that ML systems incur massive ongoing maintenance costs—not from model code, but from data dependencies, configuration complexity, and the erosion of abstraction boundaries. The reality is that most ML failures in production stem from data inconsistencies, not algorithmic issues. This phenomenon, known as training-serving skew, occurs when features are computed differently during training versus inference—often due to inconsistent logic implementations across languages, systems, or data pipelines.
Industry leaders learned this lesson through costly experience. Uber built Michelangelo specifically to solve feature consistency after experiencing model degradation. DoorDash reported significant prediction errors in delivery time estimates before implementing their feature store. Netflix developed distributed time travel for feature generation to enable offline testing with historical data. Airbnb created Chronon, a declarative feature engineering framework, to centralize feature computation for both training and inference. Atlassian reduced feature deployment from months to days while improving data consistency from 96% to 99.9% after adopting a feature platform.
Feature stores have emerged as critical infrastructure to address these challenges, providing a single source of truth for feature definitions that both training and serving can rely on. Combined with distributed training frameworks that scale across multiple nodes and accelerators, experiment tracking systems that ensure reproducibility, and serving platforms that handle auto-scaling and traffic management, organizations can build ML pipelines that are robust, scalable, and maintainable.
This article walks you through building a complete ML pipeline for retail demand forecasting—a challenge that costs the industry dearly when done poorly. According to IHL Group research, global retailers lose approximately $1.77 trillion annually due to inventory distortion—out-of-stocks account for $1.2 trillion in lost sales, while overstocking costs another $562 billion in markdowns and waste.
Our use case is inspired by the Walmart Store Sales Forecasting challenge on Kaggle and draws from techniques used in the M5 Forecasting Competition—the largest retail forecasting benchmark to date, featuring 42,840 hierarchical time series of Walmart unit sales. We predict weekly sales across multiple stores and departments using historical patterns, temporal features, and economic indicators.
To build this pipeline, we use Red Hat OpenShift AI, which provides a comprehensive platform for the complete ML lifecycle by integrating open source projects and operators. This article demonstrates a subset of its capabilities:
- Feast for feature management — with PostgreSQL for registry, Redis for online serving, and Ray for distributed offline computations
- Kubeflow Training Operator for distributed training — supporting both NVIDIA (CUDA/NCCL) and AMD (ROCm/RCCL) accelerators
- MLflow for experiment tracking and model registry
- KServe for auto-scaling model serving
Most importantly, we show how the same feature definitions power both training and inference, eliminating the training-serving skew that plagues production ML systems.
This blueprint combines five technologies to build a production-ready ML pipeline:
| Component | Purpose |
|---|---|
| Feast | Single source of truth for features (training + serving) |
| Ray | Distributed feature retrieval that scales to millions of rows |
| Kubeflow Trainer | Multi-node PyTorch DDP training |
| MLflow | Experiment tracking and model registry |
| KServe | Auto-scaling model serving with Feast integration |
The pattern across industries is consistent: models that work perfectly in notebooks fail silently in production because feature computation differs between training and serving. As discussed above, this problem drove Uber, DoorDash, Netflix, Airbnb, and Atlassian to invest heavily in feature platforms.
| Failure Mode | Symptom | Business Impact |
|---|---|---|
| Stale features | Serving uses yesterday's data while training used point-in-time | Predictions drift over time |
| Different aggregations | Training computes rolling averages differently than serving | Model accuracy drops |
| Missing features | Serving code omits a feature that training included | Silent prediction errors |
| Type mismatches | Training uses float64, serving uses float32 | Subtle numerical differences |
Complete pipeline flow: Feature Engineering → Distributed Training → Model Serving. The same FeatureService definitions power both training and inference.
The architecture separates concerns across five components that work together seamlessly on OpenShift AI:
| Component | Role |
|---|---|
| Feast Operator | Manages feature store infrastructure (registry, online/offline stores) |
| Ray (KubeRay) | Distributed get_historical_features() — scales to millions of rows |
| Kubeflow Trainer | Multi-node PyTorch Distributed Training orchestration |
| MLflow Operator | Experiment tracking with workspace isolation |
| KServe | Auto-scaling inference with Feast SDK integration |
Before diving into the ML pipeline phases, let's set up the environment on OpenShift AI.
Navigate to OpenShift AI from the OpenShift console.
Create a new Data Science Project to organize your ML resources.
OpenShift AI Data Science Project with all components configured.
Attach persistent storage for model artifacts and datasets.
Select an appropriate workbench image with pre-installed ML libraries.
Connect the workbench to the Feast feature store for seamless feature access.
Jupyter workbench connected to Feast feature store and shared storage.
| Component | Purpose |
|---|---|
| OpenShift AI | Managed ML platform |
| Feast Operator | Feature store infrastructure |
| Kubeflow Training Operator | Distributed training |
| KServe | Model serving |
| MLflow Operator | Experiment tracking |
Feature engineering workflow: Generate data → Engineer features → Register with Feast → Materialize to online store.
On OpenShift AI, the Feast Operator automates feature store infrastructure:
| What It Manages | Benefit |
|---|---|
| PostgreSQL Registry | Durable metadata storage for feature definitions |
| Redis Online Store | Low-latency feature serving |
| Ray Offline Store | Distributed historical feature retrieval |
| gRPC Services | Secure communication between components |
| Client ConfigMaps | Auto-generated configuration (auto-mounted in workbenches, manually mounted in jobs) |
When you create a FeatureStore custom resource, the operator provisions all components and generates client configuration that workbenches can mount automatically.
FeatureStore Custom Resource:
apiVersion: feast.dev/v1
kind: FeatureStore
metadata:
name: salesforecasting
namespace: feast-trainer-demo
spec:
feastProject: sales_forecasting
# Feature definitions from Git repository
feastProjectDir:
git:
url: https://github.com/your-org/sales-demand-forecasting.git
ref: main
featureRepoPath: feature_repo
services:
# Offline Store: Ray for distributed historical queries
offlineStore:
persistence:
store:
type: ray
secretRef:
name: feast-data-stores
# Online Store: Redis for low-latency serving
onlineStore:
persistence:
store:
type: redis
secretRef:
name: feast-data-stores
# Registry: PostgreSQL for durable metadata
registry:
local:
persistence:
store:
type: sql
secretRef:
name: feast-data-stores
Feast UI showing feature lineage from data sources → entities → feature views → feature services.
Feature Services define which features are used for training vs inference.
The critical insight is creating two FeatureServices that share the same feature definitions but differ in one key aspect:
| Service | Purpose | Includes weekly_sales (Target)? |
|---|---|---|
training_features |
Historical feature retrieval for model training | Yes - model learns to predict this |
inference_features |
Real-time feature lookup for predictions | No - this is what we're predicting |
# Training FeatureService - includes target column
training_features = FeatureService(
name="training_features",
features=[
sales_features[[
"weekly_sales", # Target - only in training
"lag_1", "lag_2", "lag_4", "lag_8",
"rolling_mean_4w", "rolling_std_4w", "sales_vs_avg",
"week_of_year", "month", "quarter", "is_holiday",
"temperature", "fuel_price", "cpi", "unemployment",
]],
store_features[["store_type", "store_size", "region"]],
],
)
# Inference FeatureService - excludes target column
inference_features = FeatureService(
name="inference_features",
features=[
sales_features[[
# No weekly_sales - that's what we predict!
"lag_1", "lag_2", "lag_4", "lag_8",
"rolling_mean_4w", "rolling_std_4w", "sales_vs_avg",
"week_of_year", "month", "quarter", "is_holiday",
"temperature", "fuel_price", "cpi", "unemployment",
]],
store_features[["store_type", "store_size", "region"]],
],
)Features included in both services:
| Category | Features | Importance |
|---|---|---|
| Target | weekly_sales |
Training only - the value we predict |
| Lag | lag_1, lag_2, lag_4, lag_8 |
35% - historical sales patterns |
| Rolling | rolling_mean_4w, rolling_std_4w, sales_vs_avg |
28% - recent trends |
| Temporal | week_of_year, month, quarter, week_of_month, is_month_end |
18% - seasonality |
| Holiday | is_holiday, days_to_holiday |
10% - holiday effects |
| Economic | temperature, fuel_price, cpi, unemployment |
7% - external factors |
| Store | store_type, store_size, region |
2% - static metadata |
Both services reference the same underlying feature definitions, ensuring that:
- Training and serving use identical feature computations
- The target (
weekly_sales) is only included during training - Feature updates propagate to both training and serving automatically
Training workflow: Submit TrainJob → Fetch features via Feast SDK → Distributed training → Log to MLflow.
Distributed training on Kubernetes is complex. Without an operator, you'd manually configure:
- Environment variables (
MASTER_ADDR,WORLD_SIZE,RANK) for each pod - Pod scheduling for network locality
- Job lifecycle (restarts, failures, gang scheduling)
- Headless services for pod-to-pod communication
- Multi-GPU allocation and NCCL/RCCL configuration
- Accelerator-specific settings (CUDA for NVIDIA, ROCm for AMD)
The Kubeflow Trainer handles all of this declaratively, supporting:
| Capability | Description |
|---|---|
| Multi-node training | Scale across 2, 4, 8, or 100+ nodes |
| Multi-GPU per node | Utilize all GPUs on each node (1, 2, 4, 8 GPUs) |
| Multi-accelerator support | NVIDIA (CUDA/NCCL) and AMD (ROCm/RCCL) |
| Automatic coordination | Environment variables and service discovery handled automatically |
PyTorch DDP Initialization (handled automatically by Kubeflow):
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP
# Kubeflow sets MASTER_ADDR, WORLD_SIZE, RANK automatically
dist.init_process_group(backend="nccl" if torch.cuda.is_available() else "gloo")
rank, world_size = dist.get_rank(), dist.get_world_size()
# Each rank gets its own GPU
device = torch.device(f"cuda:{os.environ.get('LOCAL_RANK', 0)}")
model = MLP(input_dim).to(device)
model = DDP(model) # Wrap for distributed training
# DistributedSampler ensures each rank processes different data
sampler = DistributedSampler(train_dataset, num_replicas=world_size, rank=rank)
train_loader = DataLoader(train_dataset, batch_size=64, sampler=sampler)The training workflow follows a proven pattern:
| Step | Component | Action |
|---|---|---|
| 1 | Kubeflow SDK | Submit TrainJob to cluster |
| 2 | Feast SDK | Retrieve historical features via gRPC |
| 3 | PyTorch DDP | Distribute training across nodes |
| 4 | MLflow | Track experiments, log model |
| 5 | Shared Storage | Save artifacts for serving |
Fetching Historical Features with Feast SDK:
from feast import FeatureStore
from datetime import datetime, timezone, timedelta
# Initialize Feast with operator-provided config
store = FeatureStore(repo_path="/opt/app-root/src/feast-config/salesforecasting")
# Build entity DataFrame (store_id, dept_id, timestamp)
entity_df = pd.DataFrame([
{"store_id": s, "dept_id": d, "event_timestamp": datetime(2022, 1, 1, tzinfo=timezone.utc) + timedelta(weeks=w)}
for w in range(104) for s in range(1, 46) for d in range(1, 15)
])
# Retrieve historical features via Feast gRPC (Ray distributes the query)
df = store.get_historical_features(
entity_df=entity_df,
features=store.get_feature_service("training_features")
).to_df()
# df now contains all features + weekly_sales target
X = df[feature_columns].values
y = df["weekly_sales"].values
MLflow workspace in OpenShift AI for experiment tracking.
MLflow UI comparing training runs across different configurations.
The MLflow Operator on OpenShift AI provides:
| Feature | Benefit |
|---|---|
| Workspace Isolation | Each team/project gets isolated experiment tracking |
| Model Registry | Version and stage models for deployment |
| Artifact Storage | Automatic model and scaler persistence |
| Parameter Logging | Track hyperparameters across runs |
Each training run automatically logs parameters, metrics (loss, MAPE), and artifacts (model weights, scalers, metadata).
The same training configuration scales from development to production:
| Scale | Configuration | Use Case |
|---|---|---|
| Development | 1 node, CPU | Debugging, quick iteration |
| Team | 2-4 nodes, GPU | Regular training runs |
| Production | 8-64 nodes | Large models, hyperparameter sweeps |
Submitting a TrainJob with Kubeflow SDK:
from kubeflow.trainer import TrainerClient, CustomTrainer
trainer = TrainerClient()
# Submit distributed training job
trainer.train(
trainer=CustomTrainer(
func=train_fn, # Your training function
num_nodes=4, # Scale to 4 nodes
resources_per_node={
"gpu": 1, # 1 GPU per node
"cpu": 4,
"memory": "16Gi"
},
packages_to_install=["feast", "mlflow", "torch"],
),
namespace="feast-trainer-demo",
env={
"FEAST_CONFIG_PATH": "/opt/app-root/src/feast-config/salesforecasting",
"MLFLOW_TRACKING_URI": "https://mlflow.apps.cluster.local",
},
)TrainJob YAML (declarative alternative):
apiVersion: trainer.kubeflow.org/v1alpha1
kind: TrainJob
metadata:
name: sales-training
namespace: feast-trainer-demo
spec:
runtimeRef:
name: torch-distributed
kind: ClusterTrainingRuntime
trainer:
image: quay.io/modh/training:py312-cuda128-torch280
numNodes: 2
numProcPerNode: 1
resourcesPerNode:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
env:
- name: FEAST_CONFIG_PATH
value: /opt/app-root/src/feast-config/salesforecasting
- name: MLFLOW_TRACKING_URI
value: "http://mlflow.feast-trainer-demo.svc.cluster.local:5000"
- name: NUM_EPOCHS
value: "50"
podTemplateOverrides:
- targetJobs:
- name: node
spec:
volumes:
- name: feast-config
configMap:
name: feast-salesforecasting-client # Created by Feast Operator
containers:
- name: node
volumeMounts:
- name: feast-config
mountPath: /opt/app-root/src/feast-config/salesforecastingNote: The
feast-salesforecasting-clientConfigMap is created by the Feast Operator when you deploy the FeatureStore CR. In workbenches, this is auto-mounted when you connect to the feature store via the OpenShift AI UI. For TrainJobs and InferenceServices, you must explicitly mount it as shown above.
Inference workflow: Client sends entity IDs → KServe fetches features from Feast → Model predicts → Return result.
The key innovation is Feast SDK integration in the serving layer:
| Step | Component | Action |
|---|---|---|
| 1 | Client | Sends entity IDs (store_id, dept_id) |
| 2 | KServe | Receives inference request |
| 3 | Feast SDK | Fetches features via gRPC (same as training) |
| 4 | Model | Scales features, runs inference |
| 5 | KServe | Returns prediction |
Fetching Online Features in KServe Model:
from feast import FeatureStore
from kserve import Model
class SalesForecastModel(Model):
def load(self):
# Same Feast SDK, same config path as training
self.feast_store = FeatureStore(repo_path="/opt/app-root/src/feast-config/salesforecasting")
self.model = torch.load("best_model.pt")
def preprocess(self, payload):
# Client sends only entity IDs
entities = payload["inputs"][0]["data"] # [{"store_id": 1, "dept_id": 5}, ...]
# Fetch features from Redis online store via Feast SDK
features = self.feast_store.get_online_features(
entity_rows=entities,
features=self.feast_store.get_feature_service("inference_features")
).to_dict()
return self._build_feature_matrix(features)
def predict(self, X):
X_scaled = self.scaler.transform(X)
with torch.no_grad():
return self.model(torch.FloatTensor(X_scaled)).numpy()| Approach | Client Sends | Pros | Cons |
|---|---|---|---|
| Without Feast | All features | Simple server | Feature drift, complex client |
| With Feast | Entity IDs only | No drift, simple client | Requires online store |
By fetching features at serving time using the same Feast SDK as training, we guarantee identical feature computation.
InferenceService YAML:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: sales-forecast
namespace: feast-trainer-demo
annotations:
serving.kserve.io/deploymentMode: RawDeployment
spec:
predictor:
minReplicas: 1
maxReplicas: 3
containers:
- name: kserve-container
image: quay.io/modh/training:py312-cuda128-torch280
env:
- name: MODEL_DIR
value: /shared/models
- name: FEAST_CONFIG_PATH
value: /opt/app-root/src/feast-config/salesforecasting
resources:
requests:
cpu: "500m"
memory: "2Gi"
volumeMounts:
- name: model-storage
mountPath: /shared
- name: feast-config
mountPath: /opt/app-root/src/feast-config/salesforecasting
volumes:
- name: model-storage
persistentVolumeClaim:
claimName: shared
- name: feast-config
configMap:
name: feast-salesforecasting-client # Created by Feast OperatorThe end-to-end inference latency is composed of:
| Component | Description |
|---|---|
| Feast online store lookup | Redis-backed feature retrieval |
| Feature scaling | NumPy vectorized operations |
| Model inference | PyTorch forward pass |
Using Redis as the online store ensures low-latency feature retrieval suitable for real-time serving.
| Area | Benefit |
|---|---|
| Feature Consistency | Same feature definitions for training and serving eliminates skew |
| Scalability | Ray distributes feature retrieval; Kubeflow distributes training |
| Reduced Training Time | Multi-node DDP significantly reduces iteration time vs single-node |
| Low-Latency Serving | Redis online store enables real-time predictions |
| Reproducibility | MLflow tracks all experiments, parameters, and artifacts |
| Decision | Benefit |
|---|---|
| Feast as single source of truth | Same FeatureService for training + serving = no skew |
| Feast Operator | Automated infrastructure, secure gRPC communication |
| Ray offline store | Scales point-in-time joins from thousands to millions of rows |
| Kubeflow Trainer | Declarative distributed training, no manual coordination |
| MLflow Operator | Workspace isolation, model versioning |
| KServe + Feast SDK | Consistent feature retrieval at inference time |
| Approach | Scales | No Skew | Open Source | Vendor Lock-in |
|---|---|---|---|---|
| Pandas + custom serving | ❌ | ❌ | ✅ | None |
| Tecton | ✅ | ✅ | ❌ | High |
| Vertex AI Feature Store | ✅ | ✅ | ❌ | GCP |
| SageMaker Feature Store | ✅ | ✅ | ❌ | AWS |
| This Architecture | ✅ | ✅ | ✅ | None |
The open-source stack provides managed-service capabilities while running on any Kubernetes cluster.
Feature stores aren't just about organizing data—they're about preventing the silent failures that plague production ML systems.
This architecture provides:
- Zero training-serving skew by construction (same FeatureService definitions)
- Horizontal scalability from prototype to production
- No vendor lock-in — runs on any Kubernetes cluster
- GitOps-friendly — feature definitions are version-controlled
The key insight: feature consistency between training and serving is the foundation everything else builds on. Get this wrong, and no amount of model tuning will save you. Get it right, and you have a platform that scales with your business.
Foundational Research:
- Hidden Technical Debt in Machine Learning Systems — Google's seminal NIPS 2015 paper on ML systems maintenance
- Rules of ML: Best Practices for ML Engineering — Google's comprehensive guide to production ML
- MLOps: Continuous delivery and automation pipelines — Google Cloud Architecture Center
- MLOps Maturity Model — Microsoft's framework for assessing ML operations maturity
Industry Case Studies & Engineering Blogs:
- Uber Michelangelo ML Platform — Uber's approach to feature consistency
- DoorDash Feature Store with Redis — Building a gigascale feature store
- Netflix Distributed Time Travel — Feature generation for offline testing
- Airbnb Chronon Framework — Declarative feature engineering
- Atlassian Feature Platform Case Study — Reducing deployment time from months to days
Retail Forecasting:
- M5 Forecasting Competition — The largest retail forecasting benchmark (42,840 time series)
- Walmart Store Sales Forecasting — Classic Kaggle demand forecasting challenge
- IHL Group Inventory Distortion Research — $1.77 trillion annual retail losses
OpenShift AI Documentation:
- Red Hat OpenShift AI
- Red Hat OpenShift AI Self-Managed Documentation
- AI on OpenShift patterns and recipes
Kubeflow Training:
OpenShift AI Component Documentation:
- Working with Machine Learning Features (Feast) — Feature Store on OpenShift AI
- Distributed Workloads (Ray & Kubeflow) — Distributed training on OpenShift AI
- Deploying Models on Single-Model Serving Platform — KServe on OpenShift AI
- Configuring Model-Serving Platform — Runtime and serving configuration
Upstream Project Documentation:
- Feast Documentation
- KubeRay Documentation
- Kubeflow Trainer
- MLflow Documentation
- KServe Documentation
- PyTorch Distributed Data Parallel (DDP) — Multi-GPU training documentation