Add handle_missing_values call and switch to one-hot encoding for categorical features#6
Conversation
Co-authored-by: wuzhi456 <217564920+wuzhi456@users.noreply.github.com>
Co-authored-by: wuzhi456 <217564920+wuzhi456@users.noreply.github.com>
|
@copilot Missing values (NaN) in training data: '?' values in training data: Remaining missing values in training data: 0 |
…inplace Co-authored-by: wuzhi456 <217564920+wuzhi456@users.noreply.github.com>
Found and fixed the issue in Changed to Fixed in commit 143022f. |
|
@copilot def encode_categorical_features(self, df, is_training=True): |
The encodings are stored in |
|
@copilot 我希望你把类别的编码方式改成独热编码,这样会不会效果好一些 |
Co-authored-by: wuzhi456 <217564920+wuzhi456@users.noreply.github.com>
Adds explicit missing value handling in the training notebook and improves categorical feature encoding.
Changes Made
DataPreprocessor.handle_missing_values()for both train and test dataframes.gitignore: Standard Python ignores for cache files, virtual envs, IDE fileshandle_missing_valuesmethod: Replaceddf[col].fillna(..., inplace=True)withdf[col] = df[col].fillna(...)to properly handle pandas copy behavior and avoid FutureWarningLabelEncoderwithOneHotEncoderfor categorical features. This avoids imposing ordinal relationships on categorical data and may improve model performance. Feature dimension increases from 14 to 104 due to the one-hot encoding.Usage
The
handle_missing_valuesmethod replaces '?' with NaN, then fills categorical columns with mode and numerical columns with median.Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.