Long Column names are unexpectedly dropped in training

**System Information (please complete the following information):**
 - OS & Version:  Windows 11 
 - ML.NET Version: ML.NET 1.6.0
 - .NET Version: .NET 6.0

**Describe the bug**
Dataset may include long column names. In my case, they are about 150 characters long. Name includes a-Z 0-9 and the dash character. ColumnInference reports them correctly. However, after starting training with AutoML, the columns are not used for training. If there are only long titles error about missing "Features" column is thrown.



**To Reproduce**
Steps to reproduce the behavior:
1. Create dataset with long column names (numeric in my case)
2.  Column inference reports them correctly:
`ColumnInferenceResults columnInference = mlContext.Auto().InferColumns(TrainDataPath, LabelColumnName, groupColumns: false);`
3. Train:
                 `   experimentResult = experiment.Execute(TrainDataView, ValidationDataView, columnInformation, null, progressHandler);`
4. Observe exception about missing Features.
5. Rename columns to shorter manually, or in a loop to confirm training now works. This can be also used as a workaround for now.
```
var copyPipeline= mlContext.Transforms.CopyColumns("col" + i, col.Name);
OriginalTrainDataView = pipeline.Fit(OriginalTrainDataView).Transform(OriginalTrainDataView);

```
Note: I have tree-based algorithms enabled.

**Expected behavior**
Long column names should be trained normally. 
If not possible, an exception should be received. Now user might think all data is being used to train but actually some columns may be ignored.

It is possible Verbose level would give information about this, but it is disabled by default in AutoML. I did not run separately with verbose output.

**Additional data**
There may be many reasons why dataset could include long column names. For example, they may have name, id and settings of a measurement device included in the column name.

If possible, I'd like to know what is currently the column length limit even if this would be fixed. That helps know which fields have been ignored in earlier models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long Column names are unexpectedly dropped in training #6045

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Long Column names are unexpectedly dropped in training #6045

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions