Skip to content

Dataset import error (Datumaro v1) - ColumnNotFoundError #5763

@leoll2

Description

@leoll2

Steps to reproduce

  1. Import the following dataset as a new project: https://intel-my.sharepoint.com/:u:/p/leonardo_lai/IQC_nFb6QfS_TrUMTK4W3aExAWv6uhI9YrzyL0kZBN9FPfM?e=l4now7
    This dataset was exported from a detection project of Geti 2.x in Datumaro format
  2. Select 'detection' task type and all labels

Expected

The project is successfully created and populated with the dataset.

Actual

The project is created (empty) but the import fails.

Error:

  File "/home/leoll2/training_extensions/application/backend/app/execution/dataset_import/import_as_new_project.py", line 89, in prepare_dataset
    dataset = dataset.filter_by_labels(labels=params.labels, keep_empty_samples=params.include_unannotated)
              │       │                       │      │                          │      └ True
              │       │                       │      │                          └ ImportDatasetAsNewProjectJobParams(staged_dataset_id=UUID('5432e251-8107-4660-bb78-4895069e2885'), project_name='Birds', task...
              │       │                       │      └ ['American', 'Downy', 'Pileated']
              │       │                       └ ImportDatasetAsNewProjectJobParams(staged_dataset_id=UUID('5432e251-8107-4660-bb78-4895069e2885'), project_name='Birds', task...
              │       └ <function Dataset.filter_by_labels at 0x7bb1066605e0>
              └ <datumaro.experimental.dataset.Dataset object at 0x7bb0c62d51d0>

  File "/home/leoll2/training_extensions/application/backend/.venv/lib/python3.13/site-packages/datumaro/experimental/dataset.py", line 703, in filter_by_labels
    filtered_df = filter_df_by_label_indices(
                  └ <function filter_df_by_label_indices at 0x7bb106623240>
  File "/home/leoll2/training_extensions/application/backend/.venv/lib/python3.13/site-packages/datumaro/experimental/filtering/label_filter.py", line 338, in filter_df_by_label_indices
    return _filter_list_label(df, label_field_name, label_indices, schema, label_field_instance, keep_empty_samples)
           │                  │   │                 │              │       │                     └ True
           │                  │   │                 │              │       └ LabelField(semantic='default', dtype=UInt8, multi_label=False, is_list=True)
           │                  │   │                 │              └ Schema(attributes={'id': AttributeInfo(type=str | None, field=StringField(semantic='id', is_list=False, dtype=String), catego...
           │                  │   │                 └ [1, 0, 2]
           │                  │   └ 'label'
           │                  └ shape: (48, 5)
           │                    ┌────────────┬─────────┬──────────────────────────┬──────────┬─────────────────────────────────┐
           │                    │ image_info ...
           └ <function _filter_list_label at 0x7bb106623060>
  File "/home/leoll2/training_extensions/application/backend/.venv/lib/python3.13/site-packages/datumaro/experimental/filtering/label_filter.py", line 259, in _filter_list_label
    result = df.with_columns(columns_to_update)
             │  │            └ [<Expr ['col("label").list.eval(.when(e…'] at 0x7BB0C62AC950>, <Expr ['as_struct("bboxes", col("label…'] at 0x7BB0C62ACE50>]
             │  └ <function DataFrame.with_columns at 0x7bb12991bec0>
             └ shape: (48, 5)
               ┌────────────┬─────────┬──────────────────────────┬──────────┬─────────────────────────────────┐
               │ image_info ...
  File "/home/leoll2/training_extensions/application/backend/.venv/lib/python3.13/site-packages/polars/dataframe/frame.py", line 10335, in with_columns
    .collect(optimizations=QueryOptFlags._eager())
                           │             └ <staticmethod(<function QueryOptFlags._eager at 0x7bb129a80a40>)>
                           └ <class 'polars.lazyframe.opt_flags.QueryOptFlags'>
  File "/home/leoll2/training_extensions/application/backend/.venv/lib/python3.13/site-packages/polars/_utils/deprecation.py", line 97, in wrapper
    return function(*args, **kwargs)
           │         │       └ {'optimizations': <polars.lazyframe.opt_flags.QueryOptFlags object at 0x7bb0c6201f30>}
           │         └ (<LazyFrame at 0x7BB0C62024E0>,)
           └ <function LazyFrame.collect at 0x7bb129a97600>
  File "/home/leoll2/training_extensions/application/backend/.venv/lib/python3.13/site-packages/polars/lazyframe/opt_flags.py", line 324, in wrapper
    return function(*args, **kwargs)
           │         │       └ {'optimizations': <polars.lazyframe.opt_flags.QueryOptFlags object at 0x7bb0c51801d0>}
           │         └ (<LazyFrame at 0x7BB0C62024E0>,)
           └ <function LazyFrame.collect at 0x7bb129a97560>
  File "/home/leoll2/training_extensions/application/backend/.venv/lib/python3.13/site-packages/polars/lazyframe/frame.py", line 2429, in collect
    return wrap_df(ldf.collect(engine, callback))
           │       │   │       │       └ None
           │       │   │       └ 'auto'
           │       │   └ <method 'collect' of 'builtins.PyLazyFrame' objects>
           │       └ <builtins.PyLazyFrame object at 0x7bb0c62d1290>
           └ <function wrap_df at 0x7bb129b43c40>

polars.exceptions.ColumnNotFoundError: unable to find column "label"; valid columns: ["image_info", "subset", "bboxes", "labels", "image_path"]

Metadata

Metadata

Labels

Geti Tune BackendIssues related to Geti Tune backend

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions