Skip to content

ExperimentalImport: IReceptor import fails with "No objects to concatenate" on unmodified iReceptor data #6

@Reihaneh-Zarrin

Description

@Reihaneh-Zarrin

Sure! Here's the GitHub-flavored Markdown version, ready to paste into a GitHub issue:


Hi LIgO team,

I'm encountering an error when trying to import data using the IReceptor format in LIgO, as I was testing the ExperimentalImport data generator model.

The dataset was downloaded directly from the iReceptor Gateway and was not modified. It includes a ZIP file containing AIRR-compliant TSV files and a metadata.tsv.

My goal was to use ExperimentalImport in its simplest form, just to make sure it works as expected, but I keep getting an error during parsing.


YAML (dataset + simulation snippet)

definitions:
  datasets:
    experimental_dataset:
      format: IReceptor
      params:
        path: data.zip
        is_repertoire: true
        import_productive: true
        import_with_stop_codon: false
        import_out_of_frame: false
        import_illegal_characters: false
        import_empty_nt_sequences: true
        import_empty_aa_sequences: false
        region_type: FULL_SEQUENCE
        separator: "\t"
        column_mapping:
          junction: sequences
          junction_aa: sequence_aas
          v_call: v_alleles
          j_call: j_alleles
          locus: chains
          duplicate_count: counts
          sequence_id: sequence_identifiers

  simulations:
    sim1:
      is_repertoire: true
      paired: false
      sequence_type: amino_acid
      simulation_strategy: Implanting
      remove_seqs_with_signals: false
      sim_items:
        experimental_dataset:
          generative_model:
            type: ExperimentalImport
            import_format: IReceptor
            import_params: {}
            tmp_import_path: tmp_ligo_import
          immune_events:
            dummy_event: true
          number_of_examples: 100
          is_noise: false
          receptors_in_repertoire_count: 50
          signals: {}

instructions:
  inst1:
    type: LigoSim
    simulation: sim1
    export_p_gens: false
    max_iterations: 1
    number_of_processes: 1
    sequence_batch_size: 100

output:
  format: HTML

Data

  • The file at data.zip was not modified after downloading from iReceptor.
  • It contains multiple rearrangement .tsv files and a valid metadata.tsv file.
  • I also tried extracting the ZIP file and pointing directly to a .tsv file inside it, but the same error occurred.
  • The column mapping in the YAML matches the field names in the dataset.

Error

When I run LIgO with this YAML file, I get the following error. The traceback shows it fails during parsing at the point where airr.load_rearrangement() is called:

...
File ".../airr/interface.py", line 103, in load_rearrangement
  df = pd.read_csv(filename, sep='\t', header=0, index_col=None,
                   dtype=schema.pandas_types(), true_values=schema.true_values,
                   false_values=schema.false_values)
...
ValueError: No objects to concatenate
...
ImmuneMLParser: an error occurred during parsing in function parse_dataset

What I’ve verified

  • The data structure follows the AIRR standard.
  • The ZIP includes rearrangement .tsv files and a proper metadata.tsv.
  • The YAML mappings are correct and complete.
  • The dataset was downloaded as-is, no post-processing.
  • I tried tweaking parameters like region_type, but it made no difference.
  • I simply want to run a minimal simulation that produces HTML output based on real-world data.

Question

Is this a known issue when importing iReceptor data with ExperimentalImport?
Could it be due to how LIgO handles repertoire datasets in ZIP format or a mismatch in expected schema?

Any guidance would be appreciated!

Thanks in advance for your support

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions