-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Sure! Here's the GitHub-flavored Markdown version, ready to paste into a GitHub issue:
Hi LIgO team,
I'm encountering an error when trying to import data using the IReceptor format in LIgO, as I was testing the ExperimentalImport data generator model.
The dataset was downloaded directly from the iReceptor Gateway and was not modified. It includes a ZIP file containing AIRR-compliant TSV files and a metadata.tsv.
My goal was to use ExperimentalImport in its simplest form, just to make sure it works as expected, but I keep getting an error during parsing.
YAML (dataset + simulation snippet)
definitions:
datasets:
experimental_dataset:
format: IReceptor
params:
path: data.zip
is_repertoire: true
import_productive: true
import_with_stop_codon: false
import_out_of_frame: false
import_illegal_characters: false
import_empty_nt_sequences: true
import_empty_aa_sequences: false
region_type: FULL_SEQUENCE
separator: "\t"
column_mapping:
junction: sequences
junction_aa: sequence_aas
v_call: v_alleles
j_call: j_alleles
locus: chains
duplicate_count: counts
sequence_id: sequence_identifiers
simulations:
sim1:
is_repertoire: true
paired: false
sequence_type: amino_acid
simulation_strategy: Implanting
remove_seqs_with_signals: false
sim_items:
experimental_dataset:
generative_model:
type: ExperimentalImport
import_format: IReceptor
import_params: {}
tmp_import_path: tmp_ligo_import
immune_events:
dummy_event: true
number_of_examples: 100
is_noise: false
receptors_in_repertoire_count: 50
signals: {}
instructions:
inst1:
type: LigoSim
simulation: sim1
export_p_gens: false
max_iterations: 1
number_of_processes: 1
sequence_batch_size: 100
output:
format: HTMLData
- The file at
data.zipwas not modified after downloading from iReceptor. - It contains multiple rearrangement
.tsvfiles and a validmetadata.tsvfile. - I also tried extracting the ZIP file and pointing directly to a
.tsvfile inside it, but the same error occurred. - The column mapping in the YAML matches the field names in the dataset.
Error
When I run LIgO with this YAML file, I get the following error. The traceback shows it fails during parsing at the point where airr.load_rearrangement() is called:
...
File ".../airr/interface.py", line 103, in load_rearrangement
df = pd.read_csv(filename, sep='\t', header=0, index_col=None,
dtype=schema.pandas_types(), true_values=schema.true_values,
false_values=schema.false_values)
...
ValueError: No objects to concatenate
...
ImmuneMLParser: an error occurred during parsing in function parse_dataset
What I’ve verified
- The data structure follows the AIRR standard.
- The ZIP includes rearrangement
.tsvfiles and a propermetadata.tsv. - The YAML mappings are correct and complete.
- The dataset was downloaded as-is, no post-processing.
- I tried tweaking parameters like
region_type, but it made no difference. - I simply want to run a minimal simulation that produces HTML output based on real-world data.
Question
Is this a known issue when importing iReceptor data with ExperimentalImport?
Could it be due to how LIgO handles repertoire datasets in ZIP format or a mismatch in expected schema?
Any guidance would be appreciated!
Thanks in advance for your support