Skip to content

Error running MiXCR on ONT long read data #2052

@e-antoun

Description

@e-antoun

Hi,

We are running MiXCR on some long read TCR enriched data that was generated using ONT but have been getting a few issues. We processed our reads using Pychopper, and then trimmed the TSO/Read1 and barcodes/UMIs that are present at the start of the reads using cutadapt. Given the error rates with ONT sequencing, we wanted to correct the reads which we did using isONcorrect. Without correction, MiXCR runs fine and completes, however using the files that were corrected, we are getting the error. We are correcting the cutadapt trimmed reads, and then using the read IDs, prepending the UMI/barcodes back to the start of the reads so they can be used for refineTagsAndSort in MiXCR

For reference, the structure of the reads post correction and with the prepended sequence is:

^Read1primer_CELLBARCODE{16}_UMI{12}_Read.........

Actual Result

The alignment seems to run fine, but the second step, the refineTagsAndSort step is where it seems to error out. Even if I manually run refineTagsAndSort on the .vdjca file that gets generated in the align step, with --dont-correct set, I still get the same error below:

>>>>>>>>>>>>>>>>>>>>>>> mixcr align <<<<<<<<<<<<<<<<<<<<<<<
Running:
mixcr align -f --report ./OX6_LN_MiXCR_test.align.report.txt --json-report ./OX6_LN_MiXCR_test.align.report.json --threads 30 --preset local:10x-ont-modified-v1 --save-output-file-names ./OX6_LN_MiXCR_test.align.list.tsv --keep-non-CDR3-alignments --rna --species hsa --tag-pattern ^CTACACGACGCTCTTCCGATCT(CELL:N{16})(UMI:N{12})(R1:*) --set-whitelist CELL=file:/well/dong/users/uap089/SPTCRseq/SPTCR-Seq-Pipeline//Reference/Barcodes/visium_bc.tsv --not-aligned-R1 ./OX6_LN_MiXCR_test_not_aligned.fastq.gz /well/dong/users/uap089/SPTCRseq/Experiment2_Oct2025/OX6_LN_MiXCR_test/isOnCorrect_OX6_cutadapt/corrected_with_prefix.fq ./OX6_LN_MiXCR_test.alignments.vdjca
The following tags and their roles will be associated with each output alignment:
  Payload tags: R1
  Cell tags: CELL(SQ)
  Molecule tags: UMI(SQ)
Alignment: 0%
Alignment: 10.1%  ETA: 00:08:48
Alignment: 20.1%  ETA: 00:07:25
Alignment: 30.2%  ETA: 00:06:43
Alignment: 40.5%  ETA: 00:05:06
Alignment: 50.6%  ETA: 00:03:40
Alignment: 60.6%  ETA: 00:02:13
Alignment: 70.6%  ETA: 00:01:40
Alignment: 80.8%  ETA: 00:01:09
Alignment: 90.8%  ETA: 00:00:38
====================== report: align ======================
Analysis time: 11.92m
Total sequencing reads: 7206039
Successfully aligned reads: 1484064 (20.59%)
Coverage (percent of successfully aligned):
  CDR3: 1200381 (80.88%)
  FR3_TO_FR4: 999850 (67.37%)
  CDR2_TO_FR4: 969084 (65.3%)
  FR2_TO_FR4: 766117 (51.62%)
  CDR1_TO_FR4: 723769 (48.77%)
  VDJRegion: 644098 (43.4%)
Alignment failed: no hits (not TCR/IG?): 4564144 (63.34%)
Alignment failed: absent barcode: 1157831 (16.07%)
Overlapped: 0 (0%)
Overlapped and aligned: 0 (0%)
Overlapped and not aligned: 0 (0%)
Alignment-aided overlaps, percent of overlapped and aligned: 0 (NaN%)
No CDR3 parts alignments, percent of successfully aligned: 73657 (4.96%)
Partial aligned reads, percent of successfully aligned: 210026 (14.15%)
Realigned with forced non-floating bound: 0 (0%)
Realigned with forced non-floating right bound in left read: 0 (0%)
Realigned with forced non-floating left bound in right read: 0 (0%)
TRA chains: 1335 (0.09%)
TRA non-functional: 187 (14.01%)
TRB chains: 6451 (0.43%)
TRB non-functional: 1048 (16.25%)
TRD chains: 109 (0.01%)
TRD non-functional: 17 (15.6%)
TRG chains: 9082 (0.61%)
TRG non-functional: 35 (0.39%)
IGH chains: 80515 (5.43%)
IGH non-functional: 20620 (25.61%)
TRAD chains: 1837 (0.12%)
TRAD non-functional: 0 (0%)
IGK chains: 825456 (55.62%)
IGK non-functional: 190345 (23.06%)
IGL chains: 559279 (37.69%)
IGL non-functional: 85573 (15.3%)
Tag parsing report:
  Execution time: 0ns
  Total reads: 7206039
  Matched reads: 6048208 (83.93%)
  Projection +R1: 6048208 (83.93%)
  For variant 0:
    For projection +R1:
      CELL:Left position: 22
      UMI:Left position: 38
      CELL:Right position: 38
      R1:Left position: 50
      UMI:Right position: 50
      Variants: 0
      Cost: 0
      CELL length: 16
      UMI length: 12
      R1 length:
        26~205: + 910772 (15.06%) = 910772 (15.06%)
        206~342: + 908487 (15.02%) = 1819259 (30.08%)
        343~454: + 914904 (15.13%) = 2734163 (45.21%)
        455~555: + 911746 (15.07%) = 3645909 (60.28%)
        556~693: + 908949 (15.03%) = 4554858 (75.31%)
        694~32061: + 1493350 (24.69%) = 6048208 (100%)

>>>>>>>>>>>>>>>>> mixcr refineTagsAndSort <<<<<<<<<<<<<<<<<
Running:
mixcr refineTagsAndSort -f --report ./OX6_LN_MiXCR_test.refine.report.txt --json-report ./OX6_LN_MiXCR_test.refine.report.json ./OX6_LN_MiXCR_test.alignments.vdjca ./OX6_LN_MiXCR_test.refined.vdjca
Sorting will be applied to the following tags: CELL, UMI
The following whitelist will be used for CELL: WhitelistFromAddress....(really long output)

Initialization: progress unknown
Initialization: 27.5%
Initialization: 62.8%  ETA: 00:00:01
Initialization: 100%  ETA: 00:00:00
Writing CELL: 35.4%
Writing CELL: 76.4%  ETA: 00:00:00
Processing UMI: 0.6%
Processing UMI: 11.3%  ETA: 00:00:41
Processing UMI: 21.3%  ETA: 00:00:39
Processing UMI: 32.6%  ETA: 00:00:29
Processing UMI: 43.4%  ETA: 00:00:26
Processing UMI: 54%  ETA: 00:00:21
Processing UMI: 64.8%  ETA: 00:00:16
Processing UMI: 75.5%  ETA: 00:00:11
Processing UMI: 86.8%  ETA: 00:00:05
Processing UMI: 98.7%  ETA: 00:00:00
Filtering: progress unknown
Final sorting: 5%
Final sorting: 18%  ETA: 00:00:06
Final sorting: 29.2%  ETA: 00:00:06
Final sorting: 41.5%  ETA: 00:00:04
Final sorting: 53.3%  ETA: 00:00:04
Final sorting: 66%  ETA: 00:00:02
Final sorting: 76.9%  ETA: 00:00:02
Final sorting: 88.5%  ETA: 00:00:00
Please copy the following information along with the stacktrace:
   Version: 4.7.0; built=Wed Aug 07 20:19:48 BST 2024; rev=976ba14139; lib=repseqio.v5.1
        OS: Linux
      Java: 17.0.6
  Abs path: /gpfs3/well/dong/users/uap089/SPTCRseq/Experiment2_Oct2025/OX6_LN_MiXCR_test/isOnCorrect_OX6_cutadapt/tmp
  Cmd args: refineTagsAndSort -f --report ./OX6_LN_MiXCR_test.refine.report.txt --json-report ./OX6_LN_MiXCR_test.refine.report.json ./OX6_LN_MiXCR_test.alignments.vdjca ./OX6_LN_MiXCR_test.refined.vdjca
picocli.CommandLine$ExecutionException: Error while running command refineTagsAndSort com.milaboratory.mixcr.cli.TagCorrectionError: Error on tag correction of []
        at com.milaboratory.mixcr.cli.Main.registerExceptionHandlers$lambda-17(SourceFile:420)
        at picocli.CommandLine.execute(CommandLine.java:2088)
        at com.milaboratory.mixcr.cli.Main.execute(SourceFile:105)
        at com.milaboratory.mixcr.cli.CommandAnalyze$Cmd$PlanBuilder.executeSteps(SourceFile:543)
        at com.milaboratory.mixcr.cli.CommandAnalyze$Cmd.run0(SourceFile:500)
        at com.milaboratory.mixcr.cli.MiXCRCommand.run(SourceFile:37)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
        at com.milaboratory.mixcr.cli.Main.registerLogger$lambda-32(SourceFile:539)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        at com.milaboratory.mixcr.cli.Main.execute(SourceFile:105)
        at com.milaboratory.mixcr.cli.Main.main(SourceFile:101)
Caused by: com.milaboratory.mixcr.cli.TagCorrectionError: Error on tag correction of []
        at com.milaboratory.mixcr.cli.CommandRefineTagsAndSort$Cmd.run1(SourceFile:375)
        at com.milaboratory.mixcr.cli.MiXCRCommandWithOutputs.run0(SourceFile:69)
        at com.milaboratory.mixcr.cli.MiXCRCommand.run(SourceFile:37)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
        at com.milaboratory.mixcr.cli.Main.registerLogger$lambda-32(SourceFile:539)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        ... 15 more
Caused by: java.lang.IllegalStateException: Can't assemble logarithmic histogram for data values less then or equal to zero
        at com.milaboratory.mitool.refinement.gfilter.HistKt.collect(SourceFile:105)
        at com.milaboratory.mitool.refinement.gfilter.GroupFilter$filter$metricValues$lambda$17$$inlined$doAfterLastOrClose$1.close(SourceFile:299)
        at com.milaboratory.mitool.refinement.gfilter.GroupFilter$filter$5$createPort$lambda$11$$inlined$doAfterLastOrClose$1.take(SourceFile:275)
        at cc.redberry.pipe.util.CountingOutputPort$Companion$wrap$1.take(CountingOutputPort.kt:35)
        at com.milaboratory.o.FA.take(SourceFile:25)
        at cc.redberry.pipe.util.Chunk.readChunk(Chunk.java:78)
        at cc.redberry.pipe.CUtils$2.take(CUtils.java:169)
        at cc.redberry.pipe.CUtils$2.take(CUtils.java:161)
        at cc.redberry.pipe.blocks.O2ITransmitter.run(O2ITransmitter.java:63)
        at java.base/java.lang.Thread.run(Thread.java:833)

I know what the error literally means, but in the case of what it's doing, I'm not sure what the error means and I can't seem to figure it out

Exact MiXCR commands

INPUT=$outfolder/corrected_with_prefix.fq
$mixcr_ex analyze local:10x-ont-modified-v2 
     $INPUT 
    ./${SAMPLE_NAME} 
    --set-whitelist CELL=file:bc.tsv 
    --species hsa 
    --threads 30 
    --keep-non-CDR3-alignments 
    --not-aligned-R1 ./${SAMPLE_NAME}_not_aligned.fastq.gz 
    --rna 
    --tag-pattern "^CTACACGACGCTCTTCCGATCT(CELL:N{16})(UMI:N{12})(R1:*)" 
    -f

We are using a custom preset that was created for: #1681
Works:
ont-modified-v2.yaml
Doesn't work:
ont-modified-v1.yaml

For it to work with the corrected reads, I had to change the following lines from 'true' to 'false': lines 179, 205, 235. Keeping them as 'true', MiXCR runs for the uncorrected reads. Once they are changed to 'false', I am able to run the command for the corrected reads. Regardless of if I do the correction with or without the barcodes/UMIs trimmed, I need those lines to be 'false' for it to run with the isONcorrected reads.

My issue is I'm not sure what that part of the configuration is doing. I'm just wondering whether I'm fine with just setting those lines as 'false'?

Any help would be much appreciated, and please do let me know if you require any further information!

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions