-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathmain.tex
More file actions
2585 lines (2358 loc) · 130 KB
/
main.tex
File metadata and controls
2585 lines (2358 loc) · 130 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[12pt]{article}
\usepackage[utf8]{inputenc}
\usepackage[letterpaper,margin=1.0in]{geometry}
%% Some formatting stuff
\usepackage{authblk}
\usepackage{fancyhdr}
%\usepackage{lineno}
\usepackage{siunitx}
\usepackage{hyperref}
\usepackage{booktabs, array, longtable}
\usepackage{multirow}
\pagestyle{fancy}
\setlength{\headheight}{14.5pt} % Fix fancyhdr warning
% Custom citep command for biblatex
\newcommand{\citep}{\parencite}
\fancyhead[R]{\textbf{Cephalopod Dosage Compensation}}
% for figures
\usepackage{graphicx}
\usepackage{wrapfig}
\usepackage{float}
\usepackage{breakcites}
\usepackage{nameref}
\usepackage{amsmath}
\usepackage{xspace}
%\usepackage[figuresonly,nolists,nomarkers]{endfloat}
%\renewcommand{\processdelayedfloats}{}
\graphicspath{ {./figures/} }
\newcommand{\comment}[1]{{\color{blue} #1}}
\newcommand{\illex}{\textit{Illex illecebrosus}\xspace}
% for hyperlinks
\hypersetup{
colorlinks=true,
citecolor=black,
urlcolor=cyan,
linkcolor=blue
}
\urlstyle{same}
% for highlighting text
\usepackage{xcolor}
\usepackage{soul}
\usepackage{tikz}
% Colors for chromatin modifier tables
\definecolor{fcol}{HTML}{C0392B}
\definecolor{mcol}{HTML}{2471A3}
\definecolor{switchcol}{HTML}{8E44AD}
\newcommand{\isoswitch}{\textsuperscript{\textcolor{switchcol}{\textdagger}}}
% bibliography
\usepackage[backend=biber,style=authoryear]{biblatex}
\addbibresource{refs.bib}
\addbibresource{methods_refs_fixed.bib}
%\linenumbers
\renewcommand*{\bibfont}{\fontsize{10}{12}\selectfont}
\newcommand{\beginsupplement}{%
\setcounter{table}{0}
\renewcommand{\thetable}{S\arabic{table}}%
\setcounter{figure}{0}
\renewcommand{\thefigure}{S\arabic{figure}}%
\renewcommand{\theHtable}{supp.\arabic{table}}%
\renewcommand{\theHfigure}{supp.\arabic{figure}}%
}
\def\changemargin#1#2{\list{}{\rightmargin#2\leftmargin#1}\item[]}
\let\endchangemargin=\endlist
\title{Dosage compensation of the Z chromosome is ancestral and conserved across coleoid cephalopods}
\author[1, 2]{Scott T. Small}
\author[1, 2]{Silas Tittes}
\author[3,4]{Thomas Desvignes}
\author[1,4]{John H. Postlethwait}
\author[1, 2]{Andrew D. Kern}
\affil[1]{\small{University of Oregon, Institute of Ecology and Evolution}}
\affil[2]{\small{University of Oregon, Department of Biology}}
\affil[3]{\small{Department of Biology, University of Alabama at Birmingham}}
\affil[4]{\small{University of Oregon, Institute of Neuroscience}}
\date{\small{\today{}}}
\begin{document}
\maketitle
\begin{abstract}
Chromosome-wide dosage compensation has evolved repeatedly in animals, but its mechanisms are
known from only a handful of model systems. Coleoid cephalopods (squids, cuttlefish, and octopuses)
share a ZZ/ZO sex determination system in which females carry a single copy of a large Z chromosome,
making dosage compensation a potential necessity and allowing a decisive test of whether compensation
is a lineage-wide feature and how it is implemented. Using a new chromosome-level genome assembly for the squid \textit{Illex illecebrosus}
and comparative transcriptomics spanning seven cephalopod species across $>$300 million years,
we show that Z-linked expression is near-equalized between sexes across coleoids, indicating that
dosage compensation originated early in coleoid evolution.
We find that the Z is unusually enriched for chromatin-modifying genes and that sex-biased isoform
usage of a small set of Z-linked chromatin regulators tracks variation in Z-linked expression,
implicating chromatin-state control in chromosome-wide equalization. Strikingly, LINE-rich regions
on the Z are enriched for genes that escape compensation, consistent with TE-associated heterochromatin
forming barriers to upregulation. We further identify cis-element motifs within gene bodies associated with
compensated Z genes and discover two sex-biased Z-linked long non-coding RNAs,
conserved across $>$300~MY of cephalopod divergence,
containing conserved RNA-binding protein motifs that suggest an additional RNA-mediated regulatory axis.
These results establish cephalopods as an independently evolved model for dosage compensation and reveal a mechanism that
reuses familiar components---chromatin regulators, cis-elements, repeats, and lncRNAs---in an
unexpected configuration.
\end{abstract}
%% Abstract option B
% \begin{abstract}
% Chromosome-wide dosage compensation has evolved in multiple animal lineages, yet its genomic
% basis is understood in only a few systems. Coleoid cephalopods share a conserved Z chromosome,
% providing an opportunity to test whether dosage compensation is widespread and to infer its
% molecular architecture in a deeply divergent clade. Here we present a chromosome-level genome
% assembly for the northern shortfin squid \textit{Illex illecebrosus} and analyze comparative
% transcriptomes from seven cephalopod species spanning five orders and >300 million years of
% divergence. Across species, Z-linked expression is near-equalized between sexes, indicating
% that Z dosage compensation originated early in coleoid evolution and predates the split
% between octopuses and decapods.
% The Z chromosome is strongly enriched for chromatin-modifying genes (16 of 188 conserved Z genes),
% and these genes are themselves compensated, consistent with a model in which Z-linked epigenetic
% factors participate in maintaining chromosome-wide regulatory state.
% Sex-biased isoform usage of four Z-linked chromatin regulators is associated with 57\% of
% variance in Z:autosome expression, with the H3K9 demethylase \textit{KDM3A} as the strongest
% predictor.
% In contrast to the mammalian X repeat hypothesis, LINE-dense regions on the Z are enriched
% for genes that escape compensation, consistent with TE-associated heterochromatin limiting
% upregulation. We identify gene-body cis-element motifs that discriminate compensated from
% escaped Z genes, functionally analogous to \textit{Drosophila} MSL Recognition Elements, and
% we discover two deeply conserved sex-biased Z-linked long non-coding RNAs (\textit{Zmast} and
% \textit{Zfest}) with conserved RNA-binding protein sites suggestive of a parallel splicing-regulatory
% axis. Together, these results establish coleoids as a tractable new model for the evolution of
% chromosome-wide dosage compensation and implicate an emergent mechanism integrating chromatin
% modifiers, cis-elements, repetitive-element domain structure, and regulatory lncRNAs.
% \end{abstract}
\section*{Introduction}
Dosage compensation is the regulatory mechanism by which gene expression between
individuals with differing numbers of sex chromosomes is equalized.
Multiple lineages have independently evolved mechanisms to address this imbalance:
in mammals, one X chromosome is transcriptionally silenced in females via the \textit{Xist} long non-coding RNA \citep{lyon1961gene, brockdorff2015dosage};
in \textit{Drosophila}, males upregulate their single X chromosome via the Male-Specific Lethal (MSL) complex \citep{lucchesi2015dosage, conrad2012dosage};
and in \textit{Caenorhabditis elegans}, hermaphrodites downregulate both X chromosomes by half \citep{meyer2005x}.
The diversity of these mechanisms reflects the fact that dosage compensation
has evolved independently in many lineages,
each time in response to the degeneration of the sex-limited chromosome
\citep{lucchesi1978dosage}.
As the Y (or W) chromosome loses functional genes,
selection favors compensatory upregulation of the remaining copy
in the heterogametic sex.
Studies of neo-sex chromosomes in \textit{Drosophila} have shown
that this process can occur rapidly and through diverse molecular paths,
with different species independently co-opting the MSL dosage compensation machinery
via distinct mutational mechanisms \citep{ellison2019convergent}.
Despite its evolutionary utility, dosage compensation is not ubiquitous or mechanistically uniform.
In birds (males ZZ, females ZW), bulk RNA-seq studies long suggested only incomplete,
gene-by-gene compensation \citep{itoh2007dosage, mank2009w},
but recent work has revealed multi-layered mechanisms---including
increased transcriptional burst frequency and elevated translational rates in females
\citep{papanicolaou2025multilayered}
and male-specific miRNA-mediated transcript degradation
\citep{fallahshahroudi2025mirna}---that
achieve near-complete protein-level balance.
In Lepidoptera, \textit{Bombyx mori} achieves compensation
through partial repression of both Z chromosomes in males \citep{rosin2022dosage},
and in livebearing fish, complete X chromosome compensation
has evolved de novo in some lineages but not others \citep{darolti2019extreme}.
The diversity of solutions---even among ZW systems---underscores
that the molecular paths to dosage balance are highly contingent
on the preexisting genetic architecture of each lineage.
Cephalopods were long thought to lack genetic sex chromosomes:
classical karyotype analyses failed to reveal heteromorphic chromosomes
in octopuses, squids, or cuttlefish,
leading to hypotheses of environmental or polygenic sex determination.
Recent genomic work has overturned this view.
In \textcite{coffing2025cephalopod}, we demonstrated that across cephalopods,
females are hemizygous for a large chromosome,
establishing a ZZ/ZO system shared across all coleoid (squids, cuttlefish, and octopuses) lineages surveyed.
This Z chromosome is conserved across coleoids,
originating over 300 million years ago in a common ancestor.
In \textit{Nautilus pompilius}, a representative of the only surviving non-coleoid cephalopod lineage,
the syntenic region is used as an X chromosome \citep{torrado2025nautilus},
indicating that the ancestral sex chromosome has been independently repurposed
in the two surviving cephalopod lineages.
These discoveries position coleoid cephalopods among the few known invertebrate
groups with deeply conserved sex chromosomes,
comparable in age to the insect X chromosome \citep{toups2023x}
and substantially older than mammalian ($\sim$170~MY) or avian ($\sim$100~MY) sex chromosomes.
The open question is then if and how cephalopods achieve dosage compensation.
In a recent report, \textcite{papanicolaou2024z} provided the first evidence of dosage compensation
in any cephalopod, reporting M:F expression ratios of 1.04--1.15
across adult somatic tissues (brain sub-regions and arm) in two octopus species---notably
more complete than the 1.4--1.6 typical of birds \citep{mank2009w}.
They also identified two Z-linked lncRNAs with opposing sex specificities:
\textit{Zmast} (256-fold male-biased) and \textit{Zfest} (18-fold female-biased),
conserved between species separated by $\sim$2.5~MY,
whose molecular functions remain uncharacterized.
No comparable studies have been conducted outside of octopus,
leaving the prevalence and molecular basis of compensation across cephalopods unknown.
Here we present a chromosome-scale genome assembly of the northern shortfin squid,
\illex, and identify its Z chromosome via synteny and coverage analyses.
We then use transcriptomic data to test for dosage compensation across seven cephalopod species
spanning five orders and over 300 million years of divergence,
establishing that compensation of the Z chromosome is ancestral to coleoid cephalopods.
To investigate the molecular underpinnings of this compensation,
we characterize DNA methylation patterns,
identify an enrichment of chromatin-modifying genes on the conserved Z chromosome,
and show that sex-biased isoform switching of chromatin remodelers
predicts chromosome-wide compensation levels.
We find that LINE elements and CTCF insulators define structural domains on the Z,
paralleling Lyon's repeat hypothesis for the mammalian X \citep{lyon1998x},
and discover gene-body cis-elements that distinguish compensated Z genes from escapees,
analogous to the MSL Recognition Elements in \textit{Drosophila}.
Finally, we show that the sex-specific lncRNAs \textit{Zmast} and \textit{Zfest}
are conserved across all species examined.
Together, these results reveal a multi-layered dosage compensation system
in one of the oldest known sex chromosome systems in animals.
\begin{figure}[tbp]
\centering
\includegraphics[width=\linewidth]{figures/Figure1_update.pdf}
\caption{Conserved synteny of seven coleoid cephalopod genomes.
An uncalibrated ultrametric phylogeny (left) relates the seven species used in this study.
Ribbons connect syntenic blocks between adjacent species, colored by \illex\ chromosome identity.
The Z chromosome (red, rightmost) is conserved as a single syntenic unit across all species,
although its name varies (Z in \illex, \textit{O.\ bimaculoides}, and \textit{Sepia esculenta};
chromosome 46 in \textit{Sthenoteuthis oualaniensis}; 43 in \textit{D.\ pealeii}; 43 in \textit{E.\ scolopes};
NC16 in \textit{O.\ sinensis}).}
\label{fig:phylo}
\end{figure}
\section*{Results}
\subsection*{A chromosome-scale genome assembly for \textit{Illex illecebrosus}}
We generated a chromosome-scale genome assembly for an adult female \illex
using PacBio HiFi long reads, Illumina short reads, and Hi-C chromatin conformation capture data.
Hi-C scaffolding resolved 46 chromosomes,
consistent with the decapodiform karyotype of 2n = 92 \citep{gao1990karyological},
and the final assembly spans 4.14 Gb with 95.6\% BUSCO completeness
(mollusca\_odb12; Table~\ref{tab:assembly_stats}).
Gene annotation combining stranded total RNA-seq from both individuals
and PacBio Iso-Seq from the female (mantle tissue; see Methods),
together with \textit{ab initio} prediction, identified 23,192 protein-coding genes.
Because the RNA-seq libraries were prepared with a total RNA protocol (not poly-A selected),
non-polyadenylated transcripts including many lncRNAs are represented.
We also generated deep PacBio HiFi sequencing from a second adult male individual,
enabling direct comparison of read depth between sexes across all chromosomes.
Full assembly statistics and repeat composition are provided in the
\hyperref[sec:supp_assembly]{Supplemental Results}
(Tables~\ref{tab:assembly_stats} and \ref{tab:repeat_stats}).
\subsection*{Identification and conserved synteny of the Z chromosome in \illex}
Comparison of male and female read depth across assembled chromosomes
revealed that a single chromosome is at hemizygous coverage
in the female relative to the male, which we inferred to be the Z chromosome (Figure~\ref{fig:seq_depth}).
We next performed comparative genomic analyses with other cephalopod species for which sex chromosomes have been characterized.
Whole-genome conserved synteny analysis revealed a striking pattern of chromosomal conservation across coleoid cephalopods (Figure~\ref{fig:phylo}).
We compared the \illex assembly to published genomes of
\textit{Octopus bimaculoides}, \textit{O. sinensis}, \textit{Euprymna scolopes}, \textit{Sepia esculenta}, \textit{Doryteuthis pealeii}, and \textit{Sthenoteuthis oualaniensis}.
Chromosome-scale alignments showed that within decapods, synteny across all chromosome arms
is largely maintained, with a number of rearrangements, fusions, and fissions.
However, when comparing out to octopus, synteny breaks down on every chromosome except the Z.
The Z chromosome has maintained its integrity as a distinct chromosomal unit for over 300 million years of evolution,
making it one of the oldest known sex chromosome systems in animals.
%These syntenic relationships, combined with coverage analysis showing reduced female-to-male mapping ratios on chromosome 42,
%confirm that this chromosome represents the Z chromosome in \illex, consistent with the ZZ/ZO sex determination system described in other coleoid cephalopods.
\subsection*{Partial dosage compensation of the Z chromosome in \illex}
To assess whether \illex exhibits dosage compensation for Z-linked genes,
we analyzed sex-stratified RNA-seq data from somatic mantle tissue of adults.
We quantified gene expression levels across all chromosomes in a single male (ZZ) and female (ZO),
the same individuals used for genome assembly and coverage analysis,
using DESeq2 normalization (Figure~\ref{fig:cross_species_dc}).
We use two complementary metrics throughout:
Z:A is the ratio of mean Z-linked to mean autosomal expression level (TPM, filtered to genes with TPM~$> 1$;
a value of 1.0 indicates Z expression equal to the autosomal baseline),
and M:F is the per-gene median ratio of male to female expression
(1.0 = complete compensation; 2.0 = no compensation;
percent equalization = $(1/\text{M:F}) \times 100$,
e.g.\ M:F of 1.15 gives $(1/1.15) \times 100 = 87\%$).
Expression levels on autosomes were comparable between sexes,
and Z-linked gene expression in females was comparable to autosomal levels
despite females carrying only a single Z chromosome (Z:A = 1.59;
values above 1.0 in both sexes reflect the generally higher expression
of Z-linked genes relative to the genomic average),
indicating that the hemizygous Z undergoes transcriptional upregulation.
However, Z-linked genes showed modest but significant male bias,
with a median M:F ratio of 1.15 compared to 0.99 for autosomes
(p = 0.037, Wilcoxon rank-sum test; Figure~\ref{fig:cross_species_dc}).
These results are consistent with dosage compensation of the Z chromosome in \illex,
with females achieving $\sim$87\% of male expression levels for Z-linked genes,
though the absence of biological replicates for this species
limits the precision of this estimate.
\begin{figure}[tbp]
\centering
\includegraphics[width=\textwidth]{figures/illex_dosage_compensation_3panel.pdf}
\caption{Dosage compensation of the Z chromosome in \illex.
Left panels: gene expression levels [log2(normalized count + 1)] across all chromosomes
in females (ZO, top) and males (ZZ, bottom).
The Z chromosome (orange) shows expression comparable to autosomes (blue) in both sexes,
with Z:A ratios of 1.59 in females and 1.44 in males.
Right panel: male-to-female expression ratios for autosomal genes (n = 11,459)
and Z-linked genes (n = 188).
The Z chromosome shows a modest but significant male bias
(median M:F = 1.15 vs.\ 0.99 for autosomes; p = 0.037, Wilcoxon rank-sum test),
consistent with partial dosage compensation (n = 1 per sex).
Variation in median expression among autosomes (e.g.\ the relatively low values for chr15) reflects differences in gene number and composition across chromosomes.
Dosage compensation results for six additional cephalopod species, most with biological replication, are shown in Figures S1--S6.}
\label{fig:cross_species_dc}
\end{figure}
\begin{table}[tbp]
\centering
\caption{Dosage compensation of the Z chromosome across coleoid cephalopod species.
Z~M:F and Auto~M:F are the median per-gene male-to-female expression ratios
for Z-linked and autosomal genes, respectively
(1.0 = complete compensation; 2.0 = no compensation).
The p-value is from a Wilcoxon rank-sum test comparing Z and autosomal M:F distributions.
n~(F/M) indicates the number of female and male RNA-seq libraries.}
\label{tab:cross_species_dc}
\begin{tabular}{llrrrr}
\toprule
\textbf{Species} & \textbf{Order} & \textbf{Z M:F} & \textbf{Auto M:F} & \textbf{p-value} & \textbf{n (F/M)} \\
\midrule
\textit{Illex illecebrosus} & Oegopsida & 1.15 & 0.99 & 0.037 & 1/1 \\
\textit{Sthenoteuthis oualaniensis}$^*$ & Oegopsida & 1.13 & 1.04 & 0.010 & 16/16 \\
\textit{Doryteuthis pealeii} & Myopsida & 1.10 & 1.12 & 0.29 & 10/9 \\
\textit{Euprymna scolopes} & Sepiolida & 1.08 & 1.00 & 0.003 & 3/3 \\
\textit{Euprymna berryi} & Sepiolida & 0.97 & 0.97 & 0.48 & 2/2 \\
\textit{Sepia officinalis} & Sepiida & 1.03 & 0.99 & 0.021 & 4/4 \\
\textit{Octopus bimaculoides} & Octopoda & 1.08 & 1.03 & 0.20 & 15/15 \\
\bottomrule
\end{tabular}
\vspace{0.5em}
{\small $^*$Technical replicates from a single individual per sex; lacks biological replication.}
\end{table}
\subsection*{Dosage compensation is ancestral and conserved across coleoid cephalopods}
To determine whether dosage compensation of the Z chromosome is a shared feature of coleoid cephalopods
or a lineage-specific adaptation in \illex,
we analyzed sex-stratified, publicly available RNA-seq data from
six additional species representing five coleoid orders
across both major coleoid lineages (Decapodiformes and Octopodiformes):
\textit{Sthenoteuthis oualaniensis} (Oegopsida),
\textit{Doryteuthis pealeii} (Myopsida),
\textit{Euprymna scolopes} and \textit{Euprymna berryi} (Sepiolida),
\textit{Sepia officinalis} (Sepiida),
and \textit{Octopus bimaculoides} (Octopoda).
Sample sizes per species range from one to 15 libraries per sex
(Table~\ref{tab:cross_species_dc}).
For \textit{S. oualaniensis}, the 16 libraries per sex are technical replicates
from a single individual; for \textit{O. bimaculoides}, the 15 libraries per sex
represent multiple tissues from several individuals.
All other species have independent biological replicates.
For each species, we identified Z-linked genes via synteny with the \illex Z chromosome
and compared expression levels between males (ZZ) and females (ZO).
In every species examined, Z-linked gene expression in females was upregulated
to match or approach the level observed in males,
indicating active dosage compensation of the hemizygous Z chromosome
(Figure~\ref{fig:cross_species_dc}; Figures S1--S6).
Median male-to-female expression ratios for Z-linked genes ranged from 0.97 to 1.15 across species
(Table~\ref{tab:cross_species_dc}),
all within 15\% of 1.0 and far below the 2.0 expected in the absence of compensation.
Autosomal genes showed M:F ratios centered at 1.0 in all species,
confirming that the observed Z chromosome patterns reflect sex-linked regulation
rather than systematic biases in library preparation or normalization.
In three of the seven species---\textit{D. pealeii},
\textit{E. berryi}, and \textit{O. bimaculoides}---Z chromosome
M:F ratios were indistinguishable from autosomal ratios
(p = 0.20--0.48; Table~\ref{tab:cross_species_dc}),
indicating complete dosage compensation.
\textit{E. scolopes} and \textit{S. officinalis} showed statistically significant but minimal male bias
(M:F = 1.03--1.08; p $<$ 0.02), indicating near-complete compensation.
In \illex{} (n = 1/1) and \textit{S. oualaniensis},
Z-linked genes showed modest but significant residual male bias
(M:F = 1.13--1.15; p $<$ 0.05),
consistent with partial compensation---though the \illex{} estimate
should be interpreted cautiously given the lack of biological replication.
Even in these species, the degree of male bias is far less
than that observed in birds, where Z-linked M:F ratios typically range from 1.4 to 1.6
\citep{mank2009w}.
The presence of dosage compensation across all seven species,
spanning five orders and over 300 million years of divergence,
demonstrates that compensation of the Z chromosome is ancestral to coleoid cephalopods
and has been conserved throughout their evolutionary history.
Importantly, cephalopod dosage compensation is an independent evolutionary origin of dosage compensation
among major animal lineages, distinct from the mechanisms observed in mammals, insects, and birds,
and thus offers the opportunity to discover novel mechanisms of sex chromosome regulation.
\begin{figure}[tbp]
\centering
\begin{tikzpicture}
\node[inner sep=0pt] (fig) {\includegraphics[height=\textwidth, angle=90, trim=9.5cm 4cm 4.3cm 2.1cm, clip]{figures/fig3.pdf}};
\fill[white] ([xshift=-0.2cm,yshift=0.2cm]fig.north west) rectangle ++(2cm,-0.9cm);
\end{tikzpicture}
\caption{Female-specific hypermethylation of the Z chromosome in \illex.
(A) Differential DNA methylation (Male $-$ Female) across all chromosomes.
Autosomal chromosomes (blue) show values centered near zero, indicating equivalent methylation between sexes.
The Z chromosome (red) shows strongly negative values, indicating higher methylation in females.
(B) Summary comparison of differential methylation between autosomes and the Z chromosome.
The Z chromosome is significantly hypermethylated in females relative to males
(p = 5.37$\times$10$^{-44}$, Wilcoxon rank-sum test).
The extreme statistical significance reflects the large number of individual genes
in the rank-sum test (188 Z-linked vs.\ 11,459 autosomal), which provides
high power to detect distributional shifts even when box-plot whiskers overlap.}
\label{fig:methylation}
\end{figure}
\subsection*{DNA methylation patterns on the Z chromosome}
The conservation of dosage compensation across coleoid cephalopods
raises the question of what molecular mechanisms underlie this regulation.
Given the role of epigenetic modifications in gene regulation and dosage compensation in other systems,
we investigated DNA methylation patterns across the \illex genome.
We leveraged PacBio HiFi kinetic signatures from the male and female individual to assess CpG methylation levels.
Genome-wide, we observed a positive correlation between gene expression and methylation levels in both sexes,
with more highly expressed genes showing elevated methylation (females: r=0.396, p=2.16$\times$10$^{-9}$; males: r=0.318, p=5.10$\times$10$^{-6}$).
This pattern is consistent with gene-body methylation associated with active transcription,
as has been observed across invertebrates \citep{zemach2010genome, glastad2011dna}
including molluscs \citep{manner2021inference}.
Strikingly, when comparing methylation levels between sexes across chromosomes,
we found that the Z chromosome is significantly hypermethylated in females relative to males (Figure~\ref{fig:methylation}).
Differential methylation (Male $-$ Female) was calculated for each chromosome,
revealing that autosomal chromosomes show values centered near zero,
indicating equivalent methylation between sexes (Figure~\ref{fig:methylation}A).
In contrast, the Z chromosome exhibited strongly negative differential methylation,
demonstrating that females have substantially higher methylation on the Z than males
(p = 5.37$\times$10$^{-44}$, Wilcoxon rank-sum test; Figure~\ref{fig:methylation}B).
The Z chromosome also shows greater variance in differential methylation than individual autosomes,
likely reflecting the smaller number of Z-linked genes and the hemizygous state in females,
which eliminates allelic averaging.
This female-specific hypermethylation of the Z chromosome is notable given the positive correlation
between gene-body methylation and expression observed genome-wide.
The pattern suggests that elevated gene-body methylation on the hemizygous female Z
may contribute to its transcriptional upregulation toward diploid levels,
consistent with gene-body methylation facilitating active transcription.
If this association reflects a conserved feature of coleoid dosage compensation,
it would represent the longest-maintained epigenetic sex-chromosome signature described
in any animal---exceeding the $\sim$170~MY mammalian and $\sim$100~MY avian systems
by more than 100~MY.
Direct comparison of Z versus autosomal methylation within each sex
confirms that the effect is larger in females than males
(\hyperref[sec:supp_methylation]{Supplemental Results}; Figure~\ref{fig:meth_auto_vs_z}).
We report differential methylation as Male~$-$~Female rather than as a ratio
because the subtraction is symmetric around zero and directly interpretable
when comparing chromosomes with different baseline methylation levels.
However, the relationship between methylation and dosage compensation remains complex:
gene-body methylation correlates with accumulated transcript levels,
which reflect both transcription rate and mRNA stability,
and methylation differences alone do not fully explain the residual male bias in Z-linked expression.
\subsection*{The Z chromosome is enriched for chromatin-modifying genes}
Strong conservation of the Z chromosome across coleoid cephalopods implies
that its gene content has been subject to long-term selective pressures that
likely include the maintenance of dosage compensation.
If dosage compensation depends on chromosome-wide epigenetic regulation,
then genes encoding proteins that modify chromatin structure---histone
methyltransferases, acetyltransferases, remodelers, and related
factors (hereafter ``chromatin-modifying genes'')---might be selectively
retained on the Z to maintain \textit{cis}-regulatory control.
To test this prediction, we used reciprocal best-hit (RBH) orthology mapping
to identify genes with conserved Z-linkage
at increasing levels of phylogenetic stringency.
This gene-level orthology analysis complements the whole-chromosome
synteny shown in Figure~\ref{fig:phylo}: whereas synteny analysis reveals
that the Z chromosome is preserved as a macro-syntenic block,
the orthology approach asks which individual genes have remained on the Z
versus translocated to autosomes.
For these analyses, we included an eighth species,
\textit{Octopus vulgaris}, whose genome is available
and in which dosage compensation has been recently reported \citep{papanicolaou2024z}.
We anchored the analysis on \textit{O. bimaculoides}, for which we have the
deepest expression data (30 somatic samples; see below).
Of 639 genes annotated on the \textit{O. bimaculoides} Z chromosome,
20 are chromatin-modifying genes.
Requiring that a gene be Z-linked in all four of
\illex, \textit{O. bimaculoides}, \textit{S. oualaniensis}, and \textit{O. vulgaris}
retained 188 genes (29\% of the 639);
extending the requirement to all eight species reduced this to 132
(Table~\ref{tab:conserved_z_genes}).
The drop from 188 to 132 conserved genes when extending from four
to eight species represents 70\% retention---modest given the
$>$300~MY evolutionary distances involved.
\begin{table}[tbp]
\centering
\caption{Chromatin gene enrichment on the Z chromosome at increasing phylogenetic stringency.
As Z-linkage is required across more species, chromatin genes are preferentially retained.
OR and p-value are from one-sided Fisher's exact test comparing
conserved versus non-conserved Z genes (chromatin versus non-chromatin;
20 total chromatin genes among 639 Z-linked genes in \textit{O. bimaculoides}).}
\label{tab:z_enrichment}
\begin{tabular}{lrrrrl}
\toprule
\textbf{Stringency} & \textbf{Total genes} & \textbf{Chromatin} & \textbf{\% Chromatin} & \textbf{OR} & \textbf{p-value} \\
\midrule
4 species & 188 & 16 & 8.5\% & 10.4 & $2.9 \times 10^{-6}$ \\
5 species & 169 & 16 & 9.5\% & 12.2 & $5.6 \times 10^{-7}$ \\
6 species & 160 & 14 & 8.8\% & 7.6 & $2.2 \times 10^{-5}$ \\
7 species & 148 & 13 & 8.8\% & 6.7 & $5.9 \times 10^{-5}$ \\
8 species & 132 & 13 & 9.8\% & 7.8 & $1.5 \times 10^{-5}$ \\
\bottomrule
\end{tabular}
\end{table}
At every stringency level, the conserved Z-linked gene set
was strongly enriched for chromatin-modifying and epigenetic regulatory functions.
Of the 188 genes conserved across four species,
16 are chromatin-associated (8.5\%)---a $\sim$10-fold enrichment
over the 4 chromatin genes among 451 non-conserved Z genes
(Fisher exact $p < 10^{-6}$).
Remarkably, 13 of these 16 chromatin genes remain Z-linked in all eight species,
demonstrating that chromatin modifiers are among the most deeply conserved
elements of the Z chromosome
(Table~\ref{tab:z_enrichment}).
Among the Z-linked chromatin genes conserved across all eight species,
the most striking feature is the co-localization of regulators
from multiple independent chromatin-modifying pathways.
These include the H3K4 methyltransferase \textit{SET1B},
the MLL-complex scaffold \textit{MEN1},
the H2B ubiquitin ligase \textit{BRE1/RNF40},
the histone acetyltransferase \textit{CREBBP/CBP},
the SWR1 H2A.Z remodeler \textit{Domino/SRCAP},
the HIRA-complex chaperone \textit{UBN1},
and the heterochromatin regulator \textit{BAHD1}---all Z-linked in every species examined.
Additional deeply conserved Z-linked chromatin modifiers include
\textit{ING5} (HBO1/MOZ-associated PHD reader),
\textit{KIF4A} (chromatin condensation),
\textit{ELM2-domain} (NuRD/HDAC remodeler),
\textit{THOC6} (THO/TREX mRNA export), and \textit{PAN3} (mRNA deadenylase).
\textit{TDRD3} (Tudor-domain methyl reader), present in the 4-species core,
drops out at 6 species, suggesting lineage-specific loss from the Z.
More generally, translocation of a Z-linked gene to an autosome
would remove it from the compensation machinery that upregulates
Z-linked loci in females, potentially producing a dosage imbalance
that selects against such rearrangements.
Outright gene loss, by contrast, escapes this constraint.
Notably, the WRAD scaffold subunits shared by all COMPASS-family complexes
(\textit{WDR5}, \textit{RBBP5}, \textit{ASH2L}) are autosomal,
indicating that the Z-linked chromatin regulators
do not constitute a single complex
but rather represent convergent enrichment
of multiple independent chromatin pathways on the sex chromosome.
The involvement of multiple pathways---histone methylation, acetylation,
variant-histone deposition, and chromatin remodeling---is consistent
with the multi-layered nature of dosage compensation itself,
which likely requires coordinated regulation across several chromatin marks
rather than a single switch.
\begin{table}[tbp]
\centering
\caption{Expression of conserved Z-linked chromatin modifier genes
in \textit{O. bimaculoides} (30 somatic samples, 15F:15M).
log$_2$FC indicates male-to-female fold change; positive values indicate male bias.}
\label{tab:compass}
\vspace{0.5em}
\begin{tabular}{llrr}
\toprule
\textbf{Gene} & \textbf{Pathway} & \textbf{log$_2$FC} & \textbf{$p_\text{adj}$} \\
\midrule
\textit{THOC6} & THO/TREX & +0.52 & $9.6 \times 10^{-4}$ \\
\textit{Domino/SRCAP} & SWR1/H2A.Z & +0.59 & 0.027 \\
\textit{PAN3} & Deadenylation & +0.94 & 0.076 \\
\textit{CREBBP/CBP} & HAT & +0.61 & 0.089 \\
\textit{ELM2-domain} & NuRD/HDAC & +1.02 & 0.095 \\
\textit{MEN1} & MLL/H3K4me & +1.21 & 0.175 \\
\textit{TDRD3} & Tudor reader & $-$0.32 & 0.386 \\
\textit{SET1B} & H3K4me & +0.18 & 0.474 \\
\textit{ING5} & HBO1/MOZ & +0.15 & 0.571 \\
\textit{UBN1} & HIRA/H3.3 & +0.13 & 0.606 \\
\textit{BAHD1} & Heterochromatin & +0.23 & 0.628 \\
\textit{BRE1/RNF40} & H2Bub1 & +0.05 & 0.774 \\
\textit{KIF4A} & Condensation & +0.19 & 0.928 \\
\bottomrule
\end{tabular}
\end{table}
\subsection*{Sex-biased chromatin modifier expression in \textit{O. bimaculoides}}
To determine whether these Z-linked chromatin genes escape dosage compensation
or are themselves compensated,
we examined their expression in \textit{O. bimaculoides},
which provides the greatest statistical power
(30 somatic samples, 15F:15M).
Most conserved Z-linked chromatin genes show balanced expression
between sexes (Table~\ref{tab:compass}):
\textit{SET1B}, \textit{BRE1}, \textit{ING5}, \textit{TDRD3}, \textit{UBN1}, \textit{BAHD1}, \textit{CREBBP}, and \textit{KIF4A}
are all non-significant ($p_\text{adj} > 0.05$).
Three genes deviate from this pattern:
\textit{MEN1} trends male-biased (log$_2$FC = +1.21, $p_\text{adj}$ = 0.175),
\textit{THOC6} is significantly male-biased
(+0.52, $p_\text{adj}$ = $9.6 \times 10^{-4}$),
and \textit{ELM2-domain} trends male (+1.02, $p_\text{adj}$ = 0.095).
The compensation of most Z-linked chromatin genes is consistent with
a self-reinforcing regulatory loop:
if these genes encode the epigenetic machinery
that maintains permissive chromatin on the Z chromosome,
their own compensation would sustain expression from a single copy in females.
Genome-wide, PyDESeq2 identified 2,470 significantly sex-biased genes
in \textit{O. bimaculoides}
($p_\text{adj} < 0.05$, $|\text{log}_2\text{FC}| > 1$),
including 27 chromatin modifier isoforms (15 female-biased, 12 male-biased)
spanning SWI/SNF, KDM, CHD, HAT, and heterochromatin reader families
(\hyperref[sec:supp_chromatin]{Supplemental Results};
Tables~\ref{tab:chromatin-female} and \ref{tab:chromatin-male}).
Among these, four genes---\textit{PBRM1}, \textit{SMARCC2}, \textit{KDM3A}, and \textit{Domino}---show
significant isoforms in \emph{both} directions,
producing sex-biased isoform switching rather than simple up- or down-regulation.
In \textit{PBRM1} (Polybromo-1, PBAF complex),
the dominant female isoforms (.2/.4) and male isoform (.1) differ by 10-fold
in expression magnitude ($|\text{log}_2\text{FC}| > 9$),
indicating near-exclusive sex-specific promoter usage.
\textit{SMARCC2} (BAF170, SWI/SNF core subunit) and \textit{KDM3A} (H3K9 demethylase)
show more moderate switching ($|\text{log}_2\text{FC}|$ of 1--2),
and SUPPA2 alternative splicing analysis confirms sex-biased isoform usage
in both genes (\textit{SMARCC2}: 9 significant events; \textit{CHD5}: 6 significant events).
In both cases, females predominantly express full-length isoforms
while males produce truncated or NMD-targeted variants from alternative promoters.
To test whether sex-biased isoform usage of chromatin modifiers
directly predicts Z chromosome compensation levels,
we regressed expression-matched Z:A ratios on female isoform proportions
of the four genes with bidirectional isoform switching:
\textit{PBRM1}, \textit{SMARCC2}, \textit{KDM3A}, and \textit{Domino}.
A multiple regression model explained 56.8\% of Z:A variance
across 30 \textit{O. bimaculoides} somatic samples
($R^2 = 0.568$, $F = 8.20$, $p = 2.27 \times 10^{-4}$;
Figure~\ref{fig:za_regression}),
four-fold more than sex alone ($R^2 = 0.144$).
\textit{KDM3A}, an H3K9 demethylase, was the sole individually significant predictor
(coefficient = +17.96, $p = 0.008$, Bonferroni $p = 0.032$;
Figure~\ref{fig:kdm3a_za}):
samples with higher proportions of the female \textit{KDM3A} isoform
showed greater Z:A compensation.
This result links isoform-level regulation of a specific chromatin modifier
to chromosome-wide dosage compensation output,
consistent with a model in which sex-biased alternative splicing
of chromatin remodelers tunes Z chromosome expression.
\subsection*{LINE enrichment on the Z chromosome parallels the mammalian X}
Lyon (1998) proposed that LINE-1 elements on the mammalian X chromosome
serve as ``way stations'' for the spreading of X-inactivation from the \textit{Xist} locus,
an idea supported by the observation that the human X has twice the LINE density of autosomes
and that LINE-poor regions preferentially escape inactivation.
More generally, repeat-rich landscapes may facilitate
chromosome-scale chromatin states
by providing a homogeneous substrate
for epigenetic machinery to recognize and propagate along a chromosome
\citep[e.g.,][]{huang2022species}.
We note, however, that TE accumulation on sex chromosomes
can also arise from reduced effective population size
and recombination rates
rather than positive selection for a regulatory function;
the two explanations are not mutually exclusive.
In \textcite{coffing2025cephalopod} we discovered that LINE enrichment
is a conserved feature of cephalopod Z chromosomes:
in \textit{O. bimaculoides}, the Z harbors LINEs
at approximately twice the density of the genome-wide background
(Mann-Whitney $U$ test on 1~Mb windows, $p < 10^{-4}$),
and the same pattern holds in \textit{O. sinensis},
\textit{E. scolopes}, and \textit{S. esculenta}
(all $p < 10^{-3}$).
This led us to propose LINE density
as a signature of Z chromosomes across cephalopods.
We confirmed this enrichment in \textit{O. bimaculoides}
(Z ranks 1st of 30 chromosomes at 212,484 bp/Mb
versus an autosomal mean of 136,429 bp/Mb;
1.56-fold, $z = 4.92$ relative to the autosomal standard deviation)
and additionally found that SINEs are correspondingly depleted on Z
(42,370 bp/Mb versus an autosomal mean of 51,737 bp/Mb;
0.82-fold, $z = -1.68$),
producing a LINE-high/SINE-low signature
that parallels the mammalian X chromosome.
The dominant LINE family on Z by total content is RTE-BovB
(5.66~Mb, 45\% of Z LINE content; 1.53-fold enriched),
but the most enriched family is L1-Tx1 (2.06-fold, 2.38~Mb),
followed by Dong-R4 (1.69-fold, 0.71~Mb).
Crucially, LINE density on the Z predicts which genes escape dosage compensation.
Male-biased Z genes---those retaining the default twofold excess
expected from ZZ dosage---reside in regions with significantly
higher LINE density than female-biased genes
(mean 174.2 vs.\ 122.0 bp/kb in 10~kb flanking windows centered on each gene;
Mann-Whitney $U$ test, $p = 0.029$;
Figure~\ref{fig:line_density}).
This signal is driven by the aggregate LINE landscape rather than any single family:
RTE-BovB density alone does not differ significantly between sex-bias categories
(Mann-Whitney $p = 0.251$; Figure~\ref{fig:line_density}),
consistent with multiple LINE families contributing to the heterochromatic barrier.
The effect is nonetheless strikingly categorical:
in the highest RTE-BovB density quartile,
89\% of sex-biased genes are male-biased (8M:1F),
while lower quartiles show no consistent trend
(Figure~\ref{fig:rte_quartile}).
RTE-BovB and Dong-R4 occupy distinct spatial compartments on Z
(Spearman correlations across 500~kb sliding windows).
RTE-BovB is strongly correlated with gene density
($\rho$ = +0.68, $p = 7.4 \times 10^{-17}$),
marking the euchromatic compartment,
while Dong-R4 anticorrelates with gene density
($\rho$ = $-$0.60, $p = 1.3 \times 10^{-12}$)
and with RTE-BovB ($\rho$ = $-$0.61, $p = 4.6 \times 10^{-13}$),
defining the heterochromatic compartment.
FIMO scanning for CTCF binding motifs (JASPAR MA0139.2, $p < 10^{-4}$)
revealed that the Z chromosome also carries the highest CTCF density
of any chromosome after correcting for GC content and chromosome size
(+6.49 motifs/Mb above expectation;
rank 1/30 chromosomes, $z = 3.85$ relative to autosomal distribution).
The excess is predominantly intergenic (88\% of $\sim$460 excess sites),
consistent with insulator function.
One exceptional locus contains 35 CTCF motifs within a 44~kb intergenic region
separating the female-biased transcription factor \textit{MEIS2}
(log$_2$FC = $-$1.90, $p_\text{adj}$ = 0.043) from an adjacent male-biased gene
(log$_2$FC = $+$2.12, $p_\text{adj}$ = 0.036),
consistent with a strong insulator boundary between oppositely regulated domains.
CTCF and RTE-BovB are positively correlated along the Z
($\rho$ = +0.534, $p = 2.6 \times 10^{-18}$),
but at the local scale, CTCF motifs occupy gaps between LINE elements
rather than their immediate flanks
(observed inter-element distance 4,318 bp vs.\ expected 715~bp).
Hi-C chromatin conformation data from female \illex
confirmed this structural organization.
The Z chromosome exhibited the strongest A/B compartmentalization
of any chromosome in the genome ($z = 3.36$ for PC1 variance).
All Z-linked chromatin modifier genes reside in the A compartment (euchromatic),
while the pioneer transcription factor \textit{FoxA2}
uniquely occupies the B compartment---consistent with its role
in opening closed chromatin.
TE families define compartment boundaries:
in \illex, Penelope and RTE-RTE elements mark the B compartment
(Spearman $\rho$ with PC1 = $-$0.377 and $-$0.284, respectively),
while Satellite elements mark the A compartment ($\rho$ = +0.168).
Different TE families define compartments in different species---RTE-BovB in
\textit{O. bimaculoides} versus Penelope in \illex---but
the structural principle of TE-defined chromatin domains on Z is conserved.
Thus, while LINEs on the mammalian X are thought to facilitate the
\textit{spreading} of dosage compensation \citep{lyon1998x},
LINEs on the cephalopod Z instead demarcate regions that
\textit{escape} it---an inverse relationship that may reflect
fundamentally different compensation mechanisms acting on a shared
repeat architecture.
\begin{figure}[tbp]
\centering
\includegraphics[width=0.9\linewidth]{figures/line_density_sex_bias_boxplots.png}
\caption{LINE enrichment on the \textit{O. bimaculoides} Z chromosome predicts sex-biased expression.
Left: total LINE density in 10~kb flanking regions of Z-linked genes stratified by sex-bias category.
Male-biased genes reside in LINE-dense regions (174.2 bp/kb)
while female-biased genes occupy LINE-sparse regions (122.0 bp/kb; Mann-Whitney $p = 0.029$).
Right: RTE-BovB density alone does not differ significantly among categories ($p = 0.251$),
indicating that the signal reflects the aggregate LINE landscape rather than a single family.}
\label{fig:line_density}
\end{figure}
\subsection*{Gene-body cis-elements distinguish compensated Z genes from escapees}
In \textit{Drosophila}, the MSL complex achieves dosage compensation
by recognizing $\sim$150 Chromatin Entry Sites (CES)
that overlap gene bodies rather than promoters.
CES contain a 21-bp GA-rich MSL Recognition Element (MRE),
which is modestly enriched on the X ($\sim$2-fold)
and whose presence rather than copy number determines MSL targeting
\citep{alekseyenko2008high}.
To search for analogous cis-elements in cephalopods,
we performed discriminative motif discovery (STREME)
comparing compensated and male-biased (``escapee'') Z genes in \textit{O. bimaculoides},
identifying 10 Z-exclusive motifs enriched in compensated gene sequences.
When scanned across gene bodies (gene start to end + 2~kb downstream) using FIMO,
these motifs showed a striking association with compensation status.
Among the 47 significantly sex-biased Z genes,
all 18 female-biased genes carried at least one motif in their gene body,
while all 9 motif-free sex-biased genes were male-biased
(Fisher exact $p = 0.007$; Figure~\ref{fig:motifs}).
The signal survived gene-size matching:
among genes below the Z median size (42~kb),
7 female-biased genes carried motifs
while all 9 motif-free genes were male-biased
(Fisher exact $p = 0.014$).
Two motifs showed individually detectable effects.
Motif A (\texttt{AGTTTTCCAAGSMGAM}; 16~nt) was present in 44 of 290 Z genes (15\%)
and showed the only individually significant sex-bias shift
(Mann-Whitney $p = 0.015$).
Motif B (\texttt{AAACATACCAGCAAGAAATAA}; 21~nt) was present in 148 Z genes (51\%)
and showed significant enrichment for female-biased genes (Fisher $p = 0.023$).
These two motifs co-occurred in 36 Z genes---significantly more
than expected by chance (Fisher OR = 5.50, $p < 0.0001$)---suggesting
they form a composite element with $\sim$41~bp spacing.
Genes carrying both motifs had the strongest female shift
(median log$_2$FC = $-$0.304 vs.\ $+$0.327 for genes with neither;
Kruskal-Wallis $p = 0.036$; Figure~\ref{fig:motif_cooccurrence}).
Critically, the sex-bias association is Z-specific.
The same motifs occur abundantly on autosomes,
but autosomal genes carrying Motif A and B together
show no sex-bias shift (median log$_2$FC = $+$0.355, Mann-Whitney $p = 0.72$
vs.\ autosomal non-carriers).
This mirrors the \textit{Drosophila} MRE system,
where MREs are present genome-wide
but the MSL complex acts exclusively on the X because
\textit{roX} lncRNAs target the complex in \textit{cis}.
TOMTOM comparison against JASPAR 2024 yielded suggestive matches:
Motif A to the Rel-homology domain family (NFAT/NF-$\kappa$B; best $E = 0.245$)
and Motif B to the Forkhead domain family (best $E = 0.218$).
The Forkhead match for Motif B is notable given that \textit{FoxA2},
a Z-linked Forkhead pioneer transcription factor capable of opening closed chromatin,
is among the conserved Z-linked genes.
To test whether LINE density modulates cis-element function,
we modeled sex-bias ($\log_2\text{FC}$) of 298 expressed Z genes
as a function of 10~kb flanking RTE-BovB density,
gene-body motif presence (Motif~A and Motif~B separately),
and their interactions.
The full model is significant ($R^2 = 0.055$, $F = 3.37$, $p = 0.006$)
and reveals a specific interaction:
RTE-BovB density drives male bias
(coefficient $= +0.039$~per~bp/kb, $p = 0.004$),
Motif~B presence promotes compensation
(coefficient $= +2.39$, $p = 0.043$),
but the RTE-BovB $\times$ Motif~B interaction is significantly negative
(coefficient $= -0.037$, $p = 0.007$),
meaning that Motif~B's compensating effect is abolished
in RTE-BovB-dense flanking chromatin.
In the highest RTE-BovB quartile, 100\% of significantly
sex-biased genes are male-biased (5M, 0F),
consistent with complete failure of motif-mediated compensation
above a density threshold.
\begin{figure}[tbp]
\centering
\includegraphics[width=\linewidth]{figures/genebody_motif_contingency_panels.png}
\caption{Gene-body cis-elements predict compensation status on the Z chromosome.
Among 47 significantly sex-biased Z genes,
all 18 female-biased genes carry at least one Z-exclusive motif in their gene body,
while all 9 motif-free sex-biased genes are male-biased (Fisher exact $p = 0.007$).
The signal survives gene-size matching ($p = 0.014$ among genes $<$ 42~kb).
These motifs parallel \textit{Drosophila} MSL Recognition Elements
in their gene-body location, modest enrichment, and binary presence/absence effect.}
\label{fig:motifs}
\end{figure}
\begin{figure}[tbp]
\centering
\includegraphics[width=0.9\linewidth]{figures/motif_cooccurrence_lfc.png}
\caption{Co-occurrence of Motifs A and B predicts female-shifted expression on Z.
Genes carrying both Motif A (\texttt{AGTTTTCCAAGSMGAM})
and Motif B (\texttt{AAACATACCAGCAAGAAATAA})
have median log$_2$FC = $-$0.304 (female-shifted),
compared to $+$0.327 for genes with neither motif (Kruskal-Wallis $p = 0.036$).
The two motifs co-occur more than expected by chance (Fisher OR = 5.50, $p < 0.0001$),
suggesting a composite cis-regulatory element.
The sex-bias association is Z-specific:
autosomal genes carrying both motifs show no expression shift.}
\label{fig:motif_cooccurrence}
\end{figure}
\subsection*{Z-linked long non-coding RNAs: \textit{Zmast} and \textit{Zfest}}
Among the most extreme sex-biased Z-linked transcripts in every species examined
are two long non-coding RNAs with opposing sex specificities.
The male-biased \textit{Zmast} (Z-male-specific transcript)
shows extraordinary expression asymmetry in \textit{O. bimaculoides}:
the core exon (4,685~bp) has a male-to-female ratio of 5,063:1
(Figure~\ref{fig:zmast_zfest}),
far exceeding the 2-fold difference expected from gene dosage alone.
This extreme male bias is conserved across all eight cephalopod species examined.
\paragraph{Eight-species annotation and alignment.}
We annotated \textit{Zmast} in all eight species
using a unified pipeline (see Methods).
Spliced transcript lengths range from 5,709~bp (\textit{O. bimaculoides})
to 9,576~bp (\textit{D. pealeii}),
with six of eight species in the 7--10~kb range
(Table~\ref{tab:zmast_annotation}).
\textit{O. bimaculoides} is the shortest because its 4,685~bp core exon
is highly repetitive and cannot be fully assembled from short reads alone;
\textit{S. officinalis} is limited by sequencing depth.
MAFFT L-INS-i alignment of all eight sequences
produced 14,726 alignment columns,
of which 8.6\% are gapless---consistent with the rapid sequence turnover
expected of lncRNA over $\sim$300~MY.
\begin{table}[tbp]
\centering
\small
\caption{\textit{Zmast} annotation across eight cephalopod species.}
\label{tab:zmast_annotation}
\begin{tabular}{llrrl}
\hline
Species & Spliced (bp) & Exons & Genomic span & Z chromosome \\
\hline
\textit{D. pealeii} & 9,576 & 2 & 9.6~kb & Dpe43 \\
\textit{O. vulgaris} & 9,067 & 2 & 18.0~kb & OX597829.1 \\
\textit{E. berryi} & 8,110 & 3 & 21.9~kb & Eber\_ch44 \\
\textit{S. oualaniensis} & 7,722 & 3 & 18.1~kb & HiC\_scaffold\_46 \\
\textit{E. scolopes} & 7,316 & 5 & 22.1~kb & CM044529.1\textsuperscript{a} \\
\textit{I. illecebrosus} & 7,168 & 2 & 15.3~kb & Z \\
\textit{S. officinalis} & 5,802 & 3 & 5.9~kb & OZ199189.1 \\
\textit{O. bimaculoides} & 5,709 & 6 & 169.1~kb & Z \\
\hline
\end{tabular}
\smallskip
\noindent\textsuperscript{a}\textit{Zmast} resides on a translocated Z fragment (CM044496.1/LG10) in \textit{E.\ scolopes}; the main Z chromosome is CM044529.1 (LG43).
\end{table}
\paragraph{Conserved architecture.}
Sliding-window conservation analysis (see Methods)
identified 19 conserved blocks spanning 20--1,570 alignment columns.
Nine blocks are present in all eight species,
seven in seven species, and three in six.
The alignment defines six structural domains
(Figure~\ref{fig:zmast_arch}):
a 5$^\prime$ Head (blocks B1--B2, 6/8 species),
Core A (B3--B9, all 8 species),
Core B (B10--B12, 6--8 species),
a Variable region (B13--B14),
a Poly(U) Sponge (B15, 7/8 species, absent from \textit{O. bimaculoides}),
and a 3$^\prime$ Tail (B16--B19).
Core A has the highest species occupancy
but, as described below, the lowest sequence-level conservation;
Core B comprises three small islands of 20--30~columns each,
separated from Core A by a $\sim$1,000-column gap
with 35--68\% occupancy.
\paragraph{Cross-species validation of RBP binding sites.}
FIMO scanning of each species' \textit{Zmast} against 258 metazoan RBP
position weight matrices (ATtRACT database, $p < 10^{-4}$; see Methods)
identified hundreds of binding sites per species
and suggested dense, multi-valent RBP binding across \textit{Zmast}.
However, systematic cross-species validation---mapping
all eight species' FIMO hits to alignment coordinates
and testing whether the same motif occurs at homologous positions---dramatically
revised this picture.
Of 132 motifs appearing in $\geq$4 species within the same block,
only 8 (6\%) are truly conserved
(non-low-complexity, at homologous alignment positions);
57 (43\%) are positionally conserved but driven
by species-specific homopolymers or dinucleotide repeats;
and 67 (51\%) occur at non-homologous positions
within the same block (independent hits).
In total, only $\sim$99~bp of specifically conserved sequence exists
across the entire 7,168~bp lncRNA (1.4\%).
The eight validated cross-species RBP binding sites
are concentrated in two domains.
Six reside in the 5$^\prime$ Head (blocks B1--B2):
RBM24 (6 species, identical \texttt{AGTGTGA}),
ROX8 (4 species, identical \texttt{CCATTTT}),
SNF (4 species), UAF-2 (4 species, identical \texttt{TTTTCAGG}),
MEX-5 (4 species, identical \texttt{TTAATAAT}),
and SRSF7 (5 species).
The remaining two are in the Poly(U) Sponge (block B15):
FBF-1 (5 species, identical \texttt{TGTGTATTTA})
and SYP (3 species).
Block~1 of the 5$^\prime$ Head is the deepest conservation hotspot in \textit{Zmast},
with 40\% perfectly conserved columns
and six non-homopolymer conserved sequence runs
including a 16~bp element (\texttt{TCTCTCACCCCATCCC})
shared across six species.
Notably, \textit{ROX8} is the \textit{Drosophila} ortholog of mammalian TIA-1,
which promotes U1~snRNP recruitment at the \textit{msl-2} 5$^\prime$ splice site
and is directly antagonized by Sex-lethal to repress dosage compensation in females,
and FBF-1 is a Pumilio-family translational repressor
with a central role in germline sex determination
in \textit{C. elegans}.
By contrast, Core A---despite containing all eight species---is