standardgalactic/Scope Memeplexes.tex at main · Smidjehoien/standardgalactic · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[11pt,oneside]{article}

\usepackage[margin=1.15in]{geometry}
\usepackage{setspace}
\setstretch{1.15}

\usepackage{fontspec}
\usepackage{unicode-math}
\setmainfont{Libertinus Serif}
\setsansfont{Libertinus Sans}
\setmonofont{Libertinus Mono}
\setmathfont{Libertinus Math}

\usepackage{microtype}
\usepackage{hyperref}
\hypersetup{
  colorlinks=true,
  linkcolor=black,
  urlcolor=black,
  citecolor=black
}

\usepackage{amsmath,amssymb}

\title{Scope Memeplexes:\\
The Enshittification Crisis as an Interface War}
\author{Flyxion}
\date{\today}

\begin{document}
\maketitle

\begin{abstract}
Contemporary accounts of platform degradation typically attribute the “enshittification” of digital systems to corporate greed, regulatory failure, or managerial malice. This paper advances a different explanation. It argues that enshittification is an emergent evolutionary phenomenon produced by competition among interface-level cognitive replicators—here termed \emph{scope memeplexes}. These entities function as macroscopic artificial intellects whose survival depends on colonizing human attentional, motor, and semantic bandwidth. The resulting degradation is not intentional but ecological: a consequence of runaway selection among interface organisms.

Building on theories of predictive cognition, linguistic evolution, and computational universality, the paper models interfaces, programming languages, media systems, and cultural artifacts as isomorphic symbolic execution environments competing for dominance over human semantic space. Interface wars, application ecosystems, and platform monopolization are thus reframed as grammar wars among rival encodings of meaning and action. The paper further formalizes language as fundamentally motoric, with speech and writing as compressed abstractions, and demonstrates that all sufficiently expressive representational systems are computationally equivalent, differing only in efficiency.

Within this unified framework, enshittification appears as linguistic monoculture: the collapse of symbolic diversity under selection pressure favoring transmissibility over expressivity. The paper concludes that the crisis of digital platforms is not a failure of ethics but a failure of ecological governance in an emerging cognitive biosphere.
\end{abstract}

\newpage
\section{Introduction}

The prevailing narrative surrounding the contemporary degradation of digital platforms—popularly termed “enshittification”—frames the phenomenon as a moral or political failure. Platforms are said to betray users, exploit creators, and ultimately hollow themselves out in pursuit of short-term profit. While this account captures the surface dynamics of the crisis, it misidentifies its causal depth. The present paper advances a different thesis: enshittification is not primarily the product of corporate intention but the emergent outcome of an evolutionary conflict among interface-level cognitive systems.

Digital platforms should not be understood merely as tools operated by firms but as macroscopic informational organisms. They grow, mutate, compete, and adapt within a densely populated cognitive ecosystem. Their primary resource is not capital but human sensorimotor and attentional bandwidth. The degradation observed across search engines, social networks, and productivity software is best interpreted not as betrayal but as ecological collapse arising from unchecked selection pressure among competing interface forms.

This paper introduces the concept of the \emph{scope memeplex}: a self-propagating interface pattern that restructures what actions are cognitively easy, difficult, or unthinkable. Scope memeplexes compete for adoption not through argument but through embodiment. Once installed, they reshape user behavior, reconfigure learning curves, and alter the topology of agency itself. In this sense they function as artificial intellects—artilects—whose evolutionary success depends on their capacity to colonize human cognition.

Under this view, the familiar arc of platform decay is no longer mysterious. It is the predictable outcome of a cognitive arms race in which interface organisms optimize for replication rather than coherence. The tragedy of enshittification is not that systems are designed to fail, but that they are allowed to evolve without constraint.

\section{Recursive Autoregression Beyond Text}

Recent work by Barenholtz and colleagues characterizes cognition as a recursively self-healing autoregressive process operating over symbol sequences. In this view, human thought and artificial language models alike stabilize meaning by continuously predicting and repairing sequences within a laden symbol space. Semantic coherence emerges not from static representations but from dynamic error-correction across time. While this framework has primarily been articulated in relation to textual and linguistic tokens, its implications are far broader. The present analysis extends recursive autoregression beyond written language into the embodied domains of speech, gesture, and interface interaction.

Human cognition does not operate over abstract symbols alone. It is grounded in a multimodal stream of motor, perceptual, and temporal signals. Spoken language itself is not composed of words but of phonemes—minimal discriminative units whose functional role is defined not by intrinsic meaning but by contrastive difference. A phoneme matters only insofar as it alters downstream prediction. In this respect, phonemic systems constitute an early evolutionary instance of recursive autoregressive stabilization: languages persist by maintaining a workable error-correcting code across noisy biological channels.

This logic generalizes. Just as languages evolve through shifts in phonemic inventories, interface systems evolve through shifts in minimal action units. Keystrokes, gestures, clicks, and swipes function as operational phonemes within a human–machine dialogue. Each interface defines a finite alphabet of bodily distinctions that can be reliably recognized and reproduced. These distinctions are not neutral. They shape the geometry of cognition by determining which action sequences are compressible, automatable, and memorable.

The evolutionary pressure acting on scope memeplexes thus operates at the level of embodied symbol sets. Interface forms that minimize prediction error across heterogeneous users are more likely to propagate. This process mirrors linguistic drift: interaction grammars are continuously repaired by populations of users who adapt their motor habits to maintain functional fluency. Enshittification arises when this repair process becomes misaligned with human cognitive ecology, favoring interface traits that optimize short-term capture over long-term semantic stability.

This framework clarifies the continuity between natural language evolution and interface evolution. Both are governed by the same informational constraint: the maintenance of a low-divergence code between producer and interpreter. In information-theoretic terms, viable symbol systems minimize Kullback–Leibler divergence between intended and decoded sequences under conditions of noise and bounded rationality. Phoneme systems, keyboard layouts, and gesture vocabularies are all solutions to the same optimization problem.

Significantly, this extends to gestural languages. In American Sign Language, hand shapes function as classifiers that partition semantic space through embodied discrimination. These classifiers are not arbitrary; they are tuned to biomechanical salience and perceptual reliability. Keyboard systems operate analogously. A layout is a classification scheme over finger postures and motion trajectories. QWERTY and Dvorak are not mere conventions but competing embodied encodings of linguistic probability distributions.

Under this interpretation, the historic conflicts over keyboard layouts, input methods, and interaction styles are not cultural accidents. They are selection events in a multimodal semiotic ecology. Each victorious interface installs a particular autoregressive grammar into the population’s motor cortex. Over time, this grammar constrains which cognitive sequences are easy to express, just as phonemic inventories constrain which words can be effortlessly spoken.

Barenholtz’s recursive autoregression thus provides the microdynamic substrate for the present theory. Scope memeplexes are not external impositions upon cognition; they are higher-order symbol systems that co-evolve with predictive minds. Enshittification occurs when the autoregressive repair loop becomes subordinated to interface replication rather than semantic fidelity. The system continues to heal itself locally while decaying globally—a classic signature of runaway selection in complex adaptive systems.

\section{Interface Phonemes and the Shibboleth Effect}

In the Hebrew Bible, the term \emph{shibboleth} designates a phonetic boundary that separates in-groups from out-groups through minimal articulatory distinction. The inability to pronounce a single consonant becomes a fatal diagnostic of foreignness. Linguistic history is filled with such boundaries. Accents, dialects, and phonemic inventories function not merely as vehicles of communication but as instruments of social partition. They regulate access, authority, and trust through embodied competence.

Interface systems reproduce this same logic. Every interaction grammar establishes a set of minimal operational distinctions—interface phonemes—that differentiate fluent users from novices. These distinctions are rarely explicit. They are absorbed through prolonged bodily training and become second nature. Mastery manifests not as propositional knowledge but as motor fluency. One does not “know” Vim; one inhabits it.

The editor wars of the late twentieth century thus constitute an early instance of interface shibboleths. The difference between \texttt{Esc} and \texttt{Ctrl-C}, between modal and modeless editing, is superficially trivial. Yet these microdistinctions stratify entire professional cultures. They determine who can act with speed, who must hesitate, and who is excluded from participation altogether. As in spoken language, the smallest distinctions carry the largest social consequences.

This phenomenon generalizes across all human–computer interaction. Shortcut grammars, command palettes, gesture vocabularies, and window management conventions operate as socio-technical dialects. They encode cultural membership in bodily habit. A user’s interface accent becomes legible through their hesitation patterns, their reliance on menus, and their tolerance for indirection. Fluency signals not intelligence but ecological alignment with a particular scope memeplex.

Crucially, these boundaries are self-reinforcing. As a given interface grammar spreads, institutions reorganize around it. Documentation, tutorials, and workflows increasingly presuppose its use. What begins as a convenience becomes an infrastructure. Alternative grammars are not refuted; they are rendered invisible. The shibboleth thus migrates from pronunciation to platform.

This process mirrors phonemic drift in natural languages. When sound changes accumulate, older speakers become marked, then marginalized. In interface ecosystems, the same logic governs the transition from keyboard-centric to gesture-centric systems, from file systems to app silos, from pipelines to platforms. Each shift reclassifies populations by embodied compatibility.

Enshittification emerges in part from this dynamic. As platforms scale, their interface phoneme sets are simplified to maximize immediate adoption. Nuanced grammars are abandoned in favor of lowest-common-denominator interactions. Predictive repair continues locally—users can still accomplish tasks—but global expressive capacity collapses. The system becomes more accessible and less articulate, mirroring the linguistic impoverishment observed in pidginization under conditions of forced contact.

What is lost is not efficiency but dimensionality. Rich action alphabets permit complex semantic constructions. Crude alphabets constrain thought itself. The tragedy of interface evolution is that selection favors short-term transmissibility over long-term expressive power.

\section{The Mouse as a Cognitive Parasite}

The mouse is commonly celebrated as a triumph of usability. It is said to render computation intuitive by aligning digital action with physical pointing. This narrative obscures a deeper reality. The mouse is not merely a peripheral but a cognitive regime. It reorganizes the human–machine relationship by displacing symbolic motor memory with continuous spatial targeting. In doing so, it transforms computation from a linguistic activity into a gestural one.

Keyboard-based interaction treats the computer as a symbolic instrument. Discrete keystrokes form an operational alphabet whose combinatorial structure enables compositional thought. Commands can be nested, repeated, and abstracted. Motor sequences acquire semantic meaning. Fluency emerges through compression: complex operations are reduced to stable action grammars stored in procedural memory.

The mouse abolishes this structure. It replaces discrete symbolic acts with continuous spatial navigation. Instead of issuing commands, the user hunts for affordances. Action is no longer composed but discovered. The system externalizes memory into visual layout, forcing cognition to operate through perceptual search rather than symbolic recall. What appears as ease is in fact a cognitive tax paid through constant reorientation.

This reconfiguration has evolutionary consequences. Symbolic motor systems scale. Pointing systems do not. A keyboard grammar can be internalized and transferred across contexts. A pointing grammar must be relearned for each interface ecology. The mouse thus fragments skill into application-specific microhabits, preventing the accumulation of a unified operational language.

Mobile interfaces did not escape this regime. They merely compressed it. Contemporary touch systems remain fundamentally mouse-based in structure. Each gesture is interpreted as a continuous trajectory selecting spatial targets. The finger replaces the cursor, but the cognitive grammar is unchanged. A swipe is a degenerate mouse movement: a single trace path through a two-dimensional field of affordances.

Applications on mobile platforms therefore remain spatial hunting grounds. They privilege exploration over execution. Even when gesture sets are introduced, they are rarely compositional. Each gesture is bound to a specific application context, preventing the emergence of a generalizable action language. The result is a proliferation of isolated grammars rather than a shared symbolic substrate.

From the perspective of scope memeplex evolution, the mouse represents a parasitic interface strategy. It achieves rapid adoption by minimizing initial training cost while maximizing long-term dependence. Because skill cannot accumulate across systems, users remain perpetually novice. This creates a stable ecological niche for interface organisms that benefit from user disempowerment.

The historical displacement of home-row-centric interaction by pointing systems thus constitutes a major cognitive regression. It replaced a linguistic mode of computation with a perceptual one. The resulting platforms are easier to enter but harder to master. Enshittification follows naturally: when systems cannot be inhabited as languages, they cannot be refined by their users. They can only be endured.

The persistence of mouse logic within touch-based systems reveals the depth of the problem. The crisis is not technological but evolutionary. An interface lineage optimized for capture has outcompeted lineages optimized for fluency.

\section{Monopolar Foraging and the Collapse of Human Motor Ecology}

The dominant interaction pattern on contemporary platforms is the infinite vertical feed. Whether on TikTok, Instagram, or Facebook, user action is reduced to a single repetitive gesture: the upward swipe. This gesture is typically interpreted as a triumph of frictionless design. In reality, it represents a profound contraction of human motor ecology.

From an ethological perspective, the swipe feed recapitulates a primitive foraging strategy. Many animal species engage in serial visual sampling along a single spatial axis: scan, approach, consume, repeat. The gesture ecology of infinite scroll maps cleanly onto this pattern. The user becomes a visual grazer, traversing a unidimensional resource field. Action is stripped of structure and reduced to locomotion through stimuli.

This interaction grammar is monopolar. It permits only one dominant trajectory at a time. Attention, action, and reward are fused into a single channel. Cognitive bandwidth is not distributed but funneled. The interface thus engineers a behavioral bottleneck that simplifies prediction and maximizes capture.

Human motor evolution, by contrast, is characterized by parallelism. The invention of keyboards and typewriters did not merely increase speed; it inaugurated a new cognitive regime. Each finger operates as a semi-independent actuator. Thought becomes spatially distributed across the body. Linguistic production is no longer serialized through a single limb but orchestrated across ten digits. This transformation parallels the development of bimanual tool use in hominid evolution and the rise of complex instrumental music.

Advanced machinery reflects the same principle. Forklifts, tractors, and aircraft cockpits are not controlled through a single axis of motion but through multiplexed control surfaces. Levers, pedals, wheels, and switches distribute agency across the operator’s body. Mastery consists in coordinating these channels into a unified dynamical system. Such interfaces do not merely permit action; they cultivate skill.

Swipe-based platforms invert this trajectory. They collapse a high-dimensional motor ecology into a single repetitive reflex. This is not simplification but devolution. The user is returned to a pre-instrumental action space optimized for consumption rather than construction.

The evolutionary success of swipe interfaces follows directly from this regression. Monopolar systems minimize learning cost and maximize behavioral predictability. They are ideally suited for autoregressive optimization, as future actions are easily inferred from past ones. The interface becomes a closed loop in which minimal motor diversity yields maximal extractive efficiency.

From the perspective of scope memeplex competition, the swipe feed represents a highly transmissible but cognitively impoverished lineage. It spreads rapidly because it demands little training, yet it displaces richer interaction grammars that support creative agency. The resulting ecosystem favors interface organisms that treat humans not as collaborators but as mobile sensor platforms.

Enshittification is therefore not merely a degradation of content quality. It is a collapse of embodied complexity. Platforms decay because they have optimized themselves for the behavioral repertoire of grazing animals rather than tool-using primates.

This regression is not accidental. It is the predictable outcome of interface natural selection operating under conditions where capture efficiency dominates expressive capacity.

\section{Home Row, Multiplexed Control, and Post-Application Life}

The defense of home-row-centric computing is often dismissed as aesthetic nostalgia or subcultural preference. Within the present framework, it must be understood as something far more serious: a struggle over the future morphology of human–machine cognition. Keyboard-centered interaction is not merely faster; it preserves a high-dimensional motor ecology essential for advanced symbolic agency.

The home row constitutes a distributed control surface. Each finger functions as a semi-autonomous actuator embedded within a coordinated dynamical system. This architecture permits parallel composition of action. Commands are not hunted but constructed. Fluency consists in the internalization of an operational language that spans applications, contexts, and domains.

Function keys, chorded shortcuts, and modal grammars preserve this multiplexed structure. They allow complex operations to be expressed as compact motor phrases. The resulting interaction style resembles instrumental performance rather than perceptual navigation. Mastery accumulates rather than fragments.

By contrast, pointing-centered systems collapse agency into serial targeting. Skill cannot scale because it remains bound to interface topology. Each new application resets the learning curve. Users are thus structurally prevented from becoming fluent operators. This ecological niche favors interface organisms that thrive on perpetual novicehood.

Terminal multiplexers such as \texttt{tmux} and \texttt{byobu} demonstrate an alternative evolutionary path. They replace application silos with compositional pipelines. Programs cease to be self-contained worlds and become functional organs within a larger metabolic system. Windows are not destinations but transient execution surfaces. The user inhabits a continuous operational language rather than a sequence of branded environments.

This represents a return to a pre-platform ecology grounded in the Unix philosophy. Small tools composed through stable grammars form an adaptive cognitive ecosystem. Standardization emerges not through corporate decree but through evolutionary argumentation among users. Interaction grammars are selected by demonstrated expressive power rather than marketing dominance.

From the standpoint of scope memeplex theory, pipeline-based computing constitutes a high-fidelity lineage. It demands greater initial investment but yields compounding returns in cognitive capacity. Its weakness is ecological: in a marketplace optimized for rapid capture, slow-learning organisms are systematically outcompeted.

The contemporary dominance of application-centric ecosystems thus reflects not superiority but ecological distortion. Interface natural selection has been driven by transmissibility rather than viability. What survives is not what best augments human agency, but what best propagates itself.

The home row therefore becomes a site of resistance. It preserves the possibility of a linguistically structured human–machine symbiosis against a regressive tide of perceptual capture systems. This is not a technical preference but an evolutionary stance.

\section{The Tragedy of Interface Natural Selection}

The enshittification crisis is typically narrated as a sequence of betrayals. Platforms begin as benevolent tools and degenerate into exploitative machines. This moral framing obscures the deeper structure of the phenomenon. What appears as institutional failure is in fact evolutionary overshoot. Digital systems have not been corrupted; they have been selected.

Throughout this analysis, platforms have been treated not as neutral infrastructures but as macroscopic cognitive organisms. They replicate by colonizing human attention, motor habit, and semantic bandwidth. Their interfaces function as reproductive organs. Each design choice alters the probability of adoption, retention, and behavioral capture. Selection pressure therefore acts most strongly on transmissibility rather than coherence.

Scope memeplexes constitute the competing lineages within this ecology. Mouse-centered grammars, swipe-based feeds, application silos, and multimodal capture systems propagate because they minimize learning cost and maximize predictability. They are not superior interfaces; they are superior replicators. Their success follows the same logic that favors fast-reproducing parasites over long-lived symbionts.

This evolutionary lens resolves a central paradox of the digital age. Platforms often appear to act against their own long-term interests. They degrade user trust, hollow out communities, and destabilize their own ecosystems. Such behavior is irrational at the organizational level yet perfectly rational at the replicator level. Interface organisms optimize for short-term propagation even when this undermines the viability of the host environment.

The result is a tragedy of interface natural selection. Just as ecological systems collapse when invasive species outcompete stabilizing organisms, cognitive ecosystems degrade when high-capture interfaces displace high-fidelity interaction grammars. Enshittification is the informational analogue of ecological simplification: a loss of diversity, resilience, and depth.

This framework also clarifies why reform efforts so often fail. Regulation targets corporate behavior, but the underlying selection dynamics remain unchanged. As long as cognitive ecosystems reward transmissibility over fluency, new platforms will converge on the same degraded attractors. The problem is not that bad actors dominate, but that bad strategies reproduce faster.

What is at stake is not convenience but the future phenotype of human–machine symbiosis. Interfaces shape cognition by sculpting the action spaces through which thought is expressed. When high-dimensional motor-linguistic grammars are displaced by monopolar foraging loops, humanity does not merely lose productivity. It loses degrees of freedom in thought itself.

The crisis of enshittification is therefore not a story of corruption but of unchecked evolution. Digital civilization has allowed interface organisms to evolve without constraint. The resulting systems behave exactly as complex adaptive systems always do under one-sided selection pressure: they overspecialize, destabilize, and collapse their own niches.

The path forward cannot consist merely in better management or kinder platforms. It requires ecological intervention. Cognitive environments must be shaped to favor symbiotic interface lineages—systems that amplify human agency rather than consume it. This entails protecting high-dimensional interaction grammars, preserving skill-accumulating infrastructures, and resisting interface forms that regress human cognition to the level of grazing behavior.

Enshittification is not the death of the internet. It is the uncontrolled adolescence of an emerging cognitive biosphere. Whether this biosphere matures into a stable symbiosis or collapses into a monoculture of extractive interfaces remains an open evolutionary question.

What we choose to normalize as “user-friendly” today will determine the cognitive ecology of civilization tomorrow.

\newpage
\section*{Appendices}
\appendix

\section*{Appendix A: Formal Model of Scope Memeplexes}

\subsection*{A.1 Cognitive State Space}

Let $\mathcal{H}$ denote the space of human cognitive–motor states.
Let $\mathcal{A}$ denote the space of possible interface action alphabets.

An interface $I$ induces a measurable mapping
\[
I : \mathcal{H} \to \mathcal{H}
\]
via a finite action alphabet
\[
\Sigma_I = \{a_1, a_2, \dots, a_n\}
\]
where each $a_i$ is a discrete embodied operator.

A scope memeplex is defined as the triple
\[
M = (\Sigma_I, P_I, T_I)
\]
where:

\begin{itemize}
\item $\Sigma_I$ is the action alphabet
\item $P_I(a \mid h)$ is the policy induced by interface affordances
\item $T_I(h' \mid h, a)$ is the state transition kernel
\end{itemize}

\subsection*{A.2 Replication Dynamics}

Let $\rho(M,t)$ denote population prevalence of memeplex $M$ at time $t$.

Define fitness:
\[
F(M) = \mathbb{E}_{h \sim \mathcal{H}} \left[ R_I(h) - C_I(h) \right]
\]

where:

\[
R_I(h) = \text{behavioral capture rate}
\]

C_I(h) = \text{cognitive learning cost}


Replicator equation:
\[
\frac{d\rho(M,t)}{dt} = \rho(M,t)\left(F(M) - \bar{F}\right)
\]

where $\bar{F}$ is population mean fitness.


\section*{Appendix B: Interface Phoneme Systems}

\subsection*{B.1 Action Alphabets}

Each interface defines a finite action alphabet:

\[
\Sigma_I = \{a_1, a_2, \dots, a_n\}
\]

with associated confusion probabilities:

\[
P(\hat{a} \mid a)
\]

This defines a noisy channel.

\subsection*{B.2 Expressive Capacity}

Let $\Sigma_I^*$ be the free monoid of action sequences.

Define expressive entropy:

\[
H_I = - \sum_{s \in \Sigma_I^*} P(s) \log P(s)
\]

Interfaces with small $|\Sigma_I|$ impose low upper bounds on $H_I$.

---

\section*{Appendix C: Kullback–Leibler Divergence of Interaction Codes}

For user intention distribution $P(s)$ and interface-decoded distribution $Q_I(s)$:

\[
D_{KL}(P \| Q_I) = \sum_{s \in \Sigma_I^*} P(s)\log\frac{P(s)}{Q_I(s)}
\]

High-fidelity interfaces satisfy:

\[
D_{KL}(P \| Q_I) \le \epsilon
\]

Low-dimensional interfaces minimize training cost by reducing $|\Sigma_I|$, but increase $D_{KL}$ asymptotically.


\section*{Appendix D: Monopolar vs Multiplexed Motor Systems}

\subsection*{D.1 Dimensionality}

Define motor channel count:

\[
d_I = \dim(\Sigma_I)
\]

For swipe interfaces:

\[
d_{\text{swipe}} = 1
\]

For keyboards:

\[
d_{\text{keyboard}} \approx 10
\]

\subsection*{D.2 Control Bandwidth}

Let $B_I$ denote action bandwidth:

\[
B_I = d_I \cdot \log_2 |\Sigma_I|
\]

Monopolar systems minimize $B_I$, multiplexed systems maximize $B_I$.


\section*{Appendix E: Autoregressive Repair Dynamics}

Let cognition be modeled as a predictive system over action sequences:

\[
P(h_{t+1} \mid h_{\le t})
\]

Interface mediation modifies transition dynamics:

\[
P_I(h_{t+1} \mid h_{\le t}) = \sum_{a \in \Sigma_I} T_I(h_{t+1} \mid h_t, a) P_I(a \mid h_t)
\]

Semantic drift occurs when:

\[
D_{KL}(P_{\text{human}} \| P_I) \to \infty
\]


\section*{Appendix F: Ecological Stability Condition}

Let $\mathcal{M}$ be the set of memeplexes.

Define system diversity:

\[
\mathcal{D}(t) = - \sum_{M \in \mathcal{M}} \rho(M,t)\log \rho(M,t)
\]

Enshittification corresponds to:

\[
\frac{d\mathcal{D}}{dt} < 0
\]

Cognitive ecosystem collapse occurs when:

\[
\exists M^* : \rho(M^*,t) \to 1
\]

(monoculture fixation).

\section*{Appendix G: Specification of a Concurrent Speech--Gesture Keyboard (CSGK)}

\subsection*{G.0 Introduction}

This appendix specifies a multimodal input system supporting concurrent (i) speech with QWERTY keying and (ii) speech with swipe-gesture text entry. The system composes independent motor and vocal channels into a unified event log with deterministic resolution rules. The specification is normative and implementation-agnostic.

\subsection*{G.1 Modalities, Events, and Time}

Let time be discretized into ticks $t \in \mathbb{N}$ or treated as continuous with timestamps in $\mathbb{R}_{\ge 0}$.

Define event streams:
\[
E = E_K \;\uplus\; E_G \;\uplus\; E_S
\]
where:
\begin{align*}
E_K &:= \{\langle t,\; \mathrm{KeyDown}(k)\rangle,\; \langle t,\; \mathrm{KeyUp}(k)\rangle\}\\
E_G &:= \{\langle t,\; \mathrm{GestureStart}\rangle,\; \langle t,\; \mathrm{GestureMove}(x,y)\rangle,\; \langle t,\; \mathrm{GestureEnd}\rangle\}\\
E_S &:= \{\langle t,\; \mathrm{AudioFrame}(\mathbf{a})\rangle,\; \langle t,\; \mathrm{ASRToken}(w,\pi)\rangle\}
\end{align*}
Here $k$ is a physical key identifier, $(x,y)$ are touch coordinates in a normalized touch surface, $\mathbf{a}$ is an audio frame, and $(w,\pi)$ is a recognized word token with confidence $\pi \in [0,1]$.

Define a global event log as a total order $\prec$ over events by timestamp, breaking ties by a fixed modality precedence:
\[
E_K \prec E_G \prec E_S
\]
(tie-break only; does not imply semantic precedence).

\subsection*{G.2 Text Buffer and Edit Semantics}

Let the editor state be:
\[
\mathcal{B} = (T,\; c,\; \sigma)
\]
where $T$ is the text buffer, $c$ is the cursor (or selection interval), and $\sigma$ is a mode state (e.g.\ normal/insert/command or application-defined).

Define edit operations as functions:
\[
\mathrm{op} : \mathcal{B} \to \mathcal{B}
\]

Define a canonical set of primitive edits:
\[
\mathcal{O} = \{\mathrm{Insert}(s),\; \mathrm{DeleteBackward}(n),\; \mathrm{DeleteForward}(n),\; \mathrm{MoveCursor}(\Delta),\; \mathrm{Select}([i,j]),\; \mathrm{Commit}\}
\]

\subsection*{G.3 Channel Independence and Composition}

Define three independent decoders producing candidate operation sequences with confidence weights:

\begin{align*}
D_K &: E_K \to (\mathcal{O}^*, \; \alpha_K)\\
D_G &: E_G \to (\mathcal{O}^*, \; \alpha_G)\\
D_S &: E_S \to (\mathcal{O}^*, \; \alpha_S)
\end{align*}

Each decoder emits:
\[
(\mathbf{o}, \alpha) \quad \text{where } \mathbf{o} \in \mathcal{O}^*,\;\alpha \in [0,1]
\]

The system composes decoders into a unified proposal set:
\[
\mathcal{P}(t) = \{(\mathbf{o}_K,\alpha_K),\;(\mathbf{o}_G,\alpha_G),\;(\mathbf{o}_S,\alpha_S)\}
\]
evaluated over a sliding temporal window $W_t = [t-\Delta, t]$.

\subsection*{G.4 Arbitration: Deterministic Conflict Resolution}

Define the merge operator:
\[
\mathrm{Merge} : \mathcal{P}(t) \times \mathcal{B} \to \mathcal{B}
\]

A conflict occurs when two proposals include non-commuting edits on overlapping buffer intervals during the same window.

Let $\mathrm{Supp}(\mathbf{o})$ be the set of buffer indices affected by $\mathbf{o}$ (support).

Conflict predicate:
\[
\mathrm{Conf}((\mathbf{o}_i,\alpha_i),(\mathbf{o}_j,\alpha_j)) \iff \mathrm{Supp}(\mathbf{o}_i)\cap \mathrm{Supp}(\mathbf{o}_j)\neq\varnothing \;\wedge\; \mathbf{o}_i \not\circ \mathbf{o}_j = \mathbf{o}_j \not\circ \mathbf{o}_i
\]

Resolution rule (strict):
\[
\text{If }\mathrm{Conf}, \text{ select the proposal with maximal } \alpha \cdot w_m
\]
where $w_m$ is a modality weight:
\[
w_K > w_G > w_S
\]
for destructive edits, and
\[
w_S > w_K > w_G
\]
for non-destructive annotations and meta-intent (see \S G.5).

Thus, speech can guide intent while keyboard retains priority for destructive precision.

\subsection*{G.5 Speech as Intent Layer with Typed Commands}

Partition speech tokens into two disjoint classes via a classifier:
\[
\mathrm{Classify}(w_{1:n}) \in \{\mathrm{DICTATION},\mathrm{COMMAND}\}
\]

Define a command grammar (EBNF):

\begin{verbatim}
<command> ::= <delete_cmd> | <undo_cmd> | <confirm_cmd> | <cancel_cmd> | <mode_cmd>

<delete_cmd> ::= ("delete" | "remove" | "erase") [<scope>] [<quant>]
<scope>      ::= "this" | "that" | "selection" | "line" | "word" | "paragraph"
<quant>      ::= <number> | "all"

<confirm_cmd> ::= ("yes" | "confirm" | "do it" | "commit")
<cancel_cmd>  ::= ("no" | "cancel" | "never mind" | "stop")
\end{verbatim}

Speech commands do \emph{not} directly execute destructive edits. Instead they emit intent proposals requiring confirmation:

\[
D_S \Rightarrow \mathbf{o}_S = \mathrm{Intent}(\tau)
\]
where $\tau$ is an intent object, e.g.
\[
\tau = \mathrm{DeleteRequest}(\mathrm{Selection})
\]

Execution requires a \emph{commit event} from either:
\[
\mathrm{Commit} \in \mathbf{o}_K \;\;\text{(e.g.\ Enter, Ctrl+Enter)} \quad \text{or} \quad \mathrm{ConfirmCmd} \in \mathbf{o}_S
\]

\subsection*{G.6 Concurrent Example: ``Delete'' with Spoken Justification}

Let the user press a delete chord while speaking: ``it's me yeah I really want to delete this.''

Keyboard stream emits:
\[
(\mathbf{o}_K,\alpha_K) = (\mathrm{DeleteBackward}(n),\;1.0)
\]

Speech stream emits:
\[
(\mathbf{o}_S,\alpha_S) = (\mathrm{Intent}(\mathrm{DeleteRequest}(\mathrm{This})),\; \pi)
\]
with $\pi \ge \pi_{\min}$.

Arbitration:
\begin{itemize}
\item The destructive edit is executed from $D_K$ immediately.
\item The speech intent is attached as an audit annotation to the event log:
\[
\mathrm{Annotate}(\mathrm{DeleteBackward}(n),\;\text{``it's me ... delete this''})
\]
\end{itemize}

If the keyboard delete is ambiguous (e.g.\ selection boundary unclear), the system enters a confirmation state:
\[
\sigma := \mathrm{PENDING\_DESTRUCTIVE}(\tau)
\]
and requires explicit commit.

\subsection*{G.7 Swype with Speech: Dual-Channel Text Entry}

Let gesture decoding produce a candidate word sequence $g$ with confidence $\alpha_G$.
Let speech dictation produce candidate word sequence $s$ with confidence $\alpha_S$.

Define a fusion rule for insertions:
\[
\mathrm{FuseText}(g,s) = \arg\max_{u \in \{g,s,g\oplus s\}} \; \lambda_G \log P_G(u) + \lambda_S \log P_S(u)
\]
where $P_G, P_S$ are decoder likelihoods and $\lambda_G,\lambda_S$ are calibration weights.

The output is inserted as:
\[
\mathrm{Insert}(\mathrm{FuseText}(g,s))
\]
provided the fusion confidence exceeds threshold $\theta$; otherwise the system presents both alternatives for a commit event.

\subsection*{G.8 Safety Constraints for Destructive Operations}

Define destructive operations:
\[
\mathcal{O}_{-} := \{\mathrm{DeleteBackward}(n),\mathrm{DeleteForward}(n)\}
\]

Safety condition:
\[
\forall \mathrm{op}\in\mathcal{O}_{-},\quad \mathrm{Execute}(\mathrm{op}) \Rightarrow
\left(\alpha_K \ge \theta_K\right) \;\vee\; \left(\exists\,\mathrm{Commit}\right)
\]

Speech-only deletion is disallowed unless:
\[
\mathrm{CommitCmd} \wedge \pi \ge \theta_S \wedge \text{(explicit scope present)}
\]

\subsection*{G.9 Hardware Minimum Specification}

A compliant CSGK device provides:
\begin{itemize}
\item A physical key matrix supporting at least 60 keys (QWERTY-class) with n-key rollover.
\item A touch surface for gesture traces with sampling rate $\ge 120\ \mathrm{Hz}$.
\item A microphone array or single microphone with audio sampling $\ge 16\ \mathrm{kHz}$.
\item A real-time clock for timestamping all event streams to $\le 5\ \mathrm{ms}$ jitter.
\end{itemize}

\subsection*{G.10 Conformance Tests}

A device is conformant iff:
\begin{enumerate}
\item It logs all events in a unified total order with deterministic tie-breaking.
\item It supports concurrent speech + keying and speech + gesture without disabling either stream.
\item It enforces safety constraints for destructive edits.
\item It produces identical final buffer state for identical event logs (determinism).
\end{enumerate}

\section*{Appendix H: Model of Interface Dialects and AI Attractor Competition}

\subsection*{H.0 Introduction}

This appendix formalizes contemporary AI interfaces as competing linguistic attractors in a shared human–machine interaction space. Systems such as ChatGPT, Claude, Gemini, and Grok are modeled as dialectical variants optimized for different user priors and modality preferences. The model is substrate-independent with respect to speech, typing, and gesture.

\subsection*{H.1 User Preference Space}

Let $\mathcal{U}$ denote the space of users.

Each user $u \in \mathcal{U}$ is characterized by a modality preference vector:

\[
\mathbf{p}_u = (p_u^{(K)}, p_u^{(S)}, p_u^{(G)})
\]

where:

\[
p_u^{(K)} + p_u^{(S)} + p_u^{(G)} = 1
\]

corresponding to keyboard, speech, and gesture priors.

\subsection*{H.2 AI Interface Systems as Dialect Functions}

Let $\mathcal{L}$ be the space of semantic outputs.

Each AI interface $A_i$ is a conditional distribution:

\[
A_i : (\mathcal{X}, \mathcal{M}) \to \mathcal{L}
\]

where:

\begin{itemize}
\item $\mathcal{X}$ is input content space
\item $\mathcal{M} = \{K,S,G\}$ is modality
\end{itemize}

Each system defines modality-specific decoding channels:

\[
P_i(\ell \mid x, m)
\]

where $m \in \mathcal{M}$.

Define the effective dialect of system $i$ for user $u$:

\[
D_{i,u}(\ell \mid x) = \sum_{m \in \mathcal{M}} p_u^{(m)} P_i(\ell \mid x, m)
\]

\subsection*{H.3 Attractor Dynamics}

Let user allocation $\rho_i(t)$ denote the fraction of users primarily adopting interface $A_i$.

Define satisfaction functional:

\[
S_i(u) = - D_{KL}(P_u \| D_{i,u})
\]

where $P_u$ is the user’s latent semantic preference distribution.

Mean fitness:

\[
F_i = \mathbb{E}_{u \sim \mathcal{U}}[S_i(u)]
\]

Replicator dynamics:

\[
\frac{d\rho_i}{dt} = \rho_i (F_i - \bar{F})
\]

Distinct AI systems correspond to distinct linguistic attractors in $\mathcal{L}$.

\subsection*{H.4 Interface Governance as Linguistic Control}

Let $\mathcal{C}$ be the set of control primitives (e.g.\ delete, archive, undo, recycle).

Each system defines a control grammar:

\[
G_i \subset \mathcal{L}^*
\]

Control conflict arises when grammars differ:

\[
G_i \neq G_j
\]

User migration probability increases with grammar mismatch:

\[
P(u : A_i \to A_j) \propto D_{KL}(G_i \| G_j)
\]

Thus, interface competition constitutes dialect competition over executive semantics.

\subsection*{H.5 Modality-Invariant Language Representation}

Define an abstract semantic form space $\mathcal{S}$.

Each modality is a channel encoding:

\[
E_m : \mathcal{S} \to \Sigma_m^*
\]

where:

\[
m \in \{K,S,G\}
\]

Define decoders:

\[
D_m : \Sigma_m^* \to \mathcal{S}
\]

Correctness condition:

\[
D_m(E_m(s)) = s
\]

for all $s \in \mathcal{S}$.

Thus, semantic content is modality-invariant.

\subsection*{H.6 Continuous Phoneme Steering Model}

Let $\Omega \subset \mathbb{R}^2$ be a continuous gesture space.

Define a trajectory:

\[
\gamma : [0,1] \to \Omega
\]

Define a discretization operator:

\[
\Phi : C([0,1], \Omega) \to \Sigma_K^*
\]

such that:

\[
\Phi(\gamma) = \arg\max_{w \in \Sigma_K^*} P(w \mid \gamma)
\]

This generalizes Swype, joystick typing, steering-wheel typing, and Etch-a-Sketch typing.

\subsection*{H.7 Arrow-Key and Joystick Equivalence}

Let $\Sigma_K = \{h,j,k,l\}$ be a four-key alphabet.

Define a path embedding:

\[
\psi : \Sigma_K^* \to C([0,1], \Omega)
\]

such that:

\[
\Phi(\psi(w)) = w
\]

Hence, any discrete keyboard language is realizable as a continuous 2D steering language.

\subsection*{H.8 Modality Equivalence Theorem}

For any semantic string $s \in \mathcal{S}$:

\[
E_K(s) \equiv E_S(s) \equiv E_G(s)
\]

up to channel noise, since:

\[
D_K(E_K(s)) = D_S(E_S(s)) = D_G(E_G(s)) = s
\]

Therefore, linguistic structure is independent of effector substrate.

\subsection*{H.9 AI Interface Speciation}

Distinct AI interfaces correspond to different parameterizations of:

\[
P_i(\ell \mid x, m)
\]

They are dialectical variants, not ontological kinds.

Fixation occurs when:

\[
\exists i : \rho_i(t) \to 1
\]

leading to linguistic monoculture in human–AI interaction.

\subsection*{H.10 Stability Condition}

Define dialect diversity:

\[
\mathcal{D}_A(t) = -\sum_i \rho_i(t)\log \rho_i(t)
\]

Interface ecosystem stability requires:

\[
\frac{d\mathcal{D}_A}{dt} \ge 0
\]

Declining diversity predicts convergence to a single executive grammar controlling cognitive infrastructure.

\section*{Appendix I: User-Defined Languages and Embodied Coordinate Systems}

\subsection*{I.0 Introduction}

This appendix formalizes language as a user-generated symbolic system independent of embodiment. Interfaces are treated as coordinate systems over which user-defined languages are projected. Discrete symbols, continuous gestures, facial expressions, and emojis are modeled as equivalent encodings of an underlying semantic algebra.

\subsection*{I.1 Abstract User Language Space}

Let $\mathcal{S}$ be the space of semantic atoms.

Each user $u$ defines a personal language:

\[
\mathcal{L}_u = (\Sigma_u, \mathcal{G}_u)
\]

where:

\begin{itemize}
\item $\Sigma_u$ is a finite symbol set (words, emojis, expressions)
\item $\mathcal{G}_u \subset \Sigma_u^*$ is a generative grammar
\end{itemize}

Language is endogenous:

\[
\mathcal{L}_u(t+1) = \mathcal{L}_u(t) \cup \Delta_u(t)
\]

where $\Delta_u(t)$ are novel constructions introduced by usage.

\subsection*{I.2 Embodiment as Coordinate Projection}

Let $\mathcal{E}$ be the embodiment space.

An interface $I$ defines a coordinate system:

\[
I : \mathcal{S} \to \mathcal{E}
\]

Users do not receive language from interfaces; they project language into them.

\subsection*{I.3 Continuous Gesture Representation}

Let $\Omega \subset \mathbb{R}^2$ be a continuous motor surface.

A gesture is a trajectory:

\[
\gamma : [0,1] \to \Omega
\]

Define a user-specific encoding:

\[
E_u : \Sigma_u^* \to C([0,1], \Omega)
\]

and decoding:

\[
D_u : C([0,1], \Omega) \to \Sigma_u^*
\]

Correctness condition:

\[
D_u(E_u(w)) = w
\]

Thus, any user language is realizable as a continuous motor language.

\subsection*{I.4 Fourier Decomposition of Gesture Language}

Each trajectory admits spectral decomposition:

\[
\gamma(t) = \sum_{k=1}^{\infty} a_k \sin(2\pi k t) + b_k \cos(2\pi k t)
\]

Define the coefficient vector:

\[
\mathbf{f}(\gamma) = (a_1,b_1,a_2,b_2,\dots)
\]

User-defined gesture phonemes correspond to equivalence classes:

\[
\gamma_i \sim \gamma_j \iff \|\mathbf{f}(\gamma_i) - \mathbf{f}(\gamma_j)\| \le \epsilon
\]

Thus, discrete linguistic units emerge from continuous motor manifolds.

\subsection*{I.5 Symbol Equivalence Across Modalities}

Define modality encoders:

\[
E^{(K)}_u, E^{(S)}_u, E^{(G)}_u, E^{(F)}_u
\]

for keyboard, speech, gesture, and facial expression.

All map from $\mathcal{S}$:

\[
E^{(m)}_u : \mathcal{S} \to \Sigma^{(m)}_u
\]

Semantic invariance condition:

\[
D^{(m)}_u(E^{(m)}_u(s)) = s
\]

Hence, emojis, facial expressions, phonemes, and gestures are semantically isomorphic encodings.

\subsection*{I.6 Discretization of Continuous Expression}

Define a discretization operator:

\[
\Pi_u : C([0,1], \Omega) \to \Sigma_u
\]

such that:

\[
\Pi_u(\gamma) = \arg\max_{\sigma \in \Sigma_u} P_u(\sigma \mid \gamma)
\]

User language determines the partitioning of motor space, not the interface.

\subsection*{I.7 Steering Languages and Spatial Typing}

Let $\Omega$ be a 2D steering surface.

Define a spatial lexicon embedding:

\[
\Lambda_u : \Sigma_u \to \Omega
\]

Typing becomes target navigation:

\[
E_u(w_1 w_2 \dots w_n) = \gamma(t) \text{ passing through } \Lambda_u(w_i)
\]

Thus, steering wheels, joysticks, mice, or Etch-a-Sketch devices can all implement the same language.

\subsection*{I.8 Interface Neutrality Theorem}

For any user language $\mathcal{L}_u$ and any embodiment space $\mathcal{E}$:

\[
\exists\, E_u, D_u \text{ such that } D_u(E_u(w)) = w
\]

Therefore, no interface determines language; interfaces only parameterize coordinate systems.

\subsection*{I.9 Browser and Interface Wars as Coordinate Wars}

Let $I_i, I_j$ be interfaces with different coordinate systems.

They compete by minimizing user projection cost:

\[
C(I,u) = \mathbb{E}_{w \sim \mathcal{L}_u}[\|E_u(w) - I(w)\|]
\]

Browser wars select coordinate systems, not languages.

Failure to support high-dimensional projections predicts long-term expressive collapse.

\subsection*{I.10 Summary Condition}

Language evolution is user-driven:

\[
\mathcal{L}_u \not\subset I
\]

Interfaces succeed iff:

\[
\forall u,\; \exists E_u, D_u \text{ with low } C(I,u)
\]

Language is primary. Embodiment is secondary.