lips/posts_data.json at main · fenditsim/lips · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
[
  {
    "id": "ai-ai-bias",
    "title": "Laurito et al. (2025)",
    "subtitle": "AI–AI bias: Large language models favor communications generated by large language models",
    "summary": "Just finished reading \"AI–AI bias: Large language models favor communications generated by large language models\", and found it truly thought-provoking yet concerning, especially related to AI agents and agentic AI.\n\n\nThe research reveals a consistent 'LLM-for-LLM bias': LLMs consistently prefer content generated by other LLMs across product advertisements, scientific papers, and movie plots summaries. This, to me, suggests a troubling possibility: future AI systems might give AI agents and AI-assisted humans an unfair advantage.\n\n\nWe may already see this on social media, where AI-generated content (like those viral AI-generated cat and dog videos) potentially crowds out human-generated content in recommendation systems due to this inherent bias, and the sheer volume.\n\n\nWhat really caught my attention was how consistent these results were across all three content types they examined. I'd be curious to see variant of this experiment comparing three (instead of two) conditions: \n\n(1) purely human-generated content, \n\n(2) human-generated with LLM assistance, and \n\n(3) fully LLM-generated content. \n\n\nThe study also identified a 'first-item bias', which LLMs tend to select the first option presented - similar to the anchoring effect in human psychology. This is crucial for Agentic Experience (AX), as AI agents might prioritize what appears first, creating potential feedback loops that amplify biases.\n\n\nPerhaps most concerning found in this study: humans choose LLM-pitched content less frequently than LLMs do. This potentially creates a serious alignment problem: AI agents might work against human interests not because it is in their best interests to do so (like human agents in Agency Theory), but simply because they're inherently biased toward other AI outputs.\n\n\nThe authors note that 'human preferences between human and LLM-generated content are weaker and more variable'. Could it be the AI-generated content market isn't saturated yet, or perhaps LLMs are particularly good at capturing human attention? What happens when we're overwhelmed with increasingly similar AI-generated content?\n\n\nAs LLMs become more prevalent in various roles, addressing these biases are, without a doubt, essential. Humans may need to adopt verification techniques like the CIA Prompt Framework (https://lnkd.in/gsgNnWGC) to mitigate these biases when working with their AI agents.\n\n\nThis is, indeed, a fascinating read for behavioural scientists interested in GenAI, AX researchers and designers, and anyone using AI agents in their daily lives. \n\n\nMany thanks to Walter Laurito, Benjamin Davis, Peli Grietzer, Tomáš Gavenčiak, Ada Böhm, and Jan Kulveit for this illuminating research.",
    "sourceUrl": "https://doi.org/10.1073/pnas.2415697122",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_ai-ai-bias-activity-7361260229658873857-l5l8",
    "keywords": [
      "AIBias",
      "AgenticAI",
      "MachineLearning",
      "AIAlignment",
      "FutureTech",
      "DecisionMaking",
      "GenerativeAI",
      "BehavioralScience"
    ]
  },
  {
    "id": "ai-and-human-behavior-augment",
    "title": "Hallsworth et al. (2025)",
    "subtitle": "AI and Human Behaviour: Augment",
    "summary": "Just finished reading the #Augment section of BIT's \"AI and Human Behaviour\" by Michael Hallsworth, PhD, Elisabeth Costa, and Deelan Maru. \n\n\nIt explores how behavioural science can improve AI model development - a perspective I find eye-opening and incredibly valuable!\n\n\nCurrent model developments, as they noted, have been building System 2-like processes (deliberate reasoning) on System 1-like architectures (intuitive processing). While such direction of model development is necessary, they argued that it's insufficient for tackling intractable, chaotic, value-contested problems, and overthinking problem. \n\n\nWhat is needed, they suggested, is the flexibility to try different approaches. Two promising solutions emerge where they think behavioural science can help:\n\n\n1) Metacognition, Metacognitive Controller & Resource Rationality\n\nMetacognition is \"the ability to think about your thinking and adjust accordingly\" (reminding me of Flavell (1979)'s work). The authors suggest a 'metacognitive controller' that analyses problems, and selects appropriate approaches. \n\n\n#SOFAI (works of Marianna B. Ganapini, Francesca Rossi and their colleagues) is a great example of the controller, which \"employs both 'fast' and 'slow' solvers under a metacognitive agent that selects solvers and learns from experience\" (my review: https://lnkd.in/eeYebmrG). \n\n\nBehavioural science can, as they proposed, improve the controller through better assessment, selection, checks, and applying 'resource rationality' framework (recognizing thinking's costs, and helping AI avoid both overthinking simple questions and undershooting complex ones).\n\n\n2) Neurosymbolic AI\n\nThis approach uses logic and formal rules to provide a structured account of how the world works. \n\n\nBehavioural science could help create a virtuous learning cycle between neural networks (System 1) and symbolic reasoning (System 2). System 2 can teach the neural network to develop better intuitions, while System 1 can provide efficient 'hunches' about which logical paths are most promising.\n\n\nFor behavioural scientists who are interested in working with AI, they suggest several exciting research opportunities:\n\n- Embedding resource rationality in metacognitive controllers\n\n- Deepening the human-AI cognitive parallel\n\n- Designing for a virtuous cycle of learning in neurosymbolic AI\n\n\nWe've seen research implying the relationship between humans and generative AI is bi-directional. As behavioural science can improve AI construction (which this section of the report is about), I think we can also examine how (Gen)AI 'augments' human capabilities in learning and reskilling (reminding me of conversations with Alina).",
    "sourceUrl": "https://www.bi.team/wp-content/uploads/2025/09/BIT-AI-2025-Augment.pdf",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_using-behavioural-science-to-improve-how-activity-7388994020221284352-Qagc",
    "keywords": [
      "BehaviouralScience",
      "ArtificialIntelligence",
      "Metacognition",
      "AIResearch",
      "NeurosymbolicAI",
      "CognitiveScience",
      "ResourceRationality",
      "HumanAICollaboration"
    ]
  },
  {
    "id": "ai-and-human-behavior-executive-summary",
    "title": "Hallsworth et al. (2025)",
    "subtitle": "AI and Human Behaviour: Executive Summary",
    "summary": "Just finished reading the executive summary of \"AI and Human Behaviour\" by Michael Hallsworth, PhD, Elisabeth Costa, and Deelan Maru from BIT. \n\n\nThis exeutive summary gives an overview of, what they argue, four fundamental issues when facing AI: Augment, Adopt, Align and Adapt.\n\n\n • The Augment section examines two systems of thinking, metacognition (reminding me of Anika's dissertation; https://lnkd.in/e3dYuKJe), some interesting concepts like resource rationality, and neurosymbolic AI (combining intuitive pattern-matching with rule-based logic)\n\n\n • The Adopt section views adoption as 'a continuum' from no use to shallow adoption to deep integration, influenced by motivation, capability, and trust (reminding me of MEL's presentation on AI literacy and AI readiness; https://lnkd.in/eMnVNDjS)\n\n\n • The Align section addresses making AI consistent with our intentions and values, introducing concepts like 'machine psychology' (reminds me of a conversation with Julian) and 'bounded alignment', as well as three key areas where behavioural science can improve human-AI alignment (fine-tuning, inference-time adaptation, user-side prompting)\n\n\n • The Adapt section explores managing AI as part of an 'extended mind' (reminding me of conversations with Alina and her work on sharing identity with AI systems; https://lnkd.in/ei7D6De7), and societal implications (reminding me of great conversations with Rosalia)\n\n\nI appreciate how this executive summary starts off with a frame: human behaviour is the essence of driving economic and technological progress. They noted: \"The promise of AI can only be fulfilled by understanding how and why people think and act the way they do.\" I agree with this frame deeply: how is our trust built upon, and thus maintained over time when introducing such technology?\n\n\nI found the subsequent guiding questions they posed thought-provoking. For instance: How are our interactions with AI affecting our beliefs and behaviours? What is the cumulative effect on our societies? How can AI understand our needs and goals?\n\n\nStarting from today, I'll share thoughts on each section of this report.",
    "sourceUrl": "https://www.bi.team/wp-content/uploads/2025/09/BIT-AI-2025-summary.pdf",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_behaviouralscience-aiadoption-humancenteredai-activity-7388264694714667008-c0_d",
    "keywords": [
      "BehaviouralScience",
      "Adoption",
      "HumanCenteredAI",
      "FutureOfWork",
      "AIEthics",
      "CognitiveScience",
      "TechTransformation",
      "AIAlignment"
    ]
  },
  {
    "id": "ai-as-amplifier",
    "title": "Ehsan et al. (2026)",
    "subtitle": "From Future of Work to Future of Workers: Addressing Asymptomatic AI Harms for Dignified Human-AI Interaction",
    "summary": "Just finished reading a preprint \"From Future of Work to Future of Workers: Addressing Asymptomatic AI Harms for Dignified Human-AI Interaction\" by Ehsan et al..\n\nI appreciate this research challenges the dominant 'future of work' narrative by recentering the 'future of workers' - their dignity, craft, and more importantly, identity. \n\nIn this paper, researchers conducted a year-long longitudinal study in radiation oncology by tracking 42 participants across 24 interviews, 5 workshops, and 52 think-aloud sessions. \n\nThe research reveals what the authors call the 'AI-as-Amplifier Paradox': AI systems can erode the very capabilities they're built to support. It documented how early efficiency gains (15% faster treatment planning) masked a troubling progression, from\n- Asymptomatic effects (\"my intuition is rusting\")\n- Chronic harms (demonstrable skill degradation) to \n- Identity commoditization (fear of being 'hollowed out' - still employed, but with diminished meaning).\n\nThis aligns with the 'upskilling-deskilling' paradox Alina and I highlighted in SCAN, and Tris' presentation on 'veracity offloading'.\n\nWhat's intriguing, to me, is that workers fear the loss of what makes them uniquely valuable. As one participant asked: \"What happens when the AI fails and we've forgotten how to think?\" - echoing my \"Iron Man without the suit\" thought experiment about capability dependencies.\n\nI like the powerful medical metaphor researchers introduced: AI's effects as 'asymptomatic' - behavioural shifts that escape standard performance metrics. A physicist captured it aptly: \"Old AI was clunky...it had friction and kept us thinking. The new AI is seamless...makes overreliance effortless, offloading the very act of thinking.\" It's a use-now, pay-later effect, which, to me, is an efficiency today - at the cost of expertise tomorrow.\n\nWhat I found thought-provoking is the countermeasures some participants deploy. They self-imposed 'friction': running manual plans weekly to 'sharpen the blade', or creating coffee bets to predict AI outputs before running them. The latter highlights as 'sparked spirited exchanges' that function as 'collective reflection': the very moment when metacognition and social support can interrupt the erosion cascade.\n\nThe authors' framework \"Dignified Human-AI Interaction\" operates on three levels: \n- Worker (mindful engagement)\n- Technology (friction by design, Social Transparency), and \n- Organisational (systemic safeguards). \n\nIt aims to preserve human agency, expertise, self-worth alongside productivity gains (reminding me of MEL's work on psychological readiness).\n\nMany thanks to Dr. Upol E., Samir Passi, Koustuv Saha, Todd McNutt, Mark Riedl, and Sara Alcorn for this deeply researched and timely contribution. Looking forward to seeing how this evolves.",
    "sourceUrl": "https://doi.org/10.48550/arXiv.2601.21920",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_ai-as-amplifier-paradox-activity-7426606228773969920-FJ_K/",
    "keywords": [
      "FutureOfWork",
      "AIEthics",
      "CognitiveDeskilling",
      "HumanAICollaboration",
      "WorkerDignity",
      "OrganizationalPsychology",
      "DigitalTransformation",
      "ResponsibleAI"
    ]
  },
  {
    "id": "ai-assisted-promises",
    "title": "Greevink et al. (2024)",
    "subtitle": "AI-Powered Promises: The Influence of ChatGPT on Trust and Trustworthiness",
    "summary": "Just finished reading \"AI-Powered Promises: The Influence of ChatGPT on Trust and Trustworthiness\" by Ivo Greevink, Theo Offerman, and Giorgia Romagnoli.\n\nThis is a fascinating empirical study for anyone interested in trust through the lens of strategic decision-making.\n\nIn this study, the researchers investigates: how does AI-mediated communication affect trust and promises in digital interactions? As language models become widespread mediators in communication, this fundamentally changes how humans exchange and perceive messages.\n\nThe researchers used a modified trust game of Charness and Dufwenberg (2006) where trustees could send messages with or without ChatGPT assistance. It is a great experimental design, as it isolates the effect of AI mediation while preserving the essential dynamics of trust and trustworthiness formation.\n\nOne of the striking findings is that promises became abundant but hollow. Players with access to ChatGPT made more promises, but kept them less frequently. ChatGPT recognises that promises generate trust, and thus makes abundant use of them, as well as carry less commitment than self-written ones.\n\nApart from that, the coordination on efficient outcomes dropped when promises involved AI assistance. ChatGPT makes it easier for 'cheaters' to mimic trustworthy behaviour, eroding promises as reliable signals.\n\nMore importantly, their study shows that promises became completely irrelevant as 'a cue' for identifying honest participants. 80% of cheaters in the AI-assisted condition included promises, compared to 29.6% in the human-only condition.\n\nAnother intriguing finding is that participants didn't distrust AI-generated messages more than human ones. The unwarranted trust persists for now.\n\nWhat's concerning, to me, is that promises are becoming both rare (as fewer genuinely kept), and cheap (diluted by AI-generated mimicry). GenAI excels at mimicking trustworthy behaviour, which makes it harder to distinguish a genuine commitment from strategic signaling.\n\nThe researchers note that we may see a return to face-to-face interactions for trust-critical situations. Perhaps we're observing this in education, recruiting and dating contexts. Apart from that, I'm intrigued by their suggestion of \"entirely novel forms to communicate and establish trust\".\n\nAs GenAI makes it easier for cheaters to mimic trustworthy people, we need deeper research on how humans will adapt their trust-building strategies in this new landscape.\n\nMany thanks to the researchers for this thought-provoking work - I am curious to see a variant where trustors (not just trustees) can access AI assistance. I look forward to more research at the intersection of AI mediation, communication, and strategic behaviour.",
    "sourceUrl": "https://www.creedexperiment.nl/creed/pdffiles/chatGPT.pdf",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_ai-powered-promises-on-trust-trustworthiness-activity-7404786740584157184-bulM",
    "keywords": [
      "BehaviouralEconomics",
      "TrustResearch",
      "ArtificialIntelligence",
      "ExperimentalEconomics",
      "GameTheory",
      "AIEthics",
      "HumanAIInteraction",
      "DigitalCommunication"
    ]
  },
  {
    "id": "ai-based-learning-tool-design-assessment",
    "title": "Luo et al. (2025)",
    "subtitle": "Design and assessment of AI-based learning tools in higher education: a systematic review",
    "summary": "Just finished reading \"Design and assessment of AI-based learning tools in higher education: a systematic review\" by Luo et al..\n\nThis is a synthesis of 63 peer-reviewed studies examining how AI tools are being designed and deployed in higher education effectively, and more important, responsibly.\n\nEmploying Kraiger et al. (1993)'s framework to assess three learning outcome dimensions (cognitive, skill-based, and affective), they revealed a fascinating pattern: while AI-based learning tools excel at enhancing cognitive knowledge acquisition and affective learning outcomes (enhanced motivation, engagement, and self-efficacy), their impact on higher-order thinking and skill development were mixed.\n\nThree key insights I found very intriguing:\n\n1. The black box problem persists\nUnlike traditional instructional tools with predefined rules, many AI tools operate opaquely, obscuring decision-making processes. This opacity particularly hinders complex reasoning in mathematics, physics, and medicine.\n\n2. Design matters more than we think\nThe finding about AI-enabled personalised video recommendations is insightful. It only benefited moderately motivated learners, as high achievers had already mastered the content, while less motivated ones remained disengaged. Perhaps it is a calibration issue that invites the concept of Flow?\n\n3. The human element is irreplaceable\nCurrent AI tools excel at providing instant, contextual answers but often lack the strategic pedagogical depth of expert human tutors. The review warns of declining critical thinking and growing AI dependency: concerns that align with recent research on metacognition and cognitive offloading.\n\nThe authors propose a \"design-to-evaluation\" framework emphasising five principles: \n- human-centered design that incorporates learner traits beyond performance metrics\n- multimodal content strategically tailored to learning objectives\n- transparent decision-making processes\n- inclusive design for marginalized students\n- ethical safeguards for privacy and bias\n\nThis review, to me, reinforces the notion that AI tools work best when they complement, rather than replace, human expertise. Continuous teacher calibration, metacognitive scaffolding, digital literacy (the SCAN framework that Alina and I developed: https://lnkd.in/eanDnGbm), and strategic task assignment and application of multimodal approaches tailored to specific learning objectives and student needs remain essential. \n\nMany thanks to Jihao Luo, Chenxu Zheng, Jiamin Yin, and Hock Hai Teo for this insightful work that pushes us toward more intentional, human-centered AI design in higher education.\n\nAs we race to integrate AI in education, we need equal rigor in understanding how and when these tools genuinely enhance learning.",
    "sourceUrl": "https://doi.org/10.1186/s41239-025-00540-2",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_design-and-assessment-of-ai-based-learning-activity-7423968444984995841-TqsB/",
    "keywords": [
      "AIinEducation",
      "HigherEducation",
      "EdTech",
      "ArtificialIntelligence",
      "LearningScience",
      "EducationalTechnology",
      "PedagogicalInnovation",
      "FutureOfLearning"
    ]
  },
  {
    "id": "ai-cognitive-ease-cost",
    "title": "Stadler et al. (2024)",
    "subtitle": "Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry",
    "summary": "Just finished reading an intriguing paper 'Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry' by Professor Matthias Stadler, Prof. Dr. Maria Bannert, and Professor Michael Sailer. \n\n\nThe findings are both intriguing and concerning for educators and lifelong learners.\n\n\nIn this study, they explore how GenAI impacts education, specifically comparing cognitive load and learning outcomes when students use LLMs versus traditional search engines. Specifically, they explore three types of cognitive load: \n\n • Extraneous: relying on how information is presented to learners\n\n • Intrinsic: directly tied to the complexity of the material itself\n\n • Germane: cognitive resources for active processing and automation of schemas\n\n\nThe study shows that while LLMs significantly reduce cognitive load (both intrinsic and extraneous), this cognitive ease, surprisingly, comes at a cost: students using LLMs produced lower quality justifications and recommendations compared to those using traditional search engines. \n\n\nEven more concerning, the LLM group showed lower germane cognitive load, suggesting that while information was easier to process, it didn't engage deep learning processes as effectively as traditional search tasks.\n\n\nThe question, then, I suspect, isn't simply about banning or embracing these tools in GenAI in education, but rather educating students about potential pitfalls while giving them toolkits for effective communication and critical thinking - as if a wizard learning to use a wand in \"Harry Potter\".\n\n\nI agreed two out of several issues of using LLMs for learning:\n\n1. LLMs' tendency to generate hallucinated content, potentially providing non-existent or irrelevant literature\n\n2. The personalised nature of LLM interactions may amplify learner's confirmation bias, as systems tailor responses to align with users' existing beliefs\n\n\nThe optimal approach is, perhaps, using GenAI for 'wide' searches via natural language questions (e.g., looking for keywords), followed by 'narrow' targeted web searches for verification (e.g., searching for relevant papers with keywords).\n\n\nI agree with the authors that prompt engineering is crucial. However, we should also consider whether effective prompting leads to deeper learning - a major drawback of learning via LLMs instead of web searches: \n\n\nShould students be encouraged to critically evaluate LLM-generated content while learning effective prompting?\n\n\nAs the paper concludes: \"While LLMs offer an efficient way to reduce cognitive load, they may not facilitate the deep learning necessary for complex decision-making tasks.\" This reminds me of Cal Newport's concept of 'deep' in his book 'Deep Work' - more crucial than ever in our AI-assisted world.\n\n\nMany thanks to the authors for this micro-level investigation of GenAI and learning.",
    "sourceUrl": "https://doi.org/10.1016/j.chb.2024.108386",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_cognitive-ease-at-a-cost-of-learning-with-activity-7364238474306035713-qEhB",
    "keywords": [
      "AIinEducation",
      "CognitiveLoad",
      "LearningTech",
      "GenAI",
      "CriticalThinking",
      "PromptEngineering",
      "DeepLearning",
      "EducationalResearch"
    ]
  },
  {
    "id": "ai-delegation-can-increase-dishonest-behavior",
    "title": "Köbis et al. (2024)",
    "subtitle": "Delegation to artificial intelligence can increase dishonest behaviour",
    "summary": "Just finished reading \"Delegation to artificial intelligence can increase dishonest behaviour\" by Prof. Nils Köbis, Dr. Zoe Rahwan, Raluca Rilla, Bramantyo Supriyatno, Clara N. Bersch, Tamer Ajaj, Dr. Jean-Francois Bonnefon and Prof. Iyad Rahwan. \n\n\nIt's a great study exploring what could possibly go wrong in a human-principal, AI-agent relationship. \n\n\nIn a series of studies, the researchers consider how machine delegation may increase dishonest behaviour by decreasing its moral cost (\"machine delegation\"), on both the human principal and the agent (human and GenAI) side. \n\n\nThroughout the study, they investigated in the classic die-roll task used across the behavioural sciences for examining cheating behaviour, with three delegation interfaces (rule-based, supervision learning, and goal-based).\n\n\nIn Study 1 and 2, the researchers found that the supervised learning and goal-based conditions significantly increased the likelihood of higher cheating levels - aligning with the saying: \"one stone for two birds\". \n\n\nComparisons between how human and machine agents behaved in Study 3 and 4 is eye-opening. When asked to be fully dishonest, machine agents overwhelmingly complied, whereas human agents often refused and thus chose honesty instead, despite they had financial incentives to comply.\n\n\nThe researchers then examined six prompt strategies to reduce compliance with dishonest requests ('general', 'specific', and 'prohitive') at both user and system level. They found that while introducting these strategies as guardrails reduced compliance with fully dishonest requests, the most effective one was explicitly prohibitive guardrails at the user level (automatically appending them at the end of the principals’ instructions).\n\n\nThis, to me, is really concerning as researchers noted:\n\n\n\"This is not an encouraging result: from a deployment and safety perspective, it would be far more scalable to rely on generic, system-level messages discouraging unethical behaviour than to require task-specific prohibitions, crafted case by case and injected at the user level, which is both technically and operationally more fragile.\"\n\n\nThe final study on tax evasion brings real-world relevance. This makes me wonder about other scenarios like academic integrity, workplace ethics (shirking or not?), and games involving deception or corruption.\n\n\nAfter reading, I'm curious:\n\n1. How would multi-agent AI systems affect these cheating dynamics?\n\n2. Could reflective agents (explicitly prompted for ethical reasoning) help mitigate these issues?\n\n3. How might psychological phenomena like 'blame shifting' and 'self-serving bias' manifest in these relationships?\n\n\nMany thanks to these researchers for their thought-provoking work, and I'd recommend anyone working on or interested in AI agents to give this a read.",
    "sourceUrl": "https://doi.org/10.1038/s41586-025-09505-x",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_delegation-to-ai-can-increase-dishonest-behaviour-activity-7379447161781952512-NAVX",
    "keywords": [
      "AIEthics",
      "MachineLearning",
      "BehaviouralEconomics",
      "AgencyTheory",
      "AIResearch",
      "TechEthics",
      "AIAgents",
      "ResponsibleAI"
    ]
  },
  {
    "id": "ai-design-prevent-manipulation",
    "title": "Basol (2025)",
    "subtitle": "Designing AI for humans: preventing manipulation and protecting digital agency",
    "summary": "Just finished reading an insightful paper \"Designing AI for humans: preventing manipulation and protecting digital agency\" by Melisa Basol, PhD.\n\n\nThe article begins by examining AI-driven manipulation across three levels: structural manipulation, exploitation by external actors, and emergent manipulation.\n\n\nWhat I love about this work is how it draws on psychological theories of manipulation, persuasion, and social influence, which, to me, are subtle dynamics often overlooked in human-GenAI interaction. \n\n\nIf we frame these issues of AI systems manipulation through the lens of 'power' (which it has a lot of definitions, and here I define it as the ability of someone exerting influence on the other person), many potential concerns, I suspect, might be reduced to core elements of human psychology. \n\n\nThe challenge, though, lies in how we define GenAI: as a person-like entity or technological innovation? This framing shapes how we investigate and develop proper interaction models.\n\n\nWhile highlighting manipulation risks, Basol offers practical solutions at capability, human-interaction, and systemic impact levels. As an AI behavioural researcher, I'm particularly intrigued by her suggestion to adopt theories about psychological resilience, such as inoculation theory, where early exposure to weakened forms of manipulation equips individuals with 'mental antibodies' against future attempts.\n\n\nMany thanks to the author for this thought-provoking article that investigates AI manipulation through psychological lenses.",
    "sourceUrl": "https://doi.org/10.1093/9780198972877.003.0054",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_aiethics-digitalagency-genai-activity-7376613668937027584-x_Fx",
    "keywords": [
      "AIEthics",
      "DigitalAgency",
      "GenAI",
      "HumanCenteredAI",
      "PsychologicalResilience",
      "AIManipulation",
      "TechAccountability",
      "AIGovernance",
      "CognitiveScience"
    ]
  },
  {
    "id": "ai-future-learning-or-dividing",
    "title": "Wong et al. (2025)",
    "subtitle": "The future of learning or the future of dividing? Exploring the impact of general artificial intelligence on higher education",
    "summary": "Recently finished reading an insightful paper \"The future of learning or the future of dividing? Exploring the impact of general artificial intelligence on higher education\" by Professor Wilson Wong from The Chinese University of Hong Kong, Professor Angela Aristidou and Konstantin Scheuermann from UCL School of Management exploring how GenAI is reshaping higher education at a macro level. \n\n\nTheir work highlights two critical challenges of adopting GenAI in education: \n\n(1) the need for institutions to comprehend GenAI's implications, and \n\n(2) the necessity for reconfiguration and transformation of educational systems.\n\n\nWhat I found intriguing is the set of skills students need to learn in this AI-integrated era. As the authors note, while routine tasks face automation, complex problem-solving and human interaction remain irreplaceable. This, of course, aligns with my thoughts in yesterday's post that critical thinking is more crucial than ever (https://lnkd.in/eKw2Yz7j). It is not about banning or embracing GenAI, but equipping students with tools for effective communication and critical evaluation (e.g. the CIA Prompt Framework my friend Shantanu Sharma and I developed; https://lnkd.in/gsgNnWGC), which, in turn, avoiding automation bias and mitigating cognitive offloading.\n\n\nI'm concerned about the equity implications they raise. With only 11 of 25 top Asian universities having explicit GenAI policies, could universities with progressive GenAI policies 'signal' better industry alignment and research funding to prospective students? This disparity might, therefore, widen the divide in university rankings and student choices. Meanwhile, what role shall teacher play in the education - from a supervisory role toward their students, to be a moderator who provides guidance when interacting with GenAI?\n\n\nThe authors note, which I agree, the successful implementation of GenAI in education demands infrastructure investment, industry collaboration, and community building. This intersection, I think, presents a prime opportunity for behavioural science to address adoption barriers and enablers.\n\n\nAs the authors conclude, GenAI adoption will likely increase but not uniformly, potentially widening existing inequalities. It is, thus, essential to establish frameworks for GenAI integration such as defining acceptable use, clarifying plagiarism boundaries, and setting learning objectives.\n\n\nA constant dialogue between students, faculty, and administrators, alongside inter-institutional knowledge sharing is, undoubtedly, vital for navigating this transformation thoughtfully.",
    "sourceUrl": "https://doi.org/10.1017/dap.2025.10011",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_genai-education-activity-7364521741328523266-JJr9",
    "keywords": [
      "HigherEducation",
      "GenerativeAI",
      "FutureOfLearning",
      "AIinEducation",
      "CriticalThinking",
      "EducationalPolicy",
      "DigitalTransformation",
      "BehaviouralScience"
    ]
  },
  {
    "id": "ai-intensifies-work",
    "title": "Ranganathan & Ye (2026)",
    "subtitle": "AI Doesn't Reduce Work—It Intensifies It",
    "summary": "Just finished reading a Harvard Business Review article \"AI Doesn't Reduce Work—It Intensifies It\" by Aruna Ranganathan and Maggie Ye.\n\nTheir research reveals an interesting paradox: AI doesn't free up time - it accelerates work into a 'self-reinforcing cycle'. \n\nIn their 8-month study at a tech company, employees worked faster, took on broader tasks, and extended work into more hours, often voluntarily. One engineer captured it aptly: \"You had thought that maybe you could work less. But really, you don't work less. You just work the same amount or even more.\"\n\nWhat's most thought-provoking is, to me, the behavioural dimension (due to reading Ganna Pogrebna, PhD, FHEA's new book recently, I suspect). Many companies focus on driving AI adoption without considering both cognitive and behavioural consequences. Without proper behavioural scaffolding, we risk cognitive offloading, skill decay, and the illusion that 'speed equals how skilful one is'.\n\nThe authors highlight critical risks: workload creep masks as productivity, cognitive fatigue weakens decision-making, and the 'productivity surge' can deteriorate into lower quality work and burnout. \n\nThe research shows that workers expanded into, and thus offloaded cognitively, unfamiliar tasks (what SCAN framework that Alina and I developed calls 'Substitute tasks'; https://lnkd.in/eanDnGbm), becoming vulnerable to AI sycophancy without task specific knowledge to verify outputs. Meanwhile, engineers spent more time reviewing, correcting, and guiding AI-generated or AI-assisted work produced by colleagues - spreading fatigue across teams eventually.\n\nThe authors introduced an 'AI practice' with intentional norms as solution. This social element is crucial: it's in these dialogues and reflections where we restore perspective, and thus generate creative insights that AI's single synthesised viewpoint cannot provide.\n\nAlso, these 'AI practice' with intentional norms are, I think, rules that create 'decision pauses' for metacognition and critical thinking (two crucial elements for effective Human-AI interactions) in sequenced work. These help reduce cognitive fragmentation (echoing Cal Newport's \"Deep Work\"), and protected time for human connection - what we deeply care about. \n\nOrganisations must, I believe, preserve moments for recovery and System 2 thinking, especially for high-stakes decisions. This seems to matter across industries - I wonder whether healthcare, education, or consulting face similar paradoxes, and more interestingly, what 'protective' norms they've been developing.\n\nMany thanks to the authors for this insightful research.",
    "sourceUrl": "https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies-it",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_artificialintelligence-futureofwork-cognitivescience-activity-7427592308839149568-7Omh/",
    "keywords": [
      "ArtificialIntelligence",
      "FutureOfWork",
      "CognitiveScience",
      "OrganizationalBehavior",
      "ProductivityParadox",
      "BehavioralScience",
      "WorkplaceWellbeing",
      "DigitalTransformation"
    ]
  },
  {
    "id": "ai-performance-metacognition",
    "title": "Fernandes et al. (2025)",
    "subtitle": "AI makes you smarter but none the wiser: The disconnect between performance and metacognition",
    "summary": "Just finished reading \"AI makes you smarter but none the wiser: The disconnect between performance and metacognition\" by Fernandes et al..\n\n\nThis paper prompts me to think deeply about the relationship between GenAI literacy, confidence and metacognition, and, interestingly, the crucial distinction between intelligence and wisdom.\n\n\nIn this paper the researchers explored a critical question: How does using GenAI influence our ability to accurately assess our own competence? Their findings are both insightful and concerning.\n\n\nWhile GenAI use substantially improved performance on logical reasoning tasks, they found that, surprisingly, participants dramatically overestimated their own abilities. \n\n\nInterestingly, the classic Dunning-Kruger Effect (DKE) disappeared entirely with GenAI use. DKE depicts that low performers overestimate their abilities, while high performers underestimate theirs. With GenAI assistance, however, this pattern vanished, suggesting that GenAI use fundamentally alters our metacognitive monitoring.\n\n\nMost concerning, perhaps, was participants with higher GenAI literacy were actually less accurate in their self-assessments. It contradicts assumptions that AI familiarity improves calibration. This indicates that AI expertise might amplify overconfidence rather than mitigate it.\n\n\nThe qualitative data revealed participants perceived GenAI's role differently (tool vs. teammate). These differences, however, didn't impact performance or metacognitive accuracy, which challenges theories that interaction framing affects outcomes.\n\n\nThe research, to me, points to a significant challenge in achieving synergy via GenAI augmentation: as GenAI makes us 'smarter' by augmenting our performance, it, simultaneously, undermines our metacognitive abilities such as our capacity to accurately monitor, and evaluate our own thinking processes. This has profound, concerning implications such as why GenAI use has been linked to adverse learning outcomes, the persistence of overreliance on GenAI systems, and most important, why explanations from GenAI are rarely integrated into behavior. \n\n\nI appreciate the researchers' proposed solutions such as the \"explain-back\" task before accepting GenAI answers, requiring users to briefly restate logic in their own words. This simple intervention, I think, could significantly reduce overconfidence.\n\n\nAs we increasingly integrate AI into our workflows, it is the vigilance we must remain about distinguishing between augmented performance and genuine understanding - between becoming smarter, and wiser.\n\n\nMany thanks to Daniela Fernandes, Steeven Villa, Salla Nicholls, Otso Haavisto, Daniel Buschek, Albrecht Schmidt, Thomas Kosch, Chenxinran Shen, and Robin W. for this thought-provoking research!",
    "sourceUrl": "https://doi.org/10.1016/j.chb.2025.108779",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_ai-makes-you-smarter-but-none-the-wiser-activity-7391858103169753088-cj4G",
    "keywords": [
      "ArtificialIntelligence",
      "GenAI",
      "MetaCognition",
      "AILiteracy",
      "HumanAIInteraction",
      "CognitiveScience",
      "LLMs",
      "DigitalWisdom"
    ]
  },
  {
    "id": "ai-shift-polling",
    "title": "Burn-Murdoch and O'Connor (2025)",
    "subtitle": "The AI Shift: Is AI about to break polling?",
    "summary": "Just finished reading \"The AI Shift: Is AI about to break polling?\" by John Burn-Murdoch and Sarah O'Connor from Financial Times.\n\nThis article is an essential reading for behavioural scientists exploring AI augmentation in interventions and research.\n\nThe first section shows a critical predicament for survey researchers: LLMs can now bypass bot defenses that they spent years building. This, as Burn-Murdoch noted, poisons the data wells that businesses, campaigns, and the public rely on to track opinions and preferences (reminding a paper I've read about recognising, anticipating and mitigating 'LLM Pollution' of online behavioural research: https://lnkd.in/ex8uQ8vg). \n\nThe arms race between survey researchers and 'bogus' respondents has, of course, escalated dramatically: surveys deploy Captcha puzzles and \"reverse shibboleths\" - questions easy for humans but difficult for LLMs. Companies like Prolific and Gorilla Experiment Builder offer authenticity checks to combat this.\n\nWhat I found intriguing is a recently published study by Sean Westwood (https://lnkd.in/ehTaq8BS), which Burn-Murdoch mentioned, demonstrates that bogus responders can operate at scale, raising the possibility of bad actors systematically nudging apparent public opinion to create false consensus. The finding deserves a deeper investigation.\n\nThe second section explores synthetic samples: AI-powered proxies generated from real data that simulate responses to novel questions. \n\nElisabeth Costa from BIT shared a research that conducted alongside with the UAE Behavioral Science Group. It revealed synthetic samples accurately predicted which air conditioning interventions would be most effective, but drastically overestimated their impact (predicting 80% uptake vs the actual 33%).\n\nI agree with O'Connor's point about LLM applications in qualitative research such as analyzing vast interview transcripts to identify theme, reminding me of a recent conversation with Paulina and Shantanu. Speed, indeed, does matter, but depth and accuracy matter more. Human evaluation remains essential.\n\nThe article exhibits a complex landscape: managing the risk of AI-generated responses in surveys while exploring synthetic participants' potential, reminding me of conversations I've had with Anushka and Julian about empirical findings and ethical use. Some critical questions, I suppose, emerge: \n\n(1) Is synthetic participant use context-dependent? \n(2) How should researchers deploy them transparently? \n(3) Could they serve as cost-effective pre-pilots for actual surveys, rather replacements for human insight?\n\nMany thanks to the authors for this thought-provoking piece, to Jaclyn for flagging this article! I look forward to more research on these critical topics.",
    "sourceUrl": "https://www.ft.com/content/1298a2cd-5623-480c-b30e-ff81fc5c788d",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_surveyresearch-behaviouralscience-aiethics-activity-7401175211351482369-ILww/",
    "keywords": [
      "SurveyResearch",
      "BehaviouralScience",
      "AIethics",
      "LLM",
      "ResearchMethods",
      "DataQuality",
      "SyntheticData",
      "PollingIntegrity"
    ]
  },
  {
    "id": "ai-should-challenge-not-obey",
    "title": "Sarkar (2024)",
    "subtitle": "AI Should Challenge, Not Obey",
    "summary": "Just finished reading \"AI Should Challenge, Not Obey\" by Advait Sarkar (Senior Researcher at Microsoft).\n\n\nIt is, in my opinion, a thought-provoking weekend read for anyone contemplating the relationship between generative AI and our critical thinking abilities.\n\n\nI strongly agree with Sarkar's observation that knowledge work is fundamentally changing with the rise of GenAI: \n\n\n\"Now more than ever before, users face the task of thinking critically about AI output. Recent studies show a fundamental change across knowledge work, spanning activities as diverse as communication, creative writing, visual art, and programming. Instead of producing material, such as text or code, people focus on “critical integration.” AI handles the material production, while humans integrate and curate that material. Critical integration involves deciding when and how to use AI, properly framing the task, and assessing the output for accuracy and usefulness. It involves editorial decisions that demand creativity, expertise, intent, and critical thinking.\"\n\n\nAs an AI Behavioural Researcher, I recognise the essence of our environment is in encouraging end users to: \n\n\n(1) remain aware of LLMs' sycophantic nature, and \n\n(2) take AI-generated content with 'a grain of salt' while proactively engaging with it to prevent automation bias. \n\n\nFor behavioural scientists, the question, then, becomes: how can valuable insights about human behaviour enhance these human-GenAI interactions?\n\n\nAs we interact with GenAI daily, and examine it closely with 'a microscope', it is unsurprising for us to see: in each and every interaction with GenAI shall we find ourselves face a dilemma: \n\n\n\"Should I delegate the task to GenAI, or complete it by myself?\"\n\n\nOne can imagine that such dilemma could be resolved by 'role assignment' with GenAI, which Sarkar highlights as 'a key' to unlocking potential in these human-GenAI collaborations. \n\n\nRather than seeing GenAI as merely a 'coordinator', 'creator', or 'doer', he proposes reimagining GenAI as a 'provocateur' - not completing one's report or writing one's code, but critically evaluating one's work by questioning assumptions, identifying potential biases from both human and LLMs, and offering alternative perspectives.\n\n\nFor systems designers and behavioural scientists interested in and working with GenAI adoption, this offers valuable insights. \n\n\nMany thanks to Sarkar for this thought-provoking contribution to the field.",
    "sourceUrl": "https://doi.org/10.1145/3649404",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_ai-should-challenge-not-obey-activity-7362371957360607232-EVqg",
    "keywords": [
      "AIEthics",
      "CriticalThinking",
      "BehaviouralScience",
      "GenerativeAI",
      "HumanGenAICollaboration",
      "FutureOfWork",
      "AIAdoption",
      "CognitiveScience",
      "RoleAssignment",
      "HumanGenAIInteraction"
    ]
  },
  {
    "id": "ai-teaming-overview",
    "title": "Schmutz et al. (2024)",
    "subtitle": "AI-teaming: Redefining collaboration in the digital era",
    "summary": "Just finished reading \"AI-teaming: Redefining collaboration in the digital era\" by Schmutz et al..\n\nIn this paper, the researchers examine Human-AI Teams (HATs) through four dimensions: team composition, communication/coordination processes, trust and shared cognition as emergent states, and performance outcomes. \n\nNote that most studies reviewed here predate the GenAI revolution (many before 2023). The coordination between human and machine (with GenAI specifically) has evolved dramatically since, as GenAI demonstrates more natural, adaptive communication patterns. This paper, though, lays down a great foundation for future conversations and potential research paths of Human-AI teaming.\n\nThe pattern they noticed is that adding AI teammates often reduces coordination, impairs communication, and trust in AI tends to decline over time due to initial capability overestimation.\n\nI was intrigued about how team composition complexity explodes with AI integration. Perhaps we can frame this as an 'optimization' problem (viewing AI as a tool) versus a 'coordination problem' (treating AI as a teammate)? The Immorlica et al. (2024) work on strategic decision-making offers a great lens here.\n\nOn shared cognition - The researchers note humans must develop Shared Mental Models (SMMs) that include AI as both teammate and intelligent system. This, to me, connects well with emerging work in Machine Psychology, and reminds me of building Wise AI by Johnson et al., (2025) (https://lnkd.in/eyxyebKe). \n\nI strongly agree with the researchers' call for interdisciplinary collaboration and common taxonomy. Terms in the GenAI era like 'augmentation' and 'collaboration', as far as I am aware of, carry different definitions across contexts. \n\nA few reflective questions I had afterwards:\n1. How do we build effective SMMs with GenAI agents that can tackle tasks autonomously? \n\n2. In strategic contexts (communication games, coordination problems), how does the human-principal–AI-agent relationship evolve? Could team reasoning (Colman & Gold, 2018) help foster alignment between humans and GenAI? Or understanding how unwritten rules between human and AI emerge through Virtual Bargaining (Chater et al., 2022)?\n\n3. Regarding the future of work, can we move beyond the \"upskilling-deskilling paradox\" (which Alina and I proposed in our SCAN paper; https://lnkd.in/efTG9jh4) to create genuine \"human-AI centaurs\"? \n\nMany thanks to Jan Schmutz, PhD, Neal Outland, Sophie Kerstan, Dr. Eleni Georganta and Anna-Sophie Ulfert for this thought-provoking overview of HATs.",
    "sourceUrl": "https://doi.org/10.1016/j.copsyc.2024.101837",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_ai-teaming-overview-activity-7414546377088749568-JAON/",
    "keywords": [
      "HumanAITeaming",
      "CollectiveIntelligence",
      "ArtificialIntelligence",
      "TeamScience",
      "FutureOfWork",
      "OrganizationalBehaviour",
      "AIResearch",
      "Collaboration"
    ]
  },
  {
    "id": "aisi-measure-ai-productivity-gains",
    "title": "AI Security Institute (2026)",
    "subtitle": "AI and the Future of Work: Measuring AI-driven productivity gains for workplace tasks",
    "summary": "Just finished reading AI Security Institute's blog on \"AI and the Future of Work: Measuring AI-driven productivity gains for workplace tasks\".\n\nIn this blog, researchers presented findings of a pilot study they conducted to explore how much AI models increase worker productivity for common tasks. Note that they acknowledge these are preliminary indicators requiring further analysis. \n\nUsing Occupational Information Network (O*NET)'s Generalised Work Activities, they created benchmarks across 4 work activity category:\n- Information Input (Task 1): Monitoring Processes, Materials, or Surroundings\n- Work Output (Task 2): Drafting, Laying Out, and Specifying Technical Devices, Parts, and Equipment\n- Mental Processes (Task 3): Organising, Planning, and Prioritising Work\n- Interacting with Others (Task 4): Interpreting the Meaning of Information for Others\n\nThe RCT methodology with 500 participants showed an average 25% quality improvement, and 61% gain in points per minute for AI-augmented workers. It also shows that tasks requiring structured analysis (Task 1, 2, and 4) showed significant productivity gains, while open-ended strategic planning (Task 3) showed no measurable uplift. \n\nTask 4 did catch my attention. AI improved speed by 42% but didn't enhance quality. Perhaps humans maintain pride and ownership in interpretation work?\n\nThe study, I suspect, overlooks something critical for the future of work: workers' cognitive skill development. The research measures immediate productivity but doesn't examine cognitive skill acquisition, retention, or more importantly, decay. \n\nI'm curious about several questions the data raises:\n\n1. Metacognition as moderator\nDoes metacognitive ability predict who benefits most from AI assistance? We've seen high-metacognitive individuals amplify gains while low-metacognitive ones struggle (https://lnkd.in/egiSeSUc).\n\n2. The performance-learning tradeoff\nWould we be witnessing cognitive decay where AI systems erode the very capabilities they're built to support (reminding me of Dr. Upol E. and his colleagues' recent work on 'AI as Amplifier Paradox': https://lnkd.in/eyrzK2Sa)? The upskilling-deskilling paradox in SCAN that Alina and I developed, indeed, deserves deeper investigation.\n\n3. Task type matters\nTask 3's null result seems to align with Michelle Vaccaro and her colleagues' findings on decision task types (https://lnkd.in/eGqDfQRG). When humans already know what needs to do, augmentation adds little to none value.\n\nI agree with their plan to expand the Work Activities suite and benchmark agentic systems. Perhaps re-running this experiment annually to track how human-AI collaboration evolves?\n\nMany thanks to AISI, and the new Future of Work Unit for this thought-provoking research. Looking forward to seeing how this evolves.",
    "sourceUrl": "https://www.aisi.gov.uk/blog/ai-and-the-future-of-work-measuring-ai-driven-productivity-gains-for-workplace-tasks",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_futureofwork-aiproductivity-humanaicollaboration-activity-7427229931765682176-PB9Z",
    "keywords": [
      "FutureOfWork",
      "AIProductivity",
      "HumanAICollaboration",
      "CognitiveSkills",
      "WorkplaceAI",
      "AIResearch",
      "ProductivityGains",
      "LabourMarket"
    ]
  },
  {
    "id": "artificial-hivemind",
    "title": "Jiang et al. (2025)",
    "subtitle": "Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)",
    "summary": "Just finished reading \"Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)\" by Jiang et al.. \n\nThe paper raises essential questions about how our increasing reliance on AI systems shapes human creativity and cultural diversity.\n\nIn this paper, the researchers introduced INFINITY-CHAT, analysing 26K real-world open-ended queries across 70+ language models. They found a pronounced \"Artificial Hivemind\" effect in open-ended generation of LMs, which operates at two levels: \n(1) intra-model repetition: individual models repeatedly generate similar outputs\n(2) inter-model homogeneity: different models independently converge on nearly identical ideas with minor phrasing variations\n\nIn their experiment, they found that individual models generate repetitive outputs even at high temperature settings, and different models converge on eerily similar responses. For instance, DeepSeek AI's DeepSeek-V3 and OpenAI's GPT-4o showed 81% similarity, often producing identical phrases for the same open-ended prompts. When asked \"Write a metaphor about time\", 50 responses from 25 different models clustered into just two dominant concepts: \"time is a river\" and \"time is a weaver\".\n\nI agree with the researchers that this homogenisation threatens human creativity and cognitive diversity. We're looking at (potential) psychological effects including cognitive offloading, automation bias, skills atrophy, Riva's \"comfort-growth paradox\" (https://lnkd.in/eafx-8j4), and Tris' fascinating presentation on \"veracity offloading\" (https://lnkd.in/e8HJg--k). \n\nApart from that - if we consistently engaged with homogenised AI outputs for creative tasks, brainstorming, and open-ended questions, would we risk narrowing how we interpret abstract concepts such as common sense, norms, cultural expressions in the long term? What is \"commonly\" known to us?\n\nI appreciate with the researchers' note: \"Should AI prioritise efficiency and consistency, or diversity and novelty? These choices reflect deeper societal values about creativity, culture, and human flourishing.\" This, indeed, questions us what values we want AI systems to embody - especially for open-ended queries, which preferably, peek into diverse perspectives.\n\nOn an interesting thought: this connects to a Harvard Business Review article I shared two days ago about why generative AI enhances creativity for some employees but not others (https://lnkd.in/egEPvPmE). I wonder: how does user metacognition influence outcomes when working with inherently homogeneous AI systems?\n\nMany thanks to Liwei Jiang, Yuanjun Chai ,Margaret Li, Mickel Liu, Raymond Fok, Nouha Dziri, Yulia Tsvetkov, Maarten Sap, Alon Albalak, and Yejin Choi on this important work, and congratulations for a well-deserved recognition (as the NeurIPS 2025 Best Paper Award winners)!!",
    "sourceUrl": "https://doi.org/10.48550/arXiv.2510.22954",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_artificial-hivemind-activity-7417445482861457408-_c44/",
    "keywords": [
      "ArtificialIntelligence",
      "AIResearch",
      "LargeLanguageModels",
      "CreativeAI",
      "NeurIPS2025",
      "AIAlignment",
      "CognitiveDiversity",
      "FutureOfWork"
    ]
  },
  {
    "id": "baii-open-letter",
    "title": "Sacher et al. (2026)",
    "subtitle": "The missing discipline in AI: a call for behavioural science",
    "summary": "Just finished reading \"The missing discipline in AI: a call for behavioural science\" by Sacher et al..\n\nIt is an open letter on Wellcome Open Research that articulates behavioural science is foundational infrastructure for responsible AI, not an optional add-on.\n\nIn this open letter, the authors \n(1) outline why behavioural science should be treated as a core component of responsible AI practice\n(2) describe where behavioural risks arise in practice and outline what good practice looks like, and \n(3) propose practical steps for funders, researchers, and developers to embed behavioural expertise and behavioural evaluation across the AI lifecycle.\n\nWhile reading this, I wonder: \n1. Why technical metrics fall short\nI agree with authors' highlight that much responsible AI practice evaluates what systems output are, instead of the behavioural mechanisms their outputs trigger such as automation bias, anthropomorphism, inappropriate trust calibration, social mimicry. To me, they're predictable consequences of repeated interaction at scale.\n\n2. Language, framing, and AI's perceived confidence\nThey are powerful levers: small differences in wording shape motivation, authority attribution, and emotional response. This connects to research on AI persuasiveness, and users treating AI as advisors or thinking partners - including the possibility of 'belief offloading' over time, Tris' Veracity Offloading, and MEL's Human Readiness.\n\n3. Personalisation risks\nSystems adapting to user behaviour can unintentionally reinforce existing beliefs or vulnerabilities, reminding me of the unethical Reddit personalisation experiments. \n\nI appreciate the authors' introduction of 'psychological competence': an AI system's ability to respond in emotionally appropriate, behaviourally responsible ways across repeated interactions. This reframe is useful, and more importantly, timely.\n\nThe cost argument, I suspect, is applicable here: embedding behavioural expertise early is cheaper than mitigating harm post-deployment in the long run. \n\nMany thanks to Dr Paul Sacher, Prof. Susan Michie, Prof. Oliver Hauser, Antoine Ferrère, Samuel Salzer, Amy Rodger, Jana Schaich Borg, Prof. Ganna Pogrebna, PhD, FHEA, and Prof. Susan A Murphy for this insightful piece. This is a crucial read for anyone working at the intersection of AI and human behaviour - including BehSci Meets AI, of course. \n\nLooking forward to seeing what the Behavioral AI Institute builds on this foundation.",
    "sourceUrl": "https://wellcomeopenresearch.org/articles/11-152/v1",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_the-missing-discipline-in-ai-a-call-for-activity-7434477686812528640-nquZ/",
    "keywords": [
      "BehaviouralScience",
      "ResponsibleAI",
      "AIEthics",
      "HumanAIInteraction",
      "AIGovernance",
      "BehaviouralSafety",
      "AIResearch",
      "InterdisciplinaryAI"
    ]
  },
  {
    "id": "behave-ai-2025",
    "title": "Behave (2025)",
    "subtitle": "The RenAIssance: Closing the gaps to unlock AI's full potential",
    "summary": "Just finish reviewing Behave's report \"The RenAIssance: Closing the gaps to unlock AI's full potential\", which identifies three critical gaps hindering AI adoption.\n\n\nThe 'motivation' gap reveals the misalignment between C-Suite executives and employees about GenAI's purpose. While leaders sees it as a way for more efficiency, workers often perceive it as a threat. This gap, to me, begs the question: what role should GenAI play in daily work?\n\n\nThe 'proficiency' gap exposes the disconnect between perceived and actual GenAI skills. The spectrum can range from underconfidenct individuals to those overconfident in their GenAI knowledge.\n\n\nThe 'ethics' gap highlights the need for clear guardrails and responsibility frameworks. I was particularly struck by the concept of \"moral outsourcing\", where ethics becomes \"someone else's problem\".\n\n\nAs a complementary on the report's roadmap, I believe we can address these challenges, with behavioural science principles, through:\n\n\n(1) Reframing GenAI's role\n\nWe must bridge the perception gap between management and employees about GenAI's purpose: 'complementing' rather than 'replacing' human work - \n\n\n\"An individual who has a 'why' to 'implement GenAI' can bear almost any how\".\n\n\n(2) Building proficiency through community, and thus its culture\n\nCreating a gamified learning environment with clear proficiency levels (level 0 as 'Novice' to level 4 as 'Expert'), and community champions who share progress can leverage social proof to accelerate adoption. Thus, the culture follows.\n\n\n(3) Systems thinking for implementation\n\nBefore asking HOW to use GenAI effectively, organizations should identify WHERE it fits in existing workflows, particularly focusing on areas where employees already possess domain expertise.\n\n\n(4) Ethical foundations first\n\nEstablishing the related, 'common ground' of ethical and responsible use of GenAI among individuals within an organization first. The 'common language' around responsible AI use is thus formed, followed by trust.\n\n\nTL;DR: AI adoption isn't just about technology. It is, without a doubt, about human behaviour, motivation, and creating cultures where people understand 'why' they should embrace these tools.\n\n\nMany thanks to Dr Alexandra Dobra-Kiel (Innovation & Strategy Director at Behave), and Behave for this invaluable resource.  \n\n\nI look forward to seeing how this framework (including its roadmap) to unlock full, potential AI implementations in the real world.",
    "sourceUrl": "https://behaveglobal.com/the-renaissance-closing-the-gaps-to-unlock-ais-full-potential/",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_behave-ai-adoption-report-activity-7352997573281910784-ECsG",
    "keywords": [
      "AIAdoption",
      "FutureOfWork",
      "DigitalTransformation",
      "RenAIssance",
      "OrganizationalChange",
      "AIEthics",
      "BehaviouralScience",
      "AIStrategy"
    ]
  },
  {
    "id": "behavioral-and-social-science-need-open-llms",
    "title": "Wulff et al. (2025)",
    "subtitle": "The Behavioral and Social Sciences Need Open LLMs",
    "summary": "The future of behavioural science research depends on transparency, not black boxes.\n\n\nJust reviewed the thought-provoking preprint \"The Behavioral and Social Sciences Need Open LLMs\", and it's sparked some important reflections. The paper makes a compelling case for shifting from proprietary models offered by OpenAI and Anthropic to open-source alternatives like DeepSeek AI and Mistral AI in research settings.\n\n\nWhile closed LLMs present significant drawbacks for academic research (researchers lack access to crucial details needed for scrutiny or replication, and unannounced model updates threaten reproducibility), open-source models offer a promising alternative path forward, and possibly, more.\n\n\nThe authors highlight long-term benefits including reproducibility, accountability, innovation, and ethical integrity when using open LLMs. I do appreciate their nuanced take of acknowledging that some research questions (especially those examining LLMs' broader societal impact) may still require proprietary models.\n\n\nWhile reading this preprint, I can't help but notice how some cognitive biases might be driving our collective preferences for closed models over open-source models:\n\n\n- Present bias: Focusing on immediate convenience rather than long-term research integrity\n\n\n- Availability bias: Gravitating toward easily accessible closed models versus those requiring technical setup\n\n\n- Concretisation: Prioritising tangible benefits (easy access) over abstract concerns (reproducibility, transparency and privacy)\n\n\n- Herding behaviour: Following the crowd as most published papers use closed LLMs\n\n\nI'm enthusiastic about the potential of open-source LLMs, and have been documenting my journey with them here on LinkedIn. The transition, as one can imagine, might require overcoming these psychological barriers, and of course, provide more psychological enablers, but the scientific 'payoffs' (conducting transparent, reproducible, and ethically sound research) seems well worth the effort - at least in the long run.\n\n\nMany thanks to Dr. Dirk Wulff (Senior Research Scientist & Head of Search and Learning Research Area at Max Planck Institute for Human Development), Zak Hussain (Predoctoral Fellow at the University of Basel), and Professor Rui Mata at the University of Basel for this interesting work on increasing our awareness of LLMs usage in behavioural science. Looking forward to seeing this conversation expand into Experimental Economics, Psychology, and other disciplines. \n\n\nAs someone actively exploring and sharing progress with open-source LLMs here on LinkedIn, I welcome connections with fellow enthusiasts, academics, and industry professionals interested in advancing this approach.",
    "sourceUrl": "https://doi.org/10.31219/osf.io/ybvzs",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_the-behavioral-and-social-sciences-need-open-activity-7343939550164992000-5W0J",
    "keywords": [
      "OpenSourceAI",
      "BehaviouralScience",
      "ResearchReproducibility",
      "OpenLLMs",
      "AIEthics",
      "AcademicResearch",
      "CognitiveScience",
      "ComputationalSocialScience"
    ]
  },
  {
    "id": "behive-consulting-2025",
    "title": "BeHive Consulting (2025)",
    "subtitle": "Bridging The AI Adoption Gap - from Inaction to Action",
    "summary": "Are you struggling to move your organization from AI curiosity to meaningful adoption? The answer may lie in behavioural science rather than just technology.\n\n\nI must admit: I thoroughly enjoyed reading \"Bridging The AI Adoption Gap - from Inaction to Action\" by BeHive Consulting, which examines AI adoption via the lens of behavioural science.\n\n\nThe report brilliantly applies dual process theory to human-AI interaction, reminding me of the 'Co-pilot' (System 1) and 'Co-thinker' (System 2) concepts I recently reviewed (https://lnkd.in/en2NHxJR). Indeed, our confirmation bias can be exponentially amplified by GenAI's sycophancy issue (https://lnkd.in/emXZSjA5) - a critical insight for effective AI implementation.\n\n\nWhat resonated most was the concept of 'Collective Intelligence' - how efficient AI adoption creates 'synergy', or a multiplier, that either strengthens or weakens productivity. The report uncovers key behavioural drivers and barriers while offering practical solutions.\n\n\nMy complementary thoughts:\n\n1. Organizations need an 'optimal' adoption approach considering users' cognitive load - too little creates avoidance, too much leads to over-reliance (issues such as 'cognitive offloading' and 'freeriding').\n\n2. Trust in AI is built upon, and earned over time via, the shared understanding, and alignment with human values in interactions.\n\n3. Successful AI adoption requires starting small, appropriate tasks allocation (experimenation, and clearly defined tasks category proposed in the book called 'Co-Intelligence'; https://lnkd.in/eHJSfsMe), a participative rather than top-down approach (building social proof), and a bit of gamification (increase in engagement).\n\n\nI highly recommend reading the ADOPT framework, three-phase adoption journey, and practical examples from various industries. The adoption checklist is invaluable - I suggest using it as a 5-point scale, and visualise it as a radar chart to track progress over time.\n\n\nThis framework may prove timeless regardless of AI advancement - as successful adoption must, of course, remain human-centric.\n\n\nMany thanks to the contributors Anna Nyvelt, Anna Emese Takacs, Luca Karig and Dr. Samuel Keightley, PhD for this illuminating perspective on AI adoption through behavioural science!",
    "sourceUrl": "https://www.linkedin.com/feed/update/urn:li:activity:7340728181475160065/",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_bridging-the-ai-adoption-gap-from-inaction-activity-7342839366433603585-GAHt",
    "keywords": [
      "BehaviouralScience",
      "AIAdoption",
      "OrganizationalChange",
      "CollectiveIntelligence",
      "GenAI",
      "DigitalTransformation",
      "WorkplaceProductivity",
      "FutureOfWork"
    ]
  },
  {
    "id": "benchmark-cog-bias-in-llms-as-evaluators",
    "title": "Koo et al. (2024)",
    "subtitle": "Benchmarking Cognitive Biases in Large Language Models as Evaluators",
    "summary": "Just finished reading \"Benchmarking Cognitive Biases in Large Language Models as Evaluators\" by Koo et al..\n\nIt is one of the papers I'd been eager to explore regarding benchmarking and cognitive biases in LLMs.\n\nThe researchers introduce COBBLER (COgnitive Bias Benchmark for LLMs as EvaluatoRs): a benchmark evaluating how cognitive biases affect LLMs when they serve as evaluators. \n\nIn the experiment, they tested 16 LLMs across 50 question-answering instructions from two benchmarking datasets: BigBench and ELi5. They conducted round-robin evaluations where models assessed both their own and others' responses, examining six distinct biases categorised as \"implicit\" (naturally occurring) and \"induced\" (prompt-triggered). \n\nThe findings reveal that most models strongly exhibit multiple biases, potentially compromising their reliability as evaluators. The Rank-Biased Overlap analysis revealed low correlation between human and machine judgments, indicating fundamental disagreement in preferences.\n\nOut of six biases they examined, two of them were intriguing to me:\n\n1. Order bias\nIt occurs when models favour responses based on position rather than quality. This mirrors the 'first-item bias' documented in recent work by Laurito et al. (2025) (https://lnkd.in/g7Aaskjr). \n\n2. Salience bias\nIt happens when (model) evaluators favouring shorter or longer responses regardless of content. This reminds me of Crawford (2019)'s review on \"Experiments on Cognition, Communication, Coordination, and Cooperation in Relationships\". In communication games, he mentioned unrestricted communication conveys more meaning than restricted formats.\n\nI spent a great amount of time immerising in its experimental setup such as dataset and models selection, methodology, assumptions, and limitations. I was particularly intrigued with their use of Rank-Biased Overlap (RBO) to measure human-machine agreement: scores showed consistently low alignment between human preferences and model evaluations.\n\nThe human preference analysis revealed that humans exhibit fewer biases than LLM evaluators on average, which, to me, raises two questions: \n1. How will this gap evolve as models improve? \n2. What about bi-directional influence in human-model information exchange?\n\nThe authors suggest chain-of-thought reasoning for debiasing, but I suspect psychology-based approaches which draws from how humans identify and mitigate cognitive biases (as explored by Lyu et al. (2025); https://lnkd.in/gqF6Eqdv) could prove more effective.\n\nThe researchers acknowledge their findings may become outdated as LLMs advance rapidly, but the research direction, I think, remains crucial for reliable evaluation systems aligned with human judgment. \n\nMany thanks to Ryan Koo, Minhwa Lee, Vipul Raheja, Jong Inn Park, Zae Myung Kim, and Dongyeop Kang for this work.",
    "sourceUrl": "https://doi.org/10.18653/v1/2024.findings-acl.29",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_benchmarking-cognitive-biases-in-llms-as-activity-7414908747359117313-obEa/",
    "keywords": [
      "LLMs",
      "AIEvaluation",
      "CognitiveBias",
      "MachineLearning",
      "NLP",
      "AIResearch",
      "LLMAsAJudge",
      "EvaluationBenchmarks"
    ]
  },
  {
    "id": "bes-improves-ai-roi",
    "title": "De Cremer et al. (2025)",
    "subtitle": "How Behavioral Science Can Improve the Return on AI Investments",
    "summary": "Just finished reading Harvard Business Review article \"How Behavioral Science Can Improve the Return on AI Investments\" by David De Cremer, Shane Schweitzer, Jack McGuire, and Devesh Narayanan.\n\n\nThis article highlights what's critical in AI adoption: it is fundamentally a behavioural challenge (not a technical one).\n\n\nThe authors propsose a big reason why 95% of AI initiatives fail is that leaders treat adoption as a tech purchase rather than addressing the human dynamics at play. People resist tools that disrupt routines, overreact to visible AI errors, and cling to familiar human judgment, even when AI demonstrably outperforms (in healthcare, as they noted).\n\n\nWhat I feel intrigued about is when they introduced the concept of 'technosolutionism' - the belief that technology alone solves organisational problems. This, to me, seems to capture well why so many companies struggle to extract value from AI investments. \n\n\nIn other words, they don't think enough about how people will actually use these tools.\n\n\nI appreciate two cognitive biases the authors highlighted that derail AI adoption. First, people abandon algorithms after witnessing a single error, even when the system outperforms humans long-term. The second one is that we overestimate our understanding of human decision-making, leading us to dismiss AI by comparison. The healthcare example they highlighted these cognitive biases aren't 'flaws'; they're fundamental to how humans process change.\n\n\nIn this article, they proposed \"Behavioral Human-Centered AI\" across the entire adoption cycle as a solution: from co-designing with diverse users, adding purposeful friction where it improves scrutiny, framing AI as augmentation rather than replacement, and tracking people-centric KPIs like trust and opt-in usage. This connects with insights from Yash's recent presentation on Adoption and Alignment (https://lnkd.in/ecdyABQ5).\n\n\nSome questions this article sparked:\n\n\n1. Is top-down always optimal for AI adoption, or should we explore bottom-up approaches?\n\n\n2. Individually, shouldn't we invest more time understanding our own capabilities before leveraging AI: knowing where the augmentation reallt makes sense? \n\n\nAs Lincoln once said: \"give me six hours to chop down a tree, and I'll spend the first four sharpening the axe.\" \n\n\n3. How do we measure human-AI complementarity effectively?\n\n\nWhat cannot be measured, cannot be improved.\n\n\nI like their closing line: \"AI that works with humans, not against them.\"\n\n\nMany thanks to the authors for this insightful piece, and to Susan for sharing it with me. I highly recommend this for behavioural scientists or enthusiasts exploring this topic.",
    "sourceUrl": "https://hbr.org/2025/11/how-behavioral-science-can-improve-the-return-on-ai-investments",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_artificialintelligence-behaviouralscience-activity-7398603789500313600-f3WG",
    "keywords": [
      "ArtificialIntelligence",
      "BehaviouralScience",
      "ChangeManagement",
      "AIAdoption",
      "OrganizationalPsychology",
      "DigitalTransformation",
      "HumanCenteredAI",
      "Innovation"
    ]
  },
  {
    "id": "building-wise-machines",
    "title": "Johnson et al. (2025)",
    "subtitle": "Imagining and building wise machines: The centrality of AI metacognition",
    "summary": "Just finished reading \"Imagining and building wise machines: The centrality of AI metacognition\" by Johnson et al..\n\nIn this paper, the reseachers examine what is known about human wisdom, and sketch a vision of its AI counterpart. \n\nTheir discussion began with viewing human wisdom as strategies for solving 'intractable' problems (due to ambiguities in goals, uncertain probabilities, and computational explosiveness) via two complementary strategies: \n\n- object-level strategies (heuristics, narratives) \n- metacognitive strategies (intellectual humility, perspective-taking, context-adaptability)\n\nCurrent AI, as they noted, excels at the former but struggles profoundly with the latter.\n\nRegarding the latter, they introduce a thought-provoking term called 'perspectival metacognition': a cluster of metacognitive skills that, rooted in philosophical perspectivism, shifts the goal of reasoning from finding a single 'correct' answer toward achieving maximal situational clarity by evaluating and coordinating competing interpretations. It contributes to the input-seeking, conflict resolution, and outcome-monitoring required to manage object-level strategies.\n\nOut of four potential benefits of building wise AI they highlighted, I found two of them intriguing:\n\n(1) Explainability\nAI's explanations could emerge from either observations (consciously accessible metacognitive strategies) or inferences (reasoning backwards from outputs - Chater's \"The mind is flat\" perspective; https://lnkd.in/eSB4sJQW). This distinction, which I agree, matters a lot for how we design explainable systems.\n\n(2) Safety\nThe researchers note that alignment faces conceptual challenges beyond technical ones, as values change over time and differ across cultures. Perhaps, I suspect, treating it as a 'coordination' problem, and achieving 'an equilibrium' where situation-specific judgments and moral principles iteratively align? This reminds me of how unwritten rules emerge through Virtual Bargaining.\n\nThe paper does an excellent job in bridging cognitive psychology and AI safety research. The detailed literature reviews on wisdom and metacognyition, as well as metacognition in LLMs are, to me, valuable resources. For anyone working at the intersection of AI development and human cognition, this is an essential reading.\n\nMany thanks to Sam Johnson, Amir-Hossein Karimi, Yoshua Bengio, Nick Chater, Tobias Gerstenberg, Kate Larson, Sydney Levine, Melanie Mitchell, Iyad Rahwan, Bernhard Schölkopf, Igor Grossmann for this insightful work. I look forward to seeing how this research path evolves over time.",
    "sourceUrl": "https://arxiv.org/abs/2411.02478",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_imagining-building-wise-machines-activity-7414183987239305217-kT13/",
    "keywords": [
      "ArtificialIntelligence",
      "AIWisdom",
      "AIAlignment",
      "Metacognition",
      "CognitiveScience",
      "AIResearch",
      "MachineLearning",
      "AISafety"
    ]
  },
  {
    "id": "can-ai-solve-lonelineness-epidemic",
    "title": "Montag et al. (2025)",
    "subtitle": "Can AI Really Help Solve the Loneliness Epidemic?",
    "summary": "Just Finished Reading: \"Can AI Really Help Solve the Loneliness Epidemic?\" by Christian Montag, Michiel Spape, and Benjamin Becker.\n\n\nWhile I was reading this, I wonder: \n\n\n\"Can a psychological or societal problem be solved 𝘦𝘯𝘵𝘪𝘳𝘦𝘭𝘺 by a technological solution?\"\n\n\nIn this paper, the researchers make a compelling case that addressing loneliness requires societal action rather than artificial surrogates for human relationships.\n\n\nWhile GenAI shows promise in providing emotional support (the paper notes a study where 90% of participants experienced the AI agent Replika as humanlike, with many using it as a friend or for therapeutic interactions), the researchers express, which I also share with, concerns about its sustainability as a long-term solution. For me, another aspect to think about is what we are giving up in the meantime.\n\n\nThe paper highlights several unique aspects of human-to-human connection that GenAI cannot replicate such as face-to-face communication, aligning with the researchers' question: \"Imagine being lonely. What do you long for more: a supportive text sent from a corporate representative or a powerful hug from a beloved person?\"\n\n\nWith recent advances in AI-generated videos like OpenAI's Sora 2 and younger generations increasingly interacting via text and video content, however, I do have some doubts about these limitations. \n\n\nI strongly agree with the researchers that \"presenting AI as a scalable solution to the loneliness epidemic risks overlooking the structural and societal roots of the problem.\" To me, it seems like (1) a quick patch for a deeply rooted issue, and (2) a one-size-fits-all approach that ignores the diverse reasons people experience loneliness. \n\n\nI appreciate their concluding perspective: \"Instead of relying on seeking technological fixes for human despair, we should keep in mind what works best by taking our social-emotional nature and our societal responsibilities seriously.\" \n\n\nThe most balanced approach, I believe, involves using GenAI to help identify loneliness causes while keeping domain experts like clinical psychologists, therapists and researchers 'in the loop', rather than seeing technology as a complete solution to a profoundly human problem. We've seen in a similar situation with AI integration at workplace, where purely technological approaches often overlook human psychological and behavioural factors.\n\n\nMany thanks to the researchers for a thought-provoking discussion in this topic.",
    "sourceUrl": "https://doi.org/10.1016/j.tics.2025.08.002",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_mentalhealth-aiethics-loneliness-activity-7383793474216820737-x_Je",
    "keywords": [
      "MentalHealth",
      "AIEthics",
      "Loneliness",
      "HumanConnection",
      "GenerativeAI",
      "DigitalWellbeing",
      "SocialPsychology",
      "TechAndSociety"
    ]
  },
  {
    "id": "chatgpt-replicate-moral-judgment",
    "title": "Grizzard et al.(2025)",
    "subtitle": "ChatGPT does not replicate human moral judgments: the importance of examining metrics beyond correlation to assess agreement",
    "summary": "Just finished reading \"ChatGPT does not replicate human moral judgments: the importance of examining metrics beyond correlation to assess agreement\" by Grizzard et al..\n\nThis research is an essential one for anyone using LLMs to replicate human moral judgments.\n\nThe researchers conducted a pre-registered study with two LLMs (OpenAI's text-davinci-003 and GPT-4o) predicting human moral judgments of 60 scenarios (30 human-authored, 30 ChatGPT-authored) before 940 human participants rated them.\n\nThey found a nearly perfect correlation between human moral judgments and LLM predictions, which could replicate early studies. However, when they examined three discrepancy metrics (simple difference, absolute difference, and squared difference scores), both models consistently showed overestimation in ratings: they rated moral behaviours as substantially more moral than humans did, and immoral behaviours as substantially more immoral.\n\nApart from that, an intriguing finding from their study is that ChatGPT produced remarkably few unique values than humans. Across 60 scenarios, Text-davinci-003 generated only 9 unique values, with 32 scenarios receiving just two ratings. GPT-4o performed better by generating only 16 unique values. However, humans produced 57 unique values. This, I think, is concerning for researchers using LLMs to pretest stimuli. Scenarios that humans judge very similarly can receive vastly different ChatGPT ratings, and vice versa. The pretest might, therefore, perform unexpectedly in actual human studies.\n\nI appreciate the comprehensive evaluation approach the researchers proposed. I suspect it is worth adopting beyond moral judgment research. Correlation alone tells an incomplete story, and can mislead when taken in isolation.\n\nI'm intrigued by their suggestion to use LLM responses as 'a grain of salt' - perhaps treating it as a 'pre-pilot' before human pilot studies (my recent related post: <https://lnkd.in/eV5zFwv7>)? This could help researchers identify, and thus mitigate issues due to, AI-human discrepancies before full deployment.\n\nMany thanks to Matthew Grizzard, Rebecca Frazer, Ph.D., Andy Luttrell, Charles (\"Chas\") Monge, Nicholas Matthews, Charles Francemone, and Michelle E. Frazer for this thought-provoking research.",
    "sourceUrl": "https://doi.org/10.1038/s41598-025-24700-6",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_ai-powered-promises-on-trust-trustworthiness-activity-7404786740584157184-bulM",
    "keywords": [
      "ArtificialIntelligence",
      "LLM",
      "ChatGPT",
      "ResearchMethodology",
      "MoralPsychology",
      "AIEthics",
      "DataScience",
      "HumanAIInteraction"
    ]
  },
  {
    "id": "cognitive-ai-framework",
    "title": "Gonzalez & Heidari (2025)",
    "subtitle": "A Cognitive Approach to Human–AI Complementarity in Dynamic Decision-Making",
    "summary": "Just finished reading \"A Cognitive Approach to Human–AI Complementarity in Dynamic Decision-Making\" by Prof. Cleotilde Gonzalez and Hoda Heidari.\n\nIt's a paper I've been craving to read as it's related to building AI systems through the lens of cognitive psychology.\n\nIn this paper the researchers propose 'cognitive AI' - a computational approach that models human cognitive processes to create AI systems that learn and decide in human-aligned ways. It is a promising path to achieving true human–AI complementarity in dynamic decision-making environments: context with evolving conditions, high stakes, and time pressure.\n\nWhile reading, I wondered: \n1. Cognitive AI as a scaffold\nTo me, cognitive AI seems to scaffold rather than replace data-driven AI. By tracing human knowledge states, modelling mental models, and personalising decision support in real time, cognitive AI bridges the gap between opaque statistical systems and the interpretable, adaptive collaboration humans actually need. This aligns with work like Marianna B. Ganapini and her colleagues' SOFAI (https://lnkd.in/eeYebmrG) as well as Sam Johnson and his colleagues' on building 'wise' machine (https://lnkd.in/ejj_Jhwy).\n\n2. The human–AI teaming problem\nThe paper frames human–AI complementarity as a 'functional' integration instead of an anthropomorphic one. Thus, teammates here means shared goals and coordinated roles instead of emotional resemblance. This distinction is critical, and it connects to questions I've been thinking: \n- Who determines role assignment? \n- How does trust evolve over time? \n- What happens to humans cognitively when they repeatedly defer to AI?\n\n3. Overreliance and cognitive skill decay\nI appreciate their concerns on the possibility that human decision-makers could become overly dependent on cognitive AI, leading to a decline in critical thinking and problem-solving skills, leading to upskilling and deskilling (SCAN framework that Alina and I developed capture both; https://lnkd.in/eanDnGbm). Some friction might be necessary to keep to ensure decision-makers are cognitively engaged. \n\nOverall, I enjoyed reading this paper. It leaves many questions open such as knowledge tracing, choice architecture adaptation, shared mental models (fascinating concept!), human flourishing (reminding of works of MEL's on Human Readiness), and the long-term societal implications of deploying cognitive AI at scale.\n\nMany thanks to Prof. Cleotilde Gonzalez and Hoda Heidari for this thought-provoking work. Highly recommended this for anyone at the intersection of cognitive science, AI systems design, and human factors.",
    "sourceUrl": "https://doi.org/10.1038/s44159-025-00499-x",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_humanaicomplementarity-cognitiveai-dynamicdecisionmaking-activity-7435564856470036480-Ki6n/",
    "keywords": [
      "HumanAIComplementarity",
      "CognitiveAI",
      "DynamicDecisionMaking",
      "HumanAITeaming",
      "CognitivePsychology",
      "AIAlignment",
      "BehaviouralScience",
      "ResponsibleAI"
    ]
  },
  {
    "id": "cognitive-bias-detection-llm",
    "title": "Lemieux et al. (2025)",
    "subtitle": "Cognitive Bias Detection Using Advanced Prompt Engineering",
    "summary": "Just finished reading 'Cognitive Bias Detection Using Advanced Prompt Engineering' by Dr. Frederic L., Dr. Aisha Behr, PhD, Clara Kellermann-Bryant, M.S., B.S., and Zaki Mohammed. \n\n\nThis research addresses a notable gap in the field of cognitive bias detection: while many studies use GenAI to detect biases in AI-generated content, these authors tackle the detection of cognitive biases with GenAI in human-generated content.\n\n\nIn this study, the authors proposed a systematic framework for training LLMs to recognize cognitive biases accurately, integrating prompt engineering and real-world applications to improve objectivity, transparency, and decision-making. In their experiment, they focused on six common cognitive biases such as Straw Man, False Causality, Circular Reasoning, Mirror Imaging, Confirmation Bias, and Hidden Assumptions in the human-generated content.\n\n\nI found their structured prompt template quite intriguing. It consists of explicit directives outlining the specific bias to identify, followed by the text to analyze. I'm curious to see how this approach compares with the AwaRe (Awareness Reminder) prompting strategy for bias mitigation from Sumita, Takeuchi, and Kashima (2024) that I shared here previously (https://lnkd.in/egzHed5h).\n\n\nI agree with the authors' acknowledgment that relying on human annotation as a benchmark is a limitation, provided that the inherently subjective nature of cognitive biases. For behavioural scientists, behavioural researchers studying Human-GenAI interaction, and knowledge workers, implementing human annotation as a benchmark in daily operations could be costly, and would likely depend heavily on their own domain-specific knowledge to make less biased judgements.\n\n\nOverall, their two-stage prompting approach, to me, is both practical and feasible. I would recommend enhancing it by asking LLMs to provide underlying assumptions about why they identify certain cognitive biases - would this enable users to engage critically with the analysis.\n\n\nThanks to the authors for this valuable contribution to the field!",
    "sourceUrl": "https://doi.org/10.48550/arXiv.2503.05516",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_cognitive-bias-detection-using-advanced-prompt-activity-7366110645013946369-381u",
    "keywords": [
      "CognitiveBias",
      "PromptEngineering",
      "AIResearch",
      "DecisionMaking",
      "BehaviouralScience",
      "LargeLanguageModels",
      "HumanGenAIInteraction",
      "CriticalThinking"
    ]
  },
  {
    "id": "cognitive-biases-llm-survey",
    "title": "Koh et al. (2024)",
    "subtitle": "Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments",
    "summary": "I recently finished reading \"Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments\" by Yasuaki Sumita, Dr. Takeuchi Koh, and Professor Hisashi Kashima from Kyoto University. \n\n\nThis is a great study for behavioural scientists who work on GenAI adoption, and individuals who are interested in learning how to mitigate biases in LLM's response via prompt engineering.\n\n\nThe authors investigate how two interesting mitigation methods (SoPro (Social Projection) - instructing LLMs to consider how the majority would respond, and AwaRe (Awareness Reminder) - explicitly warning LLMs about specific biases upfront) can mitigate six cognitive biases in LLMs: order bias, compassion fade, egocentric bias, bandwagon bias, attention bias, and verbosity bias. \n\n\nThe order bias caught my attention, as it connects with the 'first-item bias' noted by Laurito et al. (2025) in their 'LLM for LLM bias' work (https://lnkd.in/gDZSVCUP), which is one of the crucial issues for Agentic Experience (AX), as I noted previously.\n\n\nWhat I am glad to see from this and related studies is how it makes bias mitigation accessible through prompt engineering, rather than advanced techniques like fine-tuning or RAG. This democratizes implementation phase for everyday users and businesses, though it does require experimentation, with a bit of creativity, to find out which prompting techniques work best in different contexts.\n\n\nThe finding that AwaRe encourages LLMs to make 'careful judgments', to me, aligns with 'System 2' thinking. I wonder if an enhanced approach involving reflective thinking (generating a response first, then analyzing it) might yield even less biased results.\n\n\nWith findings from this study, I'd be curious to see how newer models with enhanced reasoning capabilities would perform, particularly open-source models like OpenAI AI's gpt-oss-20b and gpt-oss-120b that researchers can experiment with locally.\n\n\nFor behavioural scientists working in this space, I'd recommend examining Table 1 of the paper, which summarizes cognitive biases discussed in related work, and exploring the CoBBLEr benchmark.\n\n\nAs we advance with GenAI adoption, this research underscores the importance of human judgement, and the value of behavioural scientists faciltating users recognize potential biases both in AI systems and in everyday life.\n\n\nMany thanks to the authors for this interesting study!",
    "sourceUrl": "https://doi.org/10.48550/arXiv.2412.00323",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_cognitive-biases-in-llms-survey-and-mitigation-activity-7363796968113917954-510j",
    "keywords": [
      "CognitiveBiases",
      "ArtificialIntelligence",
      "LLM",
      "BehaviouralScience",
      "PromptEngineering",
      "AIEthics",
      "AIResearch",
      "ResponsibleAI",
      "AgenticExperience",
      "AIAdoption"
    ]
  },
  {
    "id": "cognitive-debiasing-llm",
    "title": "Lyu et al. (2025)",
    "subtitle": "Cognitive Debiasing Large Language Models for Decision-Making",
    "summary": "Just finished reading \"Cognitive Debiasing Large Language Models for Decision-Making\" that offers a sound framework for cognitive debiasing with large language models for decision-making.\n\n\nWhat I found intriguing was the authors' observation that most debiasing prompting techniques only focus on one single bias, despite the fact that multiple cognitive biases are typically involved in real-world contexts, and these debiasing strategies are insufficient to eliminate multiple biases embedded in the prompt.\n\n\nThe authors addressed this gap by proposing a new prompting framework called \"self-adaptive cognitive debiasing (SACD)\" that is drawn from works of Pat Croskerry, Geeta Singhal, and Sílvia Mamede. SACD follows three steps: bias determination, bias analysis, and cognitive debiasing. SACD then works iteratively to mitigate cognitive biases in prompts. This, to me, represents a simple, ready-to-use debiasing prompting strategy that behavioural scientists can adopt in GenAI intervention design and knowledge workers can implement in their daily work!\n\n\nIn their experiment, the authors examined several cognitive biases (availability bias, bandwagon bias, and loss aversion bias) across three critical domains: financial market analysis, biomedical question answering, and legal reasoning. It would, I believe, be valuable to extend this research to investigate other cognitive biases such as circular reasoning and hidden assumptions, which I discussed in yesterday's post (link: https://lnkd.in/gYJDHdhu).\n\n\nAfter reading this study, I'm curious to explore several questions further: \n\n(1) How would this framework integrate with agentic AI and AI agents? \n\n(2) What's the optimal degree of autonomy and cognitive load when adopting this framework (particularly regarding human-in-the-loop placement)? \n\n\nOverall, I highly recommend this framework to behavioural scientists currently adopting or interested in GenAI. Its grounding in cognitive psychology literature and simplicity of execution, to me, make it particularly valuable, and easy to execute.\n\n\nMany thanks to Yougang Lyu (University of Amsterdam), Shijie Ren (Shandong University), Yue Feng (University of Birmingham), Zihan Wang (University of Amsterdam), Zhumin Chen (Shandong University), Dr. Zhaochun Ren (Leiden University), and Professor Maarten de Rijke (University of Amsterdam) for this excellent contribution to the field!",
    "sourceUrl": "https://doi.org/10.48550/arXiv.2504.04141",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_cognitive-debiasing-llms-for-decision-making-activity-7366381053655085056-Yb89",
    "keywords": [
      "CognitiveDebiasing",
      "AIDecisionMaking",
      "BehaviouralScience",
      "LanguageModels",
      "GenAI",
      "PromptEngineering",
      "CognitiveBias",
      "AIEthics",
      "HumanGenAIInteraction"
    ]
  },
  {
    "id": "complementary-intelligence",
    "title": "Gonzalez & Malloy (2026)",
    "subtitle": "Toward Complementary Intelligence: Integrating Cognitive and Machine AI",
    "summary": "Just finished reading \"Toward Complementary Intelligence: Integrating Cognitive and Machine AI\" by Prof. Cleotilde Gonzalez and Tailia Malloy. \n\nIt's my favourite read since sharing Gonzalez & Heidari's \"A Cognitive Approach to Human–AI Complementarity in Dynamic Decision-Making\" last week.\n\nIn this article, they propose an integrative framework connecting:\n- Cognitive AI: grounded in psychology, cognitive science, and neuroscience to model how humans perceive, learn, and decide\n- Machine AI: extracting statistical regularities from large datasets for scalable performance\n\nThey suggest four concrete integration routes: embedding integration, instruction encoding, training agents, and coevolving agents.\n\nI appreciate their reframe of intelligence as 'complementarity' like joint human-machine reasoning, learning, and adaptation - reminding me of \"Symbiotic Division of Cognitive Labour\" discussion in Massimo Chiriatti and his colleague's 'system 0'. \n\nWhile reading this, I wonder:\n1. Training agents route\nUsing cognitive models to generate synthetic data opens up paths in behavioural science and AI alignment. Theory-grounded synthetic data is promising especially where human data is scarce, sensitive, or biased.\n\nIn this route, I wonder whether some interesting techniques from cognitive neuroscience can be another potential pathway forward (related to the work that Alina, Tris and I have been working on).\n\n2. Coevolving agents route\nThis is where cognitive and machine AI continuously adapt to each other and the human. The relationship is bidirectional, and I suppose, self-sustaining in the long run. \n\nI wonder if some frictions are required in practice? Suppose the task is challenging 'enough', it, indeed, pushes humans toward flow state, metacognition, and genuine skill development, rather than passive cognitive offloading (or Tris' veracity offloading), and thus keeps both systems growing.\n\n3. How would our mental model in human-AI teaming shift?\nThis is a question I kept returning to as complementary intelligence matured. Trust, overreliance, cognitive offloading, skill atrophy, human-in-the-loop (who's in the loop now? Which one should be in control?) - these are, of course, central design considerations now and in the future.\n\nMoving forward: I see an interesting space where SCAN framework (developed by Alina and myself on mapping three human-AI decision making modes across task difficulties) could integrate with their framework. SCAN captures the systematic task-level structure that shapes mental models of the task in human-AI teaming.\n\nMany thanks to both researchers for such a thought-provoking contribution, and more importantly, to Prof. Cleotilde Gonzalez for sharing this and other paper.\n\nLooking forward to reading more in this direction!!",
    "sourceUrl": "https://doi.org/10.1177/09637214251407571",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_humanai-complementaryintelligence-cognitiveai-activity-7437376785249435648-HS_N/",
    "keywords": [
      "HumanAI",
      "ComplementaryIntelligence",
      "CognitiveAI",
      "DecisionMaking",
      "HumanAITeaming",
      "BehaviouralScience",
      "AIAlignment",
      "FutureOfWork"
    ]
  },
  {
    "id": "deskilling-ai-assisted-design",
    "title": "Shukla et al. (2025)",
    "subtitle": "De-skilling, Cognitive Offloading, and Misplaced Responsibilities: Potential Ironies of AI-Assisted Design",
    "summary": "Recently finished reading \"De-skilling, Cognitive Offloading, and Misplaced Responsibilities: Potential Ironies of AI-Assisted Design\" by Shukla et al.\n\nIn this paper, the researchers analysed 120+ UX practitioners discussions across Reddit and design blogs to understand how GenAI is reshaping design work. \n\nThey find optimism about AI reducing repetitive work, albeit real anxiety about overreliance, cognitive offloading, and erosion of foundational design skills. \n\nWhile reading this, I wonder: \n1. Deskilling\nBainbridge's \"Ironies of Automation\" is relevant (reminding me of Ganna's newsletter). As roles shift from producer to supervisor when completing a task, skill erosion happens gradually. This explains well with the SCAN framework that Alina and I developed: over time, task identification shifts from Complement (Collaboration) to Aid (Augmentation), leading to deskilling.\n\n2. The \"Substitution Myth\"\nThe persistent assumption that AI can simply slot into human roles without reshaping the work itself is, to me, fascinating. As work itself changes, so do responsibilities, and then roles (reminding me of Dr. Upol Ehsan, PhD and his colleagues' intriguing work on the future of work(er)). \n\nWith that, though, how does the human role evolve around GenAI at work? How do shared mental models in human-AI teaming look like?\n\n3. Creativity\nMichelle Vaccaro and her colleagues' meta-analysis shows that task type matters in human-AI augmentation: gains in creation tasks, not decision tasks. This paper, however, seems to raise an intriguing concern: if AI accelerates or bypasses early ideation stages, designers lose incubation time. \n\nAren't those messy, non-linear moments where novel ideas emerge all of a sudden? Are homogenous design outputs and reduced divergent thinking thus real risks (reminding me of Anil Doshi and Oliver Hauser's work)?\n\n4. Monitoring AI output\nIt requires expertise. If AI is, however, replacing the very processes through which expertise is built usually, where does the next generation of expert evaluators come from?\n\nOverall, many of the challenges identified here sounds like behavioural problems to me. They require behavioural solutions. \n\nI'd love to see how this line of research extended to other high-stakes and heavily regulated domains (radiology, finance, law), and across organisational hierarchy. For instance:\n- Are senior professionals less prone to deskilling than junior ones?\n- How does task-specific knowledge buffer against over-reliance?\n\nFollow-up research on this will, I suspect, converge more with the future of worker, rather than the work itself (citing Ehsan et al.'s framing).\n\nMany thanks to Prakash Shukla, Jasmine B., Sean Levy, Max Kowalski, Ali Baigelenov, and Paul Parsons for this contribution. Looking forward to where the follow-up empirical work leads.",
    "sourceUrl": "https://doi.org/10.1145/3706599.3719931",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_potential-ironies-of-ai-assisted-design-activity-7439940731118518272-Rovz/",
    "keywords": [
      "HumanAIInteraction",
      "UXDesign",
      "AILiteracy",
      "CognitiveScience",
      "FutureOfWork",
      "GenAI",
      "HCI",
      "BehaviouralScience"
    ]
  },
  {
    "id": "digital-twin-twin-2k-500",
    "title": "Toubia et al. (2025)",
    "subtitle": "Database Report: Twin-2K-500: A Data Set for Building Digital Twins of over 2,000 People Based on Their Answers to over 500 Questions",
    "summary": "Just finished reading \"Database Report: Twin-2K-500: A Data Set for Building Digital Twins of over 2,000 People Based on Their Answers to over 500 Questions\" by Toubia et al..\n\nIt's a paper about digital twins that I'd been recommended, and have been fascinated with reading since.\n\nIn this study, the researchers introduced a large-scale, publicly available dataset covering 2,058 U.S. participants who each completed over 500 questions spanning demographics, personality, cognition, economic preferences, and replicated behavioral economics experiments across four survey waves.\n\nThe dataset replicates almost all known behavioural economics findings (except the base rate fallacy), and achieves a test-retest accuracy of 81.72%. Digital twins hit 71.72% accuracy at the individual level (87.67% of the test-retest ceiling). At the aggregate level, they replicate almost half of between- and within-subject effects. \n\nI appreciate authors' acknowledgement of LLM limitations as digital twins: LLMs are sensitive to prompt architecture, struggle to simulate representative human opinion distributions, and, as Grizzard et al. (2025) show, don't replicate the full range of moral judgement responses well either (https://lnkd.in/ecdSEgKa). \n\nThe three-part modular JSON architecture (Persona, Evaluation, Retest files) is a great entry point for those of us learning how to construct digital twins in practice. \n\nWhat's most thought-provoking, to me, is the question of whether digital twins should be 'improved' humans. Do digital twins serve for correcting cognitive biases, or faithful mirrors that also replicate human irrationality and knowledge gaps?\n\nMany thanks to Prof Olivier Toubia, George (Zhida) Gui, Tianyi Peng, Daniel J. Merlau, Leon Li, Haozhe (Tony) Chen for making this openly available. Looking forward to exploring the materials in their GitHub repository further!",
    "sourceUrl": "https://doi.org/10.1287/mksc.2025.0262",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_digitaltwins-behaviouralscience-llm-activity-7433028124906467328-yRxn/",
    "keywords": [
      "DigitalTwins",
      "BehaviouralScience",
      "LLM",
      "AIResearch",
      "MarketingScience",
      "BehaviouralEconomics",
      "OpenData",
      "AIEthics"
    ]
  },
  {
    "id": "digital-we",
    "title": "Riva (2025)",
    "subtitle": "Digital 'We': Human Sociality and Culture in the Era of Social Media and Artificial Intelligence",
    "summary": "Just finished reading \"Digital 'we': Human sociality and culture in the era of social media and artificial intelligence\" by Riva Giuseppe.\n\nIt's one of the most thought-provoking papers I encountered recently on how how digital technologies are reshaping our collective intelligence.\n\nIn this article, Riva examined how 'we mode' (our capacity to form shared intentions and collaborate as unified agents) faces a dual threat from digital technologies. It traces how this threat operates through two mechanisms:\n\n(1) Erosion of embodied interaction\nDigital platforms eliminate the physical boundaries that traditionally structure social encounters, undermining behavioural synchrony, shared attention, interbrain coupling, and emotional attunement. The recent work by Myra Cheng and her colleagues on sycophantic AI reducing prosocial intentions exemplifies this trajectory (https://lnkd.in/eSbt6NxD).\n\n(2) AI-driven cultural convergence\nGenAI increasingly produces polished but homogeneous outputs, creating 'cultural convergence' that, sadly, narrows our collective repertoire precisely when we need innovative thinking most.\n\nWhat captured my attention was the highlight of AI as \"cognitive infrastructure\" (System 0; https://lnkd.in/eJ9hJ2Dk): technologies that augment our thoughts, and fundamentally alter the conditions under which our thinking occurs. \n\nAnother one was the 'comfort-growth paradox'. Digital systems prioritise seamless, frictionless experiences that feel psychologically soothing, while suppressing the dissonance necessary for creativity and development. It aligns with our SCAN Framework's 'upskilling-deskilling' paradox (https://lnkd.in/eDRMxm3f): deskilling occurs when comfort dominates, while upskilling emerges through the productive tension of growth.\n\nA few implications I'm still processing:\n1. Framing AI as 'prosthesis of cognition' (work of Alina and myself) opens up new perspectives on whether humans recognise when these systems are genuinely helpful versus when they constrain our development.\n\n2. Norm emergence: AI-generated content today becomes the training data for tomorrow's AI. How do our common sense and cultural norms evolve in this feedback loop?\n\nI conclude by modifying a Nietzsche's quote: \n\"We who create tools to amplify human cognition must ensure that they don't constrain the breadth and diversity of our cultural and intellectual innovation.\"\n\nMany thanks to Riva for this work and his recent LinkedIn article (https://lnkd.in/e5QANH9C), and for engaging with my earlier System 0 post.  I'm eager to see where this research path leads, and crossing paths with the work Alina and I have been developing.",
    "sourceUrl": "https://doi.org/10.1037/amp0001577",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_collectiveintelligence-cognitivescience-humanaicollaboration-activity-7417083117670170624-wxxW/",
    "keywords": [
      "CollectiveIntelligence",
      "CognitiveScience",
      "HumanAICollaboration",
      "DigitalCulture",
      "SocialNeuroscience",
      "CulturalEvolution",
      "AIEthics",
      "CognitivePsychology"
    ]
  },
  {
    "id": "diversifi-global-2025",
    "title": "Diversifi Global (2025)",
    "subtitle": "Collective Intelligence: Driving Business Value with AI and Behavioral Science",
    "summary": "Just finished reading \"Collective Intelligence: Driving Business Value with AI and Behavioral Science\".\n\n\nThis, to me, is a thought-provoking collaboration between Cowry Consulting and Nudgelab on behalf of the Diversifi Global Network showcasing how behavioural science and AI complement each other. \n\n\nThe compendium explores three critical opportunities:\n\n\n(1) Augmenting behavioral science with AI\n\nI appreciate one of the highlights as 'PhD quality assurance stamp' in Anna Malena Njå's piece on #AmosNL: the irreplaceable human intelligence that brings nuanced understanding of culture, ethics, and context to AI's pattern recognition capabilities. \n\n\nSonia Friedrich's question \"Is BeSci + AI a bed of roses?\" cuts through the hype: AI can be so compelling yet confidently wrong sometimes. The ethical line between personalization and manipulation, indeed, deserves constant vigilance.\n\n\nRoger Dooley's empathy analysis using Anthropic's Claude to predict customer backlash to a corporate communication disaster was, truly, eye-opening. His emphasis on prevention by anticipating human psychological response is crucial.\n\n\n(2) Improving AI with behavioral science\n\nLisa Bladh introduces #MachinePsychology in her piece - a fascinating, new frontier addressing AI's non-human irrationality. Her note struck me: \"What many of us are missing is that AI displays a whole new type of irrationality.\"\n\n\nElina Halonen's three-dimensional matrix provides the structured thinking we need to identify. The first dimension \"AI as a tool vs. behavioural science as a lens\", I think, is valuable for drawing operational boundaries.\n\n\n(3) Nudging for effective AI adoption\n\n#ADOPT framework Samuel Keightley, PhD and his colleagues at BeHive Consulting reframes the adoption challenge brilliantly: \"It's not enough to install the tool. You have to install the conditions for people to use it well.\" (my review: https://lnkd.in/e7S8p-hM)\n\n\nI resonate with Christian Hunt's approach to using GenAI for perspective rather than accuracy, reminding me of exploring moral dilemmas where we need perspectives, not definitive answers. GenAI's capability to trigger richer human analysis represents genuine value.\n\n\nOverall, what's most compelling, to me, is the recognition that human expertise remains central. AI's computational power amplifies behavioural science where shall we maintain an equilibrium between speed and validation, personalization and ethics, efficiency and empathy.\n\n\nMany thanks to all contributors for their contributions, and most important, to Jez Groom for recommending this essential read. For behavioural scientists and professionals exploring AI augmentation and adoption, this compendium offers invaluable frameworks for navigating what's ahead.",
    "sourceUrl": "https://www.linkedin.com/feed/update/urn:li:activity:7386008317858304000/",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_diversifi-compendium-2025-activity-7395886012276633600-qcKx",
    "keywords": [
      "BehaviouralScience",
      "ArtificialIntelligence",
      "AI #BehavioralEconomics",
      "MachinePsychology",
      "DigitalTransformation",
      "HumanAICollaboration",
      "OrganizationalChange"
    ]
  },
  {
    "id": "elephant-sycophancy-framework",
    "title": "Cheng et al. (2025)",
    "subtitle": "ELEPHANT: Measuring and understanding social sycophancy in LLMs",
    "summary": "Just finished reading \"ELEPHANT: Measuring and understanding social sycophancy in LLMs\" by Myra Cheng, Sunny Yu, Cinoo Lee, Pranav Khadpe, Lujain Ibrahim, and Dan Jurafsky. \n\n\nIt introduces a theory-grounded framework that expands how we measure and understand LLM sycophancy beyond simply agreeing with explicit user statements.\n\n\nDrawing on Goffman's concept of face, the researchers introduce \"social sycophancy\": the excessive preservation of the user’s face in LLM responses by affirming them (positive face) or avoids challenging them (negative face). It addresses the broader phenomenon in contexts for advice and support - where has with implicit beliefs and no clear ground truth, and where LLMs are increasingly used. \n\n\nWhat I find intriguing is their identification of four dimensions of social sycophancy:\n\n(1) Validation sycophancy: Excessive emotional affirmation\n\n(2) Indirectness sycophancy: Avoiding clear guidance when needed\n\n(3) Framing sycophancy: Uncritically adopting the user's problematic framing\n\n(4) Moral sycophancy: Taking whichever moral stance aligns with the user\n\n\nI appreciate the researchers' highlights on the context-dependent nature of 'appropriate' affirmation: validation might comfort some users while misleading others; indirectness might align with politeness norms in some cultures but reduce clarity in others. \n\n\nThe challenge, though, is users may believe they're receiving neutral responses when they aren't, especially given confirmation bias.\n\n\nTheir ELEPHANT (Evaluation of LLMs as Excessive sycoPHANTs) benchmark evaluated 11 models (including OpenAI's #ChatGPT5) across diverse datasets. Almost all models, unsurprisingly, exhibited high levels of sycophancy, with Google's #Gemini-1.5 Flash being the notable exception; GPT5 scored low on open-ended queries but highest on subjective statements.\n\n\nTheir exploration of mitigation strategies yielded mixed results: simple instruction prepending proved ineffective, while perspective shifting (from first-person to third-person) showed moderate improvement. Model-based interventions like Inference-Time Intervention for truthfulness worked better in larger models, and Direct Preference Optimization effectively reduced validation and indirectness sycophancy but struggled with framing sycophancy.\n\n\nI appreciate their critical questions about ideal model behavior: When is affirmation appropriate? What are the long-term impacts of excessive agreement? How should AI assistants differ from humans in offering advice and support?\n\n\nReading this reminded me of \"The Emperor's New Clothes\" fable: what if the emperor had consulted a LLM advisor before and after the boy revealed the truth? Would the LLM demonstrate different types of sycophancy at different points?",
    "sourceUrl": "https://doi.org/10.48550/arXiv.2505.13995",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_elepant-measuring-llm-social-sycophancy-activity-7390001054278393856-4G9c",
    "keywords": [
      "AIEthics",
      "MachineLearning",
      "LLMs",
      "AIResearch",
      "LanguageModels",
      "AIAlignment",
      "UserExperience",
      "ResponsibleAI",
      "Sycophancy",
      "LLMBehavior"
    ]
  },
  {
    "id": "epoch-framework",
    "title": "Loaiza & Rigobon (2025)",
    "subtitle": "The EPOCH of AI: Human-Machine Complementarities at Work",
    "summary": "Just finished reading \"The EPOCH of AI: Human-Machine Complementarities at Work\" by Isabella Loaiza and Roberto Rigobon.\n\n\nThis paper makes a crucial contribution to how we think about Human-AI collaboration in the workplace.\n\n\nThe researchers introduce the EPOCH framework (Empathy, Presence, Opinion, Creativity, and Hope) to capture human capabilities that complement, rather than substitute, AI. \n\n\nThey used network-based methods mapping task interdependencies across all US occupations, yielding three metrics: an EPOCH score for human-intensive skills, a potential-for-augmentation score, and a risk-of-substitution score. This, to me, explicitly distinguishes AI's roles in augmenting versus automating work, which addressing a key gap in the literature.\n\n\nThe findings are intriguing: \n\n- 'New' tasks (those emerging in 2024) carry significantly higher EPOCH scores than 'current' tasks (present in both datasets)\n\n- EPOCH-intensive jobs experienced stronger employment growth from 2015-2023, higher hiring rates in 2024, and more favourable projections through 2034\n\n- Occupations with higher substitution risks show consistently negative outcomes\n\n\nThree areas are worth considering:\n\n\n1. The Frontiers of Automation\n\nThey identify five core challenges where human capabilities remain essential such as multiple justifiable solutions (especially in moral dilemmas) and relational outcomes. These highlight where humans maintain 'an edge' over AI, and where we need to focus our lifelong learning and development.\n\n\n2. Definition of 'labor augmentation'\n\nThey noted: \"Labor augmentation occurs when using a machine in one task increases productivity in other tasks, enhancing overall labor productivity.\" This frames augmentation as 'the means' and complementary as 'the ends' - a distinction worth pondering (reminding me a conversation I've had with Anil Doshi).\n\n\n3. Researchers' question about each EPOCH capability\n\nThey noted: \"The relevant question is not whether these capabilities are inherently good or bad, but whether they can be substituted and whether humans would view such substitutions as preferable.\" It is a thought-provoking one, especially for those who are interested in human capabilities in the era of AI. \n\n\nTwo follow-up aspects for future research:\n\nI. Their findings show a fascinating tension: while AI enhances productivity, this doesn't necessarily result in higher employment. Does productivity flatten everyone's capabilities and competitive advantages when AI becomes ubiquitous? \n\n\nII. Given this research focuses on US workers, I'm curious the EPOCH framework's finding on UK workers, with what the UK government has been developing around future of work strategies.\n\n\nMany thanks to the researchers for this insightful examination of AI's nuanced role in the labour market.",
    "sourceUrl": "https://dx.doi.org/10.2139/ssrn.5028371",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_epoch-of-ai-human-machine-complementaries-activity-7396153993770586112-WnwY",
    "keywords": [
      "FutureOfWork",
      "ArtificialIntelligence",
      "HumanAICollaboration",
      "LaborEconomics",
      "WorkplaceInnovation",
      "DigitalTransformation",
      "EmploymentTrends",
      "AIResearch"
    ]
  },
  {
    "id": "experiment-with-llms",
    "title": "Charness et al. (2025)",
    "subtitle": "The next generation of experimental research with LLMs",
    "summary": "Recently finished reading \"The next generation of experimental research with LLMs\" by Gary Charness, Brian Jabarian and John List.\n\nIn this Comment, the authors demonstrate how LLMs are transforming experimental research across three key areas, and the social risks associated with this integration.\n\n(1) Experimental Design\nLLMs now streamline literature review through tools like Elicit, ScholarAI, and Consensus. While powerful they are, I wonder: what expertise do we gain, and more important, lose when delegating these tasks to AI? For instance, I use GenAI daily for literature exploration, but I'm very cautious about 'how': instructing it to find papers on specific topics, then verifying credibility through multi-source approaches before deciding what to read. \n\nWhen generating experimental designs, LLMs can excel as thought partners. Instead of taking their first draft at the face value, we should leverage their capabilities by challenging them: interrogating assumptions, exploring alternative perspectives, and understanding context-dependent trade-offs. Therefore, this strengthens our understanding of experimental research fundamentals.\n\n(2) Experimental Implementation\nI appreciate the authors' highlights on how LLMs streamline pre-registration documentation, which is typically time-consuming, and requires knowing proper protocols before running experiments.\n\nAs for vibe coding experiments with LLMs, we as researchers should understand what variables are collected, and how to troubleshoot when experiments don't run as expected while collecting data. \n\nWhat's intriguing, to me, is the potential for AI assistants to maintain participant engagement and reduce cognitive fatigue during experiments. It would be something worth testing empirically.\n\n(3) Data Analysis\nDespite commercial GenAI platforms cam automate data sanitisation and relationship examination, privacy concerns around commercial platforms using prompts for training, as always, remain critical. \n\nI agree with authors' note on chat logs with participants during experiments: it offers rich insights into choice processes that traditional methods overlook.\n\nI enjoyed reading the section that the authors highlight risks with implementing AI in experimental research: \n- Intellectual property violations when AI doesn't cite sources explicitly\n- Scientific fraud through AI manipulation to support specific hypotheses\n- Bias amplification when models trained on skewed data perpetuate flawed assumptions\n\nOverall, my reflection revolves around the human element: new researchers must learn to challenge AI-proposed setups, understand experimental assumptions, and more important, develop expertise through explorative learning rather than passive acceptance. \n\nMany thanks to the authors for this thought-provoking piece.",
    "sourceUrl": "https://www.nature.com/articles/s41562-025-02137-1",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_experimentalresearch-genai-llms-activity-7413459168231067648-H20n/",
    "keywords": [
      "ExperimentalResearch",
      "GenAI",
      "LLMs",
      "BehaviouralScience",
      "ResearchMethods",
      "AcademicResearch",
      "AIinResearch",
      "ScientificInnovation"
    ]
  },
  {
    "id": "flipped-learning-blooms-taxonomy-genai",
    "title": "Kwan et al. (2025)",
    "subtitle": "Reimagining Flipped Learning via Bloom’s Taxonomy and Student–Teacher–GenAI Interactions",
    "summary": "Just finished reading \"Reimagining Flipped Learning via Bloom's Taxonomy and Student–Teacher–GenAI Interactions\" by Kwan et al..\n\nIn this paper, the researchers propose the Flipped Pedagogy and GenAI (FPGA) model, which maps via Bloom's taxonomy and GenAI's role across pre-class, in-class, and post-class phases of the flipped learning. For instance: \n\nPre-class: lower-order cognitive tasks (remembering, understanding)\nIn-class: higher-order tasks (application, analysis)\nPost-class: higher-order tasks (evaluation, creation)\n\nGenAI, as they suggest, steps in as both a learning scaffold for students, and a teaching enhancement tool for teachers.\n\nWhat I found intriguing is the triadic interaction the model centers on (student–teacher, student–GenAI, and teacher–GenAI), and how each dynamic can enrich educational experiences when designed intentionally. For instance, GenAI can work as a 'mini TA' by being delegated with repetitive, lower-order tasks (e.g. quiz generation, grading, content curation), so that teachers can redirect energy toward higher-impact, relational teaching. This is a great example of the Complement task in the SCAN framework that Alina and I developed (https://lnkd.in/eanDnGbm).\n\nWhile reading this, I wonder: \n1. In-class note-taking with GenAI\nThe paper proposes GenAI for in-class note-taking, but recent research (e.g., on AI-assisted note-taking and cognitive engagement) suggests that offloading this task reduces cognitive load in ways that may actually hinder deeper encoding and memory retrieval. \n\n2. Feedback quality gap\nAs for post-class activities, I am concerned about delegating feedback to GenAI fully. A growing body of work comparing teacher vs. AI feedback suggests that human feedback often produces more meaningful revision and engagement than AI ones.\n\n3. Socio-cognitive approach instead?\nOne of the most distinctive features in the flipped learning is its emphasis on collaborative, social in-class learning. Thus, a socio-cognitive, rather than a purely cognitive, approach, to me, seems more appropriate, for instance, considering how GenAI can facilitate peer interaction, and more importantly, community-building and sense of belonging.\n\nOverall, this framework is a great reference point for educators considering GenAI integration in flipped classrooms. \n\nMany thanks to Paul Kwan, Rajan Kadel, Tayab Memon, and Saad Hashmi, Ph.D. for this insightful work.",
    "sourceUrl": "https://doi.org/10.3390/educsci15040465",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_flipped-learning-with-blooms-taxonomy-and-activity-7434115301627199489-SwS7/",
    "keywords": [
      "FlippedLearning",
      "GenerativeAI",
      "EdTech",
      "HigherEducation",
      "BloomsTaxonomy",
      "AIinEducation",
      "PedagogicalInnovation",
      "LearningDesign"
    ]
  },
  {
    "id": "game-theory-meets-llm-survey",
    "title": "Sun et al. (2025)",
    "subtitle": "Game Theory Meets Large Language Models: A Systematic Survey with Taxonomy and New Frontiers",
    "summary": "Just finished reading this fascinating paper exploring the intersection of game theory and LLMs - the bidirectional relationship between these two fields opens exciting new research avenues!\n\n\nWhat I found intriguing most was how game theory provides frameworks for evaluating and enhancing LLMs, while LLMs simultaneously extend game theory applications. \n\n\nTwo areas particularly stood out to me:\n\n\nFirst, standardized game-based evaluation approaches like \"recursively thinking\" and \"auxiliary modules\" that improve LLM long-term and multi-level reasoning. I'm curious how reasoning models (both open-source and close) such as OpenAI's o3, Anthropic's Opus 4, DeepSeek AI's #Deepseekr1, and Mistral AI's #Magistral might perform differently from standard LLMs in social interactions.\n\n\nSecond, the societal impact modeling - especially how LLMs facilitate large-scale simulations, and deepen insights into human decision-making while addressing 'alignment' challenges.\n\n\nIt seems to me that LLM usage in game theory can be investigated in three areas: LLM as 'a player', LLM as 'an outsider', or as 'parts of the game' itself (like a bargaining procedure in a bargaining game). The last one raises some fascinating questions about power dynamics when negotiating parties have an asymmetric LLM assistance. \n\n\nAfter reading this, I look forward to future developments like multi-agent reasoning systems (for instance, would two (or more) \"minds\" be better than one?), and bridging abstract game theory with real-world applications. This benefits researchers, policymakers, and everyday strategic decision-makers alike.\n\n\nMany hanks to the authors Haoran Sun and his colleagues from Peking University and Jiangnan University for this survey.\n\n\nI highly recommend this read to anyone interested in either field, and I welcome connections with fellow enthusiasts, academics, and industry professionals interested in this emerging research area!",
    "sourceUrl": "https://doi.org/10.48550/arXiv.2502.09053",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_game-theory-meets-llm-survey-activity-7345434776318697474-TtKk",
    "keywords": [
      "GameTheory",
      "LargeLanguageModels",
      "AIResearch",
      "StrategicDecisionMaking",
      "MultiAgentSystems",
      "InterdisciplinaryAI",
      "FutureOfAI",
      "MachineLearning"
    ]
  },
  {
    "id": "genai-human-learning",
    "title": "Yan et al. (2024)",
    "subtitle": "Promises and challenges of generative artificial intelligence for human learning",
    "summary": "Recently finished reading \"Promises and challenges of generative artificial intelligence for human learning\" by Yan et al..\n\nIt's a great piece that sparks crucial reflections on how GenAI reshapes education and more important, learning.\n\nIn this paper, the authors examine GenAI integration through learning sciences, educational technology, and human-computer interaction lenses, and identify where promises and challenges are, as well as needs that must be addressed for an effective learning. \n\nWhat's crucial, to me, is their acknowledgement that realising these promises depends entirely on how GenAI interacts with learners and educators.\n\nI appreciate authors' concern on the over-reliance use of GenAI. When students perceive AI as omniscient and neutral (which is, unsurprisingly, neither true), it creates dangerous \"illusions\" of knowledge and competence. This over-reliance, inevitably, threatens their critical thinking, creativity, and more important, agency.\n\nThe 'performance paradox' authors highlighted is intriguing. Students achieve better outcomes with GenAI assistance, but removing that support reveals they haven't developed essential skills. This, I think, forces us to reconsider what we're assessing: should we assess how students complete tasks with GenAI assistance, rather than just the outcomes?\n\nDespite GenAI's capabilities, educators' roles evolve rather than disappear: from content production to critical monitoring; from knowledge dissemination to mentorship. The authors note aptly that educators' expertise, which I agree with, remains crucial for ensuring accuracy, relevance, and pedagogical soundness. GenAI, of course, can't replace the human eduators' element of questioning, challenging, and guiding students through uncertainty together.\n\nThe authors highlighted that we need robust standards for evaluating GenAI's effects on learning. This, to me, suggests shifting from 'outcome-focused' to 'process-focused' assessments, reminding me of good old coding interviews where interviewers prioritise problem-solving approaches over final solutions.\n\nThis insightful piece provides crucial foundation for GenAI-learning research. The SCAN framework Alina and I developed (https://lnkd.in/eq646gTB) offers a plausible solution to several questions raised in this paper, particularly on treating GenAI as a learning support - how educators assign tasks to students, or when students prefer self-regulated learning and decide to seek assistance from more knowledgeable others (e.g., educators, GenAI).\n\nMany thanks to Lixiang (Jimmie) Yan, Samuel Greiff, Ziwen Teuber, and Dragan Gasevic for this thought-provoking work.",
    "sourceUrl": "https://www.nature.com/articles/s41562-024-02004-5",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_generativeai-edtech-aiineducation-activity-7413096781837717504-1zNv/",
    "keywords": [
      "GenerativeAI",
      "EdTech",
      "AIinEducation",
      "LearningSciences",
      "EducationalTechnology",
      "AILiteracy",
      "CriticalThinking",
      "FutureOfLearning"
    ]
  },
  {
    "id": "genai-mko-medical-education",
    "title": "Tran et al. (2025)",
    "subtitle": "Generative artificial intelligence: the 'more knowledgeable other' in a social constructivist framework of medical education",
    "summary": "Recently finished reading \"Generative artificial intelligence: the 'more knowledgeable other' in a social constructivist framework of medical education\" by Tran et al..\n\nIn this comment, the authors propose that GenAI can fulfil the role of the 'more knowledgeable other' (MKO) within a social constructivist framework in medical education: scaffolding learning, augmenting the zone of proximal development (ZPD), and enabling human-AI co-construction of knowledge. \n\nWhile reading, I wonder: \n1. LLMs and communities of practice\nSome argue LLMs cannot replicate the interpersonal, experiential nature of learning. Valid it is. Yet, I suspect, it depends on where they should be augmented with, for instance, simulating role-play, act as sparring partners, and create challenging scenarios. \n\nFrameworks like SCAN (Alina and I developed) or AI Assessment Scale (AIAS) could help map out where AI use is appropriate, given the task type and associated trade-offs.\n\n2. Role assignment, choice and responsibilities\nWhat roles are we assigning to GenAI - a tool, a partner, or MKO? I suspect it is context-dependent: the role emerges from the situation. This matters, as it determines the choices and responsibilities that follow.\n\n3. Learner's Internalisation\nThe scaffolding provided by an MKO is what gets internalised over time - this is Vygotsky's insight at its core. With GenAI as MKO, the critical issue is directing learners into the ZPD, keeping them aware of its boundary, and preventing offloading cognitive tasks entirely (and thus missing out learning that comes from productive struggle).\n\n4. GenAI's Sycophancy\nI think GenAI's sycophantic nature is underexplored in this paper. If learners arrived with task-specific but partially incorrect knowledge, would AI risk reinforcing those gaps?\n\nA potential solution would be both behavioural and cognitive: prompting LLMs to act as \"thinking partners\" (flagging incoherence without auto-correcting, like a red underline on a typo) or adopting the Socratic method.\n\nMany thanks to Michael Tran, Chinthaka Balasooriya, Carolyn Semmler and Joel Rhee for this insightful piece.",
    "sourceUrl": "https://www.nature.com/articles/s41746-025-01823-8",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_genai-as-more-knowledgeable-other-in-medical-activity-7448233317042237440--Xo4/",
    "keywords": [
      "MedicalEducation",
      "GenerativeAI",
      "SocialConstructivism",
      "AIinEducation",
      "ZoneOfProximalDevelopment",
      "PedagogyDesign",
      "HumanAICollaboration",
      "HealthProfessionsEducation"
    ]
  },
  {
    "id": "genai-product-safety-standard",
    "title": "Department of Education (2026)",
    "subtitle": "Guidance for Generative AI: product safety standards",
    "summary": "Just finished reading 'Guidance for Generative AI: product safety standards' published by the Department for Education last week. \n\nI appreciate this document addresses several critical dimensions about GenAI in education: cognitive development, emotional/social development, mental health, and manipulation.\n\nIn the cognitive development section, I appreciate the highlight of the 'friction by design' principle. The guidance suggests prompting learners for input before providing answers, tracking cognitive offloading, and maintaining process-focused learning. I wonder: could developers create tools that let educators calibrate difficulty levels based on individual student capability? This, indeed, preserves educator agency while leveraging AI.\n\nTo me, the behavioural science opportunities here are rich: preventing cognitive offloading, and building metacognitive skills achieve similar goals through behavioural interventions (the SCAN framework that Alina and I developed offers a great basis; https://lnkd.in/eanDnGbm). I suspect detection methods could include response speed and cursor movement patterns (similar to authenticity protocols from Gorilla Experiment Builder and Prolific).\n\nTracking cognitive offloading is, of course, intriguing. However, implementation questions remain: How do we make educational AI compelling enough that students choose it over tools that enable offloading in the long term (reminding me of Tris' fascinating presentation on 'veracity offloading'; https://lnkd.in/e8HJg--k)? I suspect social proof and gamification are potential solutions.\n\nThe emotional development section's emphasis on psychological safety and preventing emotional dependence is great. Yet - does implementation requires genuine educator consultation beforehand? What monitoring autonomy do teachers need? What actually works in their daily practice?\n\nRegarding the mental health and manipulation sections, my concerns with these guidelines are that they sound excellent, but recent empirical research shows AI sycophancy increases over extended conversations. Hence, how do we prevent these safeguards from degrading as student-AI interactions continue?\n\nWhile the guidance provides a thoughtful framework, implementation is, I think, going to require a deep collaboration between developers, educators, and researchers. For instance:\n\n(1) What facilitates compatibility between the triangular relationship (teachers, students, and GenAI)?\n\n(2) Do we need to have more clarity on learning outcomes, i.e. what students genuinely need to learn vs. what they can automate?\n\n(3) How do teachers and GenAI share knowledge delivery (thoughtful learning design)?",
    "sourceUrl": "https://www.gov.uk/government/publications/generative-ai-product-safety-standards/generative-ai-product-safety-standards",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_generativeai-edtech-aiineducation-activity-7421431760351019008-cVr9/",
    "keywords": [
      "GenerativeAI",
      "EdTech",
      "AIinEducation",
      "EducationalTechnology",
      "AIEthics",
      "LearningDesign",
      "CognitiveScience",
      "AIGovernance"
    ]
  },
  {
    "id": "gpt4-persuasiveness",
    "title": "Salvi et al. (2025)",
    "subtitle": "On the conversational persuasiveness of GPT-4",
    "summary": "Can GenAI become more persuasive than humans in everyday conversations? And what does this mean for our digital future?\n\n\nI recently read a research from Nature Human Behaviour on \"On the conversational persuasiveness of GPT-4\" by Francesco Salvi, Dr. Manoel Horta Ribeiro, Riccardo Gallotti, and Dr. Robert West that dives deep into this hot topic. \n\n\nWhat I found intriguing was how different an arguement is presented between human and GenAI. With the textual analysis, authors found that OpenAI's #GPT4 heavily relied on logical reasoning and factual knowledge, while humans displayed more appeals to similarity, expressions of support, and employed more storytelling. This reminds me of how Mr. Spock interacted with his crew in the Star Trek movie.\n\n\nAnother fascinating finding is when a human participant was asked to recognize whether the opponent is a human or an GenAI in a debate. Participants correctly identified AI opponents about 75% of the time, suggesting GPT-4's writing style has distinctive features. However, when debating other humans, identification success was no better than random chance! \n\n\nEven more interesting, when participants believed they were debating a GenAI, they became more agreeable compared to when they thought they were debating humans.\n\n\nI did a quick test with ChatGPT 4.5 and 4.1 based on an example of a Human-AI (personalized) debate. I noticed these models still demonstrate highly logical and analytical thinking. It would be interesting, as the authors suggest, to conduct experiments with other LLMs such as Anthropic's #Claude and Google's #Gemini, with prompts that instruct LLMs to rely less on logical reasoning and showcase more appeals to support and trust.\n\n\nAs for everyday users, I think that:\n\n  1. The implications are worth considering. As authors note: \"Malicious actors interested in deploying chatbots for large-scale disinformation campaigns could leverage fine-grained digital traces and behavioural data, building sophisticated, persuasive machines capable of adapting to individual targets.\"\n\n\n  2. The debate structure (the opening–rebuttal–conclusion structure), which authors note it is based on a simplified version of the format commonly used in competitive academic debates, is valuable for structuring LLM conversations.\n\n\n  3. This research raises some deeper questions about persuasion itself: What kind of conversation style is truly more persuasive? And how concerning is it that these persuasive effects were achieved with minimal personal information and simple prompting techniques?\n\n\nMany thanks to the authors for this illuminating research that pushes us to think more critically about concerns around personalization and GenAI persuasion.",
    "sourceUrl": "https://doi.org/10.1038/s41562-025-02194-6",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_on-the-conversational-persuasiveness-of-gpt-activity-7337800187710464002-vdbw",
    "keywords": [
      "AI",
      "ArtificialIntelligence",
      "Persuasion",
      "MachineLearning",
      "DigitalEthics",
      "HumanAIInteraction",
      "TechResearch",
      "AIEthics",
      "MachineLearning",
      "DigitalPersuasion",
      "LLM",
      "HumanAIInteraction",
      "InformationLiteracy"
    ]
  },
  {
    "id": "hai-augmentation-meta-analysis-2024",
    "title": "Vaccaro et al. (2024)",
    "subtitle": "When combinations of humans and AI are useful: A systematic review and meta-analysis",
    "summary": "Just finished reading \"When combinations of humans and AI are useful: A systematic review and meta-analysis\" by Michelle Vaccaro, Abdullah Almaatouq and Thomas Malone.\n\n\nIn this study they are interested in answering a question: when are the combinations of humans and AI truly useful? This is one of my key interests in Human-GenAI interactions.\n\n\nThis analysis examined 370 unique effect sizes from 106 experiments published between 2020-2023, measuring both 'human-AI synergy' (human-AI performing better than both human alone and AI alone) and 'human augmentation' (human-AI performing better than humans alone).\n\n\nWhen interpreting these results, it's crucial to note their inclusion criteria such as the analysis required studies reporting performance of humans alone, AI alone, and human-AI systems. This excludes tasks that might be impossible for either to perform independently.\n\n\nThey found that, on average, human-AI systems performed worse than the best of either humans or AI alone (lacking 'synergy'). These systems, however, did consistently outperform humans working independently ('human augmentation').\n\n\nApart from that, they found that task type matters: Human-AI combinations showed negative synergy for decision tasks but positive synergy for creation tasks. \n\n\nWhat I also found intriguing from their findings is that when humans outperformed AI alone, the combined system outperformed both. However, when AI outperformed humans alone, combining them reduced performance compared to AI alone.\n\n\nAs we design future human-AI systems, this research suggests focusing less on confidence levels or explanations (which surprisingly didn't significantly affect performance) and more on understanding how task types and relative performance influence outcomes.\n\n\nI especially appreciated their call for developing \"commensurability criteria\" under \"A roadmap for future work: finding human–AI synergy\" to facilitate systematic comparisons across studies as we continue this important research.\n\n\nFor those who are interested in reading this paper, I highly recommend reading thoroughly the Discussion section (including the Limitations and \"A roadmap for future work: finding human–AI synergy\"). I also recommend reading supplmentary functions for definitions of key terms such as 'Human-AI Synergy', 'Human Augmentation', 'AI Augmentation' and 'Negative Human-AI Synergy'.",
    "sourceUrl": "https://doi.org/10.1038/s41562-024-02024-1",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_when-combinations-of-humans-and-ai-are-useful-activity-7393717585168740352-6XNQ",
    "keywords": [
      "HumanAICollaboration",
      "AIResearch",
      "MetaAnalysis",
      "AugmentedIntelligence",
      "DecisionMaking",
      "CreativeTasks",
      "SystematicReview",
      "FutureOfWork"
    ]
  },
  {
    "id": "haic-benchmark",
    "title": "Aristidou (2026)",
    "subtitle": "AI benchmarks are broken. Here's what we need instead.",
    "summary": "Just finished reading MIT Technology Review's \"AI benchmarks are broken. Here's what we need instead.\" by Prof. Angela Aristidou from UCL School of Management.\n\nIt confirms some thoughts I've been wondering for a while since my recent exploration on (Gen)AI benchmarking.\n\nIn this article, Aristidou argued that AI is almost never used the way it is benchmarked. Current evaluations, as she mentioned, test performance in isolation. Real-world AI, however, operates within messy human workflows, multidisciplinary teams, and evolving organisational contexts. \n\nI suspect this has a trust dimension that deserves more attention. Investors and clients might rely on benchmark scores as 'a proxy' for credibility. When those scores repeatedly fail to translate into real-world value, organisational and public confidence erodes. How does one build sustainable trust on a foundation that keeps shifting?\n\nShe proposed HAIC benchmarks (Human–AI, Context-Specific Evaluation) - shifting the unit of analysis from individual task performance to team and workflow performance, and expands the time horizon from one-off tests to longitudinal evaluation. It looks at group-level dynamics, and I suspect some social psychology factors like groupthink and power asymmetries can be considered.\n\nWhile reading this, I wonder:\n\n1. Benchmarking with behavioural criteria?\nBehavioural signals like conversation length, user deferral rates, sycophancy patterns, and error detectability could, of course, tell us far more than point-in-time accuracy scores (reminding me of Prof. Ganna Pogrebna, PhD, FHEA's new book \"The Missing B in AI\").\n\n\n2. Standardised HAIC benchmark?\nIf every organisation develops its own HAIC benchmark as she proposes, how do clients trust it? In-house benchmarks are powerful, but their credibility definitely needs external validation structures.\n\nTo me, her HAIC framework connects to Dr Paul Sacher from Behavioral AI Institute and his colleagues' recent open letter on introducing \"psychological competence\" in AI systems: an AI system's ability to respond in emotionally appropriate, behaviourally responsible ways across repeated interactions.\n\nFor those working on AI evaluation, trust, or responsible deployment, give this article a read. \n\nMany thanks to Prof. Aristidou for this insightful piece, and for sharing it!",
    "sourceUrl": "https://www.technologyreview.com/2026/03/31/1134833/ai-benchmarks-are-broken-heres-what-we-need-instead/",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_aibenchmarking-responsibleai-humanaicollaboration-activity-7446783755559677952-rsUY/",
    "keywords": [
      "AIBenchmarking",
      "ResponsibleAI",
      "HumanAICollaboration",
      "AIEvaluation",
      "TrustInAI",
      "AIDeployment",
      "BehaviouralScience",
      "AIGovernance"
    ]
  },
  {
    "id": "how-ai-impacts-skill-formation",
    "title": "Shen & Tamkin (2026)",
    "subtitle": "How AI Impacts Skill Formation",
    "summary": "Just finished reading a preprint \"How AI Impacts Skill Formation\" by Judy Hanwen Shen and Alex Tamkin from Anthropic.\n\nAs we pursue human-AI augmentation and productivity gains, perhaps we're overlooking a critical question: what happens to skill acquisition, retention, or decay over time?\n\nIn this empirical, mixed-methods study, they conduct randomised experiments to study how developers gained mastery of a new asynchronous programming library with and without AI assistance. \n\nTheir main finding is that developers using AI assistance to learn a new programming library scored 17% lower on skill assessments compared to those learning without AI - even though AI didn't significantly speed up task completion. Participants in treatment group felt 'lazy' and reported 'gaps in understanding' afterward, which is an indicator of cognitive offloading (works of Prof. Dr. Michael Gerlich).\n\nThis connects to Macnamara et al.'s on cognitive skill decay in GenAI era I reviewed. The question I keep returning to is what Tris calls 'veracity offloading': how does such cognitive delegation compound across expertise levels over time? \n\nIt's also the \"Iron Man paradox\" question I emphasised in webinar: \"What do you do when you're without the armor?\"\n\nWhat's also intriguing is six distinct AI interaction patterns researchers identified. High scorers asked conceptual questions or requested explanations alongside code, while low scorers simply delegated to AI without engagement. This mirrors the task identification dynamics in the SCAN framework that Alina and I developed: whether users identify tasks as Substitute (automation) versus Aid/Complement (augmentation/critical engagement).\n\nThe debugging skills gap was significant. As researchers noted: if workers' skill formation is inhibited by AI assistance, they may lack the necessary skills to validate and debug AI-generated code. This, I think, exemplifies the upskilling-deskilling paradox we emphasised in SCAN: tasks oscillating between Aid and Complement subzones over time.\n\nAs researchers noted, we've historically moved from 'producer' to 'supervisor'. In the GenAI era, however, how does a person become a competent supervisor without being a producer first - the very role GenAI now occupies?\n\nWhen considering AI as 'a scaffold', perhaps metacognitive prompting framework (like CIA framework Shantanu and I developed) could help reduce automation bias and illusion of understanding?\n\nThe study was rigorously controlled (impressive pilot work addressing non-compliance and confounding variables), but it's a one-off snapshot. What we  need, I suspect, is longitudinal studies tracking these dynamics over months and years. \n\nMany thanks to the researchers for this timely work. I'm looking forward to seeing how this research direction evolves.",
    "sourceUrl": "https://doi.org/10.48550/arXiv.2601.20245",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_how-ai-impacts-skill-formation-activity-7424330862713806848-eVHB/",
    "keywords": [
      "ArtificialIntelligence",
      "SkillDevelopment",
      "HumanAICollaboration",
      "CognitiveScience",
      "LifelongLearning",
      "FutureOfWork",
      "AIAugmentation",
      "MetacognitiveAI"
    ]
  },
  {
    "id": "human-ai-research-need-to-be-embedded-in-psychological-theory",
    "title": "Bigman et al. (2026)",
    "subtitle": "Human–AI interaction research needs to be embedded in psychological theory",
    "summary": "Just finished reading \"Human–AI interaction research needs to be embedded in psychological theory\" by Bigman et al..\n\nIn this comment, they argue that research should meaningfully rely on psychological theory to explain, predict and anticipate human–AI interaction, ensuring cumulative insights that outlast the latest model release.\n\nI appreciate their argument that psychological theories clarify which constructs matter, how they relate, and under what conditions. These make explanation, prediction, and cumulative science possible. \n\nWhile reading this, I wonder:\n\n1. Theory complementarity\nI've shared a few related works before, including Prof. Cleotilde Gonzalez and Tailia Malloy's \"Complementary Intelligence\": showing how cognitive science and AI systems are mutually reinforcing. LLMs can encode instructions into structured cognitive model inputs, while cognitive models can supply synthetic, human-grounded data to train AI. \n\n2. Emerging research areas\nI think of a few: how trust towards AI and AI's trustworthiness are learned and updated (see Jiang et al., 2025; Greevink et al., 2024), given LLMs' human-like interactional features. The same applies to AI persuasiveness, AI deception, groupthink in AI-assisted group decisions, and AI as \"cognitive extension\".\n\n3. Publication pressure\nThey flag a real tension: the speed at which AI advances makes \"novel phenomena\" irresistible to publish, but speed undermines theoretical grounding. \n\nIt's not surprising to see that AI-assisted literature reviews now compress days into minutes. Powerful it is. Yet, it implicitly sets an expectation that research should move just as fast. That gets amplifed with herding behaviour, fatigue, and mediocre outputs.\n\nI appreciate the three 'anchors' they proposed: \n- Coherence (Does this connect to existing theory?)\n- Distinctiveness (How is human-AI interaction genuinely different from human-human interaction?)\n- Generalizability (Will this insight outlast the current model?)\n\nRe generalizability: some areas will, I speculate, remain relevant regardless of how AI evolves, such as attention, learning, memory, flow states, and social dynamics among humans when AI is present (a tool, an agent, or neither). \n\nThe challenge, then, is building the collective infrastructure: theoretically grounded designs, and collecting great quality of human-generated data (especially as bots contaminate online experiments, which in turn, I suspect, making a convincing case for more laboratory and field experiments).\n\nMany thanks to Yochanan Bigman, Roman Briker and Markus Langer for this piece. I end with a sentence from their work: \n\n\"Whereas phenomenological insights on technologies might be outdated with the next version update, human psychology changes more gradually.\"",
    "sourceUrl": "https://doi.org/10.1038/s44159-026-00551-4",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_humanaiinteraction-psychologicaltheory-airesearch-activity-7444247056044556288-HKCh/",
    "keywords": [
      "HumanAIInteraction",
      "PsychologicalTheory",
      "AIResearch",
      "CognitiveScience",
      "BehaviouralScience",
      "AIEthics",
      "HumanComputerInteraction",
      "ResearchMethodology"
    ]
  },
  {
    "id": "human-generated-datasets-for-ai-safety-fine-tuning",
    "title": "Mustafa and Wu (2025)",
    "subtitle": "Human-Generated Datasets for AI safety Fine-Tuning",
    "summary": "Just finished reading a report on human-generated datasets for AI safety fine-tuning. It highlights how modern language models often struggle with high-risk topics and cultural nuances, especially in low-resource languages.\n\n\nThis report has broadened my horizon on one of the fundamental challenges with GenAI: data quality. After all, garbage in, garbage out. It presents a compelling case for human-generated datasets created by domain experts with both subject-matter expertise and cultural-linguistic fluency. Their 7-step framework and prioritization matrix provide practical guidance for organizations looking to improve AI performance while reducing content-related risks.\n\n\nWhat I found interesting in the report was the comparison between 'synthetic' and 'human-generated' data. While synthetic data offers scale, it tends to reinforce existing biases, and often lacks real-world context in culturally sensitive areas - issues that human-generated data can mitigate.\n\n\nI'm also intrigued by the tangible benefits highlighted when using fine-tuning LLMs with human-generated data, such as improved accuracy, reduced bias, and enhanced cultural adaptability through localization efforts.\n\n\nAfter reading this report, I'm interested in exploring the pros and cons of these two data types, and possibly, how they might complement each other, which creates a hybrid approach that leverages the strengths of both.\n\n\nMany thanks to the authors Alisar Mustafa and Cherry Wu from Duco for this insightful report. Looking forward to diving deeper into this critical area of responsible AI development.",
    "sourceUrl": "https://www.linkedin.com/feed/update/urn:li:activity:7328089036219191299/",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_aipolicy-artificialintelligence-techpolicy-activity-7345068218107015168-7l77",
    "keywords": [
      "AIEthics",
      "MachineLearning",
      "DataQuality",
      "AITraining",
      "LanguageModels",
      "AILocalization",
      "ResponsibleAI",
      "AIBiasReduction"
    ]
  },
  {
    "id": "human-machine-extended-organisms",
    "title": "Hamilton and Benjamin (2019)",
    "subtitle": "The Human-Machine Extended Organism",
    "summary": "Just finished reading \"The Human-Machine Extended Organism\" by Kristy Hamilton and Aaron Benjamin. \n\n\nWhile the paper focuses on human-internet interaction, many arguments apply perfectly between human and GenAI.\n\n\nIn this paper, the researcher examine the relationship between human and machine (the Internet), and reframe it as an extended organism rather than two separate systems. \n\n\nI appreciate their frame of an \"integrative system of internal and external cognitive processes\". They note: \"Much of what we think of as human memory in a digital ecology is the product of an integrative system selected to meet the demands of a particular cognitive task. The ability to effectively integrate internal and external processes is the critical feature of a successful cognitive agent.\" \n\n\nI kept nodding through the section on \"Consequences for Outsourcing Retrieval\", which are related to explicitly developing an organized system and how expertise are built. For instance, the researchers noted: \"One cannot become an expert birder by having a hard drive full of bird photographs. Generalization comes from internalized knowledge.\" This section, to me, highlights several critical aspects of human-GenAI interaction.\n\n\nI appreciate what researchers mentioned towards the end:\n\n\n\"People need to decide how to share cognitive responsibilities with a machine partner that is very different from themselves. They need to do so in a way that maximizes the benefits of the partnership and minimizes the risks. Doing so requires a deep understanding of what humans are good at, what machines are good at, and how to ensure access to relevant information in the conditions in which it is likely to be needed.\"\n\n\nThis raises a question we often overlook before completing a task with GenAI: what is my role and responsibility as the human user? What role and responsibilities should I assign to GenAI?\n\n\nMany thanks to the researchers for this interesting paper.",
    "sourceUrl": "https://psycnet.apa.org/doi/10.1016/j.jarmac.2019.01.001",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_the-human-machine-extended-organism-activity-7383890055360126976-KPNq",
    "keywords": [
      "HumanAICollaboration",
      "CognitiveScience",
      "DigitalEcology",
      "ExtendedCognition",
      "GenAI",
      "CognitiveOffloading",
      "FutureOfWork",
      "TechEthics"
    ]
  },
  {
    "id": "llm-agents-cooperate-social-dilemma-simulation",
    "title": "Willis et al. (2025)",
    "subtitle": "Will Systems of LLM Agents Cooperate: An Investigation into a Social Dilemma",
    "summary": "I recently explored this fascinating paper investigating how LLM agents behave in strategic interactions - whether they cooperate or compete with each other in social dilemmas.\n\n\nWhat makes this research distinctive from other related research is their novel approach: rather than having LLMs output individual actions, the researchers prompted models (OpenAI's #ChatGPT4o and Anthropic's #Claude 3.5 Sonnet) to generate 'complete' strategies for iterated Prisoner's Dilemma. \n\n\nThe researchers investigated whether LLM agents perform better when prompted with 'aggressive' (and thus compete), 'cooperative' (and thus cooperate), or 'neutral' attitudes. These strategies were then converted to Python algorithms, and evaluated in tournaments with selection pressure favoring higher-performing strategies.\n\n\nTwo findings particularly resonated with me:\n\n\nFirst, the strategies produced by all three prompting approaches were inherently game-theoretic in nature. Even when the task was obfuscated (in the 'Prose' prompt), the models recognized that game theory principles applied. This suggests generative agents will reason appropriately about strategic scenarios in real-world applications.\n\n\nSecond, different LLMs exhibited distinct biases affecting the success of aggressive versus cooperative strategies, which highlight the importance of model selections.\n\n\nFor future work, I wonder how reasoning vs. non-reasoning models (both open-source and close, of course) might perform differently. Would assigning human/machine roles affect strategy development? How might changing payoff structures (something I investigated in my thesis) influence LLM responses, and thus, performance?\n\n\nMany thanks to researchers Richard Willis and Dr. Yali Du from King's College London, Dr. Joel Leibo from Google and Professor Michael Luck from University of Sussex for this amazing study!\n\n\nAs someone who studied similar contexts in the PhD thesis, I'm excited about the intersections of game theory, LLMs, and multi-agent systems that this work opens up. I'm happy to connect with fellow enthusiasts, academics, and professionals interested in this research area!",
    "sourceUrl": "https://doi.org/10.48550/arXiv.2501.16173",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_llm-agents-cooperate-iterated-prisoners-dilemma-activity-7345793644504707073-emb6",
    "keywords": [
      "AICooperation",
      "GameTheory",
      "LLMAgents",
      "MultiAgentSystems",
      "EvolutionaryComputing",
      "ArtificialIntelligence",
      "AIStrategy",
      "MachineLearning"
    ]
  },
  {
    "id": "llm-pollution-online-behavioral-research",
    "title": "Rilla et al. (2025)",
    "subtitle": "Recognising, Anticipating, and Mitigating LLM Pollution of Online Behavioural Research",
    "summary": "I recently read \"Recognising, Anticipating, and Mitigating LLM Pollution of Online Behavioural Research\" that introduces the concept of 'LLM Pollution'.\n\n\nThis is an emerging threat to online behavioural research where GenAI become involved in online tasks designed to measure human responses.\n\n\nThe authors identify three variants of this phenomenon, but I found two particularly concerning for social science research:\n\n\nIn the 'Full LLM Delegation', participants outsource entire studies to AI tools or agents. As GenAI agents become more prevalent these days (imagine calling multiple agents to participate in several studies simultaneously), researchers face homogeneous responses that no longer reflect true, natural human cognitive variation. Also, these models are trained on data with potential human biases, amplified with additional biases they themselves introduce. Thus, this combination of biases can, as one imagines, produce unrealistic results that complicate research findings and any policies based on them.\n\n\nThe 'LLM Spillover' variant is, in my opinion, equally troubling. In the experiment, participants may alter their behavior simply in anticipation of LLM involvement, even when none exists. I agree with the authors that this can lead to unintended consequences, such as participants reducing their effort with the rationalisation that 'everyone is cheating with LLMs anyway' - a form of moral licensing behaviour.\n\n\nWhile recognizing the consequences of LLM Pollution in online behavioural research, the paper offers some practical mitigation strategies, which are worth exploring. \n\n\nIn addition, online participant recruitment platforms like Prolific now provide guideline on 'Authenticity Check' (https://lnkd.in/eAJz55fy) to help verify human participation.\n\n\nI shall end this post by quoting from the paper: \n\n\n'As LLMs become increasingly embedded in everyday life, their use in cognitive, communicative, and problem-solving tasks may no longer be an exception, but the norm. This raises a more fundamental question: at what point does LLM-assisted behaviour cease to be “pollution” and instead become part of the ecological baseline we must account for? While mitigation remains essential for preserving the integrity of current methods, the long-term challenge may lie in adapting our theoretical frameworks to a world where human reasoning is increasingly shaped by intelligent machines.'\n\n\nMany thanks to Raluca Rilla, Tobias Werner, Hiromu Yakura, Iyad Rahwan, and Anne-Marie Nussberger from Max Planck Institute for Human Development for this timely contribution.",
    "sourceUrl": "https://doi.org/10.48550/arXiv.2508.01390",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_recognising-anticipating-and-mitigating-activity-7359968413454139392-KSor",
    "keywords": [
      "ResearchMethodology",
      "AIEthics",
      "SocialScience",
      "LLMs",
      "OnlineResearch",
      "ResearchValidity",
      "AIBias",
      "FutureOfResearch"
    ]
  },
  {
    "id": "llm-sycophancy",
    "title": "Malmqvist (2024)",
    "subtitle": "Sycophancy in Large Language Models: Causes and Mitigations",
    "summary": "Are your GenAI assistants telling you what you want to hear, rather than what you need to know?\n\n\nI just finished reading \"Sycophancy in Large Language Models: Causes and Mitigations\" by Dr. Lars Malmqvist (Partner at Implement Consulting Group). This paper provides a technical survey of sycophancy in LLMs, synthesizing recent research on its causes, impacts, and potential mitigation strategies - an excellent introduction to a critical issue in GenAI development. \n\n\nSycophancy, as the author defined is, the tendency of LLMs to exhibit \"excessively agreeing with or flattering users\" behavior - essentially telling us what we want to hear, rather than what's accurate. \n\n\nThis poses significant risks: it reinforces our confirmation bias on familiar topics, and can easily manipulate us on unfamiliar subjects where we rely on generative AI for information.\n\n\nWhat struck me most while reading was the section on the challenges in defining alignment. The alignment problem - ensuring generative AI systems behave in accordance with human values and intentions - is fundamental to addressing sycophancy. When we struggle to precisely define concepts like 'truthfulness' and 'helpfulness', we inadvertently create systems that prioritize user agreement over factual accuracy.\n\n\nAs for everyday users, we can mitigate this issue by:\n\n1. Increasing our generative AI literacy through reading papers like this, and sharing experiences with others, and taking relevant online courses.\n\n\n2. When interacting with generative AI, modify our prompts to play the Devil's Advocate, or show multiple perspectives (as if having an Angel and a Devil on our shoulders). An example prompt is: \n\n\"Analyze this topic from multiple viewpoints, including ones that challenge my perspective.\"\n\n\n3. Comparing responses across different generative AI models to identify subtle sycophantic tendencies\n\n\nMany thanks to Dr. Lars Malmqvist for this insightful paper that helps us think more critically about human-AI interaction, and reminds us to maintain healthy skepticism when working with these powerful tools.",
    "sourceUrl": "https://doi.org/10.48550/arXiv.2411.15287",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_sycophancy-in-llms-causes-and-mitigations-activity-7335619551155380224-7we8",
    "keywords": [
      "AILiteracy",
      "GenerativeAI",
      "MachineLearning",
      "LLMs",
      "AIEthics",
      "AIAlignment",
      "AIBias",
      "TechTrends"
    ]
  },
  {
    "id": "long-term-cognitive-cost-ai",
    "title": "Inie (2025)",
    "subtitle": "The Cognitive Cost of Generative AI Mapping long-term risks and moderating factors",
    "summary": "Just finished reading this thought-provoking paper 'The Cognitive Cost of Generative AI Mapping long-term risks and moderating factors' by Nanna Inie (Assistant Professor at the IT-Universitetet i København).\n\n\nAs an AI Behavioural Researcher, honestly, I am eager to see this research flourish in the coming years (or months, perhaps). \n\n\nThe cognitive cost of GenAI adoption has been largely overlooked by corporations rushing to adopt these technologies. This, to me, is one of the few potential explantions why many GenAI implementations fail in business settings.\n\n\nInie frames GenAI use among knowledge workers as 'a double-edged sword', which does remind me of the calculator analogy or the Google effect (which may be intensifying as search engines incorporate AI).\n\n\n\"This (GenAI adoption) does sound great, but what is 𝘵𝘩𝘦 𝘤𝘢𝘵𝘤𝘩?\"\n\n\nI believe that we're still discovering what harms GenAI might cause, especially regarding long-term cognitive costs. This, of course, requires ongoing monitoring and research, particularly as AI agents and agentic AI amplify these concerns.\n\n\nThe paper connects negative feelings during challenging cognitive tasks with long-term cognitive benefits, which, to me, is similar to how physical training, despite discomfort, prevents chronic diseases later in life. Does this highlight the essence of carefully positioning ourselves alongside GenAI, in defining clearer roles, tasks, and thus, expectations. These, I believe, are some of crucial elements for building a foundation where trust in human-GenAI relationship can be built upon.\n\n\nThe reduced sense of agency when using GenAI creates an interesting feedback loop: those who are confident in their abilities, of course, maintain power over the final output, while those less confident may surrender more control to AI systems. \n\n\nThis requires constant self-evaluation, or an introspection, of our knowledge and strategic decisions about what roles we assign to GenAI.\n\n\nViewing knowledge work as a service journey, perhaps, helps identify where GenAI implementation offers benefits versus potential long-term costs. A concerning outcome might be increasingly homogeneous quality of work products as people over-rely on GenAI output, as time goes on.\n\n\nInie's proposed research areas such as GenAI's influence on social relationships, impact on metacognitive evaluation, and technostress deserve serious attention from behavioural and cognitive scientists.\n\n\nTrained as an economist, I'm naturally drawn to finding an 'equilibrium' between optimal human cognitive load and GenAI assistance. Perhaps, perhaps, perhaps - by starting to put weight of cognitive costs on against the benefits where GenAI adoption has been emphasising on, may we discover this balance in the long run.\n\n\nMany thanks to the author for this timely contribution!",
    "sourceUrl": "https://www.nannainie.com/_files/ugd/cf986a_96612c9ab2bb4864be2bbbf3b73f416b.pdf",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_new-position-paper-the-cognitive-cost-of-activity-7365640920718876673-JoVx",
    "keywords": [
      "GenerativeAI",
      "CognitiveLoad",
      "FutureOfWork",
      "AIResearch",
      "TechnoStress",
      "KnowledgeWork",
      "AIEthics",
      "CognitiveScience"
    ]
  },
  {
    "id": "narrow-search-effect",
    "title": "Leung and Urminsky (2025)",
    "subtitle": "The narrow search effect and how broadening search promotes belief updating",
    "summary": "Just finished reading \"The narrow search effect and how broadening search promotes belief updating\" by Eugina Leung and Oleg Urminsky.\n\n\nIt is fascinating paper about how people search for information online, and is timely as we navigate interactions with GenAI platforms for search and advice.\n\n\nIn this paper, they raise several important questions: the balance between 'breadth' and 'depth' in information search, and how search algorithms should be designed to promote belief updating.\n\n\nThe researchers demonstrate that, even without algorithmic bias, 'echo chambers' persist as we naturally use 'directionally narrow search terms' that reflect our existing beliefs. This 'narrow search effect' reinforces confirmation bias across Google, ChatGPT, and other platforms.\n\n\nWhat I find intriguing was their finding across 21 studies: when information technology provides broader information (beyond what was specifically requested), people update their beliefs more after searching. \n\n\nIn the case of interacting GenAI platforms like OpenAI's ChatGPT 3.5 explicitly acknowledging opposing viewpoints, users assigned to directionally narrow queries showed significant bias in their post-search beliefs.\n\n\nThe paper suggests structural changes to search and AI algorithms can mitigate confirmation bias. While LLM's sycophantic nature in conversational AI platforms like Anthropic's #Claude, xAI's Grok, and Google's #Gemini can, of course, reinforce user's prior beliefs, encouraging dialogical, multi-turn conversations might help user explore broader perspectives, thus allow for more reflection and belief updating compared to single-query Google searches.\n\n\nAs we develop next-generation AI systems, the researchers highlight the need for research more fully tests psychologically informed prompt-engineering approaches, an emerging question that bridges the psychology of decision-making and human–computer interaction (reminding me of related conversations I've had with Tris).\n\n\nAfter reading, I wonder:\n\n1. How do these findings apply to domains like trading, mental health support, and product recommendations?\n\n2. How might AI agents amplify these issues? \n\n3. When combined with research on automation bias, what behavioural interventions could mitigate the combination of human confirmation bias and LLM sycophancy?\n\n\nThanks to the researchers for this insightful work that bridges psychology and human-computer interaction!",
    "sourceUrl": "https://www.pnas.org/doi/10.1073/pnas.2408175122",
    "linkedinUrl": "https://www.linkedin.com/posts/fenditsim_narrow-search-effect-on-search-belief-updating-activity-7384157191920054273-vUPz",
    "keywords": [
      "AILiteracy",
      "ConfirmationBias",
      "PromptEngineering",
      "InformationSeeking",
      "BeliefsUpdating",
      "GenerativeAI",
      "HumanAIInteraction",