TechHub-CodeSprint-Challenge-2026/answer.json at main · Ratul-byte/TechHub-CodeSprint-Challenge-2026 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
[
  {
    "question_id": 1,
    "question": "In transformer-based text classification papers, what specific fine-tuning methods are used for sentiment or topic classification?",
    "answer": "STF: Sentence Transformer Fine-Tuning For Topic Categorization With\n  Limited Data: Nowadays, topic classification from tweets attracts considerable research attention. Different classification systems have been suggested thanks to these research efforts. Nevertheless, they face major challenges owing to low performance metrics due to the limited amount of la... Sentiment Analysis of Lithuanian Online Reviews Using Large Language\n  Models: Sentiment analysis is a widely researched area within Natural Language Processing (NLP), attracting significant interest due to the advent of automated solutions. Despite this, the task remains challenging because of the inherent complexity of languages and the subjective natu... Optimizing Multi-Class Text Classification: A Diverse Stacking Ensemble\n  Framework Utilizing Transformers: Customer reviews play a crucial role in assessing customer satisfaction, gathering feedback, and driving improvements for businesses. Analyzing these reviews provides valuable insights into customer sentiments, including compliments, comments, and suggestions. Text classificat...",
    "sources": [
      {
        "arxiv_id": "2407.03253",
        "title": "STF: Sentence Transformer Fine-Tuning For Topic Categorization With\n  Limited Data",
        "category": "cs.CL",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.33589696884155273
      },
      {
        "arxiv_id": "2407.19914",
        "title": "Sentiment Analysis of Lithuanian Online Reviews Using Large Language\n  Models",
        "category": "cs.CL",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.3560217618942261
      },
      {
        "arxiv_id": "2308.11519",
        "title": "Optimizing Multi-Class Text Classification: A Diverse Stacking Ensemble\n  Framework Utilizing Transformers",
        "category": "cs.CL",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.3605504631996155
      },
      {
        "arxiv_id": "2307.10234",
        "title": "SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its\n  Departure from Current Machine Learning",
        "category": "cs.CL",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.3669431209564209
      },
      {
        "arxiv_id": "2106.02009",
        "title": "A Case Study of Spanish Text Transformations for Twitter Sentiment\n  Analysis",
        "category": "cs.CL",
        "year": 2021,
        "pub_status": "Preprint",
        "distance": 0.37092387676239014
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.CL",
    "year_filter": null
  },
  {
    "question_id": 2,
    "question": "What specific regularization methods are used to reduce overfitting in deep learning classification papers?",
    "answer": "Regularized deep learning with nonconvex penalties: Regularization methods are often employed in deep learning neural networks (DNNs) to prevent overfitting. For penalty based DNN regularization methods, convex penalties are typically considered because of their optimization guarantees. Recent theoretical work have shown that n... Regularizing Deep Neural Networks by Noise: Its Interpretation and\n  Optimization: Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is known as a successful regularizer, but i... Overfitting in adversarially robust deep learning: we study several classical and modern deep learning remedies for overfitting, including regularization and data augmentation, and find that no approach in isolation improves significantly upon the gains achieved by early stopping. All code for reproducing the experiments as we...",
    "sources": [
      {
        "arxiv_id": "1909.05142",
        "title": "Regularized deep learning with nonconvex penalties",
        "category": "stat.ML",
        "year": 2019,
        "pub_status": "Published",
        "distance": 0.32158225774765015
      },
      {
        "arxiv_id": "1710.05179",
        "title": "Regularizing Deep Neural Networks by Noise: Its Interpretation and\n  Optimization",
        "category": "cs.LG",
        "year": 2017,
        "pub_status": "Preprint",
        "distance": 0.32577669620513916
      },
      {
        "arxiv_id": "2002.11569",
        "title": "Overfitting in adversarially robust deep learning",
        "category": "cs.LG",
        "year": 2020,
        "pub_status": "Preprint",
        "distance": 0.3404098153114319
      },
      {
        "arxiv_id": "2006.08643",
        "title": "On the training dynamics of deep networks with $L_2$ regularization",
        "category": "stat.ML",
        "year": 2020,
        "pub_status": "Preprint",
        "distance": 0.35840266942977905
      },
      {
        "arxiv_id": "2302.09433",
        "title": "The Generalization Error of Stochastic Mirror Descent on\n  Over-Parametrized Linear Models",
        "category": "cs.LG",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.365922749042511
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": null,
    "year_filter": null
  },
  {
    "question_id": 3,
    "question": "In reinforcement learning papers, what tasks are discussed for robotics or control problems?",
    "answer": "Assessing Policy, Loss and Planning Combinations in Reinforcement\n  Learning using a New Modular Architecture: that the best combination of planning algorithm, policy, and loss function is heavily problem dependent. This result provides evidence that the proposed architecture, which is modular and reusable, is useful for reinforcement learning researchers who want to study new environm... Autotuning PID control using Actor-Critic Deep Reinforcement Learning: This work is an exploratory research concerned with determining in what way reinforcement learning can be used to predict optimal PID parameters for a robot designed for apple harvest. To study this, an algorithm called Advantage Actor Critic (A2C) is implemented on a simulate... Test-driven Reinforcement Learning in Continuous Control: Reinforcement learning (RL) has been recognized as a powerful tool for robot control tasks. RL typically employs reward functions to define task objectives and guide agent learning. However, since the reward function serves the dual purpose of defining the optimal goal and gui...",
    "sources": [
      {
        "arxiv_id": "2201.02874",
        "title": "Assessing Policy, Loss and Planning Combinations in Reinforcement\n  Learning using a New Modular Architecture",
        "category": "cs.LG",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.4407162070274353
      },
      {
        "arxiv_id": "2212.00013",
        "title": "Autotuning PID control using Actor-Critic Deep Reinforcement Learning",
        "category": "cs.LG",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.4514155387878418
      },
      {
        "arxiv_id": "2511.07904",
        "title": "Test-driven Reinforcement Learning in Continuous Control",
        "category": "cs.LG",
        "year": 2025,
        "pub_status": "Preprint",
        "distance": 0.4522496461868286
      },
      {
        "arxiv_id": "2309.01909",
        "title": "A Survey on Physics Informed Reinforcement Learning: Review and Open Problems",
        "category": "cs.LG",
        "year": 2023,
        "pub_status": "Published",
        "distance": 0.4606286287307739
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.LG",
    "year_filter": null
  },
  {
    "question_id": 4,
    "question": "Which benchmark datasets are most often used in computer vision papers on detection or segmentation?",
    "answer": "Detecting Figures and Part Labels in Patents: Competition-Based\n  Development of Image Processing Algorithms: detection, 78.81% for figure regions with correctly recognized figure titles, and 70.98% for part label detection and character recognition. Data and software from the competition are available through the online UCI Machine Learning repository to inspire follow-on work by the... MISeval: a Metric Library for Medical Image Segmentation Evaluation: Correct performance assessment is crucial for evaluating modern artificial intelligence algorithms in medicine like deep-learning based medical image segmentation models. However, there is no universal metric library in Python for standardized and reproducible evaluation. Thus... Fully Convolutional Networks and Generative Adversarial Networks Applied\n  to Sclera Segmentation: objective evaluations of the proposed approaches, we provide to the scientific community new 1,300 manually segmented images from two databases. The experiments are performed on the UBIRIS.v2 and MICHE databases and the best performing configurations of our propositions achiev...",
    "sources": [
      {
        "arxiv_id": "1410.6751",
        "title": "Detecting Figures and Part Labels in Patents: Competition-Based\n  Development of Image Processing Algorithms",
        "category": "cs.CV",
        "year": 2014,
        "pub_status": "Preprint",
        "distance": 0.4041897654533386
      },
      {
        "arxiv_id": "2201.09395",
        "title": "MISeval: a Metric Library for Medical Image Segmentation Evaluation",
        "category": "cs.CV",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.41237109899520874
      },
      {
        "arxiv_id": "1806.08722",
        "title": "Fully Convolutional Networks and Generative Adversarial Networks Applied\n  to Sclera Segmentation",
        "category": "cs.CV",
        "year": 2018,
        "pub_status": "Preprint",
        "distance": 0.41555923223495483
      },
      {
        "arxiv_id": "2003.07557",
        "title": "1st Place Solutions for OpenImage2019 -- Object Detection and Instance\n  Segmentation",
        "category": "cs.CV",
        "year": 2020,
        "pub_status": "Preprint",
        "distance": 0.43166792392730713
      },
      {
        "arxiv_id": "2501.16182",
        "title": "The Linear Attention Resurrection in Vision Transformer",
        "category": "cs.CV",
        "year": 2025,
        "pub_status": "Preprint",
        "distance": 0.43564367294311523
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.CV",
    "year_filter": null
  },
  {
    "question_id": 5,
    "question": "What uncertainty quantification methods are discussed in probabilistic machine learning papers?",
    "answer": "Uncertainty Quantification in Probabilistic Machine Learning Models: Theory, Methods, and Insights: Uncertainty Quantification (UQ) is essential in probabilistic machine learning models, particularly for assessing the reliability of predictions. In this paper, we present a systematic framework for estimating both epistemic and aleatoric uncertainty in probabilistic models. W... Nonparametric Uncertainty Quantification for Single Deterministic Neural\n  Network: This paper proposes a fast and scalable method for uncertainty quantification of machine learning models' predictions. First, we show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditio... On the Calibration of Probabilistic Classifier Sets: Multi-class classification methods that produce sets of probabilistic classifiers, such as ensemble learning methods, are able to model aleatoric and epistemic uncertainty. Aleatoric uncertainty is then typically quantified via the Bayes error, and epistemic uncertainty via th...",
    "sources": [
      {
        "arxiv_id": "2509.05877",
        "title": "Uncertainty Quantification in Probabilistic Machine Learning Models: Theory, Methods, and Insights",
        "category": "stat.ML",
        "year": 2025,
        "pub_status": "Preprint",
        "distance": 0.3902752995491028
      },
      {
        "arxiv_id": "2202.03101",
        "title": "Nonparametric Uncertainty Quantification for Single Deterministic Neural\n  Network",
        "category": "stat.ML",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.3948649764060974
      },
      {
        "arxiv_id": "2205.10082",
        "title": "On the Calibration of Probabilistic Classifier Sets",
        "category": "stat.ML",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.4057350158691406
      },
      {
        "arxiv_id": "2303.14568",
        "title": "Measuring Classification Decision Certainty and Doubt",
        "category": "stat.ML",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.40716952085494995
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "stat.ML",
    "year_filter": null
  },
  {
    "question_id": 6,
    "question": "In cs.AI papers from 2023, what themes appear around planning, agents, and tool use?",
    "answer": "Timeline-based Planning and Execution with Uncertainty: Theory, Modeling\n  Methodologies and Practice: Automated Planning is one of the main research field of Artificial Intelligence since its beginnings. Research in Automated Planning aims at developing general reasoners (i.e., planners) capable of automatically solve complex problems. Broadly speaking, planners rely on a gene... Report from the NSF Future Directions Workshop, Toward User-Oriented\n  Agents: Research Directions and Challenges: This USER Workshop was convened with the goal of defining future research directions for the burgeoning intelligent agent research community and to communicate them to the National Science Foundation. It took place in Pittsburgh Pennsylvania on October 24 and 25, 2019 and was... Position Paper: Online Modeling for Offline Planning: The definition and representation of planning problems is at the heart of AI planning research. A key part is the representation of action models. Decades of advances improving declarative action model representations resulted in numerous theoretical advances, and capable, wor...",
    "sources": [
      {
        "arxiv_id": "1905.05713",
        "title": "Timeline-based Planning and Execution with Uncertainty: Theory, Modeling\n  Methodologies and Practice",
        "category": "cs.AI",
        "year": 2019,
        "pub_status": "Preprint",
        "distance": 0.44589293003082275
      },
      {
        "arxiv_id": "2006.06026",
        "title": "Report from the NSF Future Directions Workshop, Toward User-Oriented\n  Agents: Research Directions and Challenges",
        "category": "cs.CL",
        "year": 2020,
        "pub_status": "Preprint",
        "distance": 0.45725929737091064
      },
      {
        "arxiv_id": "2206.03356",
        "title": "Position Paper: Online Modeling for Offline Planning",
        "category": "cs.AI",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.45929956436157227
      },
      {
        "arxiv_id": "2601.03555",
        "title": "SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models",
        "category": "cs.AI",
        "year": 2026,
        "pub_status": "Preprint",
        "distance": 0.4659184217453003
      },
      {
        "arxiv_id": "2011.10707",
        "title": "Explainable Composition of Aggregated Assistants",
        "category": "cs.AI",
        "year": 2020,
        "pub_status": "Preprint",
        "distance": 0.46719419956207275
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.AI",
    "year_filter": "2023"
  },
  {
    "question_id": 7,
    "question": "Which evaluation metrics are used in NLP papers for classification and text generation tasks?",
    "answer": "Perturbation CheckLists for Evaluating NLG Evaluation Metrics: Natural Language Generation (NLG) evaluation is a multifaceted task requiring assessment of multiple desirable criteria, e.g., fluency, coherency, coverage, relevance, adequacy, overall quality, etc. Across existing datasets for 6 NLG tasks, we observe that the human evaluatio... Automatic Metrics in Natural Language Generation: A Survey of Current\n  Evaluation Practices: Automatic metrics are extensively used to evaluate natural language processing systems. However, there has been increasing focus on how they are used and reported by practitioners within the field. In this paper, we have conducted a survey on the use of automatic metrics, focu... On the Effectiveness of Automated Metrics for Text Generation Systems: A major challenge in the field of Text Generation is evaluation because we lack a sound theory that can be leveraged to extract guidelines for evaluation campaigns. In this work, we propose a first step towards such a theory that incorporates different sources of uncertainty,...",
    "sources": [
      {
        "arxiv_id": "2109.05771",
        "title": "Perturbation CheckLists for Evaluating NLG Evaluation Metrics",
        "category": "cs.CL",
        "year": 2021,
        "pub_status": "Preprint",
        "distance": 0.3057273030281067
      },
      {
        "arxiv_id": "2408.09169",
        "title": "Automatic Metrics in Natural Language Generation: A Survey of Current\n  Evaluation Practices",
        "category": "cs.CL",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.3331500291824341
      },
      {
        "arxiv_id": "2210.13025",
        "title": "On the Effectiveness of Automated Metrics for Text Generation Systems",
        "category": "cs.CL",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.3403450846672058
      },
      {
        "arxiv_id": "2310.00752",
        "title": "TIGERScore: Towards Building Explainable Metric for All Text Generation\n  Tasks",
        "category": "cs.CL",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.34491825103759766
      },
      {
        "arxiv_id": "2110.08559",
        "title": "FrugalScore: Learning Cheaper, Lighter and Faster Evaluation Metricsfor\n  Automatic Text Generation",
        "category": "cs.CL",
        "year": 2021,
        "pub_status": "Preprint",
        "distance": 0.3450550436973572
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.CL",
    "year_filter": null
  },
  {
    "question_id": 8,
    "question": "What model families dominate cs.CV papers from 2022 on vision transformers and diffusion models?",
    "answer": "Diffusion Models in Low-Level Vision: A Survey: evaluation metrics. We conduct a thorough evaluation, encompassing both performance and efficiency, of diffusion model-based techniques in three prominent tasks. Finally, we elucidate the limitations of current diffusion models and propose seven intriguing directions for futur... PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor: objects, and variations. Thanks to our design, we do not require any inversion step. Additionally, we propose multimodal classifier-free guidance which enables editing images using both reference images and text when using our approach with foundational diffusion models. We va... Class-Balancing Diffusion Models: Diffusion-based models have shown the merits of generating high-quality visual data while preserving better diversity in recent studies. However, such observation is only justified with curated data distribution, where the data samples are nicely pre-processed to be uniformly...",
    "sources": [
      {
        "arxiv_id": "2406.11138",
        "title": "Diffusion Models in Low-Level Vision: A Survey",
        "category": "cs.CV",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.4237717390060425
      },
      {
        "arxiv_id": "2303.17546",
        "title": "PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor",
        "category": "cs.CV",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.4662579298019409
      },
      {
        "arxiv_id": "2305.00562",
        "title": "Class-Balancing Diffusion Models",
        "category": "cs.CV",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.5021175146102905
      },
      {
        "arxiv_id": "2506.21757",
        "title": "TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics",
        "category": "stat.ML",
        "year": 2025,
        "pub_status": "Preprint",
        "distance": 0.5055375695228577
      },
      {
        "arxiv_id": "2408.10207",
        "title": "A Comprehensive Survey on Diffusion Models and Their Applications",
        "category": "cs.CV",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.5188521146774292
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.CV",
    "year_filter": "2022"
  },
  {
    "question_id": 9,
    "question": "What train-validation-test splitting and cross-validation practices are described in machine learning methodology papers?",
    "answer": "Theoretical Analyses of Cross-Validation Error and Voting in\n  Instance-Based Learning: This paper begins with a general theory of error in cross-validation testing of algorithms for supervised learning from examples. It is assumed that the examples are described by attribute-value pairs, where the values are symbolic. Cross-validation requires a set of training... On Tail Decay Rate Estimation of Loss Function Distributions: The study of loss function distributions is critical to characterize a model's behaviour on a given machine learning problem. For example, while the quality of a model is commonly determined by the average loss assessed on a testing set, this quantity does not reflect the exis...",
    "sources": [
      {
        "arxiv_id": "cs/0212030",
        "title": "Theoretical Analyses of Cross-Validation Error and Voting in\n  Instance-Based Learning",
        "category": "cs.LG",
        "year": 2002,
        "pub_status": "Published",
        "distance": 0.5046436190605164
      },
      {
        "arxiv_id": "2306.02807",
        "title": "On Tail Decay Rate Estimation of Loss Function Distributions",
        "category": "cs.LG",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.548582911491394
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.LG",
    "year_filter": null
  },
  {
    "question_id": 10,
    "question": "Which Bayesian and ensemble uncertainty methods appear in stat.ML papers from 2021?",
    "answer": "Bayesian System ID: Optimal management of parameter, model, and\n  measurement uncertainty: Carlo scheme that we use to obtain the Bayesian posterior for both linear and nonlinear problems. We then empirically demonstrate that obtaining the marginal posterior of the parameter dynamics and making predictions by extracting optimal estimators (e.g., mean, median, mode)... Calibrated and uncertain? Evaluating uncertainty estimates in binary classification models: Rigorous statistical methods, including parameter estimation with accompanying uncertainties, underpin the validity of scientific discovery, especially in the natural sciences. With increasingly complex data models such as deep learning techniques, uncertainty quantification h... Time-series Scenario Forecasting: Many applications require the ability to judge uncertainty of time-series forecasts. Uncertainty is often specified as point-wise error bars around a mean or median forecast. Due to temporal dependencies, such a method obscures some information. We would ideally have a way to...",
    "sources": [
      {
        "arxiv_id": "2003.02359",
        "title": "Bayesian System ID: Optimal management of parameter, model, and\n  measurement uncertainty",
        "category": "stat.ML",
        "year": 2020,
        "pub_status": "Preprint",
        "distance": 0.4896275997161865
      },
      {
        "arxiv_id": "2508.1146",
        "title": "Calibrated and uncertain? Evaluating uncertainty estimates in binary classification models",
        "category": "cs.LG",
        "year": 2025,
        "pub_status": "Preprint",
        "distance": 0.494335412979126
      },
      {
        "arxiv_id": "1211.301",
        "title": "Time-series Scenario Forecasting",
        "category": "stat.ML",
        "year": 2012,
        "pub_status": "Preprint",
        "distance": 0.4952050447463989
      },
      {
        "arxiv_id": "cs/0504043",
        "title": "Experimental Comparison of Classification Uncertainty for Randomised and\n  Bayesian Decision Tree Ensembles",
        "category": "cs.AI",
        "year": 2005,
        "pub_status": "Preprint",
        "distance": 0.5018309354782104
      },
      {
        "arxiv_id": "2112.13776",
        "title": "Transformer Uncertainty Estimation with Hierarchical Stochastic\n  Attention",
        "category": "cs.CL",
        "year": 2021,
        "pub_status": "Preprint",
        "distance": 0.5123996734619141
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "stat.ML",
    "year_filter": "2021"
  },
  {
    "question_id": 11,
    "question": "What optimization algorithms are most common in recent deep learning training papers?",
    "answer": "Newton Methods for Convolutional Neural Networks: Deep learning involves a difficult non-convex optimization problem, which is often solved by stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some situations. Recently, Newton methods have been investigated as an alternative optimization... Optimizer Benchmarking Needs to Account for Hyperparameter Tuning: The performance of optimizers, particularly in deep learning, depends considerably on their chosen hyperparameter configuration. The efficacy of optimizers is often studied under near-optimal problem-specific hyperparameters, and finding these settings may be prohibitively cos... Derivative-Free Global Optimization Algorithms: Bayesian Method and\n  Lipschitzian Approaches: In this paper, we will provide an introduction to the derivative-free optimization algorithms which can be potentially applied to train deep learning models. Existing deep learning model training is mostly based on the back propagation algorithm, which updates the model variab...",
    "sources": [
      {
        "arxiv_id": "1811.061",
        "title": "Newton Methods for Convolutional Neural Networks",
        "category": "stat.ML",
        "year": 2018,
        "pub_status": "Preprint",
        "distance": 0.4447805881500244
      },
      {
        "arxiv_id": "1910.11758",
        "title": "Optimizer Benchmarking Needs to Account for Hyperparameter Tuning",
        "category": "cs.LG",
        "year": 2019,
        "pub_status": "Preprint",
        "distance": 0.44710588455200195
      },
      {
        "arxiv_id": "1904.09365",
        "title": "Derivative-Free Global Optimization Algorithms: Bayesian Method and\n  Lipschitzian Approaches",
        "category": "cs.LG",
        "year": 2019,
        "pub_status": "Preprint",
        "distance": 0.44846951961517334
      },
      {
        "arxiv_id": "1805.06753",
        "title": "Interpolatron: Interpolation or Extrapolation Schemes to Accelerate\n  Optimization for Deep Neural Networks",
        "category": "stat.ML",
        "year": 2018,
        "pub_status": "Preprint",
        "distance": 0.45437830686569214
      },
      {
        "arxiv_id": "2011.08042",
        "title": "Mixing ADAM and SGD: a Combined Optimization Method",
        "category": "cs.LG",
        "year": 2020,
        "pub_status": "Preprint",
        "distance": 0.45822352170944214
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": null,
    "year_filter": null
  },
  {
    "question_id": 12,
    "question": "What NLP tasks are most discussed in cs.CL papers on translation, summarization, and question answering?",
    "answer": "A Neural-Symbolic Approach Towards Identifying Grammatically Correct\n  Sentences: Textual content around us is growing on a daily basis. Numerous articles are being written as we speak on online newspapers, blogs, or social media. Similarly, recent advances in the AI field, like language models or traditional classic AI approaches, are utilizing all the abo... On Context Utilization in Summarization with Large Language Models: Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries. Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens. However, in question answering, language models exhibit... Benchmarking Generation and Evaluation Capabilities of Large Language\n  Models for Instruction Controllable Summarization: While large language models (LLMs) can already achieve strong performance on standard generic summarization benchmarks, their performance on more complex summarization task settings is less studied. Therefore, we benchmark LLMs on instruction controllable text summarization, w...",
    "sources": [
      {
        "arxiv_id": "2307.08036",
        "title": "A Neural-Symbolic Approach Towards Identifying Grammatically Correct\n  Sentences",
        "category": "cs.CL",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.4051550030708313
      },
      {
        "arxiv_id": "2310.1057",
        "title": "On Context Utilization in Summarization with Large Language Models",
        "category": "cs.CL",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.41583251953125
      },
      {
        "arxiv_id": "2311.09184",
        "title": "Benchmarking Generation and Evaluation Capabilities of Large Language\n  Models for Instruction Controllable Summarization",
        "category": "cs.CL",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.4398207664489746
      },
      {
        "arxiv_id": "2404.01701",
        "title": "On the Role of Summary Content Units in Text Summarization Evaluation",
        "category": "cs.CL",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.44099438190460205
      },
      {
        "arxiv_id": "2203.0382",
        "title": "A Variational Hierarchical Model for Neural Cross-Lingual Summarization",
        "category": "cs.CL",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.445609450340271
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.CL",
    "year_filter": null
  },
  {
    "question_id": 13,
    "question": "What dataset and benchmark trends appear in cs.CV papers about recognition and detection?",
    "answer": "Detecting Figures and Part Labels in Patents: Competition-Based\n  Development of Image Processing Algorithms: detection, 78.81% for figure regions with correctly recognized figure titles, and 70.98% for part label detection and character recognition. Data and software from the competition are available through the online UCI Machine Learning repository to inspire follow-on work by the... Pattern Generation Strategies for Improving Recognition of Handwritten\n  Mathematical Expressions: 2016 databases demonstrate the superiority and effectiveness of our strategies: our hybrid strategy achieved classification rates of 48.78% and 45.60%, respectively, on these databases. These results are competitive compared to others reported in recent literature. Our generat... Generalized Category Discovery with Clustering Assignment Consistency: results and the number of categories simultaneously. Extensive experiments show that our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets. Especially in the ImageNet-100 data set, our method significant...",
    "sources": [
      {
        "arxiv_id": "1410.6751",
        "title": "Detecting Figures and Part Labels in Patents: Competition-Based\n  Development of Image Processing Algorithms",
        "category": "cs.CV",
        "year": 2014,
        "pub_status": "Preprint",
        "distance": 0.42422711849212646
      },
      {
        "arxiv_id": "1901.06763",
        "title": "Pattern Generation Strategies for Improving Recognition of Handwritten\n  Mathematical Expressions",
        "category": "cs.CV",
        "year": 2019,
        "pub_status": "Preprint",
        "distance": 0.44221460819244385
      },
      {
        "arxiv_id": "2310.1921",
        "title": "Generalized Category Discovery with Clustering Assignment Consistency",
        "category": "cs.CV",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.45256930589675903
      },
      {
        "arxiv_id": "2506.04807",
        "title": "MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories",
        "category": "cs.CV",
        "year": 2025,
        "pub_status": "Published",
        "distance": 0.4531882405281067
      },
      {
        "arxiv_id": "2408.13031",
        "title": "VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation\n  Models",
        "category": "cs.CV",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.4540681838989258
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.CV",
    "year_filter": null
  },
  {
    "question_id": 14,
    "question": "What robustness and distribution-shift issues are discussed in machine learning generalization papers?",
    "answer": "Confidence-Based Model Selection: When to Take Shortcuts for\n  Subpopulation Shifts: Effective machine learning models learn both robust features that directly determine the outcome of interest (e.g., an object with wheels is more likely to be a car), and shortcut features (e.g., an object on a road is more likely to be a car). The latter can be a source of er... A Flat Minima Perspective on Understanding Augmentations and Model Robustness: Model robustness indicates a model's capability to generalize well on unforeseen distributional shifts, including data corruptions and adversarial attacks. Data augmentation is one of the most prevalent and effective ways to enhance robustness. Despite the great success of the... CFL: On the Use of Characteristic Function Loss for Domain Alignment in Machine Learning: Machine Learning (ML) models are extensively used in various applications due to their significant advantages over traditional learning methods. However, the developed ML models often underperform when deployed in the real world due to the well-known distribution shift problem...",
    "sources": [
      {
        "arxiv_id": "2306.1112",
        "title": "Confidence-Based Model Selection: When to Take Shortcuts for\n  Subpopulation Shifts",
        "category": "cs.LG",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.3770197629928589
      },
      {
        "arxiv_id": "2505.24592",
        "title": "A Flat Minima Perspective on Understanding Augmentations and Model Robustness",
        "category": "cs.LG",
        "year": 2025,
        "pub_status": "Preprint",
        "distance": 0.3794170022010803
      },
      {
        "arxiv_id": "2511.02148",
        "title": "CFL: On the Use of Characteristic Function Loss for Domain Alignment in Machine Learning",
        "category": "cs.LG",
        "year": 2025,
        "pub_status": "Preprint",
        "distance": 0.38570713996887207
      },
      {
        "arxiv_id": "2202.08944",
        "title": "Rethinking Machine Learning Robustness via its Link with the\n  Out-of-Distribution Problem",
        "category": "cs.LG",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.39844274520874023
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.LG",
    "year_filter": null
  },
  {
    "question_id": 15,
    "question": "What techniques are used for model efficiency, compression, and parameter reduction?",
    "answer": "Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression: What happens when multiple compression methods are combined-does the order in which they are applied matter? Joint model compression has emerged as a powerful strategy to achieve higher efficiency by combining multiple methods such as pruning and quantization. A central but un... Filter Pruning for Efficient CNNs via Knowledge-driven Differential\n  Filter Sampler: KDFS's effectiveness in compressing the base models on various datasets. For instance, the pruned ResNet-50 on ImageNet achieves $55.36\\%$ computation reduction, and $42.86\\%$ parameter reduction, while only dropping $0.35\\%$ Top-1 accuracy, significantly outperforming the sta... Pea-KD: Parameter-efficient and Accurate Knowledge Distillation on BERT: How can we efficiently compress a model while maintaining its performance? Knowledge Distillation (KD) is one of the widely known methods for model compression. In essence, KD trains a smaller student model based on a larger teacher model and tries to retain the teacher model'...",
    "sources": [
      {
        "arxiv_id": "2603.18426",
        "title": "Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression",
        "category": "cs.AI",
        "year": 2026,
        "pub_status": "Preprint",
        "distance": 0.3610309958457947
      },
      {
        "arxiv_id": "2307.00198",
        "title": "Filter Pruning for Efficient CNNs via Knowledge-driven Differential\n  Filter Sampler",
        "category": "cs.CV",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.37730544805526733
      },
      {
        "arxiv_id": "2009.14822",
        "title": "Pea-KD: Parameter-efficient and Accurate Knowledge Distillation on BERT",
        "category": "cs.LG",
        "year": 2020,
        "pub_status": "Preprint",
        "distance": 0.4071769118309021
      },
      {
        "arxiv_id": "2207.00112",
        "title": "Language model compression with weighted low-rank factorization",
        "category": "cs.LG",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.426449179649353
      },
      {
        "arxiv_id": "1810.00597",
        "title": "Taming VAEs",
        "category": "stat.ML",
        "year": 2018,
        "pub_status": "Preprint",
        "distance": 0.43006306886672974
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": null,
    "year_filter": null
  },
  {
    "question_id": 16,
    "question": "Which reinforcement learning components are emphasized in cs.LG papers from 2022?",
    "answer": "Assessing Policy, Loss and Planning Combinations in Reinforcement\n  Learning using a New Modular Architecture: that the best combination of planning algorithm, policy, and loss function is heavily problem dependent. This result provides evidence that the proposed architecture, which is modular and reusable, is useful for reinforcement learning researchers who want to study new environm... Provable General Function Class Representation Learning in Multitask\n  Bandits and MDPs: While multitask representation learning has become a popular approach in reinforcement learning (RL) to boost the sample efficiency, the theoretical understanding of why and how it works is still limited. Most previous analytical works could only assume that the representation...",
    "sources": [
      {
        "arxiv_id": "2201.02874",
        "title": "Assessing Policy, Loss and Planning Combinations in Reinforcement\n  Learning using a New Modular Architecture",
        "category": "cs.LG",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.4928462505340576
      },
      {
        "arxiv_id": "2205.15701",
        "title": "Provable General Function Class Representation Learning in Multitask\n  Bandits and MDPs",
        "category": "cs.LG",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.5299596786499023
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.LG",
    "year_filter": "2022"
  },
  {
    "question_id": 17,
    "question": "What publication metadata signals indicate whether a paper is published or still a preprint?",
    "answer": "Standardization of Post-Publication Code Verification by Journals is Possible with the Support of the Community: Reproducibility remains a challenge in machine learning research. While code and data availability requirements have become increasingly common, post-publication verification in journals is still limited and unformalized. This position paper argues that it is plausible for jou... Estimating the Causal Effect of Early ArXiving on Paper Acceptance: What is the effect of releasing a preprint of a paper before it is submitted for peer review? No randomized controlled trial has been conducted, so we turn to observational data to answer this question. We use data from the ICLR conference (2018--2022) and apply methods from c... The Pitfalls of Publishing in the Age of LLMs: Strange and Surprising\n  Adventures with a High-Impact NLP Journal: We show the fraught side of the academic publishing realm and illustrate it through a recent case study with an NLP journal.",
    "sources": [
      {
        "arxiv_id": "2601.07189",
        "title": "Standardization of Post-Publication Code Verification by Journals is Possible with the Support of the Community",
        "category": "cs.LG",
        "year": 2026,
        "pub_status": "Preprint",
        "distance": 0.543578028678894
      },
      {
        "arxiv_id": "2306.13891",
        "title": "Estimating the Causal Effect of Early ArXiving on Paper Acceptance",
        "category": "cs.CL",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.5505154132843018
      },
      {
        "arxiv_id": "2407.12026",
        "title": "The Pitfalls of Publishing in the Age of LLMs: Strange and Surprising\n  Adventures with a High-Impact NLP Journal",
        "category": "cs.CL",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.5577199459075928
      },
      {
        "arxiv_id": "1709.09119",
        "title": "Integration of Japanese Papers Into the DBLP Data Set",
        "category": "cs.CL",
        "year": 2017,
        "pub_status": "Preprint",
        "distance": 0.5627934336662292
      },
      {
        "arxiv_id": "2508.19780",
        "title": "Interestingness First Classifiers",
        "category": "cs.LG",
        "year": 2025,
        "pub_status": "Preprint",
        "distance": 0.5674616098403931
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": null,
    "year_filter": null
  },
  {
    "question_id": 18,
    "question": "For cs.AI papers, what application domains are frequently mentioned in decision support, dialogue, or robotics?",
    "answer": "IDs for AI Systems: AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system has certain safety certifications. An investigator may not know whom to investigate whe... Responsible AI for General-Purpose Systems: Overview, Challenges, and A Path Forward: the RAI requirements for future general-purpose AI systems, and discuss how recent efforts in AI alignment, retrieval-augmented generation, reasoning enhancements, etc. fare along one or more of the desiderata. We believe that the goal of developing responsible general-purpose... Where can AI be used? Insights from a deep ontology of work activities: value. Most of the market value is used in information-based activities (72%), especially creating information (36%), and only 12% is used in physical activities. Interactive activities include both information-based and physical activities and account for 48% of AI market val...",
    "sources": [
      {
        "arxiv_id": "2406.12137",
        "title": "IDs for AI Systems",
        "category": "cs.AI",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.4572352170944214
      },
      {
        "arxiv_id": "2601.13122",
        "title": "Responsible AI for General-Purpose Systems: Overview, Challenges, and A Path Forward",
        "category": "cs.AI",
        "year": 2026,
        "pub_status": "Preprint",
        "distance": 0.4669231176376343
      },
      {
        "arxiv_id": "2603.20619",
        "title": "Where can AI be used? Insights from a deep ontology of work activities",
        "category": "cs.AI",
        "year": 2026,
        "pub_status": "Preprint",
        "distance": 0.4782264232635498
      },
      {
        "arxiv_id": "2210.00608",
        "title": "Establishing Meta-Decision-Making for AI: An Ontology of Relevance,\n  Representation and Reasoning",
        "category": "cs.AI",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.4830304980278015
      },
      {
        "arxiv_id": "2411.0804",
        "title": "The Universal PDDL Domain",
        "category": "cs.AI",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.4857569932937622
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.AI",
    "year_filter": null
  },
  {
    "question_id": 19,
    "question": "What multimodal research themes appear in 2023 papers on vision-language and video modeling?",
    "answer": "Multimodal Large Language Models: A Survey: The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to understand and process other data types. Multi...",
    "sources": [
      {
        "arxiv_id": "2311.13165",
        "title": "Multimodal Large Language Models: A Survey",
        "category": "cs.AI",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.4243432879447937
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": null,
    "year_filter": "2023"
  },
  {
    "question_id": 20,
    "question": "Which techniques are used to improve factuality, retrieval grounding, and hallucination control in language model papers?",
    "answer": "PFME: A Modular Approach for Fine-grained Hallucination Detection and\n  Editing of Large Language Models: Large Language Models (LLMs) excel in fluency but risk producing inaccurate content, called \"hallucinations.\" This paper outlines a standardized process for categorizing fine-grained hallucination types and proposes an innovative framework--the Progressive Fine-grained Model E... Investigating Symbolic Triggers of Hallucination in Gemma Models Across HaluEval and TruthfulQA: Hallucination in Large Language Models (LLMs) is a well studied problem. However, the properties that make LLM intrinsically vulnerable to hallucinations have not been identified and studied. This research identifies and characterizes the key properties, allowing us to pinpoin... UFO: a Unified and Flexible Framework for Evaluating Factuality of Large\n  Language Models: Large language models (LLMs) may generate text that lacks consistency with human knowledge, leading to factual inaccuracies or \\textit{hallucination}. Existing research for evaluating the factuality of LLMs involves extracting fact claims using an LLM and verifying them agains...",
    "sources": [
      {
        "arxiv_id": "2407.00488",
        "title": "PFME: A Modular Approach for Fine-grained Hallucination Detection and\n  Editing of Large Language Models",
        "category": "cs.CL",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.3558080792427063
      },
      {
        "arxiv_id": "2509.09715",
        "title": "Investigating Symbolic Triggers of Hallucination in Gemma Models Across HaluEval and TruthfulQA",
        "category": "cs.CL",
        "year": 2025,
        "pub_status": "Preprint",
        "distance": 0.3603476285934448
      },
      {
        "arxiv_id": "2402.1469",
        "title": "UFO: a Unified and Flexible Framework for Evaluating Factuality of Large\n  Language Models",
        "category": "cs.CL",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.36526739597320557
      },
      {
        "arxiv_id": "2403.19113",
        "title": "FACTOID: FACtual enTailment fOr hallucInation Detection",
        "category": "cs.CL",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.37152785062789917
      },
      {
        "arxiv_id": "2502.13622",
        "title": "REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality\n  Hallucination Detection in Large Language Models",
        "category": "cs.CL",
        "year": 2025,
        "pub_status": "Preprint",
        "distance": 0.38271021842956543
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.CL",
    "year_filter": null
  },
  {
    "question_id": 21,
    "question": "In one or two sentences, what fairness and bias mitigation techniques appear in AI decision-support papers?",
    "answer": "Defining bias in AI-systems: Biased models are fair models: The debate around bias in AI systems is central to discussions on algorithmic fairness. However, the term bias often lacks a clear definition, despite frequently being contrasted with fairness, implying that an unbiased model is inherently fair. In this paper, we challenge thi... Navigating Fairness Measures and Trade-Offs: In order to monitor and prevent bias in AI systems we can use a wide range of (statistical) fairness measures. However, it is mathematically impossible to optimize for all of these measures at the same time. In addition, optimizing a fairness measure often greatly reduces the... Getting Fairness Right: Towards a Toolbox for Practitioners: The potential risk of AI systems unintentionally embedding and reproducing bias has attracted the attention of machine learning practitioners and society at large. As policy makers are willing to set the standards of algorithms and AI techniques, the issue on how to refine exi...",
    "sources": [
      {
        "arxiv_id": "2502.1806",
        "title": "Defining bias in AI-systems: Biased models are fair models",
        "category": "cs.AI",
        "year": 2025,
        "pub_status": "Preprint",
        "distance": 0.2659844160079956
      },
      {
        "arxiv_id": "2307.08484",
        "title": "Navigating Fairness Measures and Trade-Offs",
        "category": "cs.AI",
        "year": 2023,
        "pub_status": "Preprint",
        "distance": 0.2737894654273987
      },
      {
        "arxiv_id": "2003.0692",
        "title": "Getting Fairness Right: Towards a Toolbox for Practitioners",
        "category": "cs.AI",
        "year": 2020,
        "pub_status": "Preprint",
        "distance": 0.2849392890930176
      },
      {
        "arxiv_id": "2411.06624",
        "title": "A Review of Fairness and A Practical Guide to Selecting Context-Appropriate Fairness Metrics in Machine Learning",
        "category": "cs.AI",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.2996431589126587
      },
      {
        "arxiv_id": "1909.00982",
        "title": "Quantifying Infra-Marginality and Its Trade-off with Group Fairness",
        "category": "cs.AI",
        "year": 2019,
        "pub_status": "Preprint",
        "distance": 0.3014613389968872
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": "cs.AI",
    "year_filter": null
  },
  {
    "question_id": 22,
    "question": "What reproducibility details are frequently reported in machine learning experiments?",
    "answer": "Reproducibility in machine learning for medical imaging: Reproducibility is a cornerstone of science, as the replication of findings is the process through which they become knowledge. It is widely considered that many fields of science are undergoing a reproducibility crisis. This has led to the publications of various guidelines i... The Fundamental Principles of Reproducibility: Reproducibility is a confused terminology. In this paper, I take a fundamental view on reproducibility rooted in the scientific method. The scientific method is analysed and characterised in order to develop the terminology required to define reproducibility. Further, the lite... Research Reproducibility as a Survival Analysis: There has been increasing concern within the machine learning community that we are in a reproducibility crisis. As many have begun to work on this problem, all work we are aware of treat the issue of reproducibility as an intrinsic binary property: a paper is or is not reprod...",
    "sources": [
      {
        "arxiv_id": "2209.05097",
        "title": "Reproducibility in machine learning for medical imaging",
        "category": "cs.CV",
        "year": 2022,
        "pub_status": "Preprint",
        "distance": 0.3086525797843933
      },
      {
        "arxiv_id": "2011.10098",
        "title": "The Fundamental Principles of Reproducibility",
        "category": "cs.LG",
        "year": 2020,
        "pub_status": "Preprint",
        "distance": 0.35138338804244995
      },
      {
        "arxiv_id": "2012.09932",
        "title": "Research Reproducibility as a Survival Analysis",
        "category": "stat.ML",
        "year": 2020,
        "pub_status": "Preprint",
        "distance": 0.3535992503166199
      },
      {
        "arxiv_id": "2403.08438",
        "title": "Reproducibility and Geometric Intrinsic Dimensionality: An Investigation\n  on Graph Neural Network Research",
        "category": "cs.LG",
        "year": 2024,
        "pub_status": "Preprint",
        "distance": 0.37553656101226807
      },
      {
        "arxiv_id": "1810.0457",
        "title": "Building a Reproducible Machine Learning Pipeline",
        "category": "cs.LG",
        "year": 2018,
        "pub_status": "Preprint",
        "distance": 0.3823944926261902
      }
    ],
    "model_used": "sentence-transformers/all-minilm-l12-v2",
    "category_filter": null,
    "year_filter": null
  }
]