Maryam Bala*, Johannes Heim†, Elspeth Edelstein†, Arabella Sinclair‡
* University of Southampton
† University of Aberdeen
‡ University College London
The paper will be presented at LREC 2026, Palma de Mallorca, May 2026!
In language development, children learn to form Question–Answer (QA) sequences through caregiver feedback that adapts dynamically to their evolving linguistic abilities. Using expert annotated child-caregiver interaction, we examine four feedback types that guide children's acquisition of adult-like QA behaviour: caregiver instructions through reformulating and affirming a child's output as well as caregiver demonstrations through exemplifying and modelling adult-like behaviour. Our analysis reveals that feedback incidence, frequency and complexity progress and adapt over the course of development, akin to a tailored curriculum for pragmatic development. We release our annotated dataset which offers a rich resource for studying pragmatic feedback and provides the first large-scale empirical evidence of adaptive, tailored caregiver feedback on QA behaviour.
Table of Contents
Please use the following to cite this work:
@inproceedings{bala-etal-2026-pragmatic,
title = {Pragmatic Modelling in Language Learning: Caregiver Question-Answer Feedback in Child-Directed Dialogue},
author = {Bala, Maryam and Heim, Johannes and Edelstein, Elspeth and Sinclair, Arabella},
booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)},
month = {May},
year = {2026},
pages = {11461--11478},
address = {Palma, Mallorca, Spain},
publisher = {European Language Resources Association (ELRA)},
editor = {Piperidis, Stelios and Bel, Núria and van den Heuvel, Henk and Ide, Nancy and Krek, Simon and Toral, Antonio},
doi = {10.63317/4g9ggjcrgfax},
}- Maryam Bala - University of Southampton
- Johannes Heim - University of Aberdeen
- Elspeth Edelstein - University of Aberdeen
- Arabella Sinclair - University College London
🔬 Find out about other work going on at The Context Lab.
-
Candidate Extraction: Rule-based scripts extract candidate QA feedback pairs from the raw CHILDES transcripts using speaker sequence, punctuation patterns and lexical overlap heuristics (see
rule_based/). -
Annotation: A representative sample of candidates was manually annotated by expert linguists. The resulting gold-annotated dataset of ~3,639 samples (~1000 positive examples) is available in
data/annotated_feedback/full-childes-annotation.xlsx. -
Classification: BERT and ModernBERT binary classifiers are fine-tuned and evaluated on the annotated data using
train_eval.py. Train/val/test splits per category are indata/train_data/, organised intocontext/andno_context/subfolders. LLM-as-judge baselines are evaluated inprompting.ipynb. -
Scaling: The best-performing classifier per category is applied to the full CHILDES candidate pool using
label.py, producing scaled annotations indata/automatic/ -
Feature Extraction: Linguistic features (vocabulary overlap, Levenshtein distances, POS proportions, lexical sophistication) are computed for all four feedback categories of the gold-annotated dataset using
analysis_features.py. -
Analysis: Once all features have been extracted, the developmental analysis and figures reported in the paper are produced in
analysis.ipynb, examining how feedback incidence, complexity and repetition vary across child development.
git clone https://github.com/the-context-lab/childQAfeedback.git
cd childQAfeedbackCreate a conda environment and install dependencies:
conda create -n childes python=3.10
conda activate childes
pip install -r requirements.txtFor NLTK resources:
python -m nltk.downloader wordnet punkt punkt_tab-
Step 1: Extract candidates from CHILDES
# example for reformulating - repeat for modelling, exemplifying and affirming python rule_based/extract_reformulating.py data/childes_original/childes.csv data/raw/reformulating.csv -
Step 2: Train and evaluate classifiers
Open
classifiers/train_eval.pyand set the following in the CONFIG block:CATEGORY = "reformulating" CONTEXT = False
Then run:
python classifiers/train_eval.py
Trained model checkpoints and evaluation results are saved to
classifiers/no_context/reformulating/. Repeat steps 1 and 2 for each category (modelling,exemplifying,affirming). -
Step 3: Scale annotation to full CHILDES
Once all four category models are trained, run:
python classifiers/label.py
This automatically applies the best-performing classifier per category to the full candidate pool and saves labelled CSVs to
data/automatic/. -
Step 4: Compute analysis features
python analysis/analysis_features.py
-
Step 5: LLM-as-judge evaluation
Run
llm_judge/prompting.ipynbon a GPU runtime -
Step 6: Analysis and plots
Run
analysis/analysis.ipynbto reproduce the figures and developmental analysis reported in the paper.
This repository is structured as follows.
Data
The data/ folder contains all annotated, processed and scaled data used across the pipeline.
- The
childes_original/folder contains the full subset of the CHILDES corpus used in this work, covering children aged 12-48 months across 46 children and 15 corpora. - The
annotated_feedback/folder contains the expert-annotated dataset of ~1,000 positive examples across four feedback categories, stored as an Excel file with one sheet per category. It also contains the linguistic feature CSVs for each category produced byanalysis_features.py. - The
train_data/folder contains train/val/test splits per category, organised intocontext/andno_context/subfolders. - The
automatic/folder contains the full-corpus scaled annotation outputs produced bylabel.py. It also containsutterance_count.csvfile contains the total number of utterances per dialogue across the full dataset. - The
echoes/folder contains the extracted adult echo sequences produced byechoes.py, used for the repetition analysis.
Rule-based
The rule_based/ folder contains rule-based scripts that extract candidate QA feedback pairs from the raw CHILDES transcripts.
modelling.pyextracts Modelling candidate pairs (Adult - Adult).exemplifying.pyextracts Exemplifying candidate triples (Child - Adult - Adult).reformulating.pyextracts Reformulating candidate pairs (Child - Adult).affirming.pyextracts Affirming candidate triples (Child - Child - Adult).echoes.pyextracts adult echo sequences for the repetition analysis
Classifier
The classifier/ folder contains scripts for training, evaluating and applying the automatic annotation models.
train_eval.pyfine-tunes and evaluates BERT and ModernBERT as binary classifiers for a single feedback category, usingtransformers.Trainer.label.pyapplies the best-performing classifier per category to the full CHILDES candidate pool to produce scaled annotations for the developmental incidence analysis.
LLM Judge
The llm_judge/ folder contains the prompting experiments comparing LLM-as-judge performance against the fine-tuned classifiers.
prompting.ipynbruns zero-shot and few-shot prompting experiments with Llama 3 8b, Gemma 2b, and Falcon 7b across all four feedback categories and evaluates the results.
Analysis
The analysis/ folder contains the scripts compute the linguistic features and produce the developmental analysis plots
-
analysis_features.pycomputes linguistic features for all four feedback categories from the expert-annotated dataset. Features include vocabulary overlap, POS tag overlap, character-level, word-level and POS-level Levenshtein distances and POS proportions Each category is processed separately. Outputs are saved as one CSV per category todata/annotated_feedback/. -
analysis_gold.ipynbandanalysis.ipynbproduce all figures and plots reported in the paper for the expert-annotated and automatically labelled datasets respectively.
Creative Commons.
CHILDES data is subject to the TalkBank Terms of Use.