Pragmatic Modelling in Language Learning

Caregiver Question-Answer Feedback in
Child-Directed Dialogue

Maryam Bala*, Johannes Heim†, Elspeth Edelstein†, Arabella Sinclair‡

* University of Southampton
† University of Aberdeen
‡ University College London

The paper will be presented at LREC 2026, Palma de Mallorca, May 2026!

Abstract

In language development, children learn to form Question–Answer (QA) sequences through caregiver feedback that adapts dynamically to their evolving linguistic abilities. Using expert annotated child-caregiver interaction, we examine four feedback types that guide children's acquisition of adult-like QA behaviour: caregiver instructions through reformulating and affirming a child's output as well as caregiver demonstrations through exemplifying and modelling adult-like behaviour. Our analysis reveals that feedback incidence, frequency and complexity progress and adapt over the course of development, akin to a tailored curriculum for pragmatic development. We release our annotated dataset which offers a rich resource for studying pragmatic feedback and provides the first large-scale empirical evidence of adaptive, tailored caregiver feedback on QA behaviour.

Table of Contents

Citing
Contact
Usage
Repository Structure
License

Citing

Please use the following to cite this work:

@inproceedings{bala-etal-2026-pragmatic,
  title = {Pragmatic Modelling in Language Learning: Caregiver Question-Answer Feedback in Child-Directed Dialogue},
  author = {Bala, Maryam and Heim, Johannes and Edelstein, Elspeth and Sinclair, Arabella},
  booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)},
  month = {May},
  year = {2026},
  pages = {11461--11478},
  address = {Palma, Mallorca, Spain},
  publisher = {European Language Resources Association (ELRA)},
  editor = {Piperidis, Stelios and Bel, Núria and van den Heuvel, Henk and Ide, Nancy and Krek, Simon and Toral, Antonio},
  doi = {10.63317/4g9ggjcrgfax},
}

Contact

Maryam Bala - University of Southampton
Johannes Heim - University of Aberdeen
Elspeth Edelstein - University of Aberdeen
Arabella Sinclair - University College London

🔬 Find out about other work going on at The Context Lab.

Usage

Experiment Pipeline

Candidate Extraction: Rule-based scripts extract candidate QA feedback pairs from the raw CHILDES transcripts using speaker sequence, punctuation patterns and lexical overlap heuristics (see rule_based/).
Annotation: A representative sample of candidates was manually annotated by expert linguists. The resulting gold-annotated dataset of ~3,639 samples (~1000 positive examples) is available in data/annotated_feedback/full-childes-annotation.xlsx.
Classification: BERT and ModernBERT binary classifiers are fine-tuned and evaluated on the annotated data using train_eval.py. Train/val/test splits per category are in data/train_data/, organised into context/ and no_context/ subfolders. LLM-as-judge baselines are evaluated in prompting.ipynb.
Scaling: The best-performing classifier per category is applied to the full CHILDES candidate pool using label.py, producing scaled annotations in data/automatic/
Feature Extraction: Linguistic features (vocabulary overlap, Levenshtein distances, POS proportions, lexical sophistication) are computed for all four feedback categories of the gold-annotated dataset using analysis_features.py.
Analysis: Once all features have been extracted, the developmental analysis and figures reported in the paper are produced in analysis.ipynb, examining how feedback incidence, complexity and repetition vary across child development.

Installation

git clone https://github.com/the-context-lab/childQAfeedback.git
cd childQAfeedback

Create a conda environment and install dependencies:

conda create -n childes python=3.10
conda activate childes
pip install -r requirements.txt

For NLTK resources:

python -m nltk.downloader wordnet punkt punkt_tab

Running the Pipeline

Step 1: Extract candidates from CHILDES

# example for reformulating - repeat for modelling, exemplifying and affirming
python rule_based/extract_reformulating.py data/childes_original/childes.csv data/raw/reformulating.csv

Step 2: Train and evaluate classifiers

Open classifiers/train_eval.py and set the following in the CONFIG block:
```
CATEGORY = "reformulating"
CONTEXT  = False
```
Then run:
```
python classifiers/train_eval.py
```
Trained model checkpoints and evaluation results are saved to classifiers/no_context/reformulating/. Repeat steps 1 and 2 for each category (modelling, exemplifying, affirming).
Step 3: Scale annotation to full CHILDES

Once all four category models are trained, run:
```
python classifiers/label.py
```
This automatically applies the best-performing classifier per category to the full candidate pool and saves labelled CSVs to data/automatic/.
Step 4: Compute analysis features
```
python analysis/analysis_features.py
```
Step 5: LLM-as-judge evaluation

Run llm_judge/prompting.ipynb on a GPU runtime
Step 6: Analysis and plots

Run analysis/analysis.ipynb to reproduce the figures and developmental analysis reported in the paper.

Repository Structure

This repository is structured as follows.

Data

The data/ folder contains all annotated, processed and scaled data used across the pipeline.

The childes_original/ folder contains the full subset of the CHILDES corpus used in this work, covering children aged 12-48 months across 46 children and 15 corpora.
The annotated_feedback/ folder contains the expert-annotated dataset of ~1,000 positive examples across four feedback categories, stored as an Excel file with one sheet per category. It also contains the linguistic feature CSVs for each category produced by analysis_features.py.
The train_data/ folder contains train/val/test splits per category, organised into context/ and no_context/ subfolders.
The automatic/ folder contains the full-corpus scaled annotation outputs produced by label.py. It also contains utterance_count.csv file contains the total number of utterances per dialogue across the full dataset.
The echoes/ folder contains the extracted adult echo sequences produced by echoes.py, used for the repetition analysis.

Rule-based

The rule_based/ folder contains rule-based scripts that extract candidate QA feedback pairs from the raw CHILDES transcripts.

modelling.py extracts Modelling candidate pairs (Adult - Adult).
exemplifying.py extracts Exemplifying candidate triples (Child - Adult - Adult).
reformulating.py extracts Reformulating candidate pairs (Child - Adult).
affirming.py extracts Affirming candidate triples (Child - Child - Adult).
echoes.py extracts adult echo sequences for the repetition analysis

Classifier

The classifier/ folder contains scripts for training, evaluating and applying the automatic annotation models.

train_eval.py fine-tunes and evaluates BERT and ModernBERT as binary classifiers for a single feedback category, using transformers.Trainer.
label.py applies the best-performing classifier per category to the full CHILDES candidate pool to produce scaled annotations for the developmental incidence analysis.

LLM Judge

The llm_judge/ folder contains the prompting experiments comparing LLM-as-judge performance against the fine-tuned classifiers.

prompting.ipynb runs zero-shot and few-shot prompting experiments with Llama 3 8b, Gemma 2b, and Falcon 7b across all four feedback categories and evaluates the results.

Analysis

The analysis/ folder contains the scripts compute the linguistic features and produce the developmental analysis plots

analysis_features.py computes linguistic features for all four feedback categories from the expert-annotated dataset. Features include vocabulary overlap, POS tag overlap, character-level, word-level and POS-level Levenshtein distances and POS proportions Each category is processed separately. Outputs are saved as one CSV per category to data/annotated_feedback/.
analysis_gold.ipynb and analysis.ipynb produce all figures and plots reported in the paper for the expert-annotated and automatically labelled datasets respectively.

License

Creative Commons.

CHILDES data is subject to the TalkBank Terms of Use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pragmatic Modelling in Language Learning

Caregiver Question-Answer Feedback in
Child-Directed Dialogue

Abstract

Citing

Contact

Usage

Experiment Pipeline

Installation

Running the Pipeline

Repository Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analysis		analysis
classifier		classifier
data		data
llm_judge		llm_judge
rule_based		rule_based
.DS_Store		.DS_Store
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Pragmatic Modelling in Language Learning

Caregiver Question-Answer Feedback inChild-Directed Dialogue

Abstract

Citing

Contact

Usage

Experiment Pipeline

Installation

Running the Pipeline

Repository Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Caregiver Question-Answer Feedback in
Child-Directed Dialogue

Packages