This is the official repository for our paper: Automated main concept generation for narrative discourse assessment in aphasia. This repository contains code to reproduce the modeling experiments discussed in our paper.
An earlier version of this work was presented at the Clinical Aphasiology Conference 2025. The abstract is available in the CAC2025 directory.
Follow these instructions to set up the repository.
git clone https://github.com/gnkitaa/aphasia-narrative.git
cd aphasia-narrative
conda create -y --name aphasia python=3.9
conda activate aphasia
pip install -r requirements.txt
git clone https://github.com/openai/openai-cookbook.git
-
We release a novel BATS dataset, containing narratives with human-annotated main concepts, which are empirically derived through extensive analysis of hundreds of story retellings from healthy participants (Kurland et al., 2021; Richardson and Dalton, 2016, 2020) and have been used to assess patients with aphasia (Kurland et al., 2024b). The dataset is provided under
data/BATSdirectory. -
We also evaluate our method on an existing narrative summarization dataset (Zhao et al., 2022). Please refer to NarraSum for more details.
To generate main concepts run MCGenerator/generate_mcs_bats.ipynb for BATS dataset and MCGenerator/generate_mcs_narrasum.ipynb for narrasum dataset.
Different prompts used for MC generation are provided in MCGenerator/Prompts directory.
To cluster main concepts that are similar in meaning, run MCGenerator/clustering_bats.ipynb for BATS dataset and MCGenerator/clustering_narrasum.ipynb for narrasum dataset.
To evaluate the generated main concepts, run MCEvaluator/evaluate_bats.ipynb for BATS dataset and MCEvaluator/evaluate_narrasum.ipynb for narrasum dataset. The notebooks also plot the recall versus yield tradeoff curves discussed in the paper.