Code replicating D3ST.
In order to set up the necessary environment:
- review and uncomment what you need in
environment.yml
and create an environmentrobust-dst
with the help of conda:conda env create -f environment.yml
- activate the new environment with:
conda activate robust-dst
- install development dependencies:
conda env update -f dev_environment.yaml
- install PyTorch
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
NOTE: The conda environment will have robust-dst installed in editable mode. Some changes, e.g. in
setup.cfg
, might require you to runpip install -e .
again.
Optional and needed only once after git clone
:
- install several pre-commit git hooks with:
and checkout the configuration under
pre-commit install # You might also want to run `pre-commit autoupdate`
.pre-commit-config.yaml
. The-n, --no-verify
flag ofgit commit
can be used to deactivate pre-commit hooks temporarily.
Then take a look into the scripts
and src
folders.
Download the SGD dataset by running
chmod +x scripts/prepare_datasets.sh
./scripts/prepare_datasets
SGD dataset should be first preprocessed into a dataset-agnostic format, which follows the baseline D3ST format and contains the necessary metadata.
If intending to evaluate on SGD-X, preprocess SGD dataset using
python -m scripts.preprocess_d3st_sgd \
-d data/raw/original/ \
-d data/raw/v1/ \
-d data/raw/v2/ \
-d data/raw/v3/ \
-d data/raw/v4/ \
-d data/raw/v5/ \
-o data/processed/ \
-c configs/data_processing_d3st_sgd.yaml \
--all \
-vv
else
python -m scripts.preprocess_d3st_sgd \
-d data/raw/original/ \
-o data/processed/ \
-c configs/data_processing_d3st_sgd.yaml \
--all \
-vv
Preprocess SGD dataset using
declare -a versions=("original" "v1" "v2" "v3" "v4" "v5")
for i in "${versions[@]}"
do
mkdir -p data/preprocessed/
python -m scripts.preprocess_t5dst -d data/raw/"$i"/ -o data/preprocessed/ -c configs/data_processing_t5dst.yaml --train
python -m scripts.preprocess_t5dst -d data/raw/"$i"/ -o data/preprocessed/ -c configs/data_processing_t5dst.yaml --dev
python -m scripts.preprocess_t5dst -d data/raw/"$i"/ -o data/preprocessed/ -c configs/data_processing_t5dst.yaml --test
done
- Install the dependencies, prepare and preprocess the datasets as in previous sections.
- Complete the relevant configuration file with:
- Paths to the processed dataset
- Check out
README_EXP.md
PLACEHOLDER FOR SIGDAIL PAPER
@article{cocaGroundingDescriptionDrivenDialogue2023,
title={Grounding Description-Driven Dialogue State Trackers with Knowledge-Seeking Turns},
author={Coca, Alexandru},
year={2023}
}
@mastersthesis{zhangGroundingDescriptionDrivenDialogue2023,
type={Master’s thesis},
title={Grounding Description-Driven Dialogue State Tracking},
school={University of Cambridge},
author={Zhang, Weixuan},
year={2023},
month={May}
}
- Always keep your abstract (unpinned) dependencies updated in
environment.yml
and eventually insetup.cfg
if you want to ship and install your package viapip
later on. - Create concrete dependencies as
environment.lock.yml
for the exact reproduction of your environment with:For multi-OS development, consider usingconda env export -n robust-dst -f environment.lock.yml
--no-builds
during the export. - Update your current environment with respect to a new
environment.lock.yml
using:conda env update -f environment.lock.yml --prune
├── AUTHORS.md <- List of developers and maintainers.
├── CHANGELOG.md <- Changelog to keep track of new features and fixes.
├── CONTRIBUTING.md <- Guidelines for contributing to this project.
├── Dockerfile <- Build a docker container with `docker build .`.
├── LICENSE.txt <- License as chosen on the command-line.
├── README.md <- The top-level README for developers.
├── configs <- Directory for configurations of model & application.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- Directory for Sphinx documentation in rst or md.
├── environment.yml <- The conda environment file for reproducibility.
├── models <- Trained and serialized models, model predictions,
│ or model summaries.
├── notebooks <- Jupyter notebooks. Naming convention is a number (for
│ ordering), the creator's initials and a description,
│ e.g. `1.0-fw-initial-data-exploration`.
├── pyproject.toml <- Build configuration. Don't change! Use `pip install -e .`
│ to install for development or to build `tox -e build`.
├── references <- Data dictionaries, manuals, and all other materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated plots and figures for reports.
├── scripts <- Analysis and production scripts which import the
│ actual PYTHON_PKG, e.g. train_model.
├── setup.cfg <- Declarative configuration of your project.
├── setup.py <- [DEPRECATED] Use `python setup.py develop` to install for
│ development or `python setup.py bdist_wheel` to build.
├── src
│ └── robust_dst <- Actual Python package where the main functionality goes.
├── tests <- Unit tests which can be run with `pytest`.
├── .coveragerc <- Configuration for coverage reports of unit tests.
├── .isort.cfg <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.
This project has been set up using PyScaffold 4.3.1 and the dsproject extension 0.7.2.