Implementation of Using Drug Description and Molecular Structures for Drug-Drug Interaction Extraction from Literature
torch >= 1.2
transformers == 2.1
rdkit (please install via conda conda install -c conda-forge rdkit )
lxml
apex (for argument --fp16, optional)
see corpus/README.md
pre-trained SciBERT model is available here
When you use the sample data set created by splitting the official training data set, you can skip the preparation of the corpus and the database.
export NEW_TSV_DIR=sample/tsv
export FINGERPRINT_DIR=sample/radius1
export RADIUS=1
python3 fingerprint/preprocessor.py $NEW_TSV_DIR none $RADIUS $FINGERPRINT_DIR
change these paths to absolute paths before running run_ddie.py
cd main
python run_ddie.py \
--task_name MRPC \
--model_type bert \
--data_dir $NEW_TSV_DIR \
--model_name_or_path $SCIBERT_MODEL \
--per_gpu_train_batch_size 32 \
--num_train_epochs 3. \
--dropout_prob .1 \
--weight_decay .01 \
--fp16 \
--do_train \
--do_eval \
--do_lower_case \
--max_seq_length 128 \
--use_cnn \
--conv_window_size 5 \
--pos_emb_dim 10 \
--activation gelu \
--desc_conv_window_size 3 \
--desc_conv_output_size 20 \
--molecular_vector_size 50 \
--gnn_layer_hidden 5 \
--gnn_layer_output 1 \
--gnn_mode sum \
--gnn_activation gelu \
--fingerprint_dir $FINGERPRINT_DIR \
--output_dir $OUTPUT_DIR
when you use description and molecular structure information, please add --use_desc and --use_mol arguments respectively.
This work was supported by JSPS KAKENHI Grant Numbers 17K12741 and 20k11962