Skip to content

tticoin/DESC_MOL-DDIE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DESC_MOL-DDIE

Implementation of Using Drug Description and Molecular Structures for Drug-Drug Interaction Extraction from Literature

Requirements

torch >= 1.2
transformers == 2.1
rdkit (please install via conda conda install -c conda-forge rdkit )
lxml
apex (for argument --fp16, optional)

Usage

Preparation of the corpus sets

see corpus/README.md

Preparation of the DrugBank data

see database/README.md

Preparation of the molecular fingerprints data

see fingerprint/README.md

Preparation of the SciBERT model

pre-trained SciBERT model is available here

Sample data set

When you use the sample data set created by splitting the official training data set, you can skip the preparation of the corpus and the database.

export NEW_TSV_DIR=sample/tsv
export FINGERPRINT_DIR=sample/radius1
export RADIUS=1
python3 fingerprint/preprocessor.py $NEW_TSV_DIR none $RADIUS $FINGERPRINT_DIR

change these paths to absolute paths before running run_ddie.py

DDI Extraction

cd main
python run_ddie.py \
    --task_name MRPC \
    --model_type bert \
    --data_dir $NEW_TSV_DIR \
    --model_name_or_path $SCIBERT_MODEL \
    --per_gpu_train_batch_size 32 \
    --num_train_epochs 3. \
    --dropout_prob .1 \
    --weight_decay .01 \
    --fp16 \
    --do_train \
    --do_eval \
    --do_lower_case \
    --max_seq_length 128 \
    --use_cnn \
    --conv_window_size 5 \
    --pos_emb_dim 10 \
    --activation gelu \
    --desc_conv_window_size 3 \
    --desc_conv_output_size 20 \
    --molecular_vector_size 50 \
    --gnn_layer_hidden 5 \
    --gnn_layer_output 1 \
    --gnn_mode sum \
    --gnn_activation gelu \
    --fingerprint_dir $FINGERPRINT_DIR \
    --output_dir $OUTPUT_DIR

when you use description and molecular structure information, please add --use_desc and --use_mol arguments respectively.

Acknowledgement

This work was supported by JSPS KAKENHI Grant Numbers 17K12741 and 20k11962

About

Implementation of Using Drug Descriptions and Molecular Structures for Drug-Drug Interaction Extraction from Literature

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors