Download dataset Version 1.0 [526.6MB] from (https://www.sri.inf.ethz.ch/py150)
to install requirements run
pip install -r requirements.txt
to generate new trees according to the paper run
python generate_new_trees.py -i /PATH/TO/py150/DATASET/python100k_train.json -o /PATH/TO/OUTPUT/FILE/new_python100k_train.json
to generate vocabulary according to the paper run
python generate_vocab.py -i /PATH/TO/FILE/new_python100k_train.json -o /PATH/TO/OUTPUT/FILE/new_python100k_train.pkl -t ast
to generate data according to the README.md
python -m models.dfs_ud.dataset -a /PATH/TO/FILE/new_python100k_train.json -o /PATH/TO/OUTPUT/FILE/new_new_python100k_train.txt
to generate ast ids according to the README.md
python -m models.dfs.generate_ast_ids -a /PATH/TO/FILE/new_python100k_train.json -o /PATH/TO/OUTPUT/FILE/generated_ids.txt all