TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora

Priyanka Kargupta, Nan Zhang, Yunyi Zhang, Rui Zhang, Prasenjit Mitra, Jiawei Han

Official implementation for ACL 2025 main track paper: TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora.

TaxoAdapt is a framework that dynamically adapts an LLM-generated taxonomy to a given corpus across multiple dimensions. TaxoAdapt performs iterative hierarchical classification, expanding both the taxonomy width and depth based on corpus' topical distribution. We demonstrate its state-of-the-art performance across a diverse set of computer science conferences over the years to showcase its ability to structure and capture the evolution of scientific fields. As a multidimensional method, TaxoAdapt generates taxonomies that are 26.51% more granularity-preserving and 50.41% more coherent than the most competitive baselines judged by LLMs.

Setup

We use python=3.8, torch=2.4.0, and a two NVIDIA RTX A6000s. Other packages can be installed using:

pip install -r requirements.txt

To run the code with the default parameters, you can run the following command in the terminal:

python main.py

In order to run the code, you need to have a valid OpenAI API key and set it as an environment variable OPENAI_API_KEY. You can do this in your terminal as follows: export OPENAI_API_KEY='your_openai_api_key'

Arguments

The following are the primary arguments for TaxoAdapt (defined in main.py; modify as needed):

topic $\rightarrow$ this is the topic of the corpus, e.g., "natural language processing", "robotics", etc.
dataset $\rightarrow$ this is the name of the dataset, e.g., "llm_graph", "icra_2020", etc. The huggingface dataset should be added to the construct_dataset function in main.py (see below).
llm $\rightarrow$ this is the LLM to be used for initial taxonomy construction, e.g., "gpt", "vllm", etc. You can replace the vLLM model in the initializeLLM function and the GPT model version in the promptGPT function of model_definitions.py.
max_depth $\rightarrow$ this is the maximum depth of each taxonomy to be constructed.
init_levels $\rightarrow$ this is the number of initial levels to be constructed in the initial taxonomy.
max_density $\rightarrow$ this is the maximum density of papers to be mapped to a node (or unmapped papers at a parent node) in the taxonomies. If a leaf node has more than max_density papers, it will trigger depth expansion at that node. If a parent node has more than max_density papers that are unmapped to any of its children, it will trigger width expansion at that node.

In main.py, we define the different dimensions of research for a specific topic, each of which will be constructed as a separate taxonomy. You can modify the dimensions in the args.dimensions list.

Custom Dataset

To use a custom dataset, you need to add it to the construct_dataset function in main.py. You may add it as follows:

elif args.dataset == 'dataset_name':
        ds = load_dataset("huggingface_dataset_name")

We assume that the dataset has a title and abstract field for each paper. If not, you can modify the function to extract the relevant fields from your dataset.

Video

You can find a video explanation of the TaxoAdapt framework and its results on YouTube: TaxoAdapt Video.

📖 Citation

Please cite the paper and star this repo if you use TaxoAdapt and find it interesting/useful, thanks! Feel free to open an issue if you have any questions.

@article{kargupta2025taxoadapt,
  title={TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora},
  author={Kargupta, Priyanka and Zhang, Nan and Zhang, Yunyi and Zhang, Rui and Mitra, Prasenjit and Han, Jiawei},
  journal={arXiv preprint arXiv:2506.10737},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
ablations		ablations
api/openai		api/openai
datasets		datasets
llm_judge/node_judge		llm_judge/node_judge
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
classification.py		classification.py
enrichment.py		enrichment.py
expansion.py		expansion.py
framework.png		framework.png
main.py		main.py
model_definitions.py		model_definitions.py
paper.py		paper.py
prompts.py		prompts.py
requirements.txt		requirements.txt
taxonomy.py		taxonomy.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora

Contents

Setup

Arguments

Custom Dataset

Video

📖 Citation

About

Uh oh!

Releases

Packages

Languages

License

pkargupta/taxoadapt

Folders and files

Latest commit

History

Repository files navigation

TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora

Contents

Setup

Arguments

Custom Dataset

Video

📖 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages