Skip to content

Latest commit

 

History

History
71 lines (48 loc) · 2.34 KB

File metadata and controls

71 lines (48 loc) · 2.34 KB

HYDRA: A Multi-Head Encoder-only Architecture for Hierarchical Text Classification

This repository contains the implementation of our paper "HYDRA: A Multi-Head Encoder-only Architecture for Hierarchical Text Classification".

Overview

HYDRA is a simple yet effective multi-head encoder-only architecture for hierarchical text classification that treats each level in the hierarchy as a separate classification task with its own label space. Through parameter sharing and level-specific parameterization, HYDRA enables flat models to incorporate hierarchical awareness without architectural complexity.

Our approach demonstrates that complex components like graph encoders, label semantics, and autoregressive decoders are often unnecessary for achieving state-of-the-art performance in hierarchical text classification.

Repository Structure

  • datasets/: Contains scripts and instructions for the four benchmark datasets (NYT, RCV1-V2, BGC, WOS)
  • modeling/: Contains the implementation of the HYDRA model architecture and its variants
  • scripts/: Contains shell scripts to reproduce our experiments
  • hydra_experiments.py: Main script for training and evaluation

Installation

pip install -r requirements.txt

Main Requirements

  • PyTorch
  • Transformers (v4.51.3, for UnLlama you need to use v4.41.1)
  • Datasets

Quick Start

  1. Prepare the datasets following the instructions in datasets/README.md.
  2. To reproduce our main results:
# Run flat baseline models
bash scripts/run_flat.sh

# Run HYDRA with local heads only
bash scripts/run_hydra.sh

# Run HYDRA with local+global heads
bash scripts/run_hydra_global.sh

# Run HYDRA with local+nested heads
bash scripts/run_hydra_nested.sh

Experimental Setup

Our experiments were conducted on four standard hierarchical text classification benchmarks:

  • NYT (New York Times Annotated Corpus)
  • RCV1-V2
  • BGC (Blurb Genre Collection)
  • WOS (Web of Science-46985)

All experiments were run five times with different random seeds (42, 1, 2, 3, 4) to ensure reproducibility.

License

Our source code is released under the MIT License. See the LICENSE file for details.

Development Tools

This repository was developed with the assistance of GitHub Copilot. The authors have reviewed and edited the generated content to ensure accuracy and clarity.