HRSVM-Research/Research Paper at master · JumpThanawut/HRSVM-Research · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Research Paper

Title: An Implementation of Hierarchical Multi-Label Classification System
Author: Thanawut Ananpiriyakul, Piyapan Poomsirivilai, Peerapon Vateekul
Institution: Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand
Conference: 2014 4th IEEE Thailand Student Conference on Senior Capstone Project (IEEE Thailand SCAP 2014)

Abstract
Hierarchical Multi-Label Classification (HMC) is a complex type of classification problem, where each example can be annotated into more than one label, and the labels are hierarchically organized. It has received a lot of attentions due to its need in broad domains of applications. There were many proposed HMC algorithms based on Support Vector Machine (SVM); however, it is surprising that none of them has ever been implemented and opened for the community. In this paper, we present open-source HMC software built around our earlier algorithm called “HR-SVM”. Moreover, there are many modifications in this version resulting in an improvement in terms of accuracy and induction time. The software was tested and showed promising results not only in the HMC domain, but also in the multi-label domain.

Keyword: Hierarchical Multi-Label Classification, Multi-Label Classification, Support Vector Machine

Introduction
Classification is an attempt to classify each example into a set of labels (classes). The traditional approach assumes that an example can be assigned to one and only one class. However, over the last decade, a new classification scheme called “Hierarchical Multi-Label Classification” (HMC) has emerged and gained numerous attentions from various real-world applications, e.g., text categorization, gene function prediction, etc. It is different from the traditional one in two aspects. First, each example may belong to more than one class at the same time (multi-label classification). Second, classes are organized in hierarchical structure which is tree or directed acyclic graph (hierarchical classification).
Support Vector Machine (SVM) [1] is one of the most famous classification techniques and has known for its remarkable prediction performance. There are several SVM classification tools freely available on the internet including LIBSVM [2], SVMLight [3], SVMPerf [4], and LIBLINEAR [5]. However, all of them are only designed for a single-label classification and can support neither HMC nor multi-label classification. As of our survey, in the HMC domain, there is no SVM based tool which is publicly available on the internet, while the need of applications in this domain has been increasing.
HR-SVM [6] is our own HMC algorithm, which is a hierarchical extended version of R-SVM [6], our earlier enhanced SVM for domains with imbalanced classes. In the class hierarchy, it generates R-SVM as a local classifier for each class node, and employs in a top-down fashion. There is also a mechanism to correct false positive errors (FPC) that propagate downward the hierarchy. The experiments show that HR-SVM outperforms other HMC algorithms in several benchmark datasets. However, the overall induction time seems slightly high since the FPC module does not allow a classifier to be induced unless the classifiers on superclasses have already constructed.
In this paper, we aim to improve the performance of HR-SVM in terms of accuracy and induction time. The outcome is open-source software that everyone in the community can be free of use. Comparing to the original HR-SVM, there are three enhancements. First, a resampling technique is applied to reduce the induction time of each classifier. Second, one of HR-SVM’s modules is removed due to a new training mechanism; therefore, the overall induction cost is greatly reduced. Third, F-score algorithm [7] is chosen as a feature selection. The software is extensively evaluated on several benchmark datasets: 7 multi-label datasets and 6 HMC datasets. The results show that it outperforms other HMC algorithms on almost all datasets.
The rest of the paper is organized as follows. Section 2 introduces classification problem and existing strategies. Section 3 declares relevant performance criteria. Section 4 describes the proposed system in detail. Section 5 shows the results of experiments and discussion. Section 6 is the conclusion.

*If you want to read full research paper. Feel free to send me a message and introduce yourself.