Multi-Label Learning by Exploiting Label Dependency
Review of Multi-Label Learning by Exploiting Label Dependency, Min-Ling Zhang, Kun Zhang.
The problem discussed in the paper is regarding the classification of multi-label datasets. The number of possible learning stages for a given dataset is exponential. To avoid the laborious task, the authors propose to use correlation and Bayesian networks to build an efficient algorithm. The resulting classifier transforms a multi-label learning problem into a single-label one.
The contribution of the new algorithm is the following, with the use of the proposed algorithm there is no need to address the problem by dividing it in different ways and applying the corresponding algorithm. The authors build a single algorithm to improve performance of multi-label learning by exploiting label dependencies that are capture though Bayesian Networks.
The idea is based in the assumption that the effect of a set of features can be separated and then the conditional dependency is identified by looking at predicted errors.
The authors identify three main differences to previous algorithms.
1. The underlying structure for a label space is explicitly expressed in a simple way. This helps to acquire additional information for the problem
2. The new algorithm handles with random order of the correlation among labels, and
3. The algorithm reduce the problem into linear for the number of labels in the dataset. The implementation is efficient and straightforward for any new instance.
The data use to evaluate the algorithm was selected to meet the criteria that state-of-the-art algorithms have define to analyse multi-label dataset. There are three types: First-order, Second-order and high-order. The first order is for algorithms that decompose the problem into a number of independent tasks; the second order considers interactions between pairs of labels; and the high-order that includes correlations between subsets of labels.
In total 13 datasets are gather. The features in each data set vary in terms of the number of examples, number of features, number of possible labels, and feature type (continuous or discrete). Additional features with measurements regarding the number and average of dimensions, ratios of density and proportion are integrated to each dataset. The domain of the dataset varies from music, text, biology and media.
Performance evaluation for multi-label dataset does not follow the standard metrics such as precision or recall. The paper uses five metrics designed for the specific problem of multi-label. The metrics are the following:
a. Hamming loss. Measures prediction errors and missing error (when prediction is not given) and then computes the percentage or number of times a label class is misclassified.
b. One-error. Evaluates how many times the top-ranked label was not in the set of true labels. Usually Low on-error implies low hamming loss and vice versa.
c. Coverage. Evaluates the average number of steps needed to move down the label list in order to cover all the proper labels of the example. Useful to measure the performance of the algorithm with all possible labels, no just the top ranked labels.
d. Ranking loss. Evaluates the average fraction of label pairs that are misordered for an instance. It makes the count of the labels that violate the condition.
e. Average precision. Computes the average percentage of correctly labeled instances above an specific label.