Incremental Construction of Cost-Conscious Ensembles Using Multiple Learners and Representations in Machine Learning
In this thesis, the main purpose is to combine multiple models to increase accuracy, while at the same time keeping a check on complexity. Towards this aim, we propose two methods, and these methods are tested by simulations using well-known classification algorithms on standard uni- and multi-representation data sets.
In the literature, methods have been proposed to create diverse classifiers. These methods change: (i) Algorithms used for training, (ii) Hyperparameters of the algorithms, (iii) Training set samples, (iv) Input feature subsets, and (v) Input representations. In this thesis, we show that these methods are not enough to decrease the correlations among base classifiers. Furthermore, we present the relation between error and correlation for fixed combination rules and a linear combiner, using three different cases. The cases are: (i) Independence, (ii) Equicorrelation, and (iii) Groups. We see that, the sum rule and the trained combiner show the most robust behavior to changes in correlation. Previous studies in the literature assume that the base classifiers are independent, the analysis in the presence of correlation, as presented in this thesis, is novel.
To remove the correlation between classifiers, we propose two algorithms to construct ensembles of multiple classifiers: (i) An incremental algorithm, named Icon which generates an ensemble of multiple models (representation/classifier pairs) to improve performance, taking into account both accuracy and the concomitant increase in cost, i.e., time and space complexity, and (ii) An algorithm which post-processes before fusing, using principal component analysis (Pca) and linear discriminant analysis (Lda) to form uncorrelated metaclassifiers from a set of correlated experts.
Icon chooses a subset among correlated base classifiers. The algorithm has three dimensions: (i) Search direction (forward, backward, floating), (ii) Model evaluation criterion (accuracy, diversity and complexity), and (iii) Combination rule (fixed rules or a trained combiner). Our simulations using fourteen classifiers on thirty eight data sets show that, accuracy is the best model selection criteria and sum rule is the best combination rule. Other approaches create less preferred results compared to these two. There has been studies of subset selection in the literature, but the work in this thesis has a larger number of classifiers and data sets and its scope is wider. Using this method, we create ensembles which are more accurate than the single best algorithm and using all algorithms; and which are not worse than the optimal subset using smaller number of base classifiers. When applied to multi-representation data sets, we see that Icon automatically chooses classifiers which combine different representations and generates a set of complementary classifiers.
Pca which uses principal component analysis, and Lda which uses linear discriminant analysis create uncorrelated metaclassifiers from correlated base classifiers and these metaclassifiers are combined using a linear classifier. This method is successful with a small number of components and has the same accuracy as combining all classifiers. The work in this thesis allows generalization to multiple classifiers, combines multiple representations, allows knowledge extraction, and is novel in these respects. In this method, principal component analysis is more successful than linear discriminant analysis.
As the overall result, in comparing these two methods which get rid of correlation, we see that if the aim is to decrease complexity, then subset selection is better; if the aim is higher accuracy, we should prefer metaclassifiers which extract knowledge and has redundancy.