Localized Multiple Kernel Algorithms for Machine Learning
In recent years, several multiple kernel learning methods have been proposed in the machine learning literature. Different kernels correspond to different notions of similarity and multiple kernel learning can be used to combine them. It can also be used to integrate different inputs coming from different representations, possibly from different sources or modalities, by combining kernels calculated on these representations. This thesis contains a number of extensions to the original multiple kernel learning framework, together with experimental results that support their utility on benchmark data sets from the UCI Machine Learning Repository as well as several image image recognition and bioinformatics data sets.
This thesis introduces a regularized multiple kernel learning framework and proposes to use the response surface methodology to search for the best regularization parameter set using validation data. Optimizing such regularization parameters allows us to obtain more robust decision functions for the classification task at hand. Kernels that do not help increase the classification accuracy are pruned by selecting their regularization parameters accordingly, obtaining smoother discriminants. Eliminating some of the kernels directly or decreasing the number of stored support vectors reduces the testing time for new instances.
This thesis also proposes a cost-conscious strategy to include the cost of kernel computations and data acquisition/generation into the multiple kernel learning framework. The results show that incorporating a cost factor into the model enables us to use only the necessary kernels, avoiding costly kernel computations and input generation for some data representations in the testing phase, when possible.
The main contribution of this thesis is formulation of a localized multiple kernel learning framework that is composed of a kernel-based learning algorithm and a gating model to assign data-dependent weights to kernel functions. We derive the learning algorithm for three different gating models and apply localized multiple kernel learning to binary classification, regression, multiclass classification, and one-class classification problems. For classification problems that use different feature representations, our proposed method is able to construct better classifiers by combining the kernels on these representations locally. This localized formulation achieves higher average test accuracies and stores fewer support vectors compared to the canonical multiple kernel combination with global weights. We also see that, as expected, combining heterogeneous feature representations is more advantageous than combining multiple copies of the same representation. For image recognition problems, our proposed method identifies the relevant parts of each input image separately by using the gating model as a saliency detector on the kernels calculated on the image patches. Different from the multiple kernel learning methods proper, our proposed method can combine multiple copies of the same kernel. We show that even if we provide more kernels than needed, our proposed approach uses only as many support vectors as required and does not overfit.
We also introduce a supervised and localized dimensionality reduction method that trains local projection kernels coupled with a kernel-based learning algorithm. On visualization tasks, our proposed method is able to maintain the multimodality of a class by placing clusters of the same class on the same side of the hyperplane while preserving a separation between them. On classification tasks, it achieves better results than other methods by attaining both higher test accuracy and storing fewer support vectors due to the coupled optimization of the discriminant and the local projection matrices used in dimensionality reduction.