##### – E. Morvant (LIF) : A Well-founded PAC-Bayesian Majority Vote applied to the Nearest Neighbor Rule
Carte non disponible

Date/heure
Date(s) - 15/11/2012
14 h 00 min - 15 h 00 min

Catégories Pas de Catégories

A Well-founded PAC-Bayesian Majority Vote applied to the Nearest Neighbor Rule By Emilie Morvant, LIF. The Nearest Neighbor \left(NN\right) \left[1\right] rule is probably the best-known classification method. Its widespread use in machine learning and pattern recognition is due to its simplicity, its theoretical properties and its good practical performance. In this work, we focus on the k-NN classifier rule where the predicted class of an instance corresponds to the majority class over its k-nearest neighbors in the learning sample. However, k-NN rule suffers from limitations, among which the choice of a suitable k and the impossibility to derive generalization guarantees with a standard k-NN algorithm in finite-sample situations. To tackle these drawbacks, we propose to investigate a new well-founded quadratic algorithm called MinCq \left[2\right], which takes advantage of the PAC-Bayesian setting \left[3\right] by looking for a probability distribution over a set of voters H which seeks for suitable weights to be given to each voters in order to build a majority vote. In particular, MinCq aims at minimizing a bound involving the first two statistical moments of the margin realized on the learning data. This framework offers strong and elegant theoretical guarantees on the learned weighted majority vote. In the context of k-NN rule, if H consists of the set of the k-NN classifiers themselves \left(k=\left\{1,2,\dots \right\}\right), MinCq may prevent us from tuning k and can “easily” provide generalization guarantees. However in such a situation, we point out two limitations of MinCq. First, it focuses on quasi-uniform distribution \left(i.e. close to the uniform distribution\right) which is not appropriate to settings where one has an a priori belief on the relevance of the voters: We would like to give higher weights to nearer neighbors. Second, the theoretical guarantees are not true when the voters are built from learning examples \left(which is the case of a k-NN classifier\right). In this work, we propose thus to generalize MinCq by allowing the incorporation of an a priori belief P, constraining the learned distribution to be P-aligned and we extend the generalization guarantees to the PAC-Bayes sample compression setting with voters built from learning examples. We set a suitable P-aligned distribution and we conduct a large comparative experimental study that shows practical evidences of the efficieny of our method called P-MinCq. \left[1\right] T. Cover and P. Hart, “Nearest neighbor pattern classification”, IEEE Transactions on Information Theory, vol.13, no. 1, pp. 21-27, 1967 \left[2\right] F. Laviolette, M. Marchand and J.-F. Roy, “From PAC-Bayes bounds to quadratic program for majority votes”, in Proceedings of ICML 2011 \left[3\right] D. A. McAllester, “PAC-Bayesian model Averaging”, in Proceedings of COLT 1999
##### – E. Morvant (LIF) : A Well-founded PAC-Bayesian Majority Vote applied to the Nearest Neighbor Rule
Carte non disponible

Date/heure
Date(s) - 15/11/2012
14 h 00 min - 15 h 00 min

Catégories Pas de Catégories

A Well-founded PAC-Bayesian Majority Vote applied to the Nearest Neighbor Rule\n\nBy Emilie Morvant, LIF.\n\nThe Nearest Neighbor \left(NN\right) \left[1\right] rule is probably the best-known classification method. Its widespread use in machine learning and pattern recognition is due to its simplicity, its theoretical properties and its good practical performance. In this work, we focus on the k-NN classifier rule where the predicted class of an instance corresponds to the majority class over its k-nearest neighbors in the learning sample. However, k-NN rule suffers from limitations, among which the choice of a suitable k and the impossibility to derive generalization guarantees with a standard k-NN algorithm in finite-sample situations.\nTo tackle these drawbacks, we propose to investigate a new well-founded quadratic algorithm called MinCq \left[2\right], which takes advantage of the PAC-Bayesian setting \left[3\right] by looking for a probability distribution over a set of voters H which seeks for suitable weights to be given to each voters in order to build a majority vote. In particular, MinCq aims at minimizing a bound involving the first two statistical moments of the margin realized on the learning data. This framework offers strong and elegant theoretical guarantees on the learned weighted majority vote.\nIn the context of k-NN rule, if H consists of the set of the k-NN classifiers themselves \left(k=1,2,\dots \right), MinCq may prevent us from tuning k and can “easily” provide generalization guarantees. However in such a situation, we point out two limitations of MinCq. First, it focuses on quasi-uniform distribution \left(i.e. close to the uniform distribution\right) which is not appropriate to settings where one has an a priori belief on the relevance of the voters : We would like to give higher weights to nearer neighbors. Second, the theoretical guarantees are not true when the voters are built from learning examples \left(which is the case of a k-NN classifier\right).\nIn this work, we propose thus to generalize MinCq by allowing the incorporation of an a priori belief P, constraining the learned distribution to be P-aligned and we extend the generalization guarantees to the PAC-Bayes sample compression setting with voters built from learning examples. We set a suitable P-aligned distribution and we conduct a large comparative experimental study that shows practical evidences of the efficieny of our method called P-MinCq. \n\n\left[1\right] T. Cover and P. Hart, “Nearest neighbor pattern classification”, IEEE Transactions on Information Theory, vol.13, no. 1, pp. 21-27, 1967\n\left[2\right] F. Laviolette, M. Marchand and J.-F. Roy, “From PAC-Bayes bounds to quadratic program for majority votes”, in Proceedings of ICML 2011\n\left[3\right] D. A. McAllester, “PAC-Bayesian model Averaging”, in Proceedings of COLT 1999\n\n\left[