– F. Denis (LIF) : Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning

Carte non disponible

Date/heure
Date(s) - 17 janvier 2014

Catégories Pas de Catégories


Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning\n\nBy François Denis, LIF.\n\nCowork with Mattias Gybels and Amaury Habrard.\n\nLearning probabilistic models over strings is an important issue\nfor many applications. Spectral methods propose elegant solutions to the\nproblem of inferring weighted automata from finite samples of\nvariable-length strings drawn from an unknown target distribution. These\nmethods rely on a singular value decomposition of a matrix $H_S$, called\nthe Hankel matrix, that records the frequencies of (some of) the observed\nstrings. The accuracy of the learned distribution depends both on the\nquantity of information embedded in $H_S$ and on the distance between\n$H_S$ and its mean $H_r$. Existing concentration bounds seem to indicate\nthat the concentration over $H_r$ gets looser with the size of $H_r$,\nsuggesting to make a trade-off between the quantity of used information\nand the size of $H_r$. We propose new dimension-free concentration bounds\nfor several\nvariants of Hankel matrices. Experiments demonstrate that these bounds are\ntight and that they significantly improve existing bounds. These results\nsuggest that the concentration rate of the Hankel matrix around its mean\ndoes not constitute an argument for limiting its size.[

– F. Denis (LIF) : Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning

Carte non disponible

Date/heure
Date(s) - 17 janvier 2014

Catégories Pas de Catégories


Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning\n\nBy François Denis, LIF.\n\nCowork with Mattias Gybels and Amaury Habrard.\n\nLearning probabilistic models over strings is an important issue\nfor many applications. Spectral methods propose elegant solutions to the\nproblem of inferring weighted automata from finite samples of\nvariable-length strings drawn from an unknown target distribution. These\nmethods rely on a singular value decomposition of a matrix $H_S$, called\nthe Hankel matrix, that records the frequencies of (some of) the observed\nstrings. The accuracy of the learned distribution depends both on the\nquantity of information embedded in $H_S$ and on the distance between\n$H_S$ and its mean $H_r$. Existing concentration bounds seem to indicate\nthat the concentration over $H_r$ gets looser with the size of $H_r$,\nsuggesting to make a trade-off between the quantity of used information\nand the size of $H_r$. We propose new dimension-free concentration bounds\nfor several\nvariants of Hankel matrices. Experiments demonstrate that these bounds are\ntight and that they significantly improve existing bounds. These results\nsuggest that the concentration rate of the Hankel matrix around its mean\ndoes not constitute an argument for limiting its size.[