Information Geometry
Information geometry is a relatively new field of mathematical science that discusses geometry in the space of probability distributions, and uses a characteristic structure called the dual connection. By reconsidering the relationship between examples (data) collected from a certain probability distribution and a learning machine that models data structur in the space of probability distributions, we can discuss the properties of machine learning algorithms.
Boosting by Well-Designed Ensemble
Boosting is an algorithm specific to machine learning that efficiently collects weak learners and augments them into a strong learner.
Each step of Boosting is the estimation of a weak learner using weighted data, but when reconsidered in the space of probability distributions, it can be understood as a special coordinate gradient descent method that sequentially updates a special learning model based on the previously acquired learner with the original data.
Murata, N., Takenouchi, T., Kanamori, T., Eguchi, S.: “Information geometry of U-Boost and Bregman divergence”, Neural Computation, Volume 16, Issue 7, July 2004, Pages 1437-1481. https://doi.org/10.1162/089976604323057452
A Geometrical Extension of the Bradley-Terry Model
The Bradley-Terry model is a basic and important probability model for rating. Since the parameter assigned to each player is estimated based on the number of matches won or lost between the two players, it is necessary to perform multiple maximum likelihood estimation of the binomial distribution. For this reason, several likelihood designs that take into account the weighting by the number of matches have been proposed.
We consider the geometry of probability distributions in space, and propose an algorithm for parameter estimation in the framework of Expectation-Maxmization (exponential-mixture) algorithm, which is regarded as maximum likelihood estimation of multinomial distributions with incomplete observation data.
Fujimoto, Y., Hino, H., Murata, N.: “An estimation of generalized Bradley-Terry models based on the em algorithm”, Neural Computation, Volume 23, Issue 6, June 2011, Pages 1623-1659. https://doi.org/10.1162/NECO_a_00129