Machine Learning

From Optimal Transport Wiki
Revision as of 18:48, 9 June 2020 by Efagnan (talk | contribs)
Jump to navigation Jump to search

Optimal Transport: Machine Learning

Introduction

Optimal transport concepts applied to machine learning applications can also be referred to as computational Optimal Transport (OT). At its core, machine learning focuses on making comparisons between complex objects. To properly measure these similarities, a metric is needed, which is a distance function.

Optimal transport respects the underlying structure and geometry of a problem while providing a framework for comparing probability distributions. Optimal transport methods have received attention from researchers in fields as varied as economics, statistics, and quantum mechanics. The categories that OT methods can be divided into include learning, domain adaptation, Bayesian inference, and hypothesis testing.

Learning Methods

These methods have used transport-based distances in the following research contexts:

Graph-based semi-supervised learning: Effective approach for classification from a large variety of domains. These include image and text classification. It is possible to use graph-based algorithms, and is often useful for unlabeled data.

Generative Adversarial Networks (GAN): Machine learning frameworks where two neural networks are used compete in a game-theoretic sense. These techniques have been used in semi-supervised learning.

Restricted Bolzman Machines (RMB): These are probabilistic graphical models and can obtain hierarchical features at multiple levels. An RBM can learn a probability distribution over a given set of inputs, and they were originally created under the name Harmonim by Paul Smolensky in 1986.

Entropy-regularized Wasserstein loss: This has been used for multi-label classification. It is characterized by a relaxation of the transport problem which addresses unnormalized measure. It does this be replacing the equality constraints with soft penalties with respect to KL- divergence. Slice-Wasserstein metric

Wasserstein GAN (WGAN): Uses a minimization of the distance between data distribution contained in the training set and the distribution of the observed data. In certain cases this produces a more stable training process.