Machine Learning: Difference between revisions
No edit summary |
No edit summary |
||
Line 18: | Line 18: | ||
Wasserstein GAN (WGAN): Uses a minimization of the distance between data distribution contained in the training set and the distribution of the observed data. In certain cases this produces a more stable training process. | Wasserstein GAN (WGAN): Uses a minimization of the distance between data distribution contained in the training set and the distribution of the observed data. In certain cases this produces a more stable training process. | ||
WGAN Pseudocode: | |||
Domain Adaptation: In this case the goal is to learn about or extrapolate from one domain to another, often by finding domain-invariant representations (County). https://arxiv.org/pdf/1507.00504.pdf This is a technique that is often used to transfer information based on labelled data to unlabeled data. | |||
By obtaining the best transportation plan connecting the probability distributions of source and target domains, estimates of learning samples are estimated. The transformation is non-linear and invertible. This allows for the use of a variety of machine learning methods that can be used on the transformed dataset. Regularized, unsupervised models have been used, as well as Joint Class Proportion and Optimal Transport (JCPOT) to address multi-source domain adaptation | |||
<math> | |||
\begin{tabular}{l} | |||
Algorithm 1 Joint Class Proportion and Optimal \\ | |||
Transport $(\mathrm{JCPOT})$ \\ | |||
\hline 1: Input: $\epsilon,$ maxIter, $\forall k\left(\mathrm{C}^{(k)} \text { and } \lambda^{(k)}\right)$ \\ | |||
$2: c p t \leftarrow 0$ \\ | |||
$3: e r r \leftarrow \infty$ \\ | |||
4: for all $k=1, \ldots, K$ do \\ | |||
$5: \quad \zeta^{(k)} \leftarrow \exp \left(-\frac{\mathrm{C}^{(k)}}{\epsilon}\right), \quad \forall k$ \\ | |||
6: while $c p t<$ maxiter and err$>$ threshold do \\ | |||
$7: \quad$ for all $k=1, \ldots, K \mathrm{do}$ \\ | |||
$8: \quad \zeta^{(k)} \leftarrow \operatorname{diag}\left(\frac{\mathrm{m}}{\zeta^{(k)}}\right) \zeta^{(k)}, \quad \forall k$ \\ | |||
$9: \quad \mathbf{h}^{(c p t)} \leftarrow \exp \left(\sum_{k=1}^{K} \lambda^{(k)} \log \left(\mathbf{D}_{1}^{(k)} \zeta^{(k)} \mathbf{1}\right)\right)$ \\ | |||
$10: \quad$ for all $k=1, \ldots, K$ do \\ | |||
$11: \quad \zeta^{(k)} \leftarrow \zeta^{(k)} \operatorname{diag}\left(\frac{\mathrm{D}_{2}^{(k)} \mathbf{h}}{\zeta^{(k)}}\right), \quad \forall k$ \\ | |||
$12: \quad \mathrm{err} \leftarrow\left\|\mathbf{h}^{(c p t)}-\mathbf{h}^{(c p t-1)}\right\|_{2}$ \\ | |||
$13: \quad \mathrm{cpt} \leftarrow \mathrm{cpt}+1$ \\ | |||
14: return $\mathbf{h}, \forall k \zeta^{(k)}$ | |||
\end{tabular} | |||
</math> |
Revision as of 19:04, 9 June 2020
Optimal Transport: Machine Learning
Introduction
Optimal transport concepts applied to machine learning applications can also be referred to as computational Optimal Transport (OT). At its core, machine learning focuses on making comparisons between complex objects. To properly measure these similarities, a metric is needed, which is a distance function.
Optimal transport respects the underlying structure and geometry of a problem while providing a framework for comparing probability distributions. Optimal transport methods have received attention from researchers in fields as varied as economics, statistics, and quantum mechanics. The categories that OT methods can be divided into include learning, domain adaptation, Bayesian inference, and hypothesis testing.
Learning Methods
These methods have used transport-based distances in the following research contexts:
Graph-based semi-supervised learning: Effective approach for classification from a large variety of domains. These include image and text classification. It is possible to use graph-based algorithms, and is often useful for unlabeled data.
Generative Adversarial Networks (GAN): Machine learning frameworks where two neural networks are used compete in a game-theoretic sense. These techniques have been used in semi-supervised learning.
Restricted Bolzman Machines (RMB): These are probabilistic graphical models and can obtain hierarchical features at multiple levels. An RBM can learn a probability distribution over a given set of inputs, and they were originally created under the name Harmonim by Paul Smolensky in 1986.
Entropy-regularized Wasserstein loss: This has been used for multi-label classification. It is characterized by a relaxation of the transport problem which addresses unnormalized measure. It does this be replacing the equality constraints with soft penalties with respect to KL- divergence. Slice-Wasserstein metric
Wasserstein GAN (WGAN): Uses a minimization of the distance between data distribution contained in the training set and the distribution of the observed data. In certain cases this produces a more stable training process.
WGAN Pseudocode:
Domain Adaptation: In this case the goal is to learn about or extrapolate from one domain to another, often by finding domain-invariant representations (County). https://arxiv.org/pdf/1507.00504.pdf This is a technique that is often used to transfer information based on labelled data to unlabeled data.
By obtaining the best transportation plan connecting the probability distributions of source and target domains, estimates of learning samples are estimated. The transformation is non-linear and invertible. This allows for the use of a variety of machine learning methods that can be used on the transformed dataset. Regularized, unsupervised models have been used, as well as Joint Class Proportion and Optimal Transport (JCPOT) to address multi-source domain adaptation
Failed to parse (unknown function "\begin{tabular}"): {\displaystyle \begin{tabular}{l} Algorithm 1 Joint Class Proportion and Optimal \\ Transport $(\mathrm{JCPOT})$ \\ \hline 1: Input: $\epsilon,$ maxIter, $\forall k\left(\mathrm{C}^{(k)} \text { and } \lambda^{(k)}\right)$ \\ $2: c p t \leftarrow 0$ \\ $3: e r r \leftarrow \infty$ \\ 4: for all $k=1, \ldots, K$ do \\ $5: \quad \zeta^{(k)} \leftarrow \exp \left(-\frac{\mathrm{C}^{(k)}}{\epsilon}\right), \quad \forall k$ \\ 6: while $c p t<$ maxiter and err$>$ threshold do \\ $7: \quad$ for all $k=1, \ldots, K \mathrm{do}$ \\ $8: \quad \zeta^{(k)} \leftarrow \operatorname{diag}\left(\frac{\mathrm{m}}{\zeta^{(k)}}\right) \zeta^{(k)}, \quad \forall k$ \\ $9: \quad \mathbf{h}^{(c p t)} \leftarrow \exp \left(\sum_{k=1}^{K} \lambda^{(k)} \log \left(\mathbf{D}_{1}^{(k)} \zeta^{(k)} \mathbf{1}\right)\right)$ \\ $10: \quad$ for all $k=1, \ldots, K$ do \\ $11: \quad \zeta^{(k)} \leftarrow \zeta^{(k)} \operatorname{diag}\left(\frac{\mathrm{D}_{2}^{(k)} \mathbf{h}}{\zeta^{(k)}}\right), \quad \forall k$ \\ $12: \quad \mathrm{err} \leftarrow\left\|\mathbf{h}^{(c p t)}-\mathbf{h}^{(c p t-1)}\right\|_{2}$ \\ $13: \quad \mathrm{cpt} \leftarrow \mathrm{cpt}+1$ \\ 14: return $\mathbf{h}, \forall k \zeta^{(k)}$ \end{tabular} }