Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Last revision Both sides next revision
fitness [2017/08/31 00:48]
127.0.0.1 external edit
fitness [2018/07/13 17:06]
admin
Line 277: Line 277:
  
 This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles both these issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations. These two approximations result in a robust and differentiable approximation of the OT loss with streamlined GPU execution. Entropic smoothing generates a family of losses interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus allowing to find a sweet spot leveraging the geometry of OT and the favorable high-dimensional sample complexity of MMD which comes with unbiased gradient estimates. The resulting computational architecture complements nicely standard deep network generative models by a stack of extra layers implementing the loss function. This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles both these issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations. These two approximations result in a robust and differentiable approximation of the OT loss with streamlined GPU execution. Entropic smoothing generates a family of losses interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus allowing to find a sweet spot leveraging the geometry of OT and the favorable high-dimensional sample complexity of MMD which comes with unbiased gradient estimates. The resulting computational architecture complements nicely standard deep network generative models by a stack of extra layers implementing the loss function.
 +
 +https://​arxiv.org/​pdf/​1711.01558v1.pdf Wasserstein Auto-Encoders
 +
 +https://​openreview.net/​pdf?​id=SJyEH91A- LEARNING WASSERSTEIN EMBEDDINGS
 +
 +https://​arxiv.org/​pdf/​1804.04268.pdf Incomplete Contracting and AI Alignment
 +
 +https://​arxiv.org/​abs/​1803.04585v3 Categorizing Variants of Goodhart'​s Law
 +
 +http://​bmax.im/​LovaszSoftmax The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks
 +
 +https://​danilorezende.com/​2018/​07/​12/​short-notes-on-divergence-measures/ ​