Domain Adaptation




How can we ensure that a network trained in one domain will predict accurately in another domain?


This section provides alternative descriptions of the pattern in the form of an illustration or alternative formal expression. By looking at the sketch a reader may quickly understand the essence of the pattern. Discussion

This is the main section of the pattern that goes in greater detail to explain the pattern. We leverage a vocabulary that we describe in the theory section of this book. We don’t go into intense detail into providing proofs but rather reference the sources of the proofs. How the motivation is addressed is expounded upon in this section. We also include additional questions that may be interesting topics for future research.

Known Uses

Here we review several projects or papers that have used this pattern.

Related Patterns In this section we describe in a diagram how this pattern is conceptually related to other patterns. The relationships may be as precise or may be fuzzy, so we provide further explanation into the nature of the relationship. We also describe other patterns may not be conceptually related but work well in combination with this pattern.

Relationship to Canonical Patterns

Relationship to other Patterns

Further Reading

We provide here some additional external material that will help in exploring this pattern in more detail.


To aid in reading, we include sources that are referenced in the text in the pattern.

References Domain-Adversarial Training of Neural Networks

The proposed architecture includes a deep feature extractor (green) and a deep label predictor (blue), which together form a standard feed-forward architecture. Unsupervised domain adaptation is achieved by adding a domain classifier (red) connected to the feature extractor via a gradient reversal layer that multiplies the gradient by a certain negative constant during the backpropagation-based training. Otherwise, the training proceeds standardly and minimizes the label prediction loss (for source examples) and the domain classification loss (for all samples). Gradient reversal ensures that the feature distributions over the two domains are made similar (as indistinguishable as possible for the domain classifier), thus resulting in the domain-invariant features.

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In NIPS, pages 137–144, 2006.

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains. Machine Learning, 79(1-2):151–175, 2010.

DLID: Deep Learning for Domain Adaptation by Interpolating between Domains

Overview of DLID model. (a) Schematic showing the interpolating path between the source domain S and target domain T , consisting of the intermediate domains that we create. The figure also shows the unsupervised pretrainer. (b) Once the feature extractors are trained, any input Xi is represented by concatenating the the outputs of all each the feature extractors. This representation is passed through classifier/regressor. Scatter Component Analysis: A Unified Framework for Domain Adaptation and Domain Generalization Flexible Image Tagging with Fast0Tag REVISITING BATCH NORMALIZATION FOR PRACTICAL DOMAIN ADAPTATION

Adaptive Batch Normalization (AdaBN) increases the generalization ability of a DNN. By modulating the statistics in all Batch Normalization layers across the network, our approach achieves deep adaptation effect for domain adaptation tasks.

In this paper, we have introduced a simple yet effective approach for domain adaptation on batch normalized neural networks. Besides its original uses, we have exploited another functionality of Batch Normalization (BN) layer: domain adaptation. The main idea is to replace the statistics of each BN layer in source domain with those in target domain. The proposed method is easy to implement and parameter-free, and it takes almost no effort to extend to multiple source domains and semi-supervised settings. Our method established new state-of-the-art results on both single and multiple source(s) domain adaptation settings on standard benchmarks.

Illustration of the proposed method. For each convolutional or fully connected layer, we use different bias/variance terms to perform batch normalization for the training domain and the test domain. The domain specific normalization mitigates the domain shift issue. VARIATIONAL RECURRENT ADVERSARIAL DEEP DOMAIN ADAPTATION

Our model termed as Variational Recurrent Adversarial Deep Domain Adaptation (VRADA) is built atop a variational recurrent neural network (VRNN) and trains adversarially to capture complex temporal relationships that are domain invariant. This is (as far as we know) the first to capture and transfer temporal latent dependencies in multivariate time-series data. Unifying Multi-Domain Multi-Task Learning: Tensor and Neural Network Perspectives

Multi-domain learning aims to benefit from simultaneously learning across several different but related domains. In this chapter, we propose a single framework that unifies multi-domain learning (MDL) and the related but better studied area of multi-task learning (MTL). By exploiting the concept of a \emph{semantic descriptor} we show how our framework encompasses various classic and recent MDL/MTL algorithms as special cases with different semantic descriptor encodings. As a second contribution, we present a higher order generalisation of this framework, capable of simultaneous multi-task-multi-domain learning. This generalisation has two mathematically equivalent views in multi-linear algebra and gated neural networks respectively. Moreover, by exploiting the semantic descriptor, it provides neural networks the capability of zero-shot learning (ZSL), where a classifier is generated for an unseen class without any training data; as well as zero-shot domain adaptation (ZSDA), where a model is generated for an unseen domain without any training data. In practice, this framework provides a powerful yet easy to implement method that can be flexibly applied to MTL, MDL, ZSL and ZSDA. Generalized Discrepancy for Domain Adaptation

We present a new algorithm for domain adaptation improving upon the discrepancy minimization algorithm (DM), which was previously shown to outperform a number of popular algorithms designed for this task. Unlike most previous approaches adopted for domain adaptation, our algorithm does not consist of a fixed reweighting of the losses over the training sample. Instead, it uses a reweighting that depends on the hypothesis considered and is based on the minimization of a new measure of generalized discrepancy. Unsupervised Cross-Domain Image Generation

We study the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given function f, which accepts inputs in either domains, would remain unchanged. Other than the function f, the training data is unsupervised and consist of a set of samples from each domain. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity. Just DIAL: DomaIn Alignment Layers for Unsupervised Domain Adaptation

Rather than introducing regularization terms aiming to promote the alignment of the two representations, we act at the distribution level through the introduction of DomaIn Alignment Layers (DIAL), able to match the observed source and target data distributions to a reference one. Unsupervised Domain Adaptation by Backpropagation

As the training progresses, the approach promotes the emergence of “deep” features that are (i) discriminative for the main learning task on the source domain and (ii) invariant with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a simple new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation.

arch.svg A Theory of Output-Side Unsupervised Domain Adaptation

When learning a mapping from an input space to an output space, the assumption that the sample distribution of the training data is the same as that of the test data is often violated. Unsupervised domain shift methods adapt the learned function in order to correct for this shift. Previous work has focused on utilizing unlabeled samples from the target distribution. We consider the complementary problem in which the unlabeled samples are given post mapping, i.e., we are given the outputs of the mapping of unknown samples from the shifted domain. Two other variants are also studied: the two sided version, in which unlabeled samples are give from both the input and the output spaces, and the Domain Transfer problem, which was recently formalized. In all cases, we derive generalization bounds that employ discrepancy terms. A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

The DNN is built on a new basic module Conv-M which provides more diverse feature extractors without significantly increasing parameters. The unified framework of our DA method will simultaneously learn invariance across domains, reduce divergence of feature representations, and adapt label prediction. Task-based End-to-end Model Learning

As machine learning techniques have become more ubiquitous, it has become common to see machine learning prediction algorithms operating within some larger process. However, the criteria by which we train machine learning algorithms often differ from the ultimate criteria on which we evaluate them. This paper proposes an end-to-end approach for learning probabilistic machine learning models within the context of stochastic programming, in a manner that directly captures the ultimate task-based objective for which they will be used. We then present two experimental evaluations of the proposed approach, one as applied to a generic inventory stock problem and the second to a real-world electrical grid scheduling task. In both cases, we show that the proposed approach can outperform both a traditional modeling approach and a purely black-box policy optimization approach. Asymmetric Tri-training for Unsupervised Domain Adaptation

Tri-training leverages three classifiers equally to give pseudo-labels to unlabeled samples, but the method does not assume labeling samples generated from a different domain.In this paper, we propose an asymmetric tri-training method for unsupervised domain adaptation, where we assign pseudo-labels to unlabeled samples and train neural networks as if they are true labels. In our work, we use three networks asymmetrically. By asymmetric, we mean that two networks are used to label unlabeled target samples and one network is trained by the samples to obtain target-discriminative representations. Joint auto-encoders: a flexible multi-task learning framework

We develop a framework for learning multiple tasks simultaneously, based on sharing features that are common to all tasks, achieved through the use of a modular deep feedforward neural network consisting of shared branches, dealing with the common features of all tasks, and private branches, learning the specific unique aspects of each task. Once an appropriate weight sharing architecture has been established, learning takes place through standard algorithms for feedforward networks, e.g., stochastic gradient descent and its variations. The method deals with domain adaptation and multi-task learning in a unified fashion, and can easily deal with data arising from different types of sources. Multi-task Domain Adaptation for Sequence Tagging Decoupling “when to update” from “how to update”

In this paper, we propose a meta algorithm for tackling the noisy labels problem. The key idea is to decouple “when to update” from “how to update”. We demonstrate the effectiveness of our algorithm by mining data for gender classification by combining the Labeled Faces in the Wild (LFW) face recognition dataset with a textual genderizing service, which leads to a noisy dataset. While our approach is very simple to implement, it leads to state-of-the-art results. We analyze some convergence properties of the proposed algorithm. Self-ensembling for domain adaptation

This paper explores the use of self-ensembling with random image augmentation – a technique that has achieved impressive results in the area of semi-supervised learning – for visual domain adaptation problems. We modify the approach of Laine et al. to improve stability and ease of use. Our approach demonstrates state of the art results when performing adaptation between the following pairs of datasets: MNIST and USPS, CIFAR-10 and STL, SVHN and MNIST, Syn-Digits to SVHN and Syn-Signs to GTSRB. We also explore the use of richer data augmentation to solve the challenging MNIST to SVHN adaptation path. Distributional Adversarial Networks

We propose a framework for adversarial training that relies on a sample rather than a single sample point as the fundamental unit of discrimination. Inspired by discrepancy measures and two-sample tests between probability distributions, we propose two such distributional adversaries that operate and predict on samples, and show how they can be easily implemented on top of existing models. Various experimental results show that generators trained with our distributional adversaries are much more stable and are remarkably less prone to mode collapse than traditional models trained with pointwise prediction discriminators. The application of our framework to domain adaptation also results in considerable improvement over recent state-of-the-art. Learning Representations for Counterfactual Inference

We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

Domain adaptation is an important open problem in deep reinforcement learning (RL). In many scenarios of interest data is hard to obtain, so agents may learn a source policy in a setting where data is readily available, with the hope that it generalises well to the target domain. We propose a new multi-stage RL agent, DARLA (DisentAngled Representation Learning Agent), which learns to see before learning to act. DARLA's vision is based on learning a disentangled representation of the observed environment. Once DARLA can see, it is able to acquire source policies that are robust to many domain shifts - even with no access to the target domain. DARLA significantly outperforms conventional baselines in zero-shot domain adaptation scenarios, an effect that holds across a variety of RL environments (Jaco arm, DeepMind Lab) and base RL algorithms (DQN, A3C and EC). Learning by Association - A versatile semi-supervised training method for neural networks

We propose a new framework for semi-supervised training of deep neural networks inspired by learning in humans. “Associations” are made from embeddings of labeled samples to those of unlabeled ones and back. The optimization schedule encourages correct association cycles that end up at the same class from which the association was started and penalizes wrong associations ending at a different class. The implementation is easy to use and can be added to any existing end-to-end training setup. Associative Domain Adaptation

We propose associative domain adaptation, a novel technique for end-to-end domain adaptation with neural networks, the task of inferring class labels for an unlabeled target domain based on the statistical properties of a labeled source domain. Our training scheme follows the paradigm that in order to effectively derive class labels for the target domain, a network should produce statistically domain invariant embeddings, while minimizing the classification error on the labeled source domain. We accomplish this by reinforcing associations between source and target data directly in embedding space. Our method can easily be added to any existing classification network with no structural and almost no computational overhead. Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping Guarding Against Adversarial Domain Shifts with Counterfactual Regularization

We provide a causal framework for the problem and treat groups of instances of the same object as counterfactuals under different interventions on the mutable style features. We show links to questions of fairness, transfer learning and adversarial examples. Neural Lattice Search for Domain Adaptation in Machine Translation

We have presented a fully-convolutional neural network architecture for interest point detection and description trained using a self-supervised domain adaptation framework called Homographic Adaptation.

Our experiments demonstrate that (1) it is possible to transfer knowledge from a synthetic dataset onto real-world images, (2) sparse interest point detection and description can be cast as a single, efficient convolutional neural network, and (3) the resulting system works well for geometric computer vision matching tasks such as Homography Estimation. Bi-shifting Auto-Encoder for Unsupervised Domain Adaptation Augmented Cyclic Adversarial Learning for Domain Adaptation