http://nerds.airbnb.com/overcoming-missing-values-in-a-rfc/

https://arxiv.org/pdf/1607.00136.pdf Missing Data Estimation in High-Dimensional Datasets: A Swarm Intelligence-Deep Neural Network Approach

https://arxiv.org/pdf/1512.01362.pdf Proposition of a Theoretical Model for Missing Data Imputation using Deep Learning and Evolutionary Algorithms

https://arxiv.org/abs/1701.05305v1 Random Forest Missing Data Algorithms

Currently there are many different RF imputation algorithms but relatively little guidance about their efficacy, which motivated us to study their performance. Using a large, diverse collection of data sets, performance of various RF algorithms was assessed under different missing data mechanisms. Algorithms included proximity imputation, on the fly imputation, and imputation utilizing multivariate unsupervised and supervised splitting—the latter class representing a generalization of a new promising imputation algorithm called missForest. Performance of algorithms was assessed by ability to impute data accurately. Our findings reveal RF imputation to be generally robust with performance improving with increasing correlation. Performance was good under moderate to high missingness, and even (in certain cases) when data was missing not at random.

https://arxiv.org/abs/1708.06724 VIGAN: Missing View Imputation with Generative Adversarial Networks

In this paper, we propose a novel approach for view imputation via generative adversarial networks (GANs), which we name as VIGAN. This approach first treats each view as a separate domain and identifies domain-to-domain mappings through a GAN using randomly-sampled data from each view, and then employs a multi-modal denoising autoencoder (DAE) to reconstruct the missing view from the GAN outputs based on paired data from each view. Then, by optimizing the GANs and DAE jointly, our model enables the knowledge integration learned for domain mappings and view correspondences to effectively recover the missing view. Empirical results on benchmark datasets validate the VIGAN approach by comparing against the state of the art, and an evaluation of VIGAN in a genetic study of substance use disorders further proves the effectiveness and utility of this approach in life science.

https://arxiv.org/abs/1811.04752v1 Learning Representations of Missing Data for Predicting Patient Outcomes