Of course, there is much more we could do here - data augmentation, collect more data, fine-tune regularization parameters, maybe even try character-based CNNs or some type of recurrent neural network.

The take-away here, though, is that you can do deep learning with a very low number of training examples and still get tangible benefits in model performance and representational efficiency over manual feature engineering. At test time, deep learning can also be cheaper - it's often computationally faster to do a bunch of matrix multiplies than it is to compute features from scratch for each example. Convolutional Neural Networks for Sentence Classification

We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification. Low Data Drug Discovery with One-shot Learning

We introduce a new architecture, the residual LSTM embedding, that, when combined with graph convolutional neural networks, significantly improves the ability to learn meaningful distance metrics over small-molecules. We open source all models introduced in this work as part of DeepChem, an open-source framework for deep-learning in drug discovery.

A message that I hear often is that “deep learning is only relevant when you have a huge amount of data”. While not entirely incorrect, this is somewhat misleading. Certainly, deep learning requires the ability to learn features automatically from the data, which is generally only possible when lots of training data is available –especially for problems where the input samples are very high-dimensional, like images. However, convolutional neural networks –a pillar algorithm of deep learning– are by design one of the best models available for most “perceptual” problems (such as image classification), even with very little data to learn from. Training a convnet from scratch on a small image dataset will still yield reasonable results, without the need for any custom feature engineering. Convnets are just plain good. They are the right tool for the job.

I expected the CNN version of MMOD to inherit the low training data requirements of the HOG version of MMOD, but working with only 4 training images is very surprising considering other deep learning methods typically require many thousands of images to produce any kind of sensible results.