Serving Patterns

To round out our coverage we will explore some practices that may not be as important for the researchers but may be extremely important for the practical deployment of these kinds of systems into the wild. This chapter covers the different patterns involved in deploying a neural network into production.

Pruning

Compression

Hierarchical Soft Max

Binarization

Net2Net

Parameter Server

1Bit SGD

Hardware Acceleration

Crowd Sourcing

Hybrid Cognition

Testing

Layer Reduction

Operational Monitoring

References

http://arxiv.org/abs/1605.07678 An Analysis of Deep Neural Network Models for Practical Applications

Since the emergence of Deep Neural Networks (DNNs) as a prominent technique in the field of computer vision, the ImageNet classification challenge has played a major role in advancing the state-of-the-art. While accuracy figures have steadily increased, the resource utilisation of winning models has not been properly taken into account. In this work, we present a comprehensive analysis of important metrics in practical applications: accuracy, memory footprint, parameters, operations count, inference time and power consumption. Key findings are: (1) fully connected layers are largely inefficient for smaller batches of images; (2) accuracy and inference time are in a hyperbolic relationship; (3) energy constraint are an upper bound on the maximum achievable accuracy and model complexity; (4) the number of operations is a reliable estimate of the inference time. We believe our analysis provides a compelling set of information that helps design and engineer efficient DNNs.

https://research.googleblog.com/2016/02/running-your-models-in-production-with.html

http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43146.pdf Machine Learning: The High-Interest Credit Card of Technical Debt

Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns.

https://arxiv.org/pdf/1605.09774.pdf Asynchrony begets Momentum, with an Application to Deep Learning

http://martinfowler.com/articles/microservices.html

  • Componentization via Services
  • Organized around Business Capabilities
  • Products not Projects
  • Smart endpoints and dumb pipes
  • Decentralized Governance
  • Decentralized Data Management
  • Infrastructure Automation
  • Design for failure
  • Evolutionary Design

https://www.nervanasys.com/faster-training-in-neon-with-multiple-gpus-on-the-nervana-cloud

http://arxiv.org/abs/1609.02943 Stealing Machine Learning Models via Prediction APIs

https://arxiv.org/pdf/1609.06870v2.pdf Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

Currently, effective scaling is not possible beyond 16 nodes.

https://drive.google.com/file/d/0B-wQVEjH9yuhanpyQjUwQS1JOTQ/view Equal Opportunity of Supervised Learning

We propose a simple, interpretable, and actionable framework for measuring and removing discrimination based on protected attributes.

https://medium.com/@neal_lathia/reading-mining-large-streams-of-user-data-for-personalized-recommendations-a8010daf17ab#.o1bhk9sty

https://medium.com/zendesk-engineering/how-zendesk-serves-tensorflow-models-in-production-751ee22f0f4b#.7wiwr1ytv

https://arxiv.org/pdf/1611.06224.pdf Towards Unified Data and Lifecycle Management for Deep Learning

https://arxiv.org/abs/1703.03924v1 Real-Time Machine Learning: The Missing Pieces

. We assert that a new distributed execution framework is needed for such ML applications and propose a candidate approach with a proof-of-concept architecture that achieves a 63x performance improvement over a state-of-the-art execution framework for a representative application.

https://arxiv.org/pdf/1708.02637v1.pdf TensorFlow Estimators: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks

We present a framework for specifying, training, evaluating, and deploying machine learning models. Our focus is on simplifying cuŠing edge machine learning for practitioners in order to bring such technologies into production. Recognizing the fast evolution of the €eld of deep learning, we make no aŠempt to capture the design space of all possible model architectures in a domain- speci€c language (DSL) or similar con€guration language. We allow users to write code to de€ne their models, but provide abstractions that guide developers to write models in ways conducive to productionization. We also provide a unifying Estimator interface, making it possible to write downstream infrastructure.