Jude W. Shavlik
[introductory/intermediate] Advising, Explaining, Distilling, and Quantizing Deep Neural Networks
This series of lectures addresses some of the challenges of dealing with the large amounts of complex floating-point calculations in neural networks. The global implications of individual neural-network nodes’ calculations are not easily grasped, making it hard to (a) understand what was learned, (b) how future predictions are made, and (c) any implicit biases in these increasingly widely deployed intelligent systems. In addition, it is hard to ‘steer’ neural networks by giving them helpful hints in a manner analogous to how one might communicate with a human assistant. With network sizes rapidly nearing a trillion weights, their energy demands, especially during training, and inference speeds have become a major concern.
Discussed are methods for inserting ‘domain knowledge’ in the form of IF-THEN rules (advising), viewing the inverse task of ‘extracting’ IF-THEN rules from trained neural networks as one that can be machine learned (explaining), reducing network complexity by training a smaller and hence more efficient network to mimic a large one without losing much accuracy (distillation), and converting from 32-bit floating point calculations to more energy-efficient 8-bit integer ones with maintaining most of the predictive accuracy (qauntization). Convert case studies will be briefly covered to provide a sense of the capabilities of such methods. The focus is historical, focusing on the speaker’s research spanning 30 years, but there will be some discussion of recent directions and key challegences. Some related work from other parts of machine learning (support vector machines, statistical-relational learners, etc) will be mentioned.
1. Providing symbolic knowledge to neural networks. Case studies in molecular-biology tasks.
2. Learning what a neural network has learned. Case study with elevator dispatching.
3. Training a simplier neural network using a complex one as its teacher. Case study with gesture recognition.
4. Converting a neural network from one using 32-bit floating point to an equivalent one using only 8-bit integer arithmetic. Case studies with an ‘AI chip’ company.
- G. Benitiz-Garcia, J. Olivares-Mercado, G Sanchez-Perez, and K. Yanai (2020).
IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition.
Proceedings of ICPR 2020.
(Basis of a case study, not in the paper, that relates to distillation.)
- C. Bucula and R. Caruana (2006).
Proceedings of KDD ’06.
- M. Craven and J. Shavlik (1995).
Extracting Tree-Structured Representations of Trained Networks.
Advances in Neural Information Processing Systems (NIPS-8).
- G. Hinton, O. Vinyals, and J. Dean (2015).
Distilling the Knowledge in a Neural Network.
NIPS Deep Learning and Representation Learning Workshop.
- R. Krishnamoorthi (2018).
Quantizing Deep Convolutional Networks for Efficient Inference: A Whitepaper.
- R. Maclin and J. Shavlik (1993).
Using Knowledge-based Neural Networks To Improve Algorithms: Refining the Chou-Fasman Algorithm for Protein Folding.
(Shows how a neural network can refine a pre-existing, non-learning algorithm for an important task.)
- G. Towell and J. Shavlik (1994).
Knowledge-based Artificial Neural Networks.
- TensorFlow Lite 8-bit quantization specification.
Jude Shavlik is an Emeritus Professor of Computer Sciences and of Biostatistics and Medical Informatics at the University of Wisconsin – Madison, and is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI).
He was at Wisconsin from 1988 to 2017, following the receipt of his PhD from the University of Illinois for his work on Explanation-Based Learning. His research interests include machine learning and computational biology, with an emphasis on using rich sources of training information, such as human-provided advice.
He served for three years as editor-in-chief of the AI Magazine and serves on the editorial board of about a dozen journals. He chaired the 1998 International Conference on Machine Learning, co-chaired the First International Conference on Intelligent Systems for Molecular Biology in 1993, co-chaired the First International Conference on Knowledge Capture in 2001, was conference chair of the 2003 IEEE Conference on Data Mining, and co-chaired the 2007 International Conference on Inductive Logic Programming.
He was a founding member of both the board of the International Machine Learning Society and the board of the International Society for Computational Biology. He co-edited, with Tom Dietterich, “Readings in Machine Learning.” His research was supported by DARPA, NSF, NIH (NLM and NCI), ONR, DOE, AT&T, IBM, Yahoo, and NYNEX. Currently he lives in New York City where he does machine-learning consulting.