[intermediate/advanced] Information Theory for Deep Learning
Information theory is central to deep learning. Most fundamental is the cross entropy loss used in training classifiers. But generative adversarial networks (GANS) and variational autoencoders (VAES) are also defined by information-theoretic loss functions. Recently methods based contrastive predictive coding (CPC), motivated by the maximization of mutual information, has proved extremely effective in self-supervised training of image features. Even more recently more direct maximization of mutual information has proved more effective. This course will explore information theory form the perspective of current deep learning architectures with an emphasis on recent empirical results.
- The definition of information and Shanon’s source coding theorem.
- Generative Adversarial Networks (GANS) and Variational Auto Encoders (VAEs) from an information theoretic perspective.
- Perils of differential entropy and the rise of discrete VAEs.
- Mutual Information and self-supervised learning.
- Formal limitations on the measurement of mutual information.
- Cover and Thomas, Elements of Information Theory, Wiley Press
- Goodfellow et al., Generative Adversarial Nets, NeurIPS, 2014
- Kingma and Welling, Auto-encoding Variational Bayes, arXiv:1312.6114, 2013
- Oord and Viyals, Representation Learning with Contrastive Predictive Coding, arXiv:1807.03748, 2018
- Poole et al, On Variational Bounds of Mutual Information, arXiv:1905.06922, 2019
- McAllester and Stratos, Formal Limitations on the Measurement of Mutual Information, AISTATS 2020
Vector Calculus. Familiarity with convex functions and Jensen’s inequality.
David A. McAllester is Professor and former chief academic officer at the TTIC (the Toyota Technological Institute at Chicago). He received his B.S., M.S. and Ph.D. degrees from the Massachusetts Institute of Technology and has served on the faculties of Cornell and MIT. He was a member of technical staff at AT&T Labs-Research from 1995 to 2002 and has been a fellow of the American Association of Artificial Intelligence since 1997. He has written over 100 refereed publications. McAllester’s research areas include machine learning theory, the theory of programming languages, automated reasoning, AI planning, computer game playing (computer chess) and computational linguistics. A 1991 paper on AI planning proved to be one of the most influential papers of the decade in that area. A 1993 paper on computer game algorithms influenced the design of the algorithms used in the Deep Blue chess system that defeated Garry Kasparov. A 1998 paper on machine learning theory introduced PAC-Bayesian theorems which combine Bayesian and non-Bayesian methods. He was a co-author of the deformable part model that dominated object detection in computer vision from 2008 to 2012. He has been teaching fundamentals of deep learning since 2016.