2nd International Summer School on Deep Learning
 23th — 27th July 2018, Genova, Italy 

Course Description

Follow DeepLearn2018


Paolo Frasconi University of Florence

Bilevel Programming for Hyperparameter Optimization and Meta-Learning

In this talk I will discuss some gradient-based approaches to hyperparameter optimization and meta-learning, focusing on algorithmic aspects and potential applications. Both problems can be formulated within a class of bilevel optimization problems which can be approximately solved by taking into account the optimization dynamics for the inner objective. The resulting framework unifies gradient-based hyperparameter optimization and meta-learning. Depending on the specific setting, the variables of the outer objective take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner. Under simple sufficient conditions, solutions of the approximate problem converge to those of the exact problem. The framework can be instantiated for meta-learning by treating the weights of the representation layers as parameters shared across a set of training episodes.

Paolo Frasconi received a PhD in Computer Engineering in 1994 and a MSc Degree in Electronic Engineering in 1990. He is professor of Computer Engineering at the University of Florence. He previously held positions at K.U. Leuven (Belgium), University of Cagliari (Italy), University of Wollongong (Australia), and MIT (USA).
His research interests are in the area of machine learning, with particular emphasis on algorithms for structured and relational data, learning with logical representations, and applications to biology, chemistry, transportation systems, brain imaging, and natural language processing. In these areas he co-authored over 140 refereed publications, one monography, and edited 5 books.
He is an Action Editor of Machine Learning Journal and he has been an Associate Editor for Artificial Intelligence Journal, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Neural Networks, ACM Transactions on Internet Technology. He was the Program Co-Chair of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery from Databases (ECML-PKDD 2016), the 7th Conference on Prestigious Applications of Intelligent Systems (PAIS 2012), the 20th International Conference on Inductive Logic Programming (ILP 2010), the Special Track on AI and Bioinformatics (AAAI 2010), the 5th International Workshop on Mining and Learning with Graphs (MLG 2007), and director of the NATO Advanced Studies Institute on Artificial Intelligence and Heuristic Methods for Bioinformatics (2001).

Marco Gori University of Siena

Motion Supervision in Visual Environments

In this talk I begin noticing that while ignoring the crucial role of temporal coherence, the formulation of most of nowadays current computer vision recognition tasks leads to tackle a problem that is remarkably more difficult than the one nature has prepared for humans. I claim that the exploitation of frame temporal coherence leads to formulate the motion invariance principle, which enables the construction of a new theory of learning with convolutional networks. The theory is applied to the construction of an unsupervised learning scheme for visual features in deep convolutional neural networks, in the framework of the principle of 'Least Cognitive Action', where an appropriate Lagrangian term is used to enforce a solution where the features are developed under motion invariance. The causal optimization of the cognitive action yields a solution where learning is carried out by an opportune blurring of the video, along the interleaving of segments of null signal. Interestingly, the theory also sheds light on the video blurring process in newborns and on the mechanisms that drive eye movements.

Marco Gori received the Ph.D. degree in 1990 from Università di Bologna, Italy, working partly at the School of Computer Science (McGill University, Montreal). In 1992, he became an Associate Professor of Computer Science at Università di Firenze and, in November 1995, he joint the Università di Siena, where he is currently full professor of computer science.

His main interests are in machine learning with applications to pattern recognition, Web mining, and game playing. He is especially interested in bridging logic and learning and in the connections between symbolic and sub-symbolic representation of information. He was the leader of the WebCrow project for automatic solving of crosswords, that outperformed human competitors in an official competition which took place during the ECAI-06 conference. As a follow up of this grand challenge he founded QuestIt, a spin-off company of the University of Siena, working in the field of question-answering. He is co-author of 'Web Dragons: Inside the myths of search engines technologies,' Morgan Kauffman (Elsevier), 2006, and “Machine Learning: A Constrained-Based Approach,” Morgan Kauffman (Elsevier), 2018.

Dr. Gori serves (has served) as an Associate Editor of a number of technical journals related to his areas of expertise, he has been the recipient of best paper awards, and keynote speakers in a number of international conferences. He was the Chairman of the Italian Chapter of the IEEE Computational Intelligence Society, and the President of the Italian Association for Artificial Intelligence. He is a fellow of the IEEE, ECCAI, IAPR. He is in the list of top Italian scientists kept by the VIA-Academy (http://www.topitalianscientists.org/top_italian_scientists.aspx)

24 Courses

Tülay Adalı University of Maryland, Baltimore County

Data Fusion through Matrix and Tensor Decompositions: Linear, Multilinear, and Nonlinear Models and their Applications


In many fields today, multiple sets of data are readily available. These might either refer to multimodal data where information about a given phenomenon is obtained through different types of acquisition techniques, or multiset data where the datasets are all of the same type but might be acquired from different subjects, at different time points, or under different conditions. Joint analysis of this data—its fusion—promises a more comprehensive and informative view of the task at hand than performing separate analyses, and, if probed carefully, may open new venues and answer questions we might not have even thought of asking when working with a single modality or dataset. Models based on matrix or tensor decompositions provide attractive solutions for fusion of both multi-modal and multiset data. These models minimize the assumptions—which is attractive as very little can be assumed about the relationship among multiple datasets—and at the same time, they can maximally exploit the interactions within and across the datasets.
This class will provide an overview of the main methods for matrix and tensor decompositions and the models that have been successfully applied for fusion of multiple datasets. An important focus is on the interrelated concepts of uniqueness, diversity, and interpretability. Diversity refers to any structural, numerical, or statistical property or assumption on the data that contributes to the identifiability of the model, which is key for interpretability, the ability to attach a physical meaning to the final decomposition. The relevance of these concepts as well as the challenges that remain are highlighted through a number of numerical and practical examples in various fields.


I. Basic matrix/tensor decompositions, identifiability & uniqueness
* CP decomposition
* CCA and MCCA
* ICA and IVA
* Other relevant matrix/tensor decompositions
II. Models for fusion of multiset and multimodal data
* IVA, transposed IVA
* Joint ICA and Parallel ICA
* Coupled matrix and tensor decompositions
* Nonlinear extensions
III. Performance evaluation, model comparison/selection
Examples in medical imaging, video processing, and recommender systems among others

D. Lahat, T. Adali and C. Jutten, 'Multimodal data fusion: An overview of methods, challenges, and prospects,' Proc. IEEE, vol. 103, no. 9, pp. 1449-1477, Sep. 2015. https://hal.archives-ouvertes.fr/hal-01179853

T. Adali, Y. Levin-Schwartz, and V. D. Calhoun, 'Multimodal data fusion using source separation: Two effective models based on ICA and IVA and their properties,' Proc. IEEE, vol. 103, no. 9, pp. 1478-1493, Sep. 2015.

T. Adali, M. Anderson, and G.-S. Fu, 'Diversity in independent component and vector analyses: Identifiability, algorithms, and applications in medical imaging,' IEEE Signal Processing Magazine, vol. 31, no. 3, pp. 18-33, May 2014.

Basic matrix algebra, probability and statistics, and background in estimation theory is desirable but not required.

Tülay Adali received the Ph.D. degree in Electrical Engineering from North Carolina State University, Raleigh, NC, USA,
in 1992 and joined the faculty at the University of Maryland Baltimore County (UMBC), Baltimore, MD, the same year.
She is currently a Distinguished University Professor in the Department of Computer Science and Electrical Engineering at UMBC.
She has been active in conference and workshop organizations. She was the general or technical co-chair of the IEEE Machine Learning for Signal Processing (MLSP) and Neural Networks for Signal Processing Workshops 2001−2008, and helped organize a number of conferences including the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). She has served or currently serving on numerous editorial boards and technical committees of the IEEE Signal Processing Society. She was the chair of the technical committee on MLSP, 2003−2005 and 2011−2013, and the Technical Program Co-Chair for ICASSP 2017. She is the Special Sessions Chair for ICASSP 2018.
Prof. Adali is a Fellow of the IEEE and the AIMBE, a Fulbright Scholar, and an IEEE Signal Processing Society Distinguished Lecturer. She was the recipient of a 2010 IEEE Signal Processing Society Best Paper Award, 2013 University System of Maryland Regents' Award for Research, and an NSF CAREER Award. Her current research interests are in the areas of statistical signal processing, machine learning, and applications in medical image analysis and fusion.

Pierre Baldi University of California, Irvine

Deep Learning: Theory, Algorithms, and Applications to the Natural Sciences


The process of learning is essential for building natural or artificial intelligent systems. Thus, not surprisingly, machine learning is at the center of artificial intelligence today. And deep learning--essentially learning in complex systems comprised of multiple processing stages--is at the forefront of machine learning. The lectures will provide an overview of neural networks and deep learning with an emphasis on first principles and theoretical foundations. The lectures will also provide a brief historical perspective of the field. Applications will be focused on difficult problems in the natural sciences, from physics, to chemistry, and to biology.

1: Introduction and Historical Background. Building Blocks. Architectures. Shallow Networks. Design and Learning.
2: Deep Networks. Backpropagation. Underfitting, Overfitting, and Tricks of the Trade.
3: Two-Layer Networks. Universal Approximation Properties. Compressive and Expansive Autoencoders. Network capacity.
4: Learning in the Machine. Local Learning and the Learning Channel. Hebbian Learning. Dropout. Optimality of BP and Random BP.
5: Architectures (Convolutional, Siamese, GANs, etc). Applications.
6: Recurrent Networks. Hopfield model. Boltzmann machines.
7: Recursive and Recurrent Networks. Design and Learning. Inner and Outer Approaches.
8: Applications to Physics (High Energy, Neutrino, Antimatter, Dark Matter, etc.)
9: Applications to Chemistry (Molecules, Reactions, etc).
10: Applications to Biology (Proteins, DNA, Biomedical Imaging, etc).


Basic algebra, calculus, and probability at the introductory college level. Some previous knowledge of machine learning could be useful but not required.

Pierre Baldi earned MS degrees in Mathematics and Psychology from the University of Paris, and a PhD in Mathematics from the California Institute of Technology. He is currently Chancellor's Professor in the Department of Computer Science, Director of the Institute for Genomics and Bioinformatics, and Associate Director of the Center for Machine Learning and Intelligent Systems at the University of California Irvine. The long term focus of his research is on understanding intelligence in brains and machines. He has made several contributions to the theory of deep learning, and developed and applied deep learning methods for problems in the natural sciences such as the detection of exotic particles in physics, the prediction of reactions in chemistry, and the prediction of protein secondary and tertiary structure in biology. He has written four books and over 300 peer-reviewed articles. He is the recipient of the 1993 Lew Allen Award at JPL, the 2010 E. R. Caianiello Prize for research in machine learning, and a 2014 Google Faculty Research Award. He is and Elected Fellow of the AAAS, AAAI, IEEE, ACM, and ISCB.

Thomas Breuel NVIDIA Corporation

Rational Design of Robust Large Scale Deep Learning Systems


Applications of deep learning in areas such as autonomous driving, text and speech recognition, tracking, and forensics involve (1) risk analysis and testing, (2) training very large datasets (many petabytes), and (3) integration with classical methods of computer vision and image processing. These lectures will introduce students to these three areas and lay the ground work for being able to develop, train, and deploy deep learning systems that are reliable and scalable.

Risk and error analysis: basics of decision theory; decision theory and deep learning; combination of statistical models with deep learning models; empirical evaluation of DL systems; testing and verification methods; data augmentation
Large scale training: working with petascale datasets; core infrastructure (Kubernetes, sharding, object stores); getting to full GPU utilization; multi-node training; batch sizes; profiling
Classical vs Deep Learning Methods: classical machine learning models (PCA, ICA, K-means, etc.) and their deep learning equivalents; classical image processing models (FIR, IIR, LPC, separability, morphology, Wiener filter) and their deep learning equivalents; distillation and adaptation of classical models; rational network design and debugging

The following are recommended 'classical' textbooks, useful for background reading.
Duda and Hart: Pattern Classification
Jae Lim: Two dimensional signal processing
Gonzales and Woods: Digital Image Processing
Berger: Statistical Decision Theory
Strang: Introduction to Applied Mathematics

A basic hands-on understanding of deep learning models at the level of an introductory course is helpful. Statistics, machine learning, and image processing will be covered in the course and course materials.

Thomas Breuel works on deep learning and computer vision at NVIDIA Research. Prior to NVIDIA, he was a full professor of computer science at the University of Kaiserslautern (Germany) and worked as a researcher at Google, Xerox PARC, the IBM Almaden Research Center, IDIAP, Switzerland, as well as a consultant to the US Bureau of the Census. He is an alumnus of the Massachusetts Institute of Technology and Harvard University. He is an author on over 200 publications on computer vision, machine learning, and deep learning.

Joachim M. Buhmann Swiss Federal Institute of Technology Zurich

Model Selection by Algorithm Validation


One of the central challenges in machine learning today relates to the high parameter complexity of models and their relation to a large amount of heterogeneous, noisy data. What is the relevant information in the data to select a model / hypothesis from a hypothesis class? Machine learning research has answered this question to a scientifically satisfactory degree in supervised learning, i.e., classification and regression. Without teacher provided guidance, however, model selection and validation still appears to be a magic engineering art mostly dominated by heuristics fundamental ML algorithms The course willl cover traditional model selection methods like AIC, BIC, but also the stability method and a novel approach based on information theory. The resulting selection score, called posterior agreement criterion, requires hypotheses to agree on two different instances drawn from the same data source. Such a robustness criterion captures the spirit of cross-validation and ensures that hypotheses of models are selected according to the signal in the data and are not significantly affected by noise.

Algorithms and Gibbs distributions, Maximum Entropy method, AIC, BIC, stability selection,
Information Theoretic Model Validation, algorithms as time evolving posterior distributions; examples in approximate sorting, aproximate spanning trees,pipeline tuning in biomedical applications.

J.M. Buhmann et al., Robust optimization in the presence of uncertainty: A generic approach,Journal of Computer and System Sciences 94, pp. 135-166, (2018)
J.M. Buhmann, Information theoretic model validation for clustering, ISIT 2010 Austin, pp 1398 - 1402, (2010)
J.M. Buhmann, SIMBAD: Emergence of Pattern Similarity, in Advances in Vision and Pattern Recognition, Ed. Marcello Pelillo, Springer, (2013), isbn = 978-1-4471-5627-7

Introductory course in Machine Learning and/or Statistics.

Joachim M. Buhmann is full Professor for Computer Science at ETH Zurich since October 2003. He heads the Institute for Machine Learning at the Department of Computer Science. Joachim Buhmann studied physics at the Technical University of Munich and was awarded a PhD for his work on artificial neuronal networks in 1988. After research appointments at the University of Southern California and at the Lawrence Livermore National Laboratory he joined the University of Bonn as professor for practical computer science (1992 – 2003). Buhmann’s research interests cover theory and applications of machine learning and artificial intelligence, as well as wide range of subjects related to information processing in the life sciences. His conceptual and theoretical work on machine learning investigates the central question, how complex models and algorithms in data analysis (Big Data) can be validated, if they are estimated from empirical observations. Particularly, the concepts of statistical and algorithmic complexity and their mutual dependency need to be understood in this context.Joachim Buhmann served as Director of Studies for Computer Science (2008 – 2013) and as Vice-Rector for Study Programmes (2014-2017) of ETH Zurich. The German Pattern Recognition Society (DAGM) awarded him an honorary membership in 2017. He was elected as an individual member of the Swiss Academy of Engineering Sciences (SATW) in the same year.

Sergei V. Gleyzer University of Florida

Feature Extraction, End-end Deep Learning and Applications to Very Large Scientific Data: Rare Signal Extraction, Uncertainty Estimation and Realtime Machine Learning Applications in Software and Hardware


Deep learning, and machine learning in general, has become one of the most widely used tools in modern science and engineering, leading to breakthroughs in a number of areas and disciplines ranging from computer vision to natural language processing and medical outcome analysis. This course will introduce the basics of machine learning grounded in theoretical foundations of statistical learning and describe two classes of popular algorithms in depth: decision-based methods (decision trees, decision rules, bagging and boosting, random forests) and deep neural network-based models of various types. The course will focus on practical applications in analysis of large scientific data, interpretability, uncertainty estimation and end-to-end deep learning and how to best extract meaningful features with autonomous feature extraction and feature engineering as well as implementation of realtime deep learning in software and hardware. No previous machine learning background is required.

Introduction to Machine Learning: Theoretical Foundations
Practical Applications and Examples in Sciences and Engineering with Large Scientific Data
Decision-based Methods: decision trees, rules, bagging, boosting, random forests
Deep Learning Methods: fully-connected, convolutional, recurrent
Fundamentals of Feature Extraction and End-to-end Deep Learning
Uncertainty Estimation and Interpreting Machine Learning Models
Realtime Implementation of Deep Learning in Software and Hardware

G. James et al., “Introduction to Statistical Learning” Springer 2013
C.M. Bishop “Pattern Recognition and Machine Learning” Springer 2006
J. R. Quinlan “C4.5: Programs for Machine Learning” Morgan Kaufmann 1992


Sergei Gleyzer is a particle physicist, a co-discover of the Higgs Boson, working at the interface of particle physics and machine learning towards more intelligent systems capable of combining physical constraints and powerful learning algorithms to extract meaningful information from the data collected by the Large Hadron Collider (LHC), the world’s highest-energy particle physics experiment located at the CERN laboratory, near Geneva Switzerland. He is the founder of several major initiatives such as the Inter-experimental Machine Learning Working Group and Compact Muon Solenoid experiment’s Machine Learning Forum and is one of the leading particle physicists working on searches for new physics, such as dark matter. He is a recipient of several awards and is passionate about scientific communication and public outreach.

Marco Gori University of Siena

Constrained Learning and Reasoning with Constraints


In spite of the amazing results obtained by deep learning in many applications, intelligent agents acting in a complex environment can strongly benefit from prior knowledge on the environment, especially when it is expressed by logic formalisms. In this brief course, we introduce a theory for modeling the agent interactions with the environments by means of the unified notion of constraint, that is shown to embrace machine learning and logic inferential processes within the same mathematical framework. Then, we present CLARE (Constrained Logic and Reasoning Environment), which can be regarded as a tool to assist the design of intelligent agents in a rich variety of application domains. CLARE is implemented in TersorFlow (TF) and provides an input language to define arbitrary First Order Logic background knowledge, including clauses, groundings, and constants. The predicates and the functions can be bound to any TF computational graph, while the formulas are converted into a set of real-valued constraints by means of t-norms, that can be defined during the design. As a result, we end up into a unified framework for performing learning and inference that is especially useful when both data and structured knowledge are jointly available. A number of cases studies are illustrated to facilitate the acquisition of CLARE.

1. Modeling environmental interactions by constraints, notion of individual;
2. Supervised, unsupervised, and semi-supervised learning as special cases of learning with constraints;
3. Functional representations and constraint reactions, learning in the primal and dual space;
4. Logic constraints by t-norms;
5. Case studies, https://github.com/GiuseppeMarra/CLARE

Marco Gori, 'Machine Learning: A Constraint-Based Approach,' Morgan Kauffman (Elsevier), 2018 (560 pp.);
• Michelangelo Diligenti, Marco Gori, Claudio Saccà: Semantic-Based Regularization for Learning and Inference, Artificial Intelligence 244: 143-165 (2017);
• Francesco Giannini, Michelangelo Diligenti, Marco Gori, and Marco Maggini, 'Learning Lukasiewicz Logic Fragments by Quadratic Programming,' Proc. of ECML-PKDD 2017: Machine Learning and Knowledge Discovery in Databases pp 410-426;
• Michelangelo Diligenti, Marco Gori, Marco Maggini, and Leonardo Rigutini, 'Bridging Logic and Kernel Machines,' Machine Learning, January 2012, Volume 86, Issue 1, pp 57–88.

Participants are expected to have a background in deep learning and preliminary notions on knowledge representation and logic formalisms. These notions, however, will be briefly reviewed along with the notion of t-norm.

Marco Gori received the Ph.D. degree in 1990 from Università di Bologna, Italy, working partly at the School of Computer Science (McGill University, Montreal). In 1992, he became an Associate Professor of Computer Science at Università di Firenze and, in November 1995, he joint the Università di Siena, where he is currently full professor of computer science.

His main interests are in machine learning with applications to pattern recognition, Web mining, and game playing. He is especially interested in bridging logic and learning and in the connections between symbolic and sub-symbolic representation of information. He was the leader of the WebCrow project for automatic solving of crosswords, that outperformed human competitors in an official competition which took place during the ECAI-06 conference. As a follow up of this grand challenge he founded QuestIt, a spin-off company of the University of Siena, working in the field of question-answering. He is co-author of 'Web Dragons: Inside the myths of search engines technologies,' Morgan Kauffman (Elsevier), 2006, and “Machine Learning: A Constrained-Based Approach,” Morgan Kauffman (Elsevier), 2018.

Dr. Gori serves (has served) as an Associate Editor of a number of technical journals related to his areas of expertise, he has been the recipient of best paper awards, and keynote speakers in a number of international conferences. He was the Chairman of the Italian Chapter of the IEEE Computational Intelligence Society, and the President of the Italian Association for Artificial Intelligence. He is a fellow of the IEEE, ECCAI, IAPR. He is in the list of top Italian scientists kept by the VIA-Academy (http://www.topitalianscientists.org/top_italian_scientists.aspx)

Michael Gschwind IBM Global Chief Data Office

Deploying Deep Learning at Enterprise Scale


A confluence of new artificial neural network architectures and unprecedented compute capabilities based on numeric accelerators has reinvigorated interest in Artificial Intelligence based on neural processing. Initial first successful deployments in hyperscale internet services are now driving broader commercial interest in adopting Deep learning as a design principle for cognitive applications in the enterprise. In this class, we will review hardware acceleration and co-optimized software frameworks for Deep Learning, and discuss model development and deployment to accelerate adoption of Deep Learning based solutions for enterprise deployments

Session 1:
1a. Hardware Foundations of the Great AI Re-Awakening
1b. Deployment models for DNN Training and Inference
Session 2:
Optimized High Performance Training Frameworks
Session 3:
Parallel Training Environments

M. Gschwind, Need for Speed: Accelerated Deep Learning on Power, GPU Technology Conference, Washington DC, October 2016.


Dr. Michael Gschwind is Chief Engineer for Machine Learning and Deep Learning for IBM Systems where he leads the development of hardware/software integrated products for cognitive computing. Over the past several years, the led the creation of the OpenPOWER Linux environment supporting GPU accelerators, created and brought to market of several generations of PowerAI, led the optimization of PowerAI for Watson workloads, and currently leads the development of the Deep Learning at Scale (DL@S) high performance cloud environment for deep learning at IBM. During his career, Dr. Gschwind has been a technical leader for IBM’s key transformational initiatives, leading the development of the OpenPOWER Hardware Architecture as well as the software interfaces of the OpenPOWER Software Ecosystem. In previous assignments, he was a chief architect for Blue Gene, POWER8, and POWER7. As chief architect for the Cell BE, Dr. Gschwind created the first programmable numeric accelerator serving as chief architect for both hardware and software architecture. In addition to his industry career, Dr. Gschwind has held faculty appointments at Princeton University and Technische Universität Wien. While at Technische Universität Wien, Dr. Gschwind invented the concept of neural network training and inference accelerators. Dr. Gschwind is a Fellow of the IEEE, an ACM Distinguished Speaker, Chair of the ACM SIGMICRO Executive Committee, an IBM Master Inventor and a Member of the IBM Academy of Technology.

Namkug Kim Asan Medical Center

Deep Learning for Computer Aided Detection/Diagnosis in Radiology and Pathology


Medical imaging is getting more important in modern medicine including radiology, pathology, surgery, neuroscience, etc. In case of radiology, there are several shortcomings in case of typical diagnostic radiology, due to the qualitative reading of a human observer. In addition, the rapid development of recent medical imaging equipment which produce a tremendous amount of image data makes the typical medical image reading nearly impractical. Recently, deep learning shows better accuracy for detection and classification in computer vision, which could be rapidly applied to medical imaging areas. I'll introduce methodology of data science including machine learning, and deep learning, and deep learning based applications in computer vision, computer aided diagnosis in radiology and pathology. In addition, I'll suggest some practical considerations on application of these technology to clinical workflow including efficient labeling technology, interpretability and visualization (No blackbox), uncertainty (Data level, Decision level), reproducibility of deep learning, novelty in supervised learning, one-shot or multi-shot learning due to Imbalanced data set or rare disease, deep survival, and physics induced machine learning.

1. Introduction to data science, machine learning, and deep learning
2. Deep learning in computer vision and applicaions
3. Deep learning for computer aided detection/diagnosis in radiology
4. Deep learning for computer aided detection/diagnosis in pathology
5. Practical consideration for deep learning application in medicine
- efficient labeling technology
- Interpretability and visualization (No blackbox)
- Uncertainty (Data level, Decision level)
- Reproducibility of deep learning
- Novelty in supervised learning
- One-shot or multi-shot learning due to Imbalanced data set or rare disease
- Deep survival
- Physics induced machine learning

Deep into the Brain: Artificial Intelligence in Stroke Imaging., Lee EJ, Kim YH, Kim N, Kang DW., J Stroke. 2017 Sep;19(3):277-285. doi: 10.5853/jos.2017.02054. Epub 2017 Sep 29. Review.
Comparison of Shallow and Deep Learning Methods on Classifying the Regional Pattern of Diffuse Lung Disease, Guk Bae Kim, Kyu-Hwan Jung, Yeha Lee, Hyun-Jun Kim, Namkug* Kim, Sanghoon Jun, Joon Beom Seo, David A. Lynch, Journal of Digital Imaging, 17 October 2017 (co-CA)
Development of a Computer-Aided Differential Diagnosis System to Distinguish Between Usual Interstitial Pneumonia and Non-specific Interstitial Pneumonia Using Texture- and Shape-Based Hierarchical Classifiers on HRCT Images. Jun S, Park B, Seo JB, Lee S, Kim N*. J Digit Imaging. 2017 Sep 7. doi: 10.1007/s10278-017-0018-y. PMID: 28884381 [PubMed – as supplied by publisher] (co-CA)
Deep Learning in Medical Imaging: General Overview. Lee JG, Jun S, Cho YW, Lee H, Kim GB, Seo JB, Kim N*. Korean J Radiol. 2017 Jul-Aug;18(4):570-584. doi: 10.3348/kjr.2017.18.4.570. Epub 2017 May 19. Review PMID: 28670152 [PubMed – in process]
Deep Learning: A Primer for Radiologists, Gabriel Chartrand, et al, Radiographics, Volume 37, Issue 7, 2017

Basic knowledge of computer algorithms and software; knowledge of machine learning and deep learning is recommended.

Namkug Kim is a professor at University of Ulsan College of Medicine, and also holds an appointment at Asan Medical Center (http://eng.amc.seoul.kr/), one of leading hospitals in South Korea. He is currently as a dual appointed assistant professor at Department of Convergence Medicine and Radiology. He received his BS, MS, PhD degrees from the Department of Industrial Engineering at Seoul National University and the author of about 160 peer-reviewed original articles and 90 patents (https://scholar.google.com/citations?user=namkugkim). His research interests are the areas of image based clinical applications including artificial intelligence in medicine, 3d printing in medicine, computer aided diagnosis, computer aided surgery, and robotic interventions, medical image processing, etc.

Sun-Yuan Kung Princeton University

A Methodical and Cost-effective Approach to Optimization/Generalization of Deep Learning Networks


This course will start with the introduction of two basic machine learning subsystems: Feature Engineering (e.g. CNN for Image/Speech Feature Extraction) and Label Engineering, e.g. Multi-layer Perceptron (MLP). The great success of DNN in broad applications of deep learning networks hinges upon the rich nonlinear space embedded in their nonlinear hidden (neuron) layers. However, we face two major challenges: (1) the curse of depth and (2) the ad hoc nature of deep learning. Fortunately, many solutions have been proposed to effectively overcome the 'vanishing gradient' problem due to the curse of depth. In particular, we shall elaborate (a) cross-entropy (with amplified gradients) effective to surrogate the 0-1 loss; (b) the merit of ReLu-neurons and (c) the vital roles of bagging, mini-batch, and dropout.
It is widely recognized that the ad hoc nature of deep learning renders its success at the mercy of trial- and-errors. To combat this problem, we advocate a methodic and cost-effective learning paradigm (MINDnet) to train multi-layer networks. In particular, MINDnet elegantly circumvents the curse of depth by harnessing a new notion of omni-present supervision, i.e. teachers hidden within a sort of 'Trojan-horse' traveling along with the forward-propagating signals from the input to hidden layers. Therefore, one can directly harvest teacher’s information at any hidden-layer in the MLP, i.e. , no- propagation (NP) will be required. This will lead to a new and slender 'inheritance layer' to summarize (inherit) all the discriminant information embedded in the previous layer. Moreover, by augmenting the inheritance layer with additional randomized nodes and applying again back-propagation (BP) learning, the discriminant power of the network can be further enhanced. Finally, we have compared MINDnet with several popular learning models on real-world datasets, including CIFAR images, MNIST, mHealth, HAR, Yale, Olivetti, Essex datsets. Our preliminary simulation seems to suggest some superiority by MINDnet. For example, for the CIFAR-10 dataset, 97.9%+/-0.16% (MINDnet) > 97.4% (CutNet) > 96.0% (DenseNet) > 93.6% (ResNet).

Session 1:
Introduction of two basic machine learning subsystems:
- Feature Engineering: CNN for Image/Speech Feature Extraction
- Label Engineering: multi-layer deep learning networks
Introduce supplementary (SVM-based) subsystems for validation and prediction and highlight their
vital roles in optimization and generalization.
Introduce an effective surrogate function (to surrogate the 0-1 loss) in the training phase:
- How/why cross-entropy offers amplified gradients.

Introduce network friendly training metrics:
• equivalent optimization metrics: LSE (Gauss), FDR (Fisher) and Mutual Information (Shannon).

Session 2:
Derive Back-propagation (BP) Algorithm for
- back-propagation of 1st-order (gradient) and 2nd-order (Hessian) functions
Discuss effective remedies for tackling the vanishing gradient problem in deep networks:
- ReLu-neuron
- bagging, minim-batch, and dropout
Introduce MINDnet learning paradigm:
- Why the acronym MIND: Monotonically INcreasing Discriminant (MIND).
- A simple solution to overcome the Curse of Depth: No-propagation (NP) learning algorithm
o How to harness the teacher information “hidden” in the hidden layer?
- How to use a small number of nodes (inheritance layer) to fully summarize (inherit) all the
useful information embedded in the entire previous layer?
- To highlight the vital role of BP/NP hybrid learning.
Session 3:
Elaborate the detailed procedure to successively construct MINDnets with gradually growing depth:

• (vertical expansion)= full Inheritance with a small number of nodes
• (horizontal expansion)= Inheritance Theorem + random nodes

Demonstrate that the prediction accuracy indeed improves as the MINDnet grows deeper:
• Via a Synthetic dataset, we shall conduct an extensive comparative study of various machine
learning tools in the literature.
- compare MINDnet with other existing networks based real-world datasets such as CIFAR,
MNIST, Yale, Olivetti, Essex, mHealth, HAR, etc.

1. I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT Press, Cambridge, UAA, 2016.
2. C.M. Bishop, Pattern Recognition and Machine Learning, Berlin: Springer.
3. S.Y. Kung, Digitial Neural Networks. Prentice Hall, 1993.
4. S.Y. Kung, Kernal Methods and Machine Learning, Cambridge Press, 2014.
5. Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2016). Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530.
6. Koh, P. W., & Liang, P. (2017). Understanding black-box predictions via influence functions. arXiv preprint arXiv:1703.04730.

Linear Algebra;Understanding of the design and analysis of algorithms

S.Y. Kung, Life Fellow of IEEE, is a Professor at Department of Electrical Engineering in Princeton University. His research areas include machine learning, data mining, systematic design of (deep-learning) neural networks, statistical estimation, VLSI array processors, signal and multimedia information processing, and most recently compressive privacy. He was a founding member of several Technical Committees (TC) of the IEEE Signal Processing Society. He was elected to Fellow in 1988 and served as a Member of the Board of Governors of the IEEE Signal Processing Society (1989-1991). He was a recipient of IEEE Signal Processing Society's Technical Achievement Award for the contributions on "parallel processing and neural network algorithms for signal processing" (1992); a Distinguished Lecturer of IEEE Signal Processing Society (1994); a recipient of IEEE Signal Processing Society's Best Paper Award for his publication on principal component neural networks (1996); and a recipient of the IEEE Third Millennium Medal (2000). Since 1990, he has been the Editor-In- Chief of the Journal of VLSI Signal Processing Systems. He served as the first Associate Editor in VLSI Area (1984) and the first Associate Editor in Neural Network (1991) for the IEEE Transactions on Signal Processing. He has authored and co-authored more than 500 technical publications and numerous textbooks including ``VLSI Array Processors'', Prentice-Hall (1988); ``Digital Neural Networks'', Prentice-Hall (1993) ; ``Principal Component Neural Networks'', John-Wiley (1996); ``Biometric Authentication: A Machine Learning Approach'', Prentice-Hall (2004); and ``Kernel Methods and Machine Learning”, Cambridge University Press (2014).

Li Erran Li Uber ATG

Deep Reinforcement Learning: Foundations, Recent Advances and Frontiers


Deep reinforcement learning has enabled artificial agents to achieve human-level performances across many challenging domains, e.g. playing Atari games and Go. I will cover the foundations of reinforcement learning, present several important algorithms including deep Q-Networks and asynchronous actor-critic algorithms (A3C), DDPG, SVG, guided policy search, TDM. I will discuss major challenges and promising results towards making deep reinforcement learning applicable to real world problems in robotics and natural language processing.

1. Introduction to reinforcement learning (RL)
2. Value-based deep RL
Deep Q-learning (deep Q-Networks)
Temporal-difference model (TDM)
3. Policy-based deep RL
Policy gradients
Asynchronous actor-critic algorithms (A3C)
Natural gradients and trust region optimization (TRPO)
Deep deterministic policy gradients (DDPG), SVG
4. Model-based deep RL: guided policy search
5. Deep learning in multi-agent environment: fictitious self-play
6. Imitation learning: GAIL and InfoGAIL
7. Exploration
8. Inverse RL
9. Transfer learning, multitask learning and meta learning in RL
10. Frontiers
Application to robotics
Application to natural language understanding

V. Pong, S. Gu, M. Dalal, S. Levine, Temporal Difference Models: Model-Free Deep RL for Model Based Control, ICLR 2018

B., and de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. In the Annual Conference on Neural Information Processing Systems (NIPS).

Asri, L. E., He, J., and Suleman, K. (2016). A sequence-to-sequence model for user simulation in spoken dialogue systems. In Annual Meeting of the International Speech Communication Association (INTERSPEECH).

Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., and Kautz, J. (2017). Reinforcement learning through asynchronous advantage actor-critic on a gpu. Submitted to Int’l Conference on Learning Representations.

Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R., Pineau, J., Courville, A., and Bengio, Y. (2017). An actor-critic algorithm for sequence prediction. Submitted to Int’l Conference on Learning Representations.

Chebotar, Y., Kalakrishnan, M., Yahya, A., Li, A., Schaal, S., and Levine, S. (2016). Path integral guided policy search. ArXiv e-prints.

Deng, L. and Liu, Y. (2017). Deep Learning in Natural Language Processing (edited book, scheduled August 2017). Springer.

Dhingra, B., Li, L., Li, X., Gao, J., Chen, Y.-N., Ahmed, F., and Deng, L. (2016). End-to-End Reinforcement Learning of Dialogue Agents for Information Access. ArXiv e-prints. Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10):78–87.

Dulac-Arnold, G., Evans, R., van Hasselt, H., Sunehag, P., Lillicrap, T., Hunt, J., Mann, T., Weber, T., Degris, T., and Coppin, B. (2016). Deep reinforcement learning in large discrete action spaces In the International Conference on Machine Learning (ICML).

Finn, C., Christiano, P., Abbeel, P., and Levine, S. (2016). A connection between GANs, inverse reinforcement learning, and energy-based models. In NIPS 2016 Workshop on Adversarial Training.

Finn, C. and Levine, S. (2016). Deep visual foresight for planning robot motion. ArXiv e-prints.

Finn, C., Yu, T., Fu, J., Abbeel, P., and Levine, S. (2017). Generalizing skills with semi supervised reinforcement learning. Submitted to Int’l Conference on Learning Representations.

Florensa, C., Duan, Y., and Abbeel, P. (2017). Stochastic neural networks for hierarchical reinforcement learning. Submitted to Int’l Conference on Learning Representations.

García, J. and Fernández, F. (2015). A comprehensive survey on safe reinforcement learning. The Journal of Machine Learning Research, 16:1437–1480.

Basic knowledge of reinforcement learning, deep learning and Markov decision processes

Li Erran Li received his Ph.D. in Computer Science from Cornell University advised by Joseph Halpern. He is currently with Uber ATG and an adjunct professor in the Computer Science Department of Columbia University. His research interests are AI, deep learning, machine learning algorithms and systems. He is an IEEE Fellow and an ACM Fellow.

Dimitris N. Metaxas Rutgers University

Adversarial, Discriminative, Recurrent, and Scalable Deep Learning Methods for Human Motion Analytics, Medical Image Analysis, Scene Understanding and Image Generation


A major challenge of modern machine learning and artificial intelligence is to offer understanding and reasoning for domains such as complex real-world environments, humans and their activities, medical imaging analytics and real world image generation. Addressing such problems for meta knowledge creation requires methods that combine deep neural methods, sparse methods, mixed norms, AI, and deformable modeling methods. This course will introduce the above new concepts and methodologies and will focus on three main topics: a) Deriving high order information from complex scenes and human movement for event understanding, b) Generative Adversarial Networks (GAN) and deep learning for real world image and video generation and story telling, and c) Cardiac and Cancer Medical Image Analytics.

1. Scene and Human Motion Understanding
Neural Nets and Nonnegative Matrix Factorization Concepts
Using NNs for Scene Understanding
Human Motion Understanding and Sign Language Understanding
2. GANs and other Deep Learning Methods for scene generation and Story Telling
Introduction to GANs
Modifications to develop Stack GANs for scene generation from text
Video Generation from Sentences
3. Medical Image Analytics
Deformable Models and Deep Learning
Cardiac Analytics
Cancer Diagnosis from Clinical and Preclinical Data

RED-Net: A recurrent encoder-decoder network for video-based face alignment.Xi Peng, Rogerio Feris, Xiaoyu Wang, Dimitris Metaxas. International Journal of Computer Vision (IJCV), 2018.

CR-GAN: Learning Complete Representations for Multi-view Generation. Yu Tian, Xi Peng, Long Zhao, Shaoting Zhang, Dimitris Metaxas International Joint Conference on Artificial Intelligence (IJCAI), 2018.

Jointly optimize data augmentation and network training: Adversarial data augmentation. Xi Peng, Zhiqiang Tang, Fei Yang, Rogerio S Feris, Dimitris Metaxas. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

3D Motion Modeling and Reconstruction of Left Ventricle Wall in Cardiac MRI. Dong Yang, Pengxiang Wu, Chaowei Tan, Kilian M Pohl, Leon Axel, Dimitris Metaxas. Functional Imaging and Modeling of the Heart, FIMH 2017.

Deep Image-to-Image Recurrent Network with Shape Basis Learning for Automatic Vertebra Labeling in Large-Scale 3D CT Volumes. Dong Yang, Tao Xiong, Daguang Xu, S Kevin Zhou, Zhoubing Xu, Mingqing Chen, JinHyeong Park, Sasa Grbic, Trac D Tran, Sang Peter Chin, Dimitris Metaxas, Dorin Comaniciu. International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 498-506, 2017.

Automatic liver segmentation using an adversarial image-to-image network Dong Yang, Daguang Xu, S Kevin Zhou, Bogdan Georgescu, Mingqing Chen, Sasa Grbic, Dimitris Metaxas, Dorin Comaniciu. International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 201 .

Pixel-Wise Neural Cell Instance Segmentation. Jingru Yi, Pengxiang Wu, Daniel J Hoeppner, Dimitris Metaxas. Proceedings of the IEEE ISBI, 2018

Multi-Component Deformable Models Coupled with 2D-3D U-Net for Automated Probabilistic Segmentation of Cardiac Walls and Blood. Dong Yang, Huang Qiaoying, Leon Axel and Dimitris Metaxas Proceedings of IEEE ISBI, 2018.

Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation. Xi Peng; Zhiqiang Tang, Fei Yang, Rogerio S. Feris, Dimitris Metaxas. Procs. CVPR 2018

Show Me a Story: Towards Coherent Neural Story Illustration. Hareesh Ravi, Lezi Wang, Carlos Muniz, Leonid Sigal, Dimitris Metaxas, Mubbasir Kapadia. Procs CVPR 2018.

Improving GANs Using Optimal Transport. Tim Salimans · Han Zhang · Alec Radford · Dimitris Metaxas, ICLR 2018

A recurrent encoder-decoder network for sequential face alignment. Xi Peng, Rogerio Feris, Xiaoyu Wang, Dimitris Metaxas European Conference on Computer Vision (ECCV), 2016

Parallel sparse subspace clustering via joint sample and parameter blockwise partition.B Liu, XT Yuan, Y Yu, Q Liu, DN Metaxas. ACM Transactions on Embedded Computing Systems (TECS) 16 (3), 75, 2017

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks.H Zhang, T Xu, H Li, S Zhang, X Wang, X Huang, D Metaxas. arXiv preprint arXiv:1710.10916, 2017

Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks.H Zhang, T Xu, H Li, S Zhang, X Wang, X Huang, D Metaxas. IEEE Int. Conf. Comput. Vision (ICCV), 5907-5915, 2017.

Calculus and PDEs, Basic Optimization Methods, Deep Neural Nets, Numerical analysis.

Dr. Dimitris Metaxas is a Distinguished Professor and Chair of the Computer Science Department at Rutgers University. He is director of the Center for Computational Biomedicine, Imaging and Modeling (CBIM). He has also been a tenured faculty member in the Computer and Information Science Department of the University of Pennsylvania. Prof. Metaxas received a Diploma with highest honors in Electrical Engineering and Computer Science from the National Technical University of Athens Greece, an M.Sc. in Computer Science from the University of Maryland, College Park, and a Ph.D. in Computer Science from the University of Toronto. Dr. Metaxas has been conducting research towards the development of formal methods to advance understanding of complex scenes and human movement, multimodal aspects of human language and ASL medical imaging, computer vision, computer graphics. His research emphasizes the development of formal models for shape and motion representation and understanding, deterministic and statistical object modeling and tracking, deformable models, sparse learning methods for segmentation, generative adversarial networks, and augmenting neural net methods for understanding. Dr. Metaxas has published over 500 research articles in these areas and has graduated 46 PhD students. The above research has been funded by NSF, NIH, ONR, AFOSR, DARPA, HSARPA and the ARO. Dr. Metaxas has received several best paper awards, and he has 7 patents. He was awarded a Fulbright Fellowship in 1986, is a recipient of an NSF Research Initiation and Career awards, an ONR YIP, and is a Fellow of the MICCAI Society, a Fellow the American Institute of Medical and Biological Engineers and a Fellow of IEEE. He has been involved with the organization of several major conferences in vision and medical image analysis, including ICCV 2007, ICCV 2011, MICCAI 2008 and CVPR 2014.

Hermann Ney RWTH Aachen University

Speech Recognition and Machine Translation: From Statistical Decision Theory to Machine Learning and Deep Neural Networks


The last 40 years have seen a dramatic progress in machine learning and statistical methods for speech and language processing like speech recognition, handwriting recognition and machine translation. Many of the key statistical concepts had originally been developed for speech recognition and language translation. Examples of such key concepts are the Bayes decision rule for minimum error rate and sequence-to-sequence processing using approaches like the alignment mechanism based on hidden Markov models and the attention mechanism based on neural networks. Recently the accuracy of speech recognition and machine translation could be improved significantly by the use of artificial neural networks, such as deep feedforward multi-layer perceptrons and recurrent neural networks (incl. long short-term memory extension). We will discuss these approaches in detail and how they form part of the probabilistic approach.

Part 1: Statistical Decision Theory, Machine Learning and Neural Networks.
Part 2: Speech Recognition
(Time Alignment, Hidden Markov models, sequence-to-sequence
processing, neural nets, attention models).
Part 3: Machine Translation
(Word Alignment, Hidden Markov models, sequence-to-sequence
processing, neural nets, attention models).

* Bourlard, H. and Morgan, N., Connectionist Speech Recognition - A Hybrid
Approach, Kluwer Academic Publishers, ISBN 0-7923-9396-1, 1994.
* L. Deng, D. Yu: Deep learning: methods and applications.
Foundations and Trends in Signal Processing, Vol. 7, No. 3–4,
pp. 197-387, 2014.
* D. Jurafsky, J. H. Martin: Speech and Language Processing.
Third edition draft, pdf; August 28, 2017.
* Y. Goldberg: Neural Network Methods in Natural Language Processing.
Morgan & Claypool Publishers, Draft, pdf; August 2016.
* P. Koehn: Statistical Machine Translation,
Cambridge University Press, 2010.
In addition: Draft of Chapter 13: Neural Machine Translation,
pdf, September 22, 2017.

Familiarity with linear algebra, numerical mathematics, probability and statistics, elementary machine learning.

Hermann Ney is a full professor of computer science at RWTH Aachen University, Germany. His main research interests lie in the area of statistical classification, machine learning, neural networks and human language technology and specific applications to speech recognition, machine translation and handwriting recognition.

In particular, he has worked on dynamic programming and discriminative training for speech recognition, on language modelling and on machine translation. His work has resulted in more than 700 conference and journal papers (h-index 90, 45000+ citations; estimated using Google scholar). He and his team contributed to a large number of European (e.g. TC-STAR, QUAERO, TRANSLECTURES, EU-BRIDGE) and American (e.g. GALE, BOLT, BABEL) large-scale joint projects.

Hermann Ney is a fellow of both IEEE and ISCA (Int. Speech Communication Association). In 2005, he was the recipient of the Technical Achievement Award of the IEEE Signal Processing Society. In 2010, he was awarded a senior DIGITEO chair at LIMIS/CNRS in Paris, France. In 2013, he received the award of honour of the International Association for Machine Translation. In 2016, he was awarded an advanced grant of the European Research Council (ERC).

Jose C. Principe University of Florida

Cognitive Architectures for Object Recognition in Video



I-Requisites for a Cognitive Architecture
• Processing in space
• Processing in time and memory
• Top down and bottom processing
• Extraction of information from data with generative models
• Attention
II- Putting it all together
• Empirical Bayes with generative models
• Clustering of time series with linear state models
• Information Theoretic Autoencoders
III- Current work
• Extraction of time signatures with kernel ARMA
• Attention Based video recognition
• Augmenting Deep Learning with memory



Jose C. Principe is a Distinguished Professor of Electrical and Computer Engineering at the University of Florida where he teaches advanced signal processing, machine learning and artificial neural networks (ANNs). He is Eckis Professor and the Founder and Director of the University of Florida Computational NeuroEngineering Laboratory (CNEL) www.cnel.ufl.edu. The CNEL Lab innovated signal and pattern recognition principles based on information theoretic criteria, as well as filtering in functional spaces. His secondary area of interest has focused in applications to computational neuroscience, Brain Machine Interfaces and brain dynamics. Dr. Principe is a Fellow of the IEEE, AIMBE, and IAMBE. Dr. Principe received the Gabor Award, from the INNS, the Career Achievement Award from the IEEE EMBS and the Neural Network Pioneer Award, of the IEEE CIS. He has more than 33 patents awarded over 800 publications in the areas of adaptive signal processing, control of nonlinear dynamical systems, machine learning and neural networks, information theoretic learning, with applications to neurotechnology and brain computer interfaces. He directed 93 Ph.D. dissertations and 65 Master theses. He wrote in 2000 an interactive electronic book entitled 'Neural and Adaptive Systems' published by John Wiley and Sons and more recently co-authored several books on 'Brain Machine Interface Engineering' Morgan and Claypool, 'Information Theoretic Learning', Springer, 'Kernel Adaptive Filtering', Wiley and 'System Parameter Adaption: Information Theoretic Criteria and Algorithms', Elsevier. He has received four Honorary Doctor Degrees, from Finland, Italy, Brazil and Colombia, and routinely serves in international scientific advisory boards of Universities and Companies. He has received extensive funding from NSF, NIH and DOD (ONR, DARPA, AFOSR).

Douglas A. Reynolds & Najim Dehak Massachusetts Institute of Technology & Johns Hopkins University

More than Words can Say: Machine and Deep Learning for Speaker, Language, and Emotion Recognition from Speech


Speech conveys many types of information to the listener. Beyond just the words, the speech signal provides information about what language is being spoken, who is speaking, the emotional state of the speaker, and the acoustic environment in which the speech is occurring. Such extra-word information can be useful for many areas such as secure access, device personalization, audio searching, and medical interactions. Powerful machine learning techniques, including statistical, geometric, and neural pattern recognition, have been successfully applied over several decades to successfully and effectively build systems for automatically recognizing these types of characteristics from challenging, real-world speech recordings. In this tutorial we will introduce the audience to the fundamentals behind speaker, language, and emotion recognition, going from the science behind speech production to the machine learning building blocks underpinning modern recognition systems. We will describe the details of implementing these recognition systems covering the critical role of data in the training and testing of systems. Important areas of domain adaptation, channel compensation, diarization, and effective evaluation design and interpretation will also be covered.


D. A. Reynolds, T. F. Quatieri, R. B. Dunn, 'Speaker verification using adapted Gaussian mixture models', Digital signal processing 10 (1-3), 19-41 F Bimbot, et al, 'A tutorial on text-independent speaker verification', EURASIP Journal on Advances in Signal Processing 2004
T. Kinnunen, H. Li, 'An overview of text-independent speaker recognition: From features to supervectors', Speech Communication, Volume 52, Issue 1, 2010, Pages 12-40
N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel and P. Ouellet, 'Front-End Factor Analysis for Speaker Verification,' in IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, May 2011
N Dehak, PA Torres-Carrasquillo, D Reynolds, R Dehak, 'Language recognition via i-vectors and dimensionality reduction', Interspeech 2012
F. Richardson, D. Reynolds and N. Dehak, 'Deep Neural Network Approaches to Speaker and Language Recognition,' in IEEE Signal Processing Letters, vol. 22, no. 10, pp. 1671-1675, Oct. 2015
D Snyder, D Garcia-Romero, G Sell, D Povey, 'X-vectors: Robust DNN embeddings for speaker recognition', ICASSP 2018

Some knowledge of digital signal processing, probability and statistics, and linear algebra

Douglas Reynolds is a senior member of the technical staff at MIT Lincoln Laboratory, where he provides technical oversight of the speech projects in speaker and language recognition and speech-content based information retrieval. Dr. Reynolds joined the Human Language Technology Group as a member of technical staff in 1992 conducting research in the areas of robust speaker recognition (identification and verification), transient classification and robust speech representations for recognition. During this period, he invented and developed several widely used techniques in the area of speaker recognition, such as robust modeling with GMMs, application of a universal background model to text-independent recognition tasks, the use of Bayesian adaptation to train and update speaker models, fast scoring techniques for GMM based systems, the development and use of a handset/channel-type detector, and several normalization techniques based on the handset/channel-type detector. In 2002, Dr. Reynolds led the SuperSID project at the JHU Summer Workshop where new approaches to exploiting high-level information for improved speaker recognition were explored. These and other ideas have been implemented in the Lincoln speaker recognition system which has won several annual international speaker recognition evaluations conducted by the National Institute of Standards and Technology.

Dr. Reynolds is a Fellow of the IEEE, a member of the IEEE Signal Processing Society's Speech Technical Committee, and has worked to launch the Odyssey Speaker Recognition Workshop series.

Najim Dehak received his PhD from School of Advanced Technology, Montreal in 2009. During his PhD studies he worked with the Computer Research Institute of Montreal, Canada. He is well known as a leading developer of the I-vector representation for speaker recognition. He first introduced this method, which has become the state-of-the-art in this field, during the 2008 summer Center for Language and Speech Processing workshop at Johns Hopkins University. This approach has become one of most known speech representations in the entire speech community.

Dr. Dehak is currently a faculty member of the Department of Electrical & Computer Engineering at Johns Hopkins University. Prior to joining Johns Hopkins, he was a research scientist in the Spoken Language Systems Group at the MIT Computer Science and Artificial Intelligence Laboratory. His research interests are in machine learning approaches applied to speech processing, audio classification, and health applications. He is a senior member of IEEE and member of the IEEE Speech and Language Technical Committee.

Björn Schuller Imperial College London

Deep Learning for Signal Analysis


This course will deal with injection of deep learning algorithms into multimodal and multisensorial signal analysis such as from audio, video, or physiological signals. Methods shown will, however, be applicable to a broad range of further signals. We will first deal with pre-processing such as by autoencoders and feature representation learning such as by convolutional neural networks first as basis for end-to-end learning from raw signals. Then, we shall discuss modelling for decision making such as by recurrent neural networks with long-short-term memory or gated recurrent units. We will also elaborate on the impact of topologies including multiple targets with shared layers and bottlenecks, and how to move towards self-shaping networks in the sense of Automatic Machine Learning. In a last part, we will deal with data efficiency, such as by weak supervision with the human in the loop based on active and semi-supervised learning, transfer learning, or generative adversarial networks. The content shown will be accompanied by open-source implementations of according toolkits available on github. Application examples will come from the domains of Affective Computing, Multimedia Retrieval, and mHealth.

1) Pre-Processing and Feature Representation Learning (AEs, CNNs, end-to-end)
2) Modelling for Decision Making (Feature Space Optimisation, Topologies, AutoML)
3) Data Efficiency (GANs, Transfer Learning, Weak Supervision)

The Handbook of Multimodal-Multisensor Interfaces. Vol. 2, S. Oviatt, B. Schuller, P.R. Cohen, D. Sonntag, G. Potamianos, A. Krüger (eds.), 2018 (forthcoming) https://github.com/end2you/end2you

Attendees should be familiar with Machine Learning and Neural Networks in general. They should further have basic knowledge of Signal Processing.

Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor all in EE/IT from TUM in Munich/Germany. He is the Head of GLAM - the Group on Language Audio & Music - at Imperial College London/UK, Full Professor and ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO of audEERING, and permanent Visiting Professor at HIT/China. Before, he was Full Professor at the University of Passau/Germany, with Joanneum Research in Graz/Austria, and the CNRS-LIMSI in Orsay/France. He is a Fellow of the IEEE, President-Emeritus of the AAAC, and Senior Member of the ACM. He (co-)authored 700+ publications (18000+ citations, h-index=66), and is the Editor in Chief of the IEEE Transactions on Affective Computing, General Chair of ACII 2019, ACII Asia 2018, and ACM ICMI 2014, and a Program Chair of Interspeech 2019, ACM ICMI 2019/2013, ACII 2015/2011, and IEEE SocialCom 2012 amongst manifold further commitments and service to the community. His 20+ awards include having been honoured as one of 40 extraordinary scientists under the age of 40 by the WEF in 2015. He served as Coordinator/PI in 10+ European Projects, is an ERC Starting Grantee, and consultant of companies such as GN, Huawei or Samsung.

Michèle Sebag French National Center for Scientific Research, Gif-sur-Yvette

Representation Learning, Domain Adaptation and Generative Models with Deep Learning


Within Deep Learning, representation learning is seamlessly integrated within the whole machine learning process, with utmost benefits when facing domains with raw high-dimensional description such as computer vision or natural language programming. Domain adaptation is concerned with the ability of transferring models learned from a so-called source domain (typically with many labelled examples) to a target domain with few or no labels, making learning faster and/or more effective. Generative learning is concerned with the ability to learn a distribution representative of the available data sample. The course will be followed by an (optional) hands-on, as a challenge on Codalab. (url available on July 15th).

Lecture 1: Introduction to Transfer Learning and Domain Adaptation
* Motivations
* Main approaches: mapping the target onto the source; mapping both source and target onto a same latent representation.
* Criteria and algorithms; distances between distributions, strengths and weaknesses

Lecture 2: Generative learning
* Auto-Encoders
* Variational Auto-Encoders
* Generative Adversarial Networks

Lecture 3: DA Algorithms
* Domain adaptation with optimal transport
* Domain Adversarial Training of Neural Networks
* Recent algorithms: CycleGAN, UNIT.

* Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains. Machine Learning, 79(1-2):151–175, 2010
* Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014.
* Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, Victor Lempitsky. Domain-Adversarial Training of Neural Networks, Journal of Machine Learning Research 17 (2016) 1-35.
* Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. ICCV 2017: 2242-2251
* Ming-Yu Liu, Thomas Breuel, Jan Kautz: Unsupervised Image-to-Image Translation Networks. NIPS 2017: 700-708
* Nicolas Courty, Rémi Flamary, Amaury Habrard, Alain Rakotomamonjy: Joint distribution optimal transportation for domain adaptation. NIPS 2017: 3733-3742

Level : intermediate/advanced
Requirement : basic knowledge of neural networks and probability theory/statistics

Lecture: Michèle Sebag, alumna of the Ecole Normale Supérieure (Maths), received her PhD in computer science from Université Paris-Dauphine and her Habilitation from Université Paris-Sud. She holds a Senior Researcher position at CNRS since 2002, she is co-head of the Machine Learning and Optimization group (CNRS, INRIA, Univ. Paris-Sud) since 2001, EurAI Fellow since 2011, head of the European steering committee of Machine Learning and Data Mining since 2015. She is elected member of the French Academy of Technology. Her research interests include Statistical Machine Learning, Deep Learning, Causal Modeling and Stochastic Optimization. Challenge: Arthur Pesah, master student in theoretical physics and mathematics, currently enrolled in a double degree between ENSTA ParisTech (Saclay, France) and KTH Royal Institute of Technology (Stockholm, Sweden). He was a research assistant for Michèle Sebag and Isabelle Guyon at Univ. Paris Sud and for Hossein Azizpour at KTH. His research interests include domain adaptation, inference methods in physics and quantum computing.

Ponnuthurai N Suganthan Nanyang Technological University

Learning Algorithms for Classification, Forecasting and Visual Tracking


This presentation will primarily focus on learning algorithms with reduced iterations or no iterations at all. Some of the algorithms have closed form solutions. Some of the algorithms do not adjust the structures once constructed. The main algorithms considered in this talk are randomized neural networks, kernel ridge regression and random forest. These non-iterative methods have attracted attention of researchers due to their high performance in terms of accuracy as well as their ability to train fast due to their non-iterative properties or closed form training solutions. For example, the random forest deliver the top classification performance. The presentation will also include the basic methods as well as their state of the art realizations. These algorithms will be benchmarked using classification, time series forecasting and visual tracking datasets. Future research directions will also be suggested.

Non-iterative algorithms or algorithms with closed-form training solutions
Randomization based neural networks and their variants
Kernel Ridge Regression and their variants
Random Forest and their variants
Applications of the above methods in classification, time series and visual tracking
Benchmarking of these methods

(Additional References will be included in the lecture materials)
X Qiu, PN Suganthan, GAJ Amaratunga, Ensemble incremental learning Random Vector Functional Link network for short-term electric load forecasting
Knowledge-Based Systems 145, 182-196, 2018.
L Zhang, PN Suganthan, Benchmarking Ensemble Classifiers with Novel Co-Trained Kernel Ridge Regression and Random Vector Functional Link Ensembles [Research Frontier], IEEE Computational Intelligence Magazine 12 (4), 61-72, 2017.
L Zhang, PN Suganthan, Visual tracking with convolutional random vector functional link network, IEEE Transactions on Cybernetics 47 (10), 3243-3253.
L Zhang, PN Suganthan, Robust visual tracking via co-trained Kernelized correlation filters, Pattern Recognition 69, 82-93, 2017.
L Zhang, PN Suganthan, A survey of randomized algorithms for training neural networks, Information Sciences 364, 146-155, 2016.
L Zhang, PN Suganthan, Oblique decision tree ensemble via multisurface proximal support vector machine, IEEE Transactions on Cybernetics 45 (10), 2165-2176, 2015.

Basic knowledge of neural networks, pattern classification, decision trees will be advantageous.

Ponnuthurai Nagaratnam Suganthan (or P N Suganthan) received the B.A degree, Postgraduate Certificate and M.A degree in Electrical and Information Engineering from the University of Cambridge, UK in 1990, 1992 and 1994, respectively. After completing his PhD research in 1995, he served as a pre-doctoral Research Assistant in the Dept of Electrical Engineering, University of Sydney in 1995–96 and a lecturer in the Dept of Computer Science and Electrical Engineering, University of Queensland in 1996–99. He moved to NTU in 1999. He is an Editorial Board Member of the Evolutionary Computation Journal, MIT Press. He is an associate editor of the IEEE Trans on Cybernetics (2012 - ), IEEE Trans on Evolutionary Computation (2005 -), Information Sciences (Elsevier) (2009 - ), Pattern Recognition (Elsevier) (2001 - ) and Int. J. of Swarm Intelligence Research (2009 - ) Journals. He is a founding co-editor-in-chief of Swarm and Evolutionary Computation (2010 - ), an SCI Indexed Elsevier Journal. His co-authored SaDE paper (published in April 2009) won the 'IEEE Trans. on Evolutionary Computation outstanding paper award' in 2012. His former PhD student, Dr Jane Jing Liang, won the IEEE CIS Outstanding PhD dissertation award, in 2014. His research interests include swarm and evolutionary algorithms, pattern recognition, big data, deep learning and applications of swarm, evolutionary & machine learning algorithms. He was selected as one of the highly cited researchers by Thomson Reuters in 2015, 2016 , and 2017 in computer science. He served as the General Chair of the IEEE SSCI 2013. He has been a member of the IEEE since 1990 and Fellow since 2015. He was an elected AdCom member of the IEEE Computational Intelligence Society (CIS) in 2014-2016. Google Scholar: http://scholar.google.com.sg/citations?hl=en&user=yZNzBU0AAAAJ&view_op=list_works&pagesize=100

Johan Suykens KU Leuven

Deep Learning and Kernel Machines


Neural networks & Deep learning and Support vector machines & Kernel methods have been among the most powerful and successful techniques in machine learning and data driven modelling. Initially, in artificial neural networks, the use of one hidden layer feedforward networks was common because of their universal approximation property. However, the existence of many local minima solutions in the training process was encountered as a drawback. Therefore, support vector machines and kernel methods became widely used, relying on solving convex optimization problems in classification and regression. In the meantime, computing power has increased and data have become abundantly available in many applications. As a result, currently one can afford training deep models consisting of (many) more layers and interconnection weights. Examples of successful deep learning models are convolutional neural networks, stacked autoencoders, deep Boltzmann machines, deep generative models and generative adversarial networks. In this course we will explain several synergies between neural networks, deep learning, least squares support vector machines and kernel methods. A key role at this point is played by primal and dual model representations and different duality principles. In this way the bigger picture will be revealed for neural networks, deep learning and kernel machines, and future perspectives will be outlined.

The material is organized into 3 parts:
- Part I Neural networks, support vector machines and kernel methods

- Part II Restricted Boltzmann machines, kernel machines and deep learning

- Part III Deep restricted kernel machines and future perspectives.

In Part I a basic introduction is given to support vector machines (SVM) and kernel methods with emphasis on their artificial neural networks (ANN) interpretations. The latter can be understood in view of primal and dual model representations, expressed in terms of the feature map and the kernel function, respectively. Related to least squares support vector machines (LS-SVM) such characterizations exist for supervised and unsupervised learning, including classification, regression, kernel principal component analysis (KPCA), kernel spectral clustering (KSC), kernel canonical correlation analysis (KCCA), and other. Primal and dual representations are also relevant in order to obtain efficient training algorithms, tailored to the nature of the given application (high dimensional input spaces versus large data sizes). Application examples are given e.g. in black-box weather forecasting, pollution modelling, prediction of energy consumption, and community detection in networks.
In Part II we explain how to obtain a so-called restricted kernel machine (RKM) representation for least squares support vector machine related models. By using a principle of conjugate feature duality it is possible to obtain a similar representation as in restricted Boltzmann machines (RBM) (with visible and hidden units), which are used in deep belief networks (DBN) and deep Boltzmann machines (DBM). The principle is explained both for supervised and unsupervised learning. Related to kernel principal component analysis a generative model is obtained within the restricted kernel machine framework. In such a generative model the trained model is able to generate new data examples.
In Part III deep restricted kernel machines (Deep RKM) are explained which consist of restricted kernel machines taken in a deep architecture. In these models a distinction is made between depth in a layer sense and depth in a level sense. Links and differences with stacked autoencoders and deep Boltzmann machines are given. The framework enables to conceive both deep feedforward neural networks (DNN) and deep kernel machines, through primal and dual model representations. In this case one has multiple feature maps over the different levels in companion with multiple kernel functions. By fusing the objectives of the different levels (e.g. several KPCA levels followed by an LS-SVM classifier) in the deep architecture, the training process becomes faster and gives improved solutions. Different training algorithms and methods for large data sets will be discussed.
Finally, based on the newly obtained insights, future perspectives and challenges will be outlined.

Bengio Y., Learning deep architectures for AI, Boston: Now, 2009.

Fischer A., Igel C., Training restricted Boltzmann machines: An introduction. Pattern Recognition, 47, 25-39, 2014.

Goodfellow I., Bengio Y., Courville A., Deep learning, Cambridge, MA: MIT Press, 2016.

Hinton G.E., What kind of graphical model is the brain?, In Proc. 19th International Joint Conference on Artificial Intelligence, pp. 1765-1775, 2005.

Hinton G.E., Osindero S., Teh Y.-W., A fast learning algorithm for deep belief nets, Neural Computation, 18, 1527-1554, 2006.

LeCun Y., Bengio Y., Hinton G., Deep learning, Nature, 521, 436-444, 2015.

Lin H.W., Tegmark M., Rolnick D., Why does deep and cheap learning work so well?, Journal of Statistical Physics 168 (6), 1223-1247, 2017.

Mall R., Langone R., Suykens J.A.K., Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks, PLOS ONE, e99966, 9(6), 1-18, 2014.

Mehrkanoon S., Suykens J.A.K., Deep hybrid neural-kernel networks using random Fourier features, Neurocomputing, Vol. 298, pp. 46-54, July 2018.

Salakhutdinov R., Learning deep generative models, Annu. Rev. Stat. Appl., 2, 361-385, 2015.

Scholkopf B., Smola A., Learning with kernels, Cambridge, MA: MIT Press, 2002.

Schreurs J., Suykens J.A.K., Generative Kernel PCA, ESANN 2018.

Suykens J.A.K., Van Gestel T., De Brabanter J., De Moor B., Vandewalle J., Least squares support vector machines, Singapore: World Scientific, 2002.

Suykens J.A.K., Alzate C., Pelckmans K., Primal and dual model representations in kernel-based learning, Statistics Surveys, vol. 4, pp. 148-183, Aug. 2010.

Suykens J.A.K., Deep Restricted Kernel Machines using Conjugate Feature Duality, Neural Computation, vol. 29, no. 8, pp. 2123-2163, Aug. 2017.

Vapnik V., Statistical learning theory, New York: Wiley, 1998.

Basics of linear algebra

Johan A.K. Suykens was born in Willebroek Belgium, May 18 1966. He received the master degree in Electro-Mechanical Engineering and the PhD degree in Applied Sciences from the Katholieke Universiteit Leuven, in 1989 and 1995, respectively. In 1996 he has been a Visiting Postdoctoral Researcher at the University of California, Berkeley. He has been a Postdoctoral Researcher with the Fund for Scientific Research FWO Flanders and is currently a full Professor with KU Leuven. He is author of the books 'Artificial Neural Networks for Modelling and Control of Non-linear Systems' (Kluwer Academic Publishers) and 'Least Squares Support Vector Machines' (World Scientific), co-author of the book 'Cellular Neural Networks, Multi-Scroll Chaos and Synchronization' (World Scientific) and editor of the books 'Nonlinear Modeling: Advanced Black-Box Techniques' (Kluwer Academic Publishers), 'Advances in Learning Theory: Methods, Models and Applications' (IOS Press) and 'Regularization, Optimization, Kernels, and Support Vector Machines' (Chapman & Hall/CRC). In 1998 he organized an International Workshop on Nonlinear Modelling with Time-series Prediction Competition. He has served as associate editor for the IEEE Transactions on Circuits and Systems (1997-1999 and 2004-2007), the IEEE Transactions on Neural Networks (1998-2009) and the IEEE Transactions on Neural Networks and Learning Systems (from 2017). He received an IEEE Signal Processing Society 1999 Best Paper Award and several Best Paper Awards at International Conferences. He is a recipient of the International Neural Networks Society INNS 2000 Young Investigator Award for significant contributions in the field of neural networks. He has served as a Director and Organizer of the NATO Advanced Study Institute on Learning Theory and Practice (Leuven 2002), as a program co-chair for the International Joint Conference on Neural Networks 2004 and the International Symposium on Nonlinear Theory and its Applications 2005, as an organizer of the International Symposium on Synchronization in Complex Networks 2007, a co-organizer of the NIPS 2010 workshop on Tensors, Kernels and Machine Learning, and chair of ROKS 2013. He has been awarded an ERC Advanced Grant 2011 and 2017, and has been elevated IEEE Fellow 2015 for developing least squares support vector machines.

Kenji Suzuki Tokyo Institute of Technology

Deep Learning in Medical Image Processing, Analysis and Diagnosis


It is said that artificial intelligence driven by deep learning would make the 4th Industrial Evolution. As is true in many other fields, deep leaning becomes one of the most active areas of research in the fields of medical image analysis and computer-aided diagnosis, because “learning from examples or data” is crucial to handling a large amount of data (“big data”) coming from medical imaging systems. Deep learning is a versatile, powerful framework that can acquire image-processing and analysis functions through training with image examples; and it is an end-to-end machine-learning model that enables a direct mapping from raw input data to desired outputs, eliminating the need for handcrafted features in conventional feature-based machine learning. I invented ones of the earliest deep-learning models for image processing, semantic segmentation, lesion enhancement, and removal of specific patterns in medical imaging to make the tasks that conventional methods had not been able to do possible, and I have been actively studying on deep learning in medical imaging in the past 20 years or so.

In my tutorials, machine learning and deep learning are described together with their applications in the biomedical field. First, fundamentals of machine learning are reviewed briefly before entering into the topic of deep learning, because deep learning is advanced machine learning. Then, the history of deep learning is overviewed. The fundamentals, architectures, training, and practical issues of deep learning are described to make clear a) what has changed in machine learning after the introduction of deep learning, b) differences and advantages over conventional feature-based machine learning, and c) deep learning applications to 1) computer-aided diagnosis for lung nodule detection in chest radiography and thoracic CT, 2) distinction between benign and malignant nodules in CT, 3) polyp detection and classification in CT colonography, 4) separation of bones from soft tissue in chest radiographs, 5) semantic segmentation of organs and lesions in medical images, and 6) radiation dose reduction by improving the image quality of low-dose CT and mammography. My tutorials include:
1. Fundamentals of machine learning
2. Neural networks (biological analogy, multilayer perceptrons, back-propagation learning algorithm, practical design and training issues)
3. History of deep learning
4. Mathematical and biological preliminaries of deep learning
5. Relations to human visual systems (multiple-channel models, visual learning models, hierarchical structure of the human visual systems, model and knowledge acquisitions, data representations in the brain)
6. Representative deep learning models including deep convolutional neural networks (DCNN), deep residual neural networks, generative adversarial networks (GAN), and massive-training artificial neural networks (MTANN)
7. Foundations, architectures, and practical issues of DCNN
8. Comparisons between DCNN and MTANN
9. Applications of deep learning (object recognition, lesion detection and classification, computer-aided diagnosis, denoising, image processing, image restoration, semantic segmentation, removal of specific patterns, and enhancement of lesions)

1. LeCun Y, Bengio Y, Hinton G, Deep learning, Nature: 521, pp. 436–444, 2015.
2. Suzuki K.: Overview of Deep Learning in Medical Imaging. Radiological Physics and Technology 10(3): 257-273, 2017.
3. A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 25, pp. 1097–1105. 2012,
4. Nima T. and Suzuki K.: Comparing Two Classes of End-to-End Learning Machines for Lung Nodule Detection and Classification: MTANNs vs. CNNs. Pattern Recognition 63: 476–486, 2017
5. Suzuki K.: Editor. Machine Learning in Computer-Aided Diagnosis: Medical Imaging Intelligence and Analysis, IGI Global (Hershey, PA), 524 pp., 2012. (ISBN 9781466600591)
6. Suzuki K.: Editor. Computational Intelligence in Biomedical Imaging, Springer (New York, NY), 411 pp., 2014. (ISBN 978-1-4614-7244-5)
7. Suzuki K., Chen Y.: Editors. Artificial Intelligence in Decision Support Systems for Diagnosis in Medical Imaging, Springer-Nature, (Switzerland), 387 pp., 2018. (ISBN 978-3-319-68842-8)
8. Suzuki K., Horiba I., Sugie N., and Ikeda S.: Improvement of image quality of x-ray fluoroscopy using spatiotemporal neural filter which learns noise reduction, edge enhancement and motion compensation. Proc. Int. Conf. Signal Processing Applications and Technology (ICSPAT) 2: 1382-1386, 1996.
9. Suzuki K., Horiba I., and Sugie N.: Efficient approximation of neural filters for removing quantum noise from images. IEEE Transactions on Signal Processing 50: 1787-1799, 2002.
10. Suzuki K., Armato III S. G., Li F., Sone S., and Doi K.: Massive training artificial neural network (MTANN) for reduction of false positives in computerized detection of lung nodules in low-dose CT. Medical Physics 30: 1602-1617, 2003.
11. Suzuki K., Horiba I., and Sugie N.: Neural edge enhancer for supervised edge enhancement from noisy images. IEEE Transactions on Pattern Analysis and Machine Intelligence 25: 1582-1596, 2003.
12. Suzuki K., Horiba I., Sugie N., and Nanki M.: Extraction of left ventricular contours from left ventriculograms by means of a neural edge detector. IEEE Transactions on Medical Imaging 23: 330-339, 2004.
13. Suzuki K.: Determining the receptive field of a neural filter. Journal of Neural Engineering 1: 228-237, 2004.
14. Suzuki K., Li F., Sone S., and Doi K.: Computer-aided diagnostic scheme for distinction between benign and malignant nodules in thoracic low-dose CT by use of massive training artificial neural network. IEEE Transactions on Medical Imaging 24: 1138-1150, 2005.
15. Suzuki K., and Doi K.: How can a massive training artificial neural network (MTANN) be trained with a small number of cases in the distinction between nodules and vessels in thoracic CT? Academic Radiology 12: 1333-1341, 2005.
16. Suzuki K., Abe H., MacMahon H., and Doi K.: Image-processing technique for suppressing ribs in chest radiographs by means of massive training artificial neural network (MTANN). IEEE Transactions on Medical Imaging 25: 406-416, 2006.
17. Suzuki K.: Supervised “lesion-enhancement” filter by use of a massive-training artificial neural network (MTANN) in computer-aided diagnosis (CAD). Physics in Medicine and Biology 54: S31-S45, 2009.
18. Suzuki K.: Pixel-Based Machine Learning in Medical Imaging. International Journal of Biomedical Imaging 2012: Article ID 792079, 18 pages, 2012.
19. Chen S. and Suzuki K.: Separation of Bones from Chest Radiographs by Means of Anatomically Specific Multiple Massive-Training ANNs Combined with Total Variation Minimization Smoothing. IEEE Transactions on Medical Imaging 33: 246-257, 2014.

There is no pre-requisite for the tutorials, but enthusiasms for learning deep learning in medicine and healthcare are 'required.'

Kenji Suzuki, Ph.D. (by Published Work; Nagoya University, Japan) worked at Hitachi Medical Corp., Japan, Aichi Prefectural University, Japan, as a faculty member, and in Department of Radiology, University of Chicago, as Assistant Professor. In 2014, he joined Department of Electric and Computer Engineering and Medical Imaging Research Center, Illinois Institute of Technology, as Associate Professor (Tenured). Since 2017, he has been jointly appointed in World Research Hub Initiative, Tokyo Institute of Technology, Japan, as Full Professor. He published more than 320 papers (including 110 peer-reviewed journal papers). He has been actively studying deep learning in medical imaging and computer-aided diagnosis in the past 20 years. He is inventor on 30 patents (including ones of earliest deep-learning patents), which were licensed to several companies and commercialized. He published 11 books and 22 book chapters, and edited 13 journal special issues. He was awarded more than 25 grants as PI including NIH R01 and ACS. He served as the Editor of a number of leading international journals, including Pattern Recognition and Medical Physics. He served as a referee for 91 international journals, an organizer of 62 international conferences, and a program committee member of 170 international conferences. He received 26 awards, including Springer-Nature EANM Most Cited Journal Paper Award 2016 and 2017 Albert Nelson Marquis Lifetime Achievement Award.

René Vidal Johns Hopkins University

Mathematics of Deep Learning


The past few years have seen a dramatic increase in the performance of recognition systems thanks to the introduction of deep networks for representation learning. However, the mathematical reasons for this success remain elusive. For example, a key issue is that the neural network training problem is nonconvex, hence optimization algorithms are not guaranteed to return a global minima. The first part of this tutorial will overview recent work on the theory of deep learning that aims to understand how to design the network architecture, how to regularize the network weights, and how to guarantee global optimality. The second part of this tutorial will present sufficient conditions to guarantee that local minima are globally optimal and that a local descent strategy can reach a global minima from any initialization. Such conditions apply to problems in matrix factorization, tensor factorization and deep learning. The third part of this tutorial will present an analysis of dropout for matrix factorization, and establish connections

1. Introduction to Deep Learning Theory: Optimization, Regularization and Architecture Design
2. Global Optimality in Matrix Factorization
3. Global Optimality in Tensor Factorization and Deep Learning
4. Dropout as a Low-Rank Regularizer for Matrix Factorization


Basic understanding of sparse and low-rank representation and non-convex optimization.

Rene Vidal is a Professor of Biomedical Engineering and the Innaugural Director of the Mathematical Institute for Data Science at The Johns Hopkins University. His research focuses on the development of theory and algorithms for the analysis of complex high-dimensional datasets such as images, videos, time-series and biomedical data. Dr. Vidal has been Associate Editor of TPAMI and CVIU, Program Chair of ICCV and CVPR, co-author of the book 'Generalized Principal Component Analysis' (2016), and co-author of more than 200 articles in machine learning, computer vision, biomedical image analysis, hybrid systems, robotics and signal processing. He is a fellow of the IEEE, IAPR and Sloan Foundation, a ONR Young Investigator, and has received numerous awards for his work, including the 2012 J.K. Aggarwal Prize for ``outstanding contributions to generalized principal component analysis (GPCA) and subspace clustering in computer vision and pattern recognition” as well as best paper awards in machine learning, computer vision, controls, and medical robotics.

Eric P. Xing Carnegie Mellon University

A Statistical Machine Learning Perspective of Deep Learning: Algorithm, Theory, Scalable Computing


In this tutorial I am going to explain the connections between the 'new wave' of multi-layer neural network inspired models in 'Deep Learning', with the well founded probabilistic graphical models, Bayesian inference, kernel methods, and many other long standing statistical learning methodologies studied in the broader machine learning community, and discuss the principles behind inference, learning, evaluations, and argumentation of these techniques. Then I will focus on stratifying various deep generative models with a unified statistical framework to better understand their behaviors, relationships, and new opportunities. Finally I will discuss the computational challenges in large scale deep learning, and discuss algorithm design, system design, and standardized universal platforms for computing support in deep learning.





Ming-Hsuan Yang University of California, Merced

Learning to Track Objects


The goal is to introduce the recent advances in object tracking based on deep learning and related approaches. Performance evaluation and challenging factors in this field will be discussed.

Brief history of visual tracking
Generative approach
Discriminative approach
Deep learning methods
Performance evaluation
Challenges and future research directions

Y. Wu, J. Lim, and M.-H. Yang, Object Tracking Benchmark, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015.
H. Nam and B. Han, Learning Multi-domain Convolutional Neural Networks for Visual Tracking, CVPR, 2016.
M. Danelljan, G. Bhat, F. Khan, M. Felsberg, ECO: Efficient Convolution Operators for Tracking. CVPR, 2017.

Basic knowledge in computer vision and intermediate knowledge in deep learning.

Ming-Hsuan Yang is a Professor of Electrical Engineering and Computer Science at University of California, Merced, and a visiting researcher at Google Cloud. He serves as a program co-chair of IEEE International Conference on Computer Vision (ICCV) in 2019, program co-chair of Asian Conference on Computer Vision (ACCV) in 2014, and general co-chair of ACCV 2016. He has served as an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) from 2007 to 2011, and currently serves as an associate editor of the International Journal of Computer Vision (IJCV), Computer Vision and Image Understanding (CVIU), Image and Vision Computing (IVC) and Journal of Artificial Intelligence (JAIR). Yang received the Google Faculty Award in 2009, and the Distinguished Early Career Research Award from the UC Merced Senate in 2011. Yang is a recipient of the Faculty Early Career Development (CAREER) award from the National Science Foundation in 2012. In 2015, Yang receives the Distinguished Research Award from UC Merced Senate. He is a senior member of the IEEE and the ACM.

Mohammed J. Zaki Rensselaer Polytechnic Institute

Introductory Tutorial on Regression and Deep Learning


In this tutorial, we will be going over the basics of regression and deep learning. We will start with linear regression, and then consider logistic regression. We will move on to artificial neural networks and deep learning. The focus will be on the underlying concepts, mathematics, and algorithms.

1. Linear Regression: Ordinary least squares, multiple regression, kernel regression, L1 regression
2. Logistic Regression: binary and multi-class regression
3. Neural networks: Multilayer perceptrons (MLPs), backpropagation
4. Recurrent Neural Networks (RNNs): RNNs, backpropagation in time, bidirectional RNNs
5. Gated RNNs: Long short-term memory (LSTM), gated recurrent units (GRU)
6. Convolutional Neural Networks (CNNs): convolutions, activations, deep CNNs
7. Evaluation: regression modelling, assessment

* Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep learning, MIT Press, 2016.
* Mohammed J. Zaki, Wagner Meira, Jr., Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press, 2014.

This is an introductory tutorial, but it assumes some familiarity with linear algebra, and probability and statistics.

Mohammed J. Zaki is a Professor of Computer Science at RPI. He received his Ph.D. degree in computer science from the University of Rochester in 1998. His research interests focus on developing novel data mining and machine learning techniques, especially for applications in text mining, social networks and bioinformatics. He has over 250 publications, including the Data Mining and Analysis textbook published by Cambridge University Press, 2014. He is the founding co-chair for the BIOKDD series of workshops. He is currently an associate editor for Data Mining and Knowledge Discovery, and he has also served as Area Editor for Statistical Analysis and Data Mining, and an Associate Editor for ACM Transactions on Knowledge Discovery from Data, and Social Networks and Mining. He was the program co-chair for SDM'08, SIGKDD'09, PAKDD'10, BIBM'11,CIKM'12, ICDM'12, IEEE BigData'15, and CIKM'18. He is currently serving on the Board of Directors for ACM SIGKDD. He received the National Science Foundation CAREER Award in 2001 and the Department of Energy Early Career Principal Investigator Award in 2002. He received an HP Innovation Research Award in 2010, 2011, and 2012, and a Google Faculty Research Award in 2011. He is an ACM Distinguished Scientist and a Fellow of the IEEE. His research is supported in part by NSF, NIH, DOE, IBM, Google, HP, and Nvidia.

Yudong Zhang University of Leicester

Convolutional Neural Network and Its Variants


This lecture will give a brief introduction of convolutional neural network. The convolution, pooling, and fully-connected layers shall be introduced. The neuroscience under CNN shall be discussed. The hyperparameter optimization of CNN shall be presented. Several typical convolutional neural networks shall be analyzed and compared, including LeNet, AlexNet, VGG, NiN, GoogleNet, ResNet, etc. CNN in segmentation shall be briefly discussed. State-of-the-art examples will be used to illustrate CNN approaches.

(i) ImageNet and ILSVRC
(ii) Convolutional neural network, Convolution layers, pooling layer
(iii) Drop out; Batch normalization; data augmentation
(iv) Neuroscientific basis, Random search, LeNet
(v) Transfer learning, AlexNet, 1x1 convolution, VGG
(vi) Network in network, GoogleNet, ResNet
(vii) R-CNN, Fast(er) R-CNN, Mask R-CNN
(viii) Application to cerebral microbleeding, radar imaging, etc.

1. Deng, J., W. Dong, R. Socher, L. J. Li, L. Kai and F.-F. Li (2009). ImageNet: A large-scale
hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248-255.
2. Ioffe, S. and C. Szegedy (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
3. Bergstra, J. and Y. Bengio (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research 13(Feb): 281-305.
4. Krizhevsky, A., I. Sutskever and G. E. Hinton (2012). ImageNet classification with deep convolutional neural networks. Advances in neural information processing systems. 1097-1105.
5. Simonyan, K. and A. Zisserman (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
6. Szegedy, C., W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich (2015). Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.
7. He, K., X. Zhang, S. Ren and J. Sun (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770-778.
8. Goodfellow, I., Y. Bengio, A. Courville and Y. Bengio (2016). Deep learning, MIT press Cambridge.

Linear Algebra and Calculus, Probability and Statistics, Basics of Image Processing, Pattern Recognition and Computer Vision

Dr. Yu-Dong Zhang now serves as Professor (Permanent) in Department of Informatics, University of Leicester, UK. He is the guest professor in Henan Polytechnic University, China. His research interest is deep learning in signal processing and medical image processing.