Course Description



  • Courses

  • Keynotes

    Nello Cristianini
    (University of Bristol) [-]
    Data, Intelligence and Shortcuts

    Petia Radeva
    (University of Barcelona) [-]
    Uncertainty Modeling and Deep Learning in Food Analysis

    Indrė Žliobaitė
    (University of Helsinki) [-]
    Any Hope for Deep Learning in Deep Time?


    Ignacio Arganda-Carreras
    (University of the Basque Country) [introductory/intermediate]
    Deep Learning for Bioimage Analysis

    Mikhail Belkin
    (Ohio State University) [intermediate/advanced]
    Understanding Deep Learning through the Lens of Over-parameterization

    Thomas G. Dietterich
    (Oregon State University) [introductory/intermediate]
    Safe and Robust Artificial Intelligence: Robustness, Calibration, Rejection, and Anomaly

    Georgios Giannakis
    (University of Minnesota) [advanced]
    Ensembles for Interactive and Deep Learning Machines with Scalability, Expressivity, and Adaptivity

    Sergei V. Gleyzer
    (University of Alabama) [introductory/intermediate]
    Machine Learning Fundamentals and Their Applications to Very Large Scientific Data: Rare Signal and Feature Extraction, End-to-end Deep Learning, Uncertainty Estimation and Realtime Machine Learning Applications in Software and Hardware

    Çağlar Gülçehre
    (DeepMind) [intermediate/advanced]
    Deep Reinforcement Learning

    Balázs Kégl
    (Huawei Technologies) [introductory]
    Deep Model-based Reinforcement Learning

    Ludmila Kuncheva
    (Bangor University) [intermediate]
    Classifier Ensembles in the Era of Deep Learning


    Do the recent advances in deep learning spell the demise of classifier ensembles? I would say no. While the philosophies of the two approaches are different, there is still a way to harness the benefits of both. Ensemble learning sprang from the idea of combining simple classifiers to solve complex classification problems. Deep learning, on the other hand, throws one, monolithic, complex model at the problem. There is room, however, for combining deep learners in a quest to boost the deep learner’s accuracy. Ensembles of deep learners do not follow the standard design methodologies. For example, we cannot afford to build a large ensemble due to computational limitation. Such ensembles require bespoke design strategies and flexible combination rules. This tutorial will introduce the fundamentals of classifier ensembles and will bring examples of successful applications of deep learning ensembles.


      1. Introduction to classifier ensembles.
      1. Deep learning classifiers.
      1. Deep learning ensembles and their applications.



    Basic knowledge of machine learning and pattern recognition.


    Ludmila (Lucy) I. Kuncheva is a Professor of Computer Science at Bangor University, UK. Her interests include pattern recognition, and specifically classifier ensembles. She has published two monographs and over 200 research papers. Lucy has won two Best Paper Awards (2006 IEEE TFS and 2003 IEEE TSMC). She is a Fellow of International Association of Pattern Recognition (IAPR).

    Vincent Lepetit
    (ENPC ParisTech) [intermediate]
    Deep Learning and 3D Geometry

    Geert Leus
    (Delft University of Technology) [introductory/intermediate]
    Graph Signal Processing: Introduction and Connections to Distributed Optimization and Deep Learning


    The field of graph signal processing extends classical signal processing tools to signals (data) with an irregular structure that can be characterized my means of a graph (e.g., network data). One of the cornerstones of this field are graph filters, direct analogues of time-domain filters, but intended for signals defined on graphs. In this course, we introduce the field of graph signal processing and specifically give an overview of the graph filtering problem. We look at the family of finite impulse response (FIR) and infinite impulse response (IIR) graph filters and show how they can be implemented in a distributed manner. To further limit the communication and computational complexity of such a distributed implementation, we also generalize the state-of-the-art distributed graph filters to filters whose weights show a dependency on the nodes sharing information. These so-called edge-variant graph filters yield significant benefits in terms of filter order reduction and can be used for solving specific distributed optimization problems with an extremely fast convergence. Finally, we will overview how graph filters can be used in deep learning applications involving data sets with an irregular structure. Different types of graph filters can be used in the convolution step of graph convolutional networks leading to different trade-offs in performance and complexity. The numerical results presented in this talk illustrate the potential of graph filters in distributed optimization and deep learning.


    • — Introduction to graph signal processing
    • — Graph filters and their extensions
    • — Connections to distributed optimization as well as related applications
    • — Connections to deep learning as well as related applications


    • — D. I. Shuman, P. Vandergheynst, and P. Frossard, “Chebyshev polynomial approximation for distributed signal processing,” in IEEE International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS), 2011, pp. 1–8.
    • — D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83–98, 2013.
    • — A. Sandryhaila and J. M. Moura, “Discrete signal processing on graphs,” IEEE Trans. on Signal Processing, vol. 61, no. 7, pp. 1644–1656, 2013.
    • — M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in30th Conf. Neural Inform. Process. Syst. Barcelona, Spain: Neural Inform. Process. Foundation, 5-10 Dec. 2016, pp. 3844–3858.
    • — E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Autoregressive moving average graph filtering,” IEEE Trans. on Signal Processing, vol. 65, no. 2, pp. 274–288, Jan. 2017.
    • — S. Segarra, A. Marques, and A. Ribeiro, “Optimal graph-filter design and applications to distributed linear network operators,” IEEE Trans. on Signal Processing, vol. 65, no. 15, pp. 4117–4131, 1 Aug. 2017.
    • — E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Filtering random graph processes over random time-varying graphs,” IEEE Trans. on Signal Processing, vol. 65, no. 16, pp. 4406–4421, Aug. 2017.
    • — Ortega, P. Frossard, J. Kovacevic, J. M. F. Moura, and P. Vandergheynst, “Graph signal processing: Overview, challenges and applications,” Proc. IEEE, vol. 106, no. 5, pp. 808–828, May 2018.
    • — F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, “Convolutional neural network architectures for signals supported on graphs,” IEEE Trans. on Signal Processing, vol. 67, no. 4, pp. 1034–1049, Feb. 2019.
    • — J. Liu, E. Isufi, and G. Leus, “Filter design for autoregressive moving average graph filters,” IEEE Trans. on Signal Information Processing and Networking, vol. 5, no. 1, pp. 47–60, Mar. 2019.
    • — M. Coutino, E. Isufi, and G. Leus, “Advances in distributed graph filtering,” IEEE Trans. on Signal Processing, vol. 67,no. 9, pp. 2320–2333, May 2019.
    • — E. Isufi, F. Gama, and A. Ribeiro, “EdgeNets: edge varying graph neural networks,” arXiv:2001.07620v1 [cs.LG], 21 Jan. 2020. [Online]. Available:


    Basics in digital signal processing, linear algebra, optimization and machine learning.

    Short Bio

    Geert Leus received the M.Sc. and Ph.D. degree in Electrical Engineering from the KU Leuven, Belgium, in June 1996 and May 2000, respectively. Geert Leus is now an "Antoni van Leeuwenhoek" Full Professor at the Faculty of Electrical Engineering, Mathematics and Computer Science of the Delft University of Technology, The Netherlands. His research interests are in the broad area of signal processing, with a specific focus on wireless communications, array processing, sensor networks, and graph signal processing. Geert Leus received a 2002 IEEE Signal Processing Society Young Author Best Paper Award and a 2005 IEEE Signal Processing Society Best Paper Award. He is a Fellow of the IEEE and a Fellow of EURASIP. Geert Leus was a Member-at-Large of the Board of Governors of the IEEE Signal Processing Society, the Chair of the IEEE Signal Processing for Communications and Networking Technical Committee, a Member of the IEEE Sensor Array and Multichannel Technical Committee, and the Editor in Chief of the EURASIP Journal on Advances in Signal Processing. He was also on the Editorial Boards of the IEEE Transactions on Signal Processing, the IEEE Transactions on Wireless Communications, the IEEE Signal Processing Letters, and the EURASIP Journal on Advances in Signal Processing. Currently, he is the Chair of the EURASIP Technical Area Committee on Signal Processing for Multisensor Systems, a Member of the IEEE Signal Processing Theory and Methods Technical Committee, a Member of the IEEE Big Data Special Interest Group, an Associate Editor of Foundations and Trends in Signal Processing, and the Editor in Chief of EURASIP Signal Processing.

    Andy Liaw
    (Merck Research Labs) [introductory]
    Deep Learning and Statistics: Better Together

    Debora Marks
    (Harvard Medical School) [intermediate]
    Protein Design Using Deep Learning

    Abdelrahman Mohamed
    (Facebook AI Research) [introductory/advanced]
    Recent Advances in Automatic Speech Recognition

    Sayan Mukherjee
    (Duke University) [introductory/intermediate]
    Integrating Deep Learning with Statistical Modeling

    Hermann Ney
    (RWTH Aachen University) [intermediate/advanced]
    Speech Recognition and Machine Translation: From Statistical Decision Theory to Machine Learning and Deep Neural Networks

    Lyle John Palmer
    (University of Adelaide) [introductory/advanced]
    Epidemiology for Machine Learning Investigators

    Razvan Pascanu
    (DeepMind) [intermediate/advanced]
    Understanding Learning Dynamics in Deep Learning and Deep Reinforcement Learning

    Jan Peters
    (Technical University of Darmstadt) [intermediate]
    Robot Learning

    José C. Príncipe
    (University of Florida) [intermediate/advanced]
    Cognitive Architectures for Object Recognition in Video

    Björn W. Schuller
    (Imperial College London) [introductory/intermediate]
    Deep Signal Processing

    Sargur N. Srihari
    (University at Buffalo) [introductory]
    Generative Models in Deep Learning

    Gaël Varoquaux
    (INRIA) [intermediate]
    Representation Learning in Limited Data Settings

    René Vidal
    (Johns Hopkins University) [intermediate/advanced]
    Mathematics of Deep Learning


    The past few years have seen a dramatic increase in the performance of recognition systems thanks to the introduction of deep networks for representation learning. However, the mathematical reasons for this success remain elusive. For example, a key issue is that the neural network training problem is nonconvex, hence optimization algorithms are not guaranteed to return a global minima. The first part of this tutorial will overview recent work on the theory of deep learning that aims to understand how to design the network architecture, how to regularize the network weights, and how to guarantee global optimality. The second part of this tutorial will present sufficient conditions to guarantee that local minima are globally optimal and that a local descent strategy can reach a global minima from any initialization. Such conditions apply to problems in matrix factorization, tensor factorization and deep learning. The third part of this tutorial will present an analysis of dropout for matrix factorization, and establish connections


      1. Introduction to Deep Learning Theory: Optimization, Regularization and Architecture Design
      1. Global Optimality in Matrix Factorization
      1. Global Optimality in Tensor Factorization and Deep Learning
      1. Dropout as a Low-Rank Regularizer for Matrix Factorization



    Basic understanding of sparse and low-rank representation and non-convex optimization.

    Short Bio

    Rene Vidal is a Professor of Biomedical Engineering and the Innaugural Director of the Mathematical Institute for Data Science at The Johns Hopkins University. His research focuses on the development of theory and algorithms for the analysis of complex high-dimensional datasets such as images, videos, time-series and biomedical data. Dr. Vidal has been Associate Editor of TPAMI and CVIU, Program Chair of ICCV and CVPR, co-author of the book 'Generalized Principal Component Analysis' (2016), and co-author of more than 200 articles in machine learning, computer vision, biomedical image analysis, hybrid systems, robotics and signal processing. He is a fellow of the IEEE, IAPR and Sloan Foundation, a ONR Young Investigator, and has received numerous awards for his work, including the 2012 J.K. Aggarwal Prize for "outstanding contributions to generalized principal component analysis (GPCA) and subspace clustering in computer vision and pattern recognition” as well as best paper awards in machine learning, computer vision, controls, and medical robotics.