Course Description

 

Keynotes

  • Courses


  • Keynotes


    Nello Cristianini
    (University of Bristol) [-]
    Data, Intelligence and Shortcuts

    Summary

    Many familiar dilemmas that we find in the application of data-driven AI have their origins in technical-mathematical choices that we have made along the way to this version of AI. Several of them might need to be reconsidered in order for the field to move forward. After reviewing some of the current problems related to AI, we trace their cultural, technical and economic origins, then we discuss possible solutions.

    Short Bio

    Nello Cristianini is Professor of Artificial Intelligence at the University of Bristol. His research covers machine learning methods, and applications of AI to the analysis of media content, as well as the social and ethical implications of AI. Cristianini is the co-author of two widely known books in machine learning, as well as a book in bioinformatics. He is a recipient of the Royal Society Wolfson Research Merit Award, and of a European Research Council Advanced Grant. Before joining the University of Bristol, he has been a professor of statistics at the University of California, Davis. Currently he is working on social and ethical implications of AI. His animated videos dealing with the social aspects of AI can be found here: https://www.youtube.com/seeapattern



    Petia Radeva
    (University of Barcelona) [-]
    Uncertainty Modeling and Deep Learning in Food Analysis

    Summary

    Recently, computer vision approaches specially assisted by deep learning techniques have shown unexpected advancements that practically solve problems that never have been imagined to be automatized like face recognition or automated driving. However, food image recognition due to its high complexity and ambiguity, still remains far from being solved. In this project, we focus on how to combine two challenging research lines: deep learning and uncertainty modeling (epistemic and aleatoric uncertainty). After discussing our methodology to advance in this direction, we comment on potential applications, as well as the social and economic impact of the research on food image analysis.

    Short-Bio

    Prof. Petia Radeva is a Full professor at the Universitat de Barcelona (UB), PI of the Consolidated Research Group “Computer Vision and Machine Learning” at the University of Barcelona (CVUB) at UB (www.ub.edu/cvub) and Senior researcher in Computer Vision Center (www.cvc.uab.es). She was PI of UB in 4 European, 3 international and more than 20 national projects devoted to applying Computer Vision and Machine learning for real problems like food intake monitoring (e.g. for patients with kidney transplants and for older people). Petia Radeva is a REA-FET-OPEN vice-chair since 2015 on, and international mentor in the Wild Cards EIT program since 2017.

    She is an Associate editor of Pattern Recognition journal (Q1) and International Journal of Visual Communication and Image Representation (Q2).

    Petia Radeva has been awarded IAPR Fellow since 2015, ICREA Academia assigned to the 30 best scientists in Catalonia for her scientific merits since 2014, received several international awards (“Aurora Pons Porrata” of CIARP, Prize “Antonio Caparrós” for the best technology transfer of UB, etc).

    She supervised 18 PhD students and published more than 100 SCI journal publications and 250 international chapters and proceedings, her Google scholar h-index is 44 with more than 7600 cites.



    Indrė Žliobaitė
    (University of Helsinki) [-]
    Any Hope for Deep Learning in Deep Time?

    Summary

    This talk will be about machine learning for science. Deep time refers to many millions of years of world history recorded in sedimentary rocks. The world today captures only a snapshot of ecosystems, environments and climates that can possibly be. Looking at the past is essential for understanding ongoing changes in the natural world, resource use and possible futures. I will discuss computational approaches to reconstructing past worlds and ecosystems, as well as analyzing environmental contexts of human evolution. Very little of deep learning is used so far, although the will is strong. I will highlight the main challenges and will speculate about opportunities.

    Short Bio

    Indre Zliobaite is a tenure track professor at the University of Helsinki in Finland, where she leads a research group on Data science and evolution. She is also in charge of the global database of fossil mammals, called NOW. Zliobaite's research has contributed to foundations of fairness-aware machine learning, machine learning with evolving data, as well as evolutionary theory.


    Courses


    Ignacio Arganda-Carreras
    (University of the Basque Country) [introductory/intermediate]
    Deep Learning for Bioimage Analysis

    Summary

    Deep learning, the latest extension of machine learning, has pushed the accuracy of algorithms to unseen limits, especially for perceptual problems such as the ones tackled by computer vision and image analysis. This workshop will cover the foundations of the field, the communities organized around it, some important tools and resources to get started with these techniques, and the latest applications of deep learning in the field of bioimage analysis. In particular, we will focus on the problems of semantic and instance segmentation of biological images, unsupervised image denoising and deep learning-based super-resolution. All classes will have a theoretical part followed by a hands-on practical session.

    Syllabus

    • ⇲ Introduction to deep learning in bioimage analysis.
    • ⇲ Deep learning-based super-resolution of biomedical images.
    • ⇲ Semantic and instance segmentation of microscopy image data.
    • ⇲ Unsupervised image denoising of biological image data.

    References

    • ⇲ Weigert, M., Schmidt, U., Boothe, T., Müller, A., Dibrov, A., Jain, A., Wilhelm, B., Schmidt, D., Broaddus, C., Culley, S. and Rocha-Martins, M., 2018. Content-aware image restoration: pushing the limits of fluorescence microscopy. Nature methods, 15(12), pp.1090-1097.
    • ⇲ Schmidt, U., Weigert, M., Broaddus, C. and Myers, G., 2018, September. Cell detection with star-convex polygons. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 265-273). Springer, Cham.
    • ⇲ Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M. and Aila, T., 2018. Noise2noise: Learning image restoration without clean data. arXiv preprint arXiv:1803.04189.
    • ⇲ Krull, A., Buchholz, T.O. and Jug, F., 2019. Noise2void-learning denoising from single noisy images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2129-2137).
    • ⇲ Gómez-de-Mariscal, E., García-López-de-Haro, C., Donati, L., Unser, M., Muñoz-Barrutia, A. and Sage, D., 2019. DeepImageJ: A user-friendly plugin to run deep learning models in ImageJ. bioRxiv, p.799270.
    • ⇲ Von Chamier, L., Jukkala, J., Spahn, C., Lerche, M., Hernández-Pérez, S., Mattila, P., Karinou, E., Holden, S., Solak, A.C., Krull, A. and Buchholz, T.O., 2020. ZeroCostDL4Mic: an open platform to simplify access and use of Deep-Learning in Microscopy. BioRxiv.

    Pre-requisites

    Mathematics at the level of an undergraduate degree in computer science: basic multivariate calculus, probability theory, and linear algebra.

    Shor Bio

    Ignacio Arganda-Carreras is an Ikerbasque Research Fellow at the department of Computer Science and Artificial Intelligence of the UPV/EHU, also associated with the Donostia International Physics Center (DIPC). He is one of the founders of Fiji, one of the most popular open source image processing packages in the world, and widely used by the bio-image analysis community. His lab is focused on image processing and machine learning, especially to develop open source computer vision methods for biomedical images. For publications, see https://scholar.google.com/citations?user=02VpQlGwa_kC&hl=en



    Thomas G. Dietterich
    (Oregon State University) [introductory]
    Machine Learning Methods for Robust Artificial Intelligence

    Summary

    How can we develop machine learning methods that we can trust in high-risk applications? We need ML methods that know their own range of competence so that they can detect when input queries lie outside that range. This class will present several related areas of ML research that seek to achieve this goal including (a) classification with a "reject" option, (b) calibrated confidence estimation, (c) out of distribution detection, and (d) open category detection. The first two topics focus on problems where the training set and test set come from the same distribution, and the classifier must assess its own competence on each test instance. The second two topics can be viewed as applications of anomaly detection, so we will study anomaly detection methods for both featurized data and for signal data (e.g., images) where a good feature space must be learned. Our discussion of anomaly detection will be complementary to Peter Rousseeuw's course (which makes a good companion).

    Syllabus

      • Obtaining calibrated probabilities from supervised classifiers
      • Achieving high accuracy classification by abstaining on test queries
      • Obtaining calibrated prediction intervals for regression problems
      • Anomaly detection methods for feature-vector data
      • Anomaly detection methods for images
      • Open category detection: detecting when a test instance belongs to a class not seen during training.

    References

    • Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. Proceedings of the 22nd International Conference on Machine Learning ICML ’05, (2005), 625–632. http://doi.org/10.1145/1102351.1102430

    • Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On Calibration of Modern Neural Networks. http://arxiv.org/abs/1706.04599

    • Romano, Y., Patterson, E., & Candès, E. J. (2019). Conformalized Quantile Regression. http://arxiv.org/abs/1905.03222

    • Shafer, G., & Vovk, V. (2008). A tutorial on conformal prediction. Journal of Machine Learning Research, 9, 371–421. Retrieved from http://arxiv.org/abs/0706.3188

    • Cortes, C., DeSalvo, G., & Mohri, M. (2016). Learning with rejection. Lecture Notes in Artificial Intelligence, 9925 LNAI, 67–82. http://doi.org/10.1007/978-3-319-46379-7_5

    • Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2012). Isolation-Based Anomaly Detection. ACM Transactions on Knowledge Discovery from Data, 6(1), 1–39. http://doi.org/10.1145/2133360.2133363

    • Emmott, A., Das, S., Dietterich, T., Fern, A., & Wong, W.-K. (2015). Systematic construction of anomaly detection benchmarks from real data. https://arxiv.org/abs/1503.01158

    • Siddiqui, A., Fern, A., Dietterich, T. G., & Das, S. (2016). Finite Sample Complexity of Rare Pattern Anomaly Detection. In Proceedings of UAI-2016 (p. 10). http://auai.org/uai2016/proceedings/papers/226.pdf

    • Bulusu, S., Kailkhura, B., Li, B., Varshney, P. K., & Song, D. (2020). Anomalous Instance Detection in Deep Learning: A Survey. ArXiv, 2003.06979(v1). http://arxiv.org/abs/2003.06979

    • Bendale, A., & Boult, T. (2016). Towards Open Set Deep Networks. In CVPR 2016 (pp. 1563–1572). http://doi.org/10.1109/CVPR.2016.173

    • Liu, S., Garrepalli, R., Dietterich, T. G., Fern, A., & Hendrycks, D. (2018). Open Category Detection with PAC Guarantees. Proceedings of the 35th International Conference on Machine Learning, PMLR, 80, 3169–3178. http://proceedings.mlr.press/v80/liu18e.html

    • Boult, T. E., Cruz, S., Dhamija, A., Gunther, M., Henrydoss, J., & Scheirer, W. (2019). Learning and the Unknown: Surveying Steps Toward Open World Recognition. AAAI 2019.

    Pre-requisites

    Familiarity with standard machine learning methods such as decision trees, random forests, and support vector machines. Basic knowledge of deep learning for images. Basic knowledge of probability and principle component analysis.

    Short Bio

    Thomas Dietterich (PhD Stanford, 1985) is Distinguished Professor (Emeritus) of Computer Science at Oregon State University. Dietterich is one of the pioneers of the field of Machine Learning and has authored more than 200 refereed publications and two books. His current research topics include robust artificial intelligence (calibration and anomaly detection), robust human-AI systems, and applications in sustainability. He is a former president of the International Machine Learning Society (the parent organization of ICML) and the Association for the Advancement of Artificial Intelligence. He is one of the moderators of the cs.LG category on arXiv.



    Georgios Giannakis
    (University of Minnesota) [advanced]
    Ensembles for Online, Interactive and Deep Learning Machines with Scalability, and Adaptivity

    Summary

    Inference of functions from data is ubiquitous in Statistical Learning. This course deals with Gaussian process (GP) based approaches that not only learn over a class of nonlinear functions, but also quantify the associated uncertainty. To cope with the curse of dimensionality, random feature Fourier (RF) vectors lead to parametric GP-RF function models, that offer scalable estimators. The course will next focus on online learning with ensembles (E) of GP-RF learners, each with a distinct kernel belonging to a prescribed dictionary, and jointly learning a much richer class of functions. Whether in batch or online forms, EGPs remain robust to dynamics captured by adaptive Kalman filters. Being able to cope with unknown dynamics and quantify uncertainty, are critical especially in adversarial settings. EGP performance can be refined online, and it is benchmarked using regret analysis. Further, the course will cross-fertilize ideas from Deep Gaussian Processes and EGPs in order to gain degrees of freedom. Broader applicability of EGPs will be also demonstrated for interactive optimization and policy evaluation in reinforcement learning.

    Syllabus

     — Day 1: Online Scalable Learning Adaptive to Unknown Dynamics and Graphs – Part I: Multi-kernel Approaches
         — Kernel based methods exhibit well-documented performance in various nonlinear learning tasks. Most of them rely on a preselected kernel, whose prudent choice presumes task-specific prior information. Especially when the latter is not available, multi-kernel learning has gained popularity thanks to its flexibility in choosing kernels from a prescribed kernel dictionary. Leveraging the random feature approximation, this talk will introduce first for static setups a scalable multi-kernel learning approach (termed Raker) to obtain the sought nonlinear learning function ‘on the fly,’ bypassing the curse of dimensionality associated with kernel methods. We will also present an adaptive multi-kernel learning scheme (termed AdaRaker) that relies on weighted combinations of advices from hierarchical ensembles of experts to boost performance in dynamic environments. The weights account not only for each kernel’s contribution to the learning process, but also for the unknown dynamics. Performance is analyzed in terms of both static and dynamic regrets. AdaRaker is uniquely capable of tracking nonlinear learning functions in environments with unknown dynamics, with analytic performance guarantees. The approach is further tailored for online graph-adaptive learning with scalability and privacy. Tests with synthetic and real datasets will showcase the effectiveness of the novel algorithms.
    
     — Day 2: Online Scalable Learning with Adaptivity and Robustness – Part II: Deep and Ensemble GPs
         — Abstract. Approximation and inference of functions from data are ubiquitous tasks in statistical learning theory and applications. Among relevant approaches with growing popularity, this talk deals with Gaussian process (GP) based approaches that not only learn over a class of nonlinear functions, but also quantify the associated uncertainty. To cope with the curse of dimensionality in this context, random feature Fourier (RF) vectors lead to parametric GP-RF function models, that offer scalable forms of Wiener’s minimum mean-square error approach. The talk will next touch upon deep GP architectures, and will further focus on a weighted ensemble (E) of GP-RF learners, each with a distinct covariance (kernel) belonging to a prescribed dictionary, and jointly learning a much richer class of functions. In addition to robustness, these ensembles can operate in either batch or online form interactively, even for dynamic functions along the lines of adaptive Kalman filters. The performance of EGP-based learning will be benchmarked using regret analysis. Broader applicability of EGPs will be also demonstrated for policy evaluation in reinforcement learning with the kernel(s) selected interactively on-the-fly. Case studies will highlight the merits of deep and ensemble GPs.

    References

     — G. B. Giannakis, Y. Shen, and G. V. Karanikolas, "Topology Identification and Learning over Graphs: Accounting for Nonlinearities and Dynamics," Proceedings of the IEEE, vol. 106, no. 5, pp. 787-807, May 2018.
     — Q. Lu, G. V. Karanikolas, Y. Shen, and G. B. Giannakis, "Ensemble Gaussian Processes with Spectral Features for Online Interactive Learning with Scalability," Proc. of 23rd Intl. Conf. on Artificial Intelligence and Statistics, Palermo, Italy, June 3-5, 2020.
     — A. Rahimi and B. Recht, “Random features for large scale kernel machines,” Proc. Advances in Neural Info. Process. Syst., pp. 117-1184, Canada, Dec. 2008.
     — C. Rasmussen, C. Williams, “Gaussian processes for machine learning,” MIT Press, Cambidge, 2006.
     — S. Shalev-Shwartz, “Online learning and online convex optimization,” Foundations and Trends in Machine Learning, vol. 4, no. 2, pp. 107–194, 2011.
     — Y. Shen , T. Chen and G. B. Giannakis, “Random Feature-based Online Multi-kernel Learning in Environments with Unknown Dynamics,” Journal of Machine Learning Research, vol. 20, no. 22, pp. 1-36, February 2019.
     — Y. Shen, G. Leus, and G. B. Giannakis, “Online Graph-Adaptive Learning with Scalability and Privacy,” IEEE Transactions on Signal Processing, vol. 67, no. 9, pp. 2471-2483, May 2019.

    Pre-requisites

     — Graduate-level courses in Random Processes, Linear Algebra, and Machine Learning


    Sergei V. Gleyzer
    (University of Alabama) [introductory/intermediate]
    Machine Learning Fundamentals and Their Applications to Very Large Scientific Data: Rare Signal and Feature Extraction, End-to-end Deep Learning, Uncertainty Estimation and Realtime Machine Learning Applications in Software and Hardware

    Summary

    Deep learning has become one of the most widely used tools in modern science and engineering, leading to breakthroughs in many areas and disciplines ranging from computer vision to natural language processing to physics and medicine. This mini-course will introduce the basics of machine learning and classification theory based on statistical learning and describe two classes of popular algorithms in depth: decision and rule-based methods (decision trees, decision rules, bagging and boosting, random forests) and deep neural network-based models of various types (fully-connected, convolutional, recurrent, recursive and graph neural networks). The course will focus on practical applications in analysis of large scientific data, interpretability, uncertainty estimation and how to best extract meaningful features, while implementing realtime deep learning in software and hardware. No previous machine learning background is required.

    Syllabus

    • Introduction to Machine Learning: Theoretical Foundation, Classification Theory
    • Practical Applications and Examples in Sciences and Engineering with Large Scientific Data (LHC/VRO)
    • Tree-based Algorithms: decision trees, rules, bagging, boosting, random forests
    • Deep Learning Methods: theory, fully-connected networks, convolutional, recurrent and recursive networks, graph networks and geometric deep learning
    • Fundamentals of Feature Extraction and End-to-end Deep Learning
    • Uncertainty Estimation and Machine Learning Model Interpretations
    • Realtime Implementation of Deep Learning in Software and Hardware

    References

    • I. Goodfellow, Y. Bengio and A. Courville, “Deep Learning” MIT Press 2016
    • G. James et al., “Introduction to Statistical Learning” Springer 2013
    • C.M. Bishop “Pattern Recognition and Machine Learning” Springer 2006
    • J. R. Quinlan “C4.5: Programs for Machine Learning” Morgan Kaufmann 1992

    Pre-requisites

    None

    Short Bio

    Sergei Gleyzer is a particle physicist and university professor, working at the interface of particle physics and machine learning towards more intelligent systems to extract meaningful information from the data collected by the Large Hadron Collider (LHC), the world’s highest-energy particle physics experiment located at the CERN laboratory, near Geneva Switzerland. He is the a co-discover of the Higgs Boson and founder of several major machine learning initiatives such as the Inter-experimental Machine Learning Working Group and Compact Muon Solenoid experiment’s Machine Learning Forum. Professor Gleyzer is working on applying advanced machine learning methods to searches for new physics, such as dark matter.



    Çağlar Gülçehre
    (DeepMind) [intermediate/advanced]
    Deep Reinforcement Learning


    Balázs Kégl
    (Huawei Technologies) [introductory]
    Deep Model-based Reinforcement Learning


    Ludmila Kuncheva
    (Bangor University) [intermediate]
    Classifier Ensembles in the Era of Deep Learning

    Summary

    Do the recent advances in deep learning spell the demise of classifier ensembles? I would say no. While the philosophies of the two approaches are different, there is still a way to harness the benefits of both. Ensemble learning sprang from the idea of combining simple classifiers to solve complex classification problems. Deep learning, on the other hand, throws one, monolithic, complex model at the problem. There is room, however, for combining deep learners in a quest to boost the deep learner’s accuracy. Ensembles of deep learners do not follow the standard design methodologies. For example, we cannot afford to build a large ensemble due to computational limitation. Such ensembles require bespoke design strategies and flexible combination rules. This tutorial will introduce the fundamentals of classifier ensembles and will bring examples of successful applications of deep learning ensembles.

    Syllabus

      1. Introduction to classifier ensembles.
      1. Deep learning classifiers.
      1. Deep learning ensembles and their applications.

    References

    Pre-requisites

    Basic knowledge of machine learning and pattern recognition.

    Short-Bio

    Ludmila (Lucy) I. Kuncheva is a Professor of Computer Science at Bangor University, UK. Her interests include pattern recognition, and specifically classifier ensembles. She has published two monographs and over 200 research papers. Lucy has won two Best Paper Awards (2006 IEEE TFS and 2003 IEEE TSMC). She is a Fellow of International Association of Pattern Recognition (IAPR). https://scholar.google.co.uk/citations?user=WIc3assAAAAJ&hl=en



    Vincent Lepetit
    (ENPC ParisTech) [intermediate]
    Deep Learning and 3D Geometry

    Summary

    Deep Learning and 3D Geometry

    While Deep Learning in computer vision had been focusing on 2D analysis of images, such as 2D object detection or image segmentation, these recent years have seen the development of many approaches applying the power of Deep Learning to 3D perception from color images, to to solve problems that were very challenging or even impossible a few years ago. Because Deep Learning and 3D geometry come from very different mathematical worlds, one has to find smart ways to connect them, and benefit from These approaches often rely on combinations of Deep Learning applied to 2D images and 3D geometry techniques. In this course, we will review and explain recent approaches to Deep Learning and 3D geometry problems, including 3D object pose estimation, 3D hand pose estimation, feature point detection, self-learning for depth prediction, and 3D scene understanding.

    Syllabus

    • • 3D object pose estimation,
    • • 3D hand pose estimation,
    • • feature point detection,
    • • self-learning for depth prediction,
    • • differentiable rendering,
    • • 3D scene understanding.

    References

    • • Bugra Tekin, Federica Bogo, Marc Pollefeys. H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions. Computer Vision and Pattern Recognition (CVPR), 2019.
    • • 3D Pose Estimation and 3D Model Retrieval for Objects in the Wild. Alexander Grabner, Peter M. Roth, and Vincent Lepetit. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    • • LIFT: Learned Invariant Feature Transform. Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. In Proceedings of the European Conference on Computer Vision (ECCV), 2016.
    • • C. Godard, O. Mac Aodha, and G. J. Brostow. Unsupervised Monocular Depth Estimation with Left-Right Consistency. In Conference on Computer Vision and Pattern Recognition, 2017.

    Pre-requisites

    Basic knowledge of Deep Learning applied to computer vision and 3D Geometry

    Short Bio

    Vincent Lepetit is a director of research at ENPC ParisTech since 2019. Prior to being at ENPC, he was a full professor at the Institute for Computer Graphics and Vision, Graz University of Technology, Austria, and before that, a senior researcher at the Computer Vision Laboratory (CVLab) of EPFL, Switzerland. His research interest are at the interface between Machine Learning and 3D Computer Vision, and currently focus on 3D scene understanding from images. He often serves as an area chair for the major computer vision conferences (CVPR, ICCV, ECCV) and is an associate editor for PAMI, IJCV, and CVIU.



    Geert Leus
    (Delft University of Technology) [introductory/intermediate]
    Graph Signal Processing: Introduction and Connections to Distributed Optimization and Deep Learning

    Summary

    The field of graph signal processing extends classical signal processing tools to signals (data) with an irregular structure that can be characterized my means of a graph (e.g., network data). One of the cornerstones of this field are graph filters, direct analogues of time-domain filters, but intended for signals defined on graphs. In this course, we introduce the field of graph signal processing and specifically give an overview of the graph filtering problem. We look at the family of finite impulse response (FIR) and infinite impulse response (IIR) graph filters and show how they can be implemented in a distributed manner. To further limit the communication and computational complexity of such a distributed implementation, we also generalize the state-of-the-art distributed graph filters to filters whose weights show a dependency on the nodes sharing information. These so-called edge-variant graph filters yield significant benefits in terms of filter order reduction and can be used for solving specific distributed optimization problems with an extremely fast convergence. Finally, we will overview how graph filters can be used in deep learning applications involving data sets with an irregular structure. Different types of graph filters can be used in the convolution step of graph convolutional networks leading to different trade-offs in performance and complexity. The numerical results presented in this talk illustrate the potential of graph filters in distributed optimization and deep learning.

    Syllabus

    • — Introduction to graph signal processing
    • — Graph filters and their extensions
    • — Connections to distributed optimization as well as related applications
    • — Connections to deep learning as well as related applications

    References

    • — D. I. Shuman, P. Vandergheynst, and P. Frossard, “Chebyshev polynomial approximation for distributed signal processing,” in IEEE International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS), 2011, pp. 1–8.
    • — D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83–98, 2013.
    • — A. Sandryhaila and J. M. Moura, “Discrete signal processing on graphs,” IEEE Trans. on Signal Processing, vol. 61, no. 7, pp. 1644–1656, 2013.
    • — M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in30th Conf. Neural Inform. Process. Syst. Barcelona, Spain: Neural Inform. Process. Foundation, 5-10 Dec. 2016, pp. 3844–3858.
    • — E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Autoregressive moving average graph filtering,” IEEE Trans. on Signal Processing, vol. 65, no. 2, pp. 274–288, Jan. 2017.
    • — S. Segarra, A. Marques, and A. Ribeiro, “Optimal graph-filter design and applications to distributed linear network operators,” IEEE Trans. on Signal Processing, vol. 65, no. 15, pp. 4117–4131, 1 Aug. 2017.
    • — E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Filtering random graph processes over random time-varying graphs,” IEEE Trans. on Signal Processing, vol. 65, no. 16, pp. 4406–4421, Aug. 2017.
    • — Ortega, P. Frossard, J. Kovacevic, J. M. F. Moura, and P. Vandergheynst, “Graph signal processing: Overview, challenges and applications,” Proc. IEEE, vol. 106, no. 5, pp. 808–828, May 2018.
    • — F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, “Convolutional neural network architectures for signals supported on graphs,” IEEE Trans. on Signal Processing, vol. 67, no. 4, pp. 1034–1049, Feb. 2019.
    • — J. Liu, E. Isufi, and G. Leus, “Filter design for autoregressive moving average graph filters,” IEEE Trans. on Signal Information Processing and Networking, vol. 5, no. 1, pp. 47–60, Mar. 2019.
    • — M. Coutino, E. Isufi, and G. Leus, “Advances in distributed graph filtering,” IEEE Trans. on Signal Processing, vol. 67,no. 9, pp. 2320–2333, May 2019.
    • — E. Isufi, F. Gama, and A. Ribeiro, “EdgeNets: edge varying graph neural networks,” arXiv:2001.07620v1 [cs.LG], 21 Jan. 2020. [Online]. Available: http://arxiv.org/abs/2001.07620

    Pre-requisites

    Basics in digital signal processing, linear algebra, optimization and machine learning.

    Short Bio

    Geert Leus received the M.Sc. and Ph.D. degree in Electrical Engineering from the KU Leuven, Belgium, in June 1996 and May 2000, respectively. Geert Leus is now an "Antoni van Leeuwenhoek" Full Professor at the Faculty of Electrical Engineering, Mathematics and Computer Science of the Delft University of Technology, The Netherlands. His research interests are in the broad area of signal processing, with a specific focus on wireless communications, array processing, sensor networks, and graph signal processing. Geert Leus received a 2002 IEEE Signal Processing Society Young Author Best Paper Award and a 2005 IEEE Signal Processing Society Best Paper Award. He is a Fellow of the IEEE and a Fellow of EURASIP. Geert Leus was a Member-at-Large of the Board of Governors of the IEEE Signal Processing Society, the Chair of the IEEE Signal Processing for Communications and Networking Technical Committee, a Member of the IEEE Sensor Array and Multichannel Technical Committee, and the Editor in Chief of the EURASIP Journal on Advances in Signal Processing. He was also on the Editorial Boards of the IEEE Transactions on Signal Processing, the IEEE Transactions on Wireless Communications, the IEEE Signal Processing Letters, and the EURASIP Journal on Advances in Signal Processing. Currently, he is the Chair of the EURASIP Technical Area Committee on Signal Processing for Multisensor Systems, a Member of the IEEE Signal Processing Theory and Methods Technical Committee, a Member of the IEEE Big Data Special Interest Group, an Associate Editor of Foundations and Trends in Signal Processing, and the Editor in Chief of EURASIP Signal Processing.



    Andy Liaw
    (Merck Research Labs) [introductory]
    Machine Learning and Statistics: Better together

    Summary

    Machine Learning and Statistics have many intersections, yet there are many distinct differences. In this course, we will examine the differences and similarities to better understand where each side is coming from and where they are going. Based on these understandings we will look at ways that machine learning tasks can be enhanced with statistical thinking as well as methods. Finally we will learn about how these methods and tools are used in real life with examples drawn from pharmaceutical research and development areas.

    Syllabus

    • ⇲ ML and Statistics: similarities and differences
    • ⇲ Beyond prediction: estimating uncertainty
    • ⇲ Beyond prediction: interpreting models and predictions
    • ⇲ Example applications in pharmaceutical research and development

    References

    Pre-requisites

    Introductory-level Machine Learning, Basic Statistics

    Short Bio

    Andy Liaw has been doing research and applying Statistics and Machine Learning methods to drug discovery areas such as high throughput screening, pharmacology, cheminformatics, proteomics, and biomarkers for the past 20 years. He is the author of the R package randomForest and had made several contributions to the open source R software for Statistics and Data Science. He is currently Senior Principal Scientist in Merck Research Laboratories. He received his Ph.D. in Statistics from Texas A&M University.



    Debora Marks
    (Harvard Medical School) [intermediate]
    Protein Design Using Deep Learning


    Abdelrahman Mohamed
    (Facebook AI Research) [introductory/advanced]
    Recent Advances in Automatic Speech Recognition


    Sayan Mukherjee
    (Duke University) [introductory/intermediate]
    Integrating Deep Learning with Statistical Modeling


    Hermann Ney
    (RWTH Aachen University) [intermediate/advanced]
    Speech Recognition and Machine Translation: From Statistical Decision Theory to Machine Learning and Deep Neural Networks

    Summary

    The last 40 years have seen a dramatic progress in machine learning and statistical methods for speech and language processing like speech recognition, handwriting recognition and machine translation. Many of the key statistical concepts had originally been developed for speech recognition and language translation. Examples of such key concepts are the Bayes decision rule for minimum error rate and sequence-to-sequence processing using approaches like the alignment mechanism based on hidden Markov models and the attention mechanism based on neural networks. Recently the accuracy of speech recognition and machine translation could be improved significantly by the use of artificial neural networks and specific architectures, such as deep feedforward multi-layer perceptrons and recurrent neural networks, attention and transformer architectures. We will discuss these approaches in detail and how they form part of the probabilistic approach.

    Syllabus

    • Part 1: Statistical Decision Theory, Machine Learning and Neural Networks.
    • Part 2: Speech Recognition (Time Alignment, Hidden Markov models, sequence-to-sequence processing, neural nets, attention models).
    • Part 3: Machine Translation (Word Alignment, Hidden Markov models, sequence-to-sequence processing, neural nets, attention models).

    References

      • Bourlard, H. and Morgan, N., Connectionist Speech Recognition - A Hybrid Approach, Kluwer Academic Publishers, ISBN 0-7923-9396-1, 1994.
      • L. Deng, D. Yu: Deep learning: methods and applications. Foundations and Trends in Signal Processing, Vol. 7, No. 3–4, pp. 197-387, 2014.
      • D. Jurafsky, J. H. Martin: Speech and Language Processing. Third edition draft, pdf; August 28, 2017.
      • Y. Goldberg: Neural Network Methods in Natural Language Processing. Morgan & Claypool Publishers, Draft, pdf; August 2016.
      • P. Koehn: Statistical Machine Translation, Cambridge University Press, 2010. In addition: Draft of Chapter 13: Neural Machine Translation, pdf, September 22, 2017.

    Pre-requisites

    Familiarity with linear algebra, numerical mathematics, probability and statistics, elementary machine learning.

    Short Bio

    Hermann Ney is a full professor of computer science at RWTH Aachen University, Germany. His main research interests lie in the area of statistical classification, machine learning, neural networks and human language technology and specific applications to speech recognition, machine translation and handwriting recognition.

    In particular, he has worked on dynamic programming and discriminative training for speech recognition, on language modelling and on machine translation. His work has resulted in more than 700 conference and journal papers (h-index 102, 60000+ citations; estimated using Google scholar). He and his team contributed to a large number of European (e.g. TC-STAR, QUAERO, TRANSLECTURES, EU-BRIDGE) and American (e.g. GALE, BOLT, BABEL) large-scale joint projects.

    Hermann Ney is a fellow of both IEEE and ISCA (Int. Speech Communication Association). In 2005, he was the recipient of the Technical Achievement Award of the IEEE Signal Processing Society. In 2010, he was awarded a senior DIGITEO chair at LIMIS/CNRS in Paris, France. In 2013, he received the award of honour of the International Association for Machine Translation. In 2016, he was awarded an advanced grant of the European Research Council (ERC).



    Lyle John Palmer
    (University of Adelaide) [introductory/advanced]
    Epidemiology for Machine Learning Investigators


    Razvan Pascanu
    (DeepMind) [intermediate/advanced]
    Understanding Learning Dynamics in Deep Learning and Deep Reinforcement Learning

    Summary

    One of the main challenges of deep learning is improving data efficiency and ensuring the scalability of the systems used. These issues are rooted in how these systems learn, and in particular in the gradient based dynamics they rely on. Learning efficiency becomes considerably more problematic when considering the use of neural networks in a reinforcement learning setting. Through this coursework we will introduce basic concepts behind optimization and gradient based learning, and discuss the interplay between expressivity, learnability and the role of inductive biases in the success of neural networks. Towards the end of the course we will introduce some open questions on the topic and some recent research directions explored to improve learning efficiency.

    Syllabus

    • Intro to deep learning basics (covering gradient descent, different family of architectures) 2h
    • Use of neural networks in reinforcement learning settings 1h
    • Recent understanding of learning dynamics and approaches to improve data efficiency 1.5h

    References

    • Deep Learning, Ian Goodfellow, Yoshua Bengio and Aaron Courville
    • On the difficulty to train recurrent neural networks, Razvan Pascanu, Tomas Mikolov, Yoshua Bentio, ICML 2013
    • On the number of linear regions of Deep Neural Networks, Guido Montufar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio, NeurIPS 2014
    • Ray Interference: a source of plateaus in deep reinforcement learning, Tom Schaul, Diana Borsa, Joseph Modayil, Razvan Pascanu, arxiv 2019
    • Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks, Soham De, Sam Smith, arxiv 2020

    Pre-requisites

    No strict requirements of prior knowledge. Familiarity with deep learning and deep reinforcement learning or interest in this topics could be beneficial, as the course will cover a lot of ground quickly

    Short Bio

    Razvan Pascanu is a Research Scientist at DeepMind, London. He obtained a Ph.D. from the University of Montreal under the supervision of Yoshua Bengio, working on different aspects of deep learning, particularly optimization, memory in recurrent models and understanding efficiency of neural networks. While in Montreal he was also a core developer of Theano. Razvan is also one of the organizers of the Eastern European Summer School (www.eeml.eu) and co-organized two NeurIPS workshops on continual learning, and an ICLR workshop on graph nets. He has a wide range of interests and works on topics around deep learning and deep reinforcement learning including optimization, RNNs, meta-learning, continual learning and graph nets.



    Jan Peters
    (Technical University of Darmstadt) [intermediate]
    Robot Learning


    José C. Príncipe
    (University of Florida) [intermediate/advanced]
    Cognitive Architectures for Object Recognition in Video

    Summary

    • I-Requisites for a Cognitive Architecture (intermediate)

      • • Processing in space
      • • Processing in time with memory
      • • Top down and bottom processing
      • • Extraction of information from data with generative models
      • • Attention
    • II- Putting it all together (intermediate)

      • • Empirical Bayes with generative models
      • • Clustering of time series with linear state models
    • III- Current work (advanced)

      • • Information Theoretic Autoencoders
      • • Attention Based video recognition
      • • Augmenting Deep Learning with memory

    Short Bio

    Jose C. Principe is a Distinguished Professor of Electrical and Computer Engineering at the University of Florida where he teaches advanced signal processing, machine learning and artificial neural networks (ANNs). He is Eckis Professor and the Founder and Director of the University of Florida Computational NeuroEngineering Laboratory (CNEL) www.cnel.ufl.edu. The CNEL Lab innovated signal and pattern recognition principles based on information theoretic criteria, as well as filtering in functional spaces. His secondary area of interest has focused in applications to computational neuroscience, Brain Machine Interfaces and brain dynamics. Dr. Principe is a Fellow of the IEEE, AIMBE, and IAMBE. Dr. Principe received the Gabor Award, from the INNS, the Career Achievement Award from the IEEE EMBS and the Neural Network Pioneer Award, of the IEEE CIS. He has more than 38 patents awarded over 800 publications in the areas of adaptive signal processing, control of nonlinear dynamical systems, machine learning and neural networks, information theoretic learning, with applications to neurotechnology and brain computer interfaces. He directed 97 Ph.D. dissertations and 65 Master theses. He wrote in 2000 an interactive electronic book entitled “Neural and Adaptive Systems” published by John Wiley and Sons and more recently co-authored several books on “Brain Machine Interface Engineering” Morgan and Claypool, “Information Theoretic Learning”, Springer, “Kernel Adaptive Filtering”, Wiley and “System Parameter Adaption: Information Theoretic Criteria and Algorithms”, Elsevier. He has received four Honorary Doctor Degrees, from Finland, Italy, Brazil and Colombia, and routinely serves in international scientific advisory boards of Universities and Companies. He has received extensive funding from NSF, NIH and DOD (ONR, DARPA, AFOSR).



    Björn W. Schuller
    (Imperial College London) [introductory/intermediate]
    Deep Signal Processing

    Summary

    This course will deal with deep learning for unimodal, multimodal, and multisensorial signal analysis and synthesis. Modalities mainly include audio, video, text, or physiological signals. Methods shown will, however, be applicable to a broad range of further signal types. We will first deal with pre-processing for denoising or dereverberation or package loss concealment. This will be followed by representation learning such as by convolutional neural networks or sequence-to-sequence encoder-decoder architectures as basis for end-to-end learning from raw signals or symbolic representation. Then, we shall discuss modelling for decision making such as by recurrent neural networks with long-short-term memory or gated recurrent units including handling dynamics by connectionist temporal classification. This will also include discussion of the usage of attention on different levels. We will further elaborate on the impact of topologies including multiple targets with shared layers, and how to move towards self-shaping networks in the sense of Automatic Machine Learning. In a last part, we will deal with some practical questions. These include data efficiency, such as by weak supervision with the human in the loop, data augmentation, active and semi-supervised learning, transfer learning, self-learning, or generative adversarial networks. Further, we will have a glance at modelling efficiency such as by squeezing networks. Privacy, trustability, and explainability enhancing solutions will include federated learning, confidence measurement, and diverse means of visualization. The content shown will be accompanied by open-source implementations of according toolkits available on github. Application examples will mainly come from the domains of Affective Computing, and mHealth.

    Syllabus

      1. Pre-Processing and Representation Learning (Signal Enhancement, Package Loss Concealment, CNNs, S2S, end-to-end)
      1. Modelling for Decision Making (Attention, Feature Space Optimisation, RNNs, LSTM, GRUs, CTC, AutoML)
      1. Data and Model Efficiency (GANs, Transfer Learning, Data Augmentation, Weak Supervision, Cooperative Learning, Self-Learning, Squeezing)
      1. Privacy, Trustability, Explainability (e.g., Federated Learning, Confidence Measurement, Visualization)

    References

    The Handbook of Multimodal-Multisensor Interfaces. Vol. 2, S. Oviatt, B. Schuller, P.R. Cohen, D. Sonntag, G. Potamianos, A. Krüger (eds.), 2018

    Pre-requisites

    Basic Machine Learning and Signal Processing knowledge.

    Short Bio

    Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor in Machine Intelligence and Signal Processing all in EE/IT from TUM in Munich/Germany. He is Full Professor of Artificial Intelligence and the Head of GLAM at Imperial College London/UK, Full Professor and Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO and current CSO of audEERING – an Audio Intelligence company based near Munich and in Berlin/Germany, and permanent Visiting Professor at HIT/China amongst other Professorships and Affiliations. Previous stays include Full Professor at the University of Passau/Germany, and Researcher at Joanneum Research in Graz/Austria, and the CNRS-LIMSI in Orsay/France. He is a Fellow of the IEEE and Golden Core Awardee of the IEEE Computer Society, Fellow of the ISCA, Fellow of the BCS, President-Emeritus of the AAAC, and Senior Member of the ACM. He (co-)authored 900+ publications (30k+ citations, h-index=83), is Field Chief Editor of Frontiers in Digital Health and was Editor in Chief of the IEEE Transactions on Affective Computing amongst manifold further commitments and service to the community. His 30+ awards include having been honoured as one of 40 extraordinary scientists under the age of 40 by the WEF in 2015. He served as Coordinator/PI in 15+ European Projects, is an ERC Starting Grantee, and consultant of companies such as Barclays, GN, Huawei, or Samsung.



    Sargur N. Srihari
    (University at Buffalo) [introductory]
    Generative Models in Deep Learning


    Gaël Varoquaux
    (INRIA) [intermediate]
    Representation Learning in Limited Data Settings

    Summary

    The success of deep-learning hinges on intermediate representations: transformations of the data on which statistical learning is easier. Deep architectures can extract very rich and powerful representations, but it needs huge volumes of data. In this course, we will study the fundamentals of simple representations. Simple representations are interesting because they can be learned in limited data settings. We will also use them to provide didactic cases to understand how to build statistical models from data. The goal of the course is to provide the basic mathematical concepts that underly successful representation extracted in limited data settings.

    Syllabus

    — Shallow representations: what and why? — Matrix factorizations and its variants: — From PCA to ICA — Sparse dictionary learning: formulation and efficient solvers — Word vectors demystified — Fisher kernels: vector representations from a data model — Theory: from likelihood to representation — Encoding strings and text — Encoding covariances

    References

    • [1] Hyvärinen, A., & Oja, E. (2000). Independent component analysis: algorithms and applications. Neural networks, 13(4-5), 411-430.
    • [2] Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11(Jan), 19-60.
    • [3] Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems (pp. 2177-2185).
    • [4] Jaakkola, T., & Haussler, D. (1999). Exploiting generative models in discriminative classifiers. In Advances in neural information processing systems (pp. 487-493).

    Pre-requisites

    — General knowledge of statistical learning — Basic knowledge of probability — Basic knowledge of linear algebra

    Short Bio

    Gaël Varoquaux is a computer-science researcher at Inria. His research focuses on statistical learning tools for data science and scientific inference. He has pioneered the use of machine learning on brain images to map cognition and pathologies. More generally, he develops tools to make machine learning easier, with statistical models suited for real-life, uncurated data, and software for data science. He co-funded scikit-learn, one of the reference machine-learning toolboxes, and helped build various central tools for data analysis in Python. Varoquaux has contributed key methods for learning on spatial data, matrix factorizations, and modeling covariance matrices. He has a PhD in quantum physics and is a graduate from Ecole Normale Superieure, Paris.



    René Vidal
    (Johns Hopkins University) [intermediate/advanced]
    Mathematics of Deep Learning

    Summary

    The past few years have seen a dramatic increase in the performance of recognition systems thanks to the introduction of deep networks for representation learning. However, the mathematical reasons for this success remain elusive. For example, a key issue is that the neural network training problem is nonconvex, hence optimization algorithms are not guaranteed to return a global minima. The first part of this tutorial will overview recent work on the theory of deep learning that aims to understand how to design the network architecture, how to regularize the network weights, and how to guarantee global optimality. The second part of this tutorial will present sufficient conditions to guarantee that local minima are globally optimal and that a local descent strategy can reach a global minima from any initialization. Such conditions apply to problems in matrix factorization, tensor factorization and deep learning. The third part of this tutorial will present an analysis of dropout for matrix factorization, and establish connections

    Syllabus

      1. Introduction to Deep Learning Theory: Optimization, Regularization and Architecture Design
      1. Global Optimality in Matrix Factorization
      1. Global Optimality in Tensor Factorization and Deep Learning
      1. Dropout as a Low-Rank Regularizer for Matrix Factorization

    References

    Pre-requisites

    Basic understanding of sparse and low-rank representation and non-convex optimization.

    Short Bio

    Rene Vidal is a Professor of Biomedical Engineering and the Innaugural Director of the Mathematical Institute for Data Science at The Johns Hopkins University. His research focuses on the development of theory and algorithms for the analysis of complex high-dimensional datasets such as images, videos, time-series and biomedical data. Dr. Vidal has been Associate Editor of TPAMI and CVIU, Program Chair of ICCV and CVPR, co-author of the book 'Generalized Principal Component Analysis' (2016), and co-author of more than 200 articles in machine learning, computer vision, biomedical image analysis, hybrid systems, robotics and signal processing. He is a fellow of the IEEE, IAPR and Sloan Foundation, a ONR Young Investigator, and has received numerous awards for his work, including the 2012 J.K. Aggarwal Prize for "outstanding contributions to generalized principal component analysis (GPCA) and subspace clustering in computer vision and pattern recognition” as well as best paper awards in machine learning, computer vision, controls, and medical robotics.



    Ming-Hsuan Yang
    (University of California, Merced) [intermediate/advanced]
    Learning to Track Objects

    Summary

    The goal is to introduce the recent advances in object tracking based on deep learning and related approaches. Performance evlaution and challenging factors in this field will be discussed.

    Syllabus

    • Brief history of viusal tracking
    • Generative approach
    • Discriminative approach
    • Deep learning methods
    • Performance evaluation
    • Chellenages and future reseach directions

    References

    • Y. Wu, J. Lim, and M.-H. Yang, Object Tracking Benchmark, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015.
    • H. Nam and B. Han, Learning Multi-domain Convolutional Neural Networks for Visual Tracking, CVPR, 2016.
    • M. Danelljan, G. Bhat, F. Khan, M. Felsberg, ECO: Efficient Convolution Operators for Tracking. CVPR, 2017.

    Pre-requisites

    Basic knowledge in computer vision and intermediate knowledge in deep learning

    Short Bio

    Ming-Hsuan Yang is a Professor of Electrical Engineering and Computer Science at University of California, Merced, and a Research Scientist at Google Cloud. He serves as a program co-chair of IEEE International Conference on Computer Vision (ICCV) in 2019, program co-chair of Asian Conference on Computer Vision (ACCV) in 2014, and general co-chair of ACCV 2016. He has served as an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) from 2007 to 2011, and currently serves as an associate editor of the International Journal of Computer Vision (IJCV), Computer Vision and Image Understanding (CVIU), Image and Vision Computing (IVC) and Journal of Artificial Intelligence (JAIR). Yang received the Google Faculty Award in 2009 and the Faculty Early Career Development (CAREER) award from the National Science Foundation in 2012. In 2015. He received paper awards from UIST 2017, CVPR 2018 and ACCV 2018. He is an IEEE Fellow.