Course Description

 

Courses



  • Rick S. Blum
    (Lehigh University) [introductory/intermediate]
    Deep Learning and Cybersecurity


    Ben Brown
    (Lawrence Berkeley National Laboratory) [introductory/advanced]
    Explainable AI (XAI) Techniques for Science and Engineering - Toward Statistical Inference for the 21st Century


    Georgios Giannakis
    (University of Minnesota) [advanced]
    Ensembles for Interactive and Deep Learning Machines with Scalability, Expressivity, and Adaptivity

    Syllabus

    • — Day 1: Online Scalable Learning Adaptive to Unknown Dynamics and Graphs – Part I: Multi-kernel Approaches

      • — Kernel based methods exhibit well-documented performance in various nonlinear learning tasks. Most of them rely on a preselected kernel, whose prudent choice presumes task-specific prior information. Especially when the latter is not available, multi-kernel learning has gained popularity thanks to its flexibility in choosing kernels from a prescribed kernel dictionary. Leveraging the random feature approximation, this talk will introduce first for static setups a scalable multi-kernel learning approach (termed Raker) to obtain the sought nonlinear learning function ‘on the fly,’ bypassing the curse of dimensionality associated with kernel methods. We will also present an adaptive multi-kernel learning scheme (termed AdaRaker) that relies on weighted combinations of advices from hierarchical ensembles of experts to boost performance in dynamic environments. The weights account not only for each kernel’s contribution to the learning process, but also for the unknown dynamics. Performance is analyzed in terms of both static and dynamic regrets. AdaRaker is uniquely capable of tracking nonlinear learning functions in environments with unknown dynamics, with analytic performance guarantees. The approach is further tailored for online graph-adaptive learning with scalability and privacy. Tests with synthetic and real datasets will showcase the effectiveness of the novel algorithms.
    • — Day 2: Online Scalable Learning with Adaptivity and Robustness – Part II: Deep and Ensemble GPs

      • — Abstract. Approximation and inference of functions from data are ubiquitous tasks in statistical learning theory and applications. Among relevant approaches with growing popularity, this talk deals with Gaussian process (GP) based approaches that not only learn over a class of nonlinear functions, but also quantify the associated uncertainty. To cope with the curse of dimensionality in this context, random feature Fourier (RF) vectors lead to parametric GP-RF function models, that offer scalable forms of Wiener’s minimum mean-square error approach. The talk will next touch upon deep GP architectures, and will further focus on a weighted ensemble (E) of GP-RF learners, each with a distinct covariance (kernel) belonging to a prescribed dictionary, and jointly learning a much richer class of functions. In addition to robustness, these ensembles can operate in either batch or online form interactively, even for dynamic functions along the lines of adaptive Kalman filters. The performance of EGP-based learning will be benchmarked using regret analysis. Broader applicability of EGPs will be also demonstrated for policy evaluation in reinforcement learning with the kernel(s) selected interactively on-the-fly. Case studies will highlight the merits of deep and ensemble GPs.

    References

    • — G. B. Giannakis, Y. Shen, and G. V. Karanikolas, "Topology Identification and Learning over Graphs: Accounting for Nonlinearities and Dynamics," Proceedings of the IEEE, vol. 106, no. 5, pp. 787-807, May 2018.
    • — Q. Lu, G. V. Karanikolas, Y. Shen, and G. B. Giannakis, "Ensemble Gaussian Processes with Spectral Features for Online Interactive Learning with Scalability," Proc. of 23rd Intl. Conf. on Artificial Intelligence and Statistics, Palermo, Italy, June 3-5, 2020.
    • — A. Rahimi and B. Recht, “Random features for large scale kernel machines,” Proc. Advances in Neural Info. Process. Syst., pp. 117-1184, Canada, Dec. 2008.
    • — C. Rasmussen, C. Williams, “Gaussian processes for machine learning,” MIT Press, Cambidge, 2006.
    • — S. Shalev-Shwartz, “Online learning and online convex optimization,” Foundations and Trends in Machine Learning, vol. 4, no. 2, pp. 107–194, 2011.
    • — Y. Shen , T. Chen and G. B. Giannakis, “Random Feature-based Online Multi-kernel Learning in Environments with Unknown Dynamics,” Journal of Machine Learning Research, vol. 20, no. 22, pp. 1-36, February 2019.
    • — Y. Shen, G. Leus, and G. B. Giannakis, “Online Graph-Adaptive Learning with Scalability and Privacy,” IEEE Transactions on Signal Processing, vol. 67, no. 9, pp. 2471-2483, May 2019.

    Pre-requisites

    • — Graduate-level courses in Random Processes, Linear Algebra, and Machine Learning

    Short Bio

    Prof. Georgios B. Giannakis, ADC Chair in Wireless Telecommunications and McKnight Presidential Chair in ECE, University of Minnesota. Georgios B. Giannakis (Fellow’97) received his Diploma in Electrical Engr. (EE) from the Ntl. Tech. Univ. of Athens, Greece, 1981. From 1982 to 1986 he was with the U. of Southern California (USC), where he received his MSc. in EE, 1983, MSc. in Mathematics, 1986, and Ph.D. in EE, 1986. He was with the U. of Virginia from 1987 to 1998, and since 1999 he has been with the U. of Minnesota, where he holds a Chair in Wireless Communications, a U. of Minnesota McKnight Presidential Chair in ECE, and serves as director of the Digital Technology Center. His general interests span the areas of statistical learning, communications, and networking - subjects on which he has published more than 460 journal papers, 760 conference papers, 26 book chapters, two edited books and two research monographs. Current research focuses on Data Science with applications to brain, and power networks with renewables. He is the (co-) inventor of 33 patents issued, and the (co-) recipient of 9 best journal paper awards from the IEEE Signal Processing (SP) and Communications Societies, including the G. Marconi Prize Paper Award in Wireless Communications. He also received the IEEE-SPS Nobert Wiener Society Award (2019); Technical Achievement Awards from the SP Society (2000) and from EURASIP (2005); the IEEE ComSoc Education Award (2019); the G. W. Taylor Award for Distinguished Research from the University of Minnesota, and the IEEE Fourier Technical Field Award (inaugural recipient in 2015). He is a Fellow of the National Academy of Inventors, IEEE and EURASIP, and has served the IEEE in a number of posts, including that of a Distinguished Lecturer for the IEEE-SPS.



    Vincent Lepetit
    (ENPC ParisTech) [intermediate]
    Deep Learning and 3D Geometry

    Summary

    Deep Learning and 3D Geometry

    While Deep Learning in computer vision had been focusing on 2D analysis of images, such as 2D object detection or image segmentation, these recent years have seen the development of many approaches applying the power of Deep Learning to 3D perception from color images, to to solve problems that were very challenging or even impossible a few years ago. Because Deep Learning and 3D geometry come from very different mathematical worlds, one has to find smart ways to connect them, and benefit from These approaches often rely on combinations of Deep Learning applied to 2D images and 3D geometry techniques. In this course, we will review and explain recent approaches to Deep Learning and 3D geometry problems, including 3D object pose estimation, 3D hand pose estimation, feature point detection, self-learning for depth prediction, and 3D scene understanding.

     Syllabus

    • 3D object pose estimation,
    • 3D hand pose estimation,
    • feature point detection,
    • self-learning for depth prediction,
    • differentiable rendering,
    • 3D scene understanding.

    References

    • Bugra Tekin, Federica Bogo, Marc Pollefeys. H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions. Computer Vision and Pattern Recognition (CVPR), 2019.
    • 3D Pose Estimation and 3D Model Retrieval for Objects in the Wild. Alexander Grabner, Peter M. Roth, and Vincent Lepetit. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    • LIFT: Learned Invariant Feature Transform. Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. In Proceedings of the European Conference on Computer Vision (ECCV), 2016.
    • C. Godard, O. Mac Aodha, and G. J. Brostow. Unsupervised Monocular Depth Estimation with Left-Right Consistency. In Conference on Computer Vision and Pattern Recognition, 2017.

    Pre-requisites

    Basic knowledge of Deep Learning applied to computer vision and 3D Geometry

    Short Bio

    Vincent Lepetit is a director of research at ENPC ParisTech since 2019. Prior to being at ENPC, he was a full professor at the Institute for Computer Graphics and Vision, Graz University of Technology, Austria, and before that, a senior researcher at the Computer Vision Laboratory (CVLab) of EPFL, Switzerland. His research interest are at the interface between Machine Learning and 3D Computer Vision, and currently focus on 3D scene understanding from images. He often serves as an area chair for the major computer vision conferences (CVPR, ICCV, ECCV) and is an associate editor for PAMI, IJCV, and CVIU.



    Geert Leus
    (Delft University of Technology) [introductory/intermediate]
    Graph Signal Processing: Introduction and Connections to Distributed Optimization and Deep Learning

    Summary

    The field of graph signal processing extends classical signal processing tools to signals (data) with an irregular structure that can be characterized my means of a graph (e.g., network data). One of the cornerstones of this field are graph filters, direct analogues of time-domain filters, but intended for signals defined on graphs. In this course, we introduce the field of graph signal processing and specifically give an overview of the graph filtering problem. We look at the family of finite impulse response (FIR) and infinite impulse response (IIR) graph filters and show how they can be implemented in a distributed manner. To further limit the communication and computational complexity of such a distributed implementation, we also generalize the state-of-the-art distributed graph filters to filters whose weights show a dependency on the nodes sharing information. These so-called edge-variant graph filters yield significant benefits in terms of filter order reduction and can be used for solving specific distributed optimization problems with an extremely fast convergence. Finally, we will overview how graph filters can be used in deep learning applications involving data sets with an irregular structure. Different types of graph filters can be used in the convolution step of graph convolutional networks leading to different trade-offs in performance and complexity. The numerical results presented in this talk illustrate the potential of graph filters in distributed optimization and deep learning.

    Syllabus

    • — Introduction to graph signal processing
    • — Graph filters and their extensions
    • — Connections to distributed optimization as well as related applications
    • — Connections to deep learning as well as related applications

    References

    • — D. I. Shuman, P. Vandergheynst, and P. Frossard, “Chebyshev polynomial approximation for distributed signal processing,” in IEEE International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS), 2011, pp. 1–8.
    • — D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83–98, 2013.
    • — A. Sandryhaila and J. M. Moura, “Discrete signal processing on graphs,” IEEE Trans. on Signal Processing, vol. 61, no. 7, pp. 1644–1656, 2013.
    • — M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in30th Conf. Neural Inform. Process. Syst. Barcelona, Spain: Neural Inform. Process. Foundation, 5-10 Dec. 2016, pp. 3844–3858.
    • — E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Autoregressive moving average graph filtering,” IEEE Trans. on Signal Processing, vol. 65, no. 2, pp. 274–288, Jan. 2017.
    • — S. Segarra, A. Marques, and A. Ribeiro, “Optimal graph-filter design and applications to distributed linear network operators,” IEEE Trans. on Signal Processing, vol. 65, no. 15, pp. 4117–4131, 1 Aug. 2017.
    • — E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Filtering random graph processes over random time-varying graphs,” IEEE Trans. on Signal Processing, vol. 65, no. 16, pp. 4406–4421, Aug. 2017.
    • — Ortega, P. Frossard, J. Kovacevic, J. M. F. Moura, and P. Vandergheynst, “Graph signal processing: Overview, challenges and applications,” Proc. IEEE, vol. 106, no. 5, pp. 808–828, May 2018.
    • — F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, “Convolutional neural network architectures for signals supported on graphs,” IEEE Trans. on Signal Processing, vol. 67, no. 4, pp. 1034–1049, Feb. 2019.
    • — J. Liu, E. Isufi, and G. Leus, “Filter design for autoregressive moving average graph filters,” IEEE Trans. on Signal Information Processing and Networking, vol. 5, no. 1, pp. 47–60, Mar. 2019.
    • — M. Coutino, E. Isufi, and G. Leus, “Advances in distributed graph filtering,” IEEE Trans. on Signal Processing, vol. 67,no. 9, pp. 2320–2333, May 2019.
    • — E. Isufi, F. Gama, and A. Ribeiro, “EdgeNets: edge varying graph neural networks,” arXiv:2001.07620v1 [cs.LG], 21 Jan. 2020. [Online]. Available: http://arxiv.org/abs/2001.07620

    Pre-requisites

    Basics in digital signal processing, linear algebra, optimization and machine learning.

    Short Bio

    Geert Leus received the M.Sc. and Ph.D. degree in Electrical Engineering from the KU Leuven, Belgium, in June 1996 and May 2000, respectively. Geert Leus is now an "Antoni van Leeuwenhoek" Full Professor at the Faculty of Electrical Engineering, Mathematics and Computer Science of the Delft University of Technology, The Netherlands. His research interests are in the broad area of signal processing, with a specific focus on wireless communications, array processing, sensor networks, and graph signal processing. Geert Leus received a 2002 IEEE Signal Processing Society Young Author Best Paper Award and a 2005 IEEE Signal Processing Society Best Paper Award. He is a Fellow of the IEEE and a Fellow of EURASIP. Geert Leus was a Member-at-Large of the Board of Governors of the IEEE Signal Processing Society, the Chair of the IEEE Signal Processing for Communications and Networking Technical Committee, a Member of the IEEE Sensor Array and Multichannel Technical Committee, and the Editor in Chief of the EURASIP Journal on Advances in Signal Processing. He was also on the Editorial Boards of the IEEE Transactions on Signal Processing, the IEEE Transactions on Wireless Communications, the IEEE Signal Processing Letters, and the EURASIP Journal on Advances in Signal Processing. Currently, he is the Chair of the EURASIP Technical Area Committee on Signal Processing for Multisensor Systems, a Member of the IEEE Signal Processing Theory and Methods Technical Committee, a Member of the IEEE Big Data Special Interest Group, an Associate Editor of Foundations and Trends in Signal Processing, and the Editor in Chief of EURASIP Signal Processing.



    Andy Liaw
    (Merck Research Labs) [introductory]
    Deep Learning and Statistics: Better Together


    Abdelrahman Mohamed
    (Facebook AI Research) [introductory/advanced]
    Recent Advances in Automatic Speech Recognition


    Jan Peters
    (Technical University of Darmstadt) [intermediate]
    Robot Learning


    Massimiliano Pontil
    (Italian Institute of Technology) [intermediate/advanced]
    Statistical Learning Theory


    Jose Principe
    (University of Florida) [intermediate/advanced]
    Cognitive Architectures for Object Recognition in Video


    Fedor Ratnikov
    (National Research University Higher School of Economics) [introductory]
    Specifics of Applying Machine Learning to Problems in Natural Science


    Salim Roukos
    (IBM Research AI) [intermediate/advanced]
    Deep Learning Methods for Natural Language Processing

    Summary

    These 3 lectures will cover several neural models for high performance NLP algorithms covering applications such as machine translation, reading comprehension, natural language generation, and semantic parsing.

    Syllabus

    • — 1. Overview of neural networks (back propagation).
    • —2. NN model architectures: CNN, RNN, LSTM, Attention, Transformer.
    • —3. Multi-task learning and transfer learning. Word and sentence embeddings.
    • —4. NLP models: language modeling, machine translation, reading comprehension, and natural language generation.
    • — 5. Stack transformers for semantic parsing (Abstract Meaning Representation).

    References

    • — Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016
    • — Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate.arXiv:1409.0473, 2014.
    • — Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc V. Le. Massive exploration of neural ,machine translation architectures.arXiv:1703.03906, 2017.
    • — Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.NAACL-HLT 2019
    • — Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016
    • — Tom Kwiatkowski, Jennimaria Palomaki, Olivia Rhinehart, Michael Collins, Ankur Parikh, Chris Al-berti, Danielle Epstein, Illia Polosukhin, Matthew Kelcey, Jacob Devlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: a benchmark for question answering research.Transactions of the Association of Computational Linguistics. 2019.
    • — Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692. 2019.
    • — Tahira Naseem, A Shah, Hui Wan, Radu Florian, Salim Roukos, Miguel Ballesteros. Rewarding Smatch: Transition-Based AMR Parsing with Reinforcement Learning.arXiv:1905.13370, 2019.
    • — Lin Pan, Rishav Chakravarti, Anthony Ferritto, Michael Glass, Alfio Gliozzo, Salim Roukos, Radu Florian, and Avirup Sil. Frustratingly easy natural question answering.arXiv preprintarXiv:1909.05286. 2019.
    • — Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. Language Models are Unsupervised Multitask Learner.OpenAI Blog, 1(8):9, 2019.
    • — Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin. Attention is all you need.In Advances in neural information processing systems, pp. 5998–6008, 2017.

    Pre-requisites

    Mathematics at the level of an undergraduate degree in engineering, computer science, math and physics: basic multivariate calculus, probability theory, and linear algebra.

    Short Bio

    Salim is an IBM Fellow and CTO for Translation Technologies at IBM T. J. Watson Research Center. Dr. Roukos joined Bolt Beranek and Newman from 1980 through 1989, where he was a Senior Scientist in charge of projects in speech compression, time scale modification, speaker identification, word spotting, and spoken language understanding. He was an Adjunct Professor at Boston University in 1988 before joining IBM in 1989. Dr. Roukos has served as Chair of the IEEE Digital Signal Processing Committee in 1988. Salim Roukos has lead teams at IBM T.J. Watson research Center that focused on various problems using machine learning techniques for natural language processing. The group pioneered many of the statistical methods for NLP from statistical parsing, to natural language understanding, to statistical machine translation and machine translation evaluation metrics (BLEU metric). Roukos has over a 150 publications in the speech and language areas and over two dozen patents. Roukos was the lead of the group which introduced the first commercial statistical language understanding system for conversational telephony systems (IBM ViaVoice Telephony) in 2000 and the first statistical machine translation product for Arabic-English translation in 2003. More recently, his team created the IBM Watson Language Translator and the custom models for IBM Natural Language Understanding. Roukos is also a fellow of the Association for Computational Linguistics.



    Björn Schuller
    (Imperial College London) [introductory/intermediate]
    Deep Signal Processing


    Alex Smola
    (Amazon) [introductory/advanced]
    Dive into Deep Learning


    Sargur N. Srihari
    (University at Buffalo) [introductory]
    Generative Models in Deep Learning


    Kunal Talwar
    (Google Brain) [intermediate]
    Differentially Private Machine Learning


    René Vidal
    (Johns Hopkins University) [intermediate/advanced]
    Mathematics of Deep Learning

    Summary

    The past few years have seen a dramatic increase in the performance of recognition systems thanks to the introduction of deep networks for representation learning. However, the mathematical reasons for this success remain elusive. For example, a key issue is that the neural network training problem is nonconvex, hence optimization algorithms are not guaranteed to return a global minima. The first part of this tutorial will overview recent work on the theory of deep learning that aims to understand how to design the network architecture, how to regularize the network weights, and how to guarantee global optimality. The second part of this tutorial will present sufficient conditions to guarantee that local minima are globally optimal and that a local descent strategy can reach a global minima from any initialization. Such conditions apply to problems in matrix factorization, tensor factorization and deep learning. The third part of this tutorial will present an analysis of dropout for matrix factorization, and establish connections

    Syllabus

      1. Introduction to Deep Learning Theory: Optimization, Regularization and Architecture Design
      1. Global Optimality in Matrix Factorization
      1. Global Optimality in Tensor Factorization and Deep Learning
      1. Dropout as a Low-Rank Regularizer for Matrix Factorization

    References

    Pre-requisites

    Basic understanding of sparse and low-rank representation and non-convex optimization.

    Short Bio

    Rene Vidal is a Professor of Biomedical Engineering and the Innaugural Director of the Mathematical Institute for Data Science at The Johns Hopkins University. His research focuses on the development of theory and algorithms for the analysis of complex high-dimensional datasets such as images, videos, time-series and biomedical data. Dr. Vidal has been Associate Editor of TPAMI and CVIU, Program Chair of ICCV and CVPR, co-author of the book 'Generalized Principal Component Analysis' (2016), and co-author of more than 200 articles in machine learning, computer vision, biomedical image analysis, hybrid systems, robotics and signal processing. He is a fellow of the IEEE, IAPR and Sloan Foundation, a ONR Young Investigator, and has received numerous awards for his work, including the 2012 J.K. Aggarwal Prize for "outstanding contributions to generalized principal component analysis (GPCA) and subspace clustering in computer vision and pattern recognition” as well as best paper awards in machine learning, computer vision, controls, and medical robotics.



    Haixun Wang
    (WeWork) [introductory/intermediate]
    Conceptual Understanding and Machine Learning

     Summary

    Big data holds the potential to solve many challenging problems, and one of them is understanding natural languages, which still faces tremendous challenges. It has been shown that in areas such as question answering and conversation, domain knowledge is indispensable. Thus, how to acquire, represent, and apply domain knowledge for text understanding is of critical importance. In this short course, I will use understanding short text (search queries, tweets, captions, titles, etc) as an example to demonstrate the challenges in this domain. Short text understanding is crucial to many applications. In addition to known difficulties in natural language understanding, short texts do not always observe the syntax of a written language. As a result, traditional natural language processing methods cannot be easily applied. Second, short texts usually do not contain sufficient statistical signals to support many state-of-the-art approaches for text processing such as topic modeling. Third, short texts are usually more ambiguous. I will go over various techniques in knowledge acquisition, representation, and inferencing has been proposed for text understanding, and will describe massive structured and semi-structured data that have been made available in the recent decade that directly or indirectly encode human knowledge, turning the knowledge representation problems into a computational grand challenge with feasible solutions insight.

    Syllabus

      1. Big data and statistical inference
      1. The rise and fall of the semantic network
      1. Knowledge of language
      1. Conceptual knowledge for text understanding
      1. Knowledge Extraction / Acquisition
      1. Knowledge Reasoning / Modeling
      1. Conclusion and Future work

    References

    • 1.- Kenneth Church, A Pendulum Swung Too Far, Linguistic Issues in Language Technology – LiLT Volume 2, Issue 4 May 2007
    • 2.- Gregory Murphy, The Big Book of Concepts, MIT Press
    • 3.- George Lakoff, Women, Fire and Dangerous Things: What Categories Reveal About the Mind, University of Chicago Press (1990)

    Pre-Requisites

    None.

    Short Bio

    Haixun Wang is an IEEE fellow, a chief-editor of the IEEE Data Engineering Bulletin, and a VP of Engineering and Distinguished Scientist at WeWork, where he leads the engineering team as well as the Research and Applied Science division. He was Director of Natural Language Processing at Amazon. Before Amazon, he led the NLP Infra team in Facebook working on Query and Document Understanding. From 2013 to 2015, he was with Google Research, working on natural language processing. From 2009 to 2013, he led research in semantic search, graph data processing systems, and distributed query processing at Microsoft Research Asia. His knowledge base project Probase has created significant impact in industry and academia. He had been a research staff member at IBM T. J. Watson Research Center from 2000 – 2009. He was Technical Assistant to Stuart Feldman (Vice President of Computer Science of IBM Research) from 2006 to 2007, and Technical Assistant to Mark Wegman (Head of Computer Science of IBM Research) from 2007 to 2009. He received the Ph.D. degree in Computer Science from the University of California, Los Angeles in 2000. He has published more than 150 research papers in referred international journals and conference proceedings. He served as PC Chairs of conferences such as CIKM’12, and he is on the editorial board of journals such as IEEE Transactions of Knowledge and Data Engineering (TKDE) and Journal of Computer Science and Technology (JCST). He won the best paper award in ICDE 2015, 10-year best paper award in ICDM 2013, and best paper award of ER 2009.