[intermediate/advanced] Deep Learning for 3D Vision
Our physical environment is three-dimensional (3D) and we move around in 3D every day. Low-cost consumer depth cameras (e.g., Microsoft Kinect v2, Intel Realsense, Orbbec Astra) have enabled a number of real-time applications due to their high acquisition frame-rate. On the other hand, depth from RGB images (including depth from stereo) has been the focus of attention for a number of years, due to its strong connection to the human binocular system. With advances in deep learning, a number of approaches to estimate depth from RGB images (including stereo vision) have recently emerged. This course will cover the use of deep learning for 1/ the estimation of depth from stereo vision; and 2/ the processing of the depth images for a number of 3D Vision applications including 3D scene classification, 3D object detection, and tracking, 3D segmentation.
The first part of the course will focus on stereo-based depth estimation using deep learning, and the second part will focus on the use of deep learning on point clouds and to cover the unique challenges faced by the processing of point clouds with deep neural networks. I will focus on three major tasks: 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation.
 Y. Guo, M. Bennamoun, F. Sohel, M. Lu and J. Wan, ‘3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey’, IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE PAMI), Vol. 36, Issue 11, Nov 2014.
 H. Laga, L. V. Jospin, F. Boussaid and M. Bennamoun, “A Survey on Deep Learning Techniques for Stereo-based Depth Estimation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.3032602.
 Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu and M. Bennamoun, “Deep Learning for 3D Point Clouds: A Survey,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.3005434.
 S.H. Khan, H. Rahmani, S.A.A. Khan and M. Bennamoun, ‘A Guide to Convolutional Neural Networks for Computer Vision’, Morgan & Claypool Publishers, February 2018. Available on Amazon.
 H. Tabia, H. Laga, Y. Guo, R. Fisher and M. Bennamoun, ‘3D Shape Analysis – Fundamentals, Theory and Applications’, John Wiley, January 2019. Available on Amazon.
 M. Bennamoun, Y. Guo, F. Tombari, K. Youcef-Toumi, and K. Nishino, ‘RGB-D Vision: Methods and Applications’, special issue, IEEE Transactions on Pattern Analysis and Machine Intelligence, October 2020.
Basic knowledge of Deep Learning applied to computer vision and 3D pointclouds.
Mohammed Bennamoun is Winthrop Professor in the Department of Computer Science and Software Engineering at The University of Western Australia (UWA). He is a researcher in Computer Vision, Machine/Deep Learning, Robotics, and Signal/Speech processing. He has published 4 books (available on Amazon), 1 edited book, 1 Encyclopedia article, 14 book chapters, 150+ journal papers, 260+ conference publications, 16 invited & keynote publications. His h-index is 61 and his number of citations is 16,000+ (Google Scholar). He was awarded 70+ competitive research grants, from the Australian Research Council, and numerous other Government, UWA and industry Research Grants. He successfully supervised 28+ PhD students to completion. He won the Best Supervisor of the Year Award at QUT (1998), and received award for research supervision at UWA (2008 & 2016) and Vice-Chancellor Award for mentorship (2016). He delivered conference tutorials at major conferences, including: IEEE Computer Vision and Pattern Recognition (CVPR 2016), Interspeech 2014, IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) and European Conference on Computer Vision (ECCV). He was also invited to give a Tutorial at an International Summer School on Deep Learning (DeepLearn 2017).