Universitatea de Vest
Blvd. Vasile Parvân, nr. 4
300223 Timişoara, Romania

Links of Interest

BigData Finance Conference

Active Links

Past Links

Contact Information

David Silva
david.silva409 (at) yahoo.com

4^th International Winter School on Big Data

Timişoara, Romania, January 22-26, 2018

Course Description

Keynotes

Bing Liu (University of Illinois, Chicago)
Towards Machines that Learn like Humans
Jeffrey Ullman (Stanford University)
Data Science: Is it Real?

Courses ▼

Paul Bliese (University of South Carolina)
Using R for Mixed-effects (Multilevel) Models [introductory/intermediate]
Hendrik Blockeel (KU Leuven)
Decision Trees for Big Data Analytics [intermediate]
Sašo Džeroski (Jožef Stefan Institute)
Multi-target Prediction: Techniques and Applications [introductory/intermediate]
Geoffrey C. Fox (Indiana University, Bloomington)
Integration of HPC, Big Data Analytics and Software Ecosystem [intermediate]
Minos Garofalakis (Technical University of Crete)
Data Streaming Analytics [intermediate/advanced]
David W. Gerbing (Portland State University)
Data Visualization with R [introductory]
Maurizio Lenzerini (Sapienza University of Rome)
Semantic Technologies for Open Data Publishing [intermediate/advanced]
Bing Liu (University of Illinois, Chicago)
Lifelong Learning and its Application to NLP [intermediate/advanced]
B.S. Manjunath (University of California, Santa Barbara)
Unstructured (Big) Data [introductory]
Folker Meyer (Argonne National Laboratory)
Efficient Multi Cloud Execution of Reproducible Data Analytics using Common Workflow Language, AWE and SHOCK [introductory/intermediate]
Wladek Minor (University of Virginia)
Big Data in Biomedical Sciences [introductory/advanced]
Fionn Murtagh (University of Huddersfield)
The New Science of Big Data Analytics, Based on the Geometry and the Topology of Complex, Hierarchic Systems [introductory/advanced]
Raymond Ng (University of British Columbia)
Mining and Summarizing Text Conversations [introductory]
Srinivasan Parthasarathy (Ohio State University)
Network Science Fundamentals [introductory/intermediate]
Hanan Samet (University of Maryland, College Park)
Sorting in Space: Multidimensional, Spatial, and Metric Data Structures for Applications in Spatial Databases, Geographic Information Systems (GIS), and Location-based Services [introductory/intermediate]
Kyuseok Shim (Seoul National University)
MapReduce Algorithms for Big Data Analysis [introductory/intermediate]
Jaideep Srivastava (University of Minnesota),
Social Computing: Computing as an Integral Tool to Understanding Human Behavior and Solving Problems of Social Relevance [introductory/intermediate]
Jeffrey Ullman (Stanford University)
Big-data Algorithms That Aren't Machine Learning [introductory]
Xiaowei Xu (University of Arkansas, Little Rock)
Mining Big Networked Data [introductory/advanced]
Zhongfei Zhang (Binghamton University)
Relational and Media Data Learning and Knowledge Discovery [introductory/advanced]

Keynotes

Bing Liu
Distinguished Professor Department of Computer Science University of Illinois at Chicago (UIC)
Towards Machines that Learn like Humans

Abstract

The classic machine learning (ML) paradigm works by learning a model from a large set of labeled examples. However, humans probably don’t learn this way. In my life, I have never been given a set of labeled documents and asked to build a text classifier. Even if I am given 10000 labeled documents, I will not be able to do it if I do not understand the language. Moreover, after learning to recognize a set of objects, we not only can recognize these objects, but also can identify objects that do not belong to the learned set. We can also learn new knowledge continuously without forgetting what I have learned in the past. We also can easily interact with the dynamic environment and learn from it efficiently, but current reinforcement learning algorithms are too inefficient and hard to use in practice, although they do extremely well in games. In this talk, I would like to describe some of these shortcomings of the classic learning based on my practical experiences in sentiment analysis and self-driving cars and discuss how we try to pursue a paradigm shift and build machines that learn like humans.

Short Bio

Bing Liu is a distinguished professor of Computer Science at the University of Illinois at Chicago. He received his Ph.D. in Artificial Intelligence from the University of Edinburgh. His research interests include lifelong learning, sentiment analysis, data mining, machine learning, and natural language processing. He has published extensively in top conferences and journals. Two of his papers have received 10-year Test-of-Time awards from KDD. He also authored four books: one on lifelong learning, two on sentiment analysis, and one on Web mining. Some of his work has been widely reported in the press, including a front-page article in the New York Times. On professional services, he served as the Chair of ACM SIGKDD (ACM Special Interest Group on Knowledge Discovery and Data Mining) from 2013-2017. He has also served as program chair of many leading data mining conferences, including KDD, ICDM, CIKM, WSDM, SDM, and PAKDD, as associate editor of leading journals such as TKDE, TWEB, and DMKD, and as area chair or senior PC member of numerous natural language processing, AI, Web, and data mining conferences. He is a Fellow of ACM, AAAI and IEEE.

Jeffrey Ullman
Stanford W. Ascherman Professor of Computer Science (Emeritus)
Data Science: Is it Real?

Abstract

We shall discuss the various ways in which data science is approached by different communities, including Statistics, Machine-Learning, and Database communities. Each presents a different viewpoint and values different outcomes. Some consequences of these approaches will be discussed. We also contrast approaches to education of the large number of data scientists that are expected to be required in the near future.

Short Bio

Link to the bio

Manjunath is a Distinguished Professor of Electrical and Computer Engineering at the University of California, Santa Barbara. He received his Ph.D. in Electrical Engineering from the University of Southern California and the M.E. in Systems Science and Automation from the Indian Institute of Science. His research interests are in image informatics and in recent years he has focused on application to life and health sciences. He has published over 300 peer-reviewed articles, inventor on 24 patents, and co-edited the book on MPEG-7.

Folker Meyer
Argonne National Laboratory
Efficient Multi Cloud Execution of Reproducible Data Analytics using Common Workflow Language, AWE and SHOCK [introductory/intermediate]

Summary:

Executing scientific workflows at scale poses a significant challenge to many teams and institutions, we present a unified system for portable, reproducible execution on local and remote resources. The Skyport system [1] provides containerized workflow execution with Docker [2] across systems boundaries. Allowing researchers to execute scientific workflows across system boundaries using the AWE [1, 3, 4] workflow engine and the SHOCK [5] as an active object store. AWE and SHOCK are implemented a RESTful service for managing and executing workflows, workflows are specified in Common workflow language (CWL) format[6]. CWL is a single, multi-vendor language to describe scientific workflows created by a community of practitioners. In addition to being multi-vendor – and thus supporting multiple engines for computing -- , another critical feature of CWL is the separation of science content and computational implementation. This allows experts in each of the domains to focus on their area (CWL, http://commonwl.org).

Syllabus:

Session 1: Initial system setup and execution of demo workflow

-- System overview

  -Scientific computing & workflows

  -Distributed computing

  -Example use case MG-RAST

  -CWL, Docker (why did we use those)

-- Install skyport2 services (using single docker compose image)

  -Install authentication services, AWE-server and SHOCK-server

  -Load demo data into SHOCK

  -Load demo workflow

  -Install awe-worker node

  -Download results from SHOCK

Session 2: Customizing system setup and monitoring execution

-- Setup of expanded system

  -Basis for customization

  -Creation of custom data types

  -Monitoring execution via the web interface or cmd-line

  -Adding tools to workflows

  -Adding workflow steps

-- Hands on exercise creating and executing a customized workflow

  - Session 3: Advanced topics

-- Combining AWE with other execution engines

  -Using Singularity [7]

-- Adding data types to SHOCK

Pre-requisites:

The assumption is that participants will bring a laptop with the ability to execute multiple Docker containers. Participants should test their available memory and hard drive space. Software systems required:
- Docker
- Ansible
- ASCII editor, e.g. Emacs, vi, Textmate, ...
The participants will be asked to install software (via Docker), modify configuration files and perform other Unix command line style activities.

References:

1. Wolfgang Gerlach WT, Andreas Wilke, Dan Olson, Folker Meyer: Container orchestration for scientific workflows. In: Cloud Engineering (IC2E), 2015 IEEE International Conference on: 2015/3/9. IEEE Transactions on Knowledge and Data Engineering 2016: 377-378.

2. Merkel D: Docker: lightweight Linux containers for consistent development and deployment. Linux Journal 2014, 2014(239):2.

3. Tang W, Bischof J, Desai N, Mahadik K, Gerlach W, Harrison T, Wilke A, Meyer F: Workload Characterization for MG-RAST Metagenomic Data Analytics Service in the Cloud. In: Proc of IEEE Int’l Conf on Big Data. 2014.

4. Tang W, Wilkening J, Bischof J, Gerlach W, Wilke A, Desai N, Meyer F: Building ScalableData Management and Analysis Infrastructure for Metagenomics. In: 5th International Workshop on Data-Intensive Computing in the Clouds. IEEE 2013.

5. Bischof J, Wilke A, Gerlach W, Harrison T, Paczian T, Tang W, Trimble W, Wilkening J,Desai N, Meyer F: Shock: Active Storage for Multicloud Streaming Data Analysis. In: 2nd IEEE/ACM International Symposium on Big Data Computing: 2015; Limassol, Cyprus.

6. Common Workflow Language, v1.0.

7. Kurtzer GM, Sochat V, Bauer MW: Singularity: Scientific containers for mobility of compute. PLoS One 2017, 12(5):e0177459.

Short Bio

Folker Meyer is a computational biologist at Argonne National Laboratory and a senior fellow at the Computation Institute at the University of Chicago. He is also the associate division director of the Institute for Genomics and Systems Biology at Argonne National Laboratory. Meyer was trained as a computer scientist and with that came his interest in building software systems to answer complex biological questions. He is the driving force behind the MG-RAST project.

Wladek Minor
Professor of Molecular Physiology and Biological Physics. University of Virginia, Charlottesville, USA
Big Data in Biomedical Sciences [introductory/advanced]

Summary:

Syllabus:

-Big Data and Big Data in Biomedical Sciences
-Why big data is perceived as a big problem - technological consideration
-Data reduction - should we preserve unreduced (raw) data?
-Databases and databanks
-Data mining with the use of raw data, databanks and databases
-Data Integration
-Automatic and semi-automatic curation of the large amount of data
-Conversion of databanks into databases
-Database priorities – content and design
-Interaction between databases
-Modern data management in biomedical sciences – necessity or luxury
-Automatic data harvesting – close reality or still on the horizon
-Reproducibility of the biomedical experiments - drug discovery
-considerations
-Big data in medicine - new possibilities
-Future considerations

Pre-requisites:

Short Bio

Kyuseok Shim is currently a professor at electrical and computer engineering department in Seoul National University, Korea. Before that, he was an assistant professor at computer science department in KAIST and a member of technical staff for the Serendip Data Mining Project at Bell Laboratories. He was also a member of the Quest Data Mining Project at the IBM Almaden Research Center and visited Microsoft Research at Redmond several times as a visiting scientist. Kyuseok was named an ACM Fellow for his contributions to scalable data mining and query processing research in 2013. Kyuseok has been working in the area of databases focusing on data mining, search engines, recommendation systems, MapReduce algorithms, privacy preservation, query processing and query optimization. His writings have appeared in a number of professional conferences and journals including ACM, VLDB and IEEE publications. He served as a Program Committee member for SIGKDD, SIGMOD, ICDE, ICDM, ICDT, EDBT, PAKDD, VLDB and WWW conferences. He also served as a Program Committee Co-Chair for PAKDD 2003, WWW 2014, ICDE 2015 and APWeb 2016. Kyuseok was previously on the editorial board of VLDB as well as IEEE TKDE Journals and is currently a member of the VLDB Endowment Board of Trustees. He received the BS degree in electrical engineering from Seoul National University in 1986, and the MS and PhD degrees in computer science from the University of Maryland, College Park, in 1988 and 1993, respectively.

Jaideep Srivastava
University of Minnesota
Social Computing: Computing as an Integral Tool to Understanding Human Behavior and Solving Problems of Social Relevance. [introductory/intermediate]

Summary:

Social Computing is an emerging discipline, and just like any discipline at a nascent stage it can often mean different things to different people. However, there are three distinct threads that are emerging. First thread is often called Socio-Technical Systems, which focuses on building systems that allow large scale interactions of people, whether for a specific purpose or in general. Examples include social networks like Facebook and Google Plus, and Multi Player Online Games like World of Warcraft and Farmville. The second thread is often called Computational Social Science, whose goal is to use computing as an integral tool to push the research boundaries of various social and behavioral science disciplines, primarily Sociology, Economics, and Psychology. Third is the idea of solving problems of societal relevance using a combination of computing and humans. The three modules of this course are structured according to this description. The goal of this course is to discuss, in a tutorial manner, through case studies, and through discussion, what Social Computing is, where it is headed, and where is it taking us.

Syllabus:

-Module 1: Socio-technical systems
• Introduction to Social Computing
• Socio-technical systems
• Examples of a number of social computing systems, e.g. Twitter,
FaceBook, MMO games, etc.
• Applying data mining to social computing systems
-Module 2: Computational Social Science
• Online trust
• Social influence
• Individual and group/team performance
• Identifying and preventing bad behavior
-Module 3: Solving Problems of Societal Relevance
• Social computing for humanitarian assistance
• Wrap-up discussion
• Privacy and ethics
• Where are we headed

Pre-requisites:

This course is intended primarily for graduate students. Following are the potential audiences: Computer Science graduate students: All that is needed for this audience is interest in one of the themes of social computing Social Science graduate students: Some exposure to building models from data, at least what these techniques are and what they can do. Management graduate students: Those with MIS focus.

References:

Provided with slides.

Short Bio

Jaideep Srivastava (https://www.linkedin.com/in/jaideep-srivastava- 50230/) is Professor of Computer Science at the University of Minnesota, where he directs a laboratory focusing on research in Web Mining, Social Analytics, and Health Analytics. He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), and has been an IEEE Distinguished Visitor and a Distinguished Fellow of Allina’s Center for Healthcare Innovation. He has been awarded the Distinguished Research Contributions Award of the PAKDD, for his lifetime contributions to the field of machine learning and data mining. Dr. Srivastava has significant experience in the industry, in both consulting and executive roles. Most recently he was the Chief Scientist for Qatar Computing Research Institute (QCRI), which is part of Qatar Foundation. Earlier, he was the data mining architect for Amazon.com (www.amazon.com), built a data analytics department at Yodlee (www.yodlee.com), and served as the Chief Technology Officer for Persistent Systems (www.persistentsys.com). He has provided technology and strategy advice to Cargill, United Technologies, IBM, Honeywell, KPMG, 3M, TCS, and Eaton. Dr. Srivastava Co-Founded Ninja Metrics (www.ninjametrics.com), based on his research in behavioral analytics. He was advisor and Chief Scientist for CogCubed (www.cogcubed.com), an innovative company with the goal to revolutionize the diagnosis and therapy of cognitive disorders through the use of online games, which was subsequently acquired by Teladoc (https://www.teladoc.com/), a public company. He has been a technology advisor to a number of startups at various stages, including Jornaya (https://www.jornaya.com/) - a leader in cross-industry lead management, and Kipsu (http://kipsu.com/) - which provides an innovative approach to improving service quality in the hospitality industry. Dr. Srivastava has held distinguished professorships at Heilongjiang University and Wuhan University, China. He has held advisory positions with the State of Minnesota, and the State of Maharashtra, India. He is a technology advisor to the Unique ID (UID) project of the Government of India, whose goal is to provide biometrics-based social security numbers to the 1.3 Billion citizens of India. Dr. Srivastava has a Bachelors of Technology from the Indian Institute of Technology (IIT), Kanpur, India, and MS and PhD from the University of California, Berkeley.

Jeffrey Ullman
Stanford W. Ascherman Professor of Computer Science (Emeritus)
Big-data Algorithms That Aren't Machine Learning [introductory]

Summary:

We shall study algorithms that have been found useful in querying large data volumes. The emphasis is on algorithms that cannot be considered 'machine learning'

Syllabus:

Locality-sensitive hashing: shingling, minhashing, applications;
PageRank and related ideas: hubs-and-authorities, spam detection, topic-specific PageRank;
Stream-processing algorithms: counting occurrences, counting unique values, sampling;
Graph-processing algorithms: centrality, counting neighborhoods, counting triangles.

Pre-requisites:

A course in algorithms at the advanced-undergraduate level is important. A course in database systems is helpful, but not required.

References:

We will be covering (parts of) Chapters 3, 4, 5, and 10 of the free text Mining of Massive Datasets, by Jure Leskovec, Anand Rajaraman, and Jeff Ullman, available at www.mmds.org

Short Bio

Link to the bio

Sebastián Ventura
Professor of Computer Sciences and Artificial Intelligence in the University of Córdoba
Pattern Mining on Big Data [intermediate/advanced]

Summary:

Data analysis has a growing interest in many fields and it is concerned with the development of methods and techniques for making sense of data. Hence, there is a real incentive to collect, manage and transform raw data into significant and meaningful information that may be used for subsequent analysis that lead better decision making. When talking about data analysis, the key element is the pattern, which is used to represent any type of homogeneity and regularity in data, serving as a way of describing intrinsic and important properties of data. Pattern mining, however, is a really challenging task that requires a deep study, specially on massive and complex data where the computational and memory requirements are too high.Early exhaustive search approaches in this field were improved by adding some constraints into the mining process so the search space could be heavily reduced. These constraints helped user’s exploration and control, confining the space of solutions to those of interest. In spite of everything, the extraction of patterns on huge datasets still required large amount of memory since the number of feasible patterns exponentially increases with the number of items in data. Hence, different ways of solving this arduous task were proposed, being the use of metaheuristics a good option to avoid the analysis of the whole search space. Nevertheless, approaches based on metaheuristics are actually time consuming methods for extremely large datasets since any pattern is evaluated on any transaction. In this sense, novel data structures as well as parallel pattern mining methods have recently emerged as really interesting and promising research areas. Parallel processing is, perhaps, the principal research topic (in connection with the runtime) considered by the pattern mining community. In this regard, two main directions are being studied: (1) cluster of computers and (2) graphic processing units (GPUs). GPUs, for example, have been correctly applied by analyzing each transaction in parallel so the runtime is reduced. MapReduce, on the contrary, decomposes the problem into two phases: map and reduce. The input dataset is split into subsets so the map phase produces all the patterns within each of these subsets, assigning as a value the frequency of each pattern. Then, similar patterns are merged so the reduce phase is able to work on these sets to produce the final frequencies. MapReduce is one of the most widely studied emerging paradigms for intensive computing, achieving excellent results in a simple and robust way. However, recent research studies have demonstrated that these approaches are just recommended for really Big Data since the time required to load the parallel structure is even larger than the one required to.

Syllabus:

-Pattern mining: foundations and algorithms (time and memory requirements)
-Evolutionary algorithms for mining patterns (reducing the requirements)
-Data structure to reduce the evaluation process
-Parallel solutions:
a) based on GPUs
b) based on MapReduce

Pre-requisites:

Foundations of Pattern Mining: classical (exhaustive approaches), Foundations of Evolutionary Computation

References:

Basic References:

Charu C. Aggrawal. Data Mining. The Textbook. 1st Edition. Springer (2015). ISBN 978-3-319-14141-1
Charu C. Aggrawal and Jiawei Han. Frequent Pattern Mining. 1st. Edition. Springer (2014). ISBN 978-3-319-07820-5.
Sebastián Ventura, José María Luna: Pattern Mining with Evolutionary Algorithms. 1st Edition, Springer (2016), ISBN 978-3-319-33857-6.

Supplementary references

José María Luna, José Raúl Romero, Sebastián Ventura: Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl. Inf. Syst. 32(1): 53-76 (2012)
Alberto Cano, José María Luna, Sebastián Ventura: High performance evaluation of evolutionary-mined association rules on GPUs. The Journal of Supercomputing 66(3): 1438-1461 (2013)
José María Luna, Alberto Cano, Mykola Pechenizkiy, Sebastián Ventura: Speeding-Up Association Rule Mining With Inverted Index Compression. IEEE Trans. Cybernetics 46(12): 3059-3072 (2016)
José María Luna, Francisco Padillo, Mykola Pechenizkiy, Sebastián Ventura: Apriori Versions Based on MapReduce for Mining Frequent Patterns on Big Data. IEEE Trans. Cybernetics (2017). DOI: 10.1109/TCYB.2017.2751081
José María Luna, Alberto Cano, Mykola Pechenizkiy, Sebastián Ventura: Speeding-Up Association Rule Mining With Inverted Index Compression. IEEE Trans. Cybernetics 46(12): 3059-3072 (2016)
José María Luna, Francisco Padillo, Mykola Pechenizkiy, Sebastián Ventura: Apriori Versions Based on MapReduce for Mining Frequent Patterns on Big Data. IEEE Trans. Cybernetics (2017). DOI: 10.1109/TCYB.2017.2751081

Short Bio

Xiaowei Xu
Professor. Department of Information Science. University of Arkansas at Little Rock
Mining Big Networked Data [introductory/advanced]

Summary:

Recent explosive growth of online social networks such as Facebook and Twitter provides a unique opportunity for many data mining applications including real time event detection, community structure detection and viral marketing. The course covers big data analytics for social networks. The emphasis will be on scalable algorithms for community structure detection, social tie modeling and structural pattern mining for big networks.

Syllabus:

Modularity-based community structure detection algorithms [1]
Structural clustering algorithms [2]
Label propagation algorithms [3]
Social tie modeling [4]
Parallel network clustering algorithm [5]
Discovering multiple social ties for characterization of individuals in online social networks [6]
Anytime network clustering algorithm for very big networks [7]

Pre-requisites:

Basic knowledge in computer algorithms and graph theory.

References:

1. Finding community structure in very large networks, Aaron Clauset, M. E. J. Newman, and Cristopher Moore, Phys. Rev. E 70, 066111 (2004).
2. X. Xu, N. Yuruk, Z. Feng, and T. A. Schweiger. Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 824–833. ACM, 2007.  
3. Near linear time algorithm to detect community structures in large-scale networks, Raghavan, Usha Nandini and Albert, Reka and Kumara, Soundar, Phys. Rev. E 76, 036106 (2007)
4. S. Sintos and P. Tsaparas. Using strong triadic closure to characterize ties in social networks. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1466–1475. ACM, 2014.  
5. Weizhong Zhao Venkata Swamy Martha Xiaowei Xu PSCAN: A Parallel Structural Clustering Algorithm for Big Networks in MapReduce. 862-869 2013 AINA
6. Ming-Hua Chung, Gang Chen, Weizhong Zhao, Guohua Hao, Julian Pan, and Xiaowei Xu. Discovering Multiple Social Ties for Characterization of Individuals in Online Social Networks. The Third European Network Intelligence Conference (ENIC 2016), September 5-7, 2016, 1-8. 10.1109/ Wrocław, Poland.
7. Weizhong Zhao, Gang Chen, Xiaowei Xu. AnySCAN: An Efficient Anytime Framework with Active Learning for Large-scale Network Clustering. Proceedings of IEEE International Conference on Data Mining (ICDM 2017), New Orleans, November 18-21, 2017.

Short Bio

Professor Xiaowei Xu is a professor in the Department of Information Science at the University of Arkansas at Little Rock (UALR). He received his Ph.D. in computer science from the University of Munich in 1998. Prior to his appointment at UALR, Dr. Xu was a senior research scientist in Siemens Corporate Technology. Dr. Xu is adjunct professor in the Department of Mathematics at the University of Arkansas. Dr. Xu was an Oak Ridge Institute for Science and Education (ORISE) Faculty Research Program Member in the National Center for Toxicological Research's (NCTR) Center for Bioinformatics in the Division of Systems Biology from 2010 to 2014. He is also a consultant for companies including Siemens, Acxiom, Dataminr and L’Oreal. Dr. Xu’s research focuses on algorithms for data mining and machine learning. Dr. Xu is a recipient of 2014 ACM SIGKDD Test of Time Award for his work in density-based clustering algorithm (DBSCAN), which has received over 10,000 citations based on Google Scholar. Dr. Xu is program committee members and session chairs for premier forums including ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), and IEEE International Conferences on Data Mining (ICDM).

Zhongfei Zhang
Professor, Department of Computer Science, Watson School of Engineering and Applied Sciences. Binghamton University
Relational and Media Data Learning and Knowledge Discovery [introductory/advanced]

Summary:

This course aims at exposing the audience a complete introduction to knowledge discovery and machine learning theories and case studies in real-world applications for relational and media data. The course begins with an extensive introduction to the fundamental concepts and theories of knowledge discovery and machine learning for relational and media data, and then showcases several important applications as case studies in the real-world as the example for big data knowledge discovery and learning.

Syllabus:

The course consists of three two-hour sessions. The syllabus is as follows:
First session: Introduction to the fundamental concepts and theories for relational and media data with the specific foci on an overview of the wide spectrum of techniques and technologies available as well as their relationships and applications to big data scenarios through real-world case studies; Second session: Specific discussions on the classic and state-of-the-art methods for relational data knowledge discovery and learning;
Third session: Specific discussions on the state-of-the-art methods on media data knowledge discovery;

Pre-requisites:

College math, fundamentals about computer science

References:

1. Bo Long, Zhongfei (Mark) Zhang, and Philip S. Yu, Relational Data Clustering: Models, Algorithms, and Applications, Taylor & Francis/CRC Press, 2010, ISBN: 9781420072617
2. Zhongfei (Mark) Zhang and Ruofei Zhang, Multimedia Data Mining -- A Systematic Introduction to Concepts and Theory, Taylor & Francis Group/CRC Press, 2008, ISBN: 9781584889663
3. Zhongfei (Mark) Zhang, Bo Long, Zhen Guo, Tianbing Xu, and Philip S. Yu, Machine Learning Approaches to Link-Based Clustering, in Link Mining: Models, Algorithms and Applications, Edited by Philip S. Yu, Christos Faloutsos, and Jiawei Han, Springer, 2010
4. Zhen Guo, Zhongfei Zhang, Eric P. Xing, and Christos Faloutsos, Multimodal Data Mining in a Multimedia Database Based on Structured Max Margin Learning, ACM Transactions on Knowledge Discovery and Data Mining, ACM Press, 2015
5. http://www.cs.binghamton.edu/~forweb/publicationsactive.html

Short Bio

Zhongfei (Mark) Zhang is a full professor of Computer Science at State University of New York (SUNY) at Binghamton, and directs the Multimedia Research Computing Laboratory in the University. He has also served as a QiuShi Chair Professor at Zhejiang University, China, and as the Director of the Data Science and Engineering Research Center at the university while he was on leave from State University of New York (SUNY) at Binghamton, USA. He has received a B.S. in Electronics Engineering (with Honors), an M.S. in Information Sciences, both from Zhejiang University, China, and a PhD in Computer Science from the University of Massachusetts at Amherst, USA. His research interests include knowledge discovery and machine learning for media and relational data, multimedia information indexing and retrieval, artificial intelligence, computer vision, and pattern recognition. He is the author and co-author of the first monograph on multimedia data mining and the first monograph on relational data clustering, respectively. His research is sponsored by a wide spectrum of government funding agencies, industrial labs, as well as private agencies noticeably including US NSF, US AFRL, CNRS in France, JSPS in Japan, and MOST and NSFC in China, New York State Government in US, and Zhejiang Provincial Government in China, as well as Kodak Research and Microsoft Research in US and Alibaba Group in China and Huang Kuancheng Foundation in Hong Kong, China. He has published over 200 papers in premier venues in his areas and is an inventor for more than 30 patents. He has served in several journal editorial boards and received several professional awards.

Contents

Links of Interest

Active Links

Past Links

Contact Information

4th International Winter School on Big Data

Timişoara, Romania, January 22-26, 2018

Course Description

Keynotes

Courses ▼

Keynotes

Bing Liu Distinguished Professor Department of Computer Science University of Illinois at Chicago (UIC) Towards Machines that Learn like Humans

Abstract

Short Bio

Jeffrey Ullman Stanford W. Ascherman Professor of Computer Science (Emeritus) Data Science: Is it Real?

Abstract

Short Bio

Courses

Paul Bliese Associate Professor of Business Administration in the Management Department of the Darla Moore School of Business at the University of South Carolina. Using R for Mixed-effects (Multilevel) Models [introductory/intermediate]

Summary:

Syllabus:

Pre-requisites:

References:

Short Bio

Hendrik Blockeel Katholieke Universiteit Leuven Decision Trees for Big Data Analytics [intermediate]

Summary:

Syllabus:

Pre-requisites:

References:

Short Bio

Saso Dzeroski Jozef Stefan Institute, Dept. of Knowledge Technologies Multi-target Prediction: Techniques and Applications [introductory/intermediate]

Summary:

Syllabus:

Pre-requisites:

References:

Short Bio

Summary:

Syllabus:

Pre-requisites:

References:

Short Bio

Minos Garofalakis Professor, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece Data Streaming Analytics [intermediate/advanced]

Summary:

Syllabus:

Pre-requisites:

References:

Short Bio

David Gerbing Professor of Quantitative Methods. Portland State University Data Visualization with R [introductory]

Summary:

Syllabus:

Pre-requisites:

References:

Short Bio

Maurizio Lenzerini Full professor in Computer Science. Sapienza Università di Roma. Semantic technologies for open data publishing [intermediate/advanced]

Summary:

Syllabus:

Pre-requisites:

References:

Short Bio

Bing Liu Distinguished Professor Department of Computer Science University of Illinois at Chicago (UIC) Lifelong Learning and its Applications in NLP [intermediate/advanced]

Summary:

Syllabus:

Pre-requisites:

References:

Short Bio

B.S. Manjunath Distinguished Professor. Electrical and Computer Engineering.University of California, Santa Barbara Unstructured (Big) Data [introductory]

Summary:

Syllabus:

Pre-requisites:

References:

Short Bio

Folker Meyer Argonne National Laboratory Efficient Multi Cloud Execution of Reproducible Data Analytics using Common Workflow Language, AWE and SHOCK [introductory/intermediate]

Summary:

Syllabus:

Pre-requisites:

References:

Short Bio

Wladek Minor Professor of Molecular Physiology and Biological Physics. University of Virginia, Charlottesville, USA Big Data in Biomedical Sciences [introductory/advanced]

Summary:

Syllabus:

4^th International Winter School on Big Data

Bing Liu
Distinguished Professor Department of Computer Science University of Illinois at Chicago (UIC)
Towards Machines that Learn like Humans

Jeffrey Ullman
Stanford W. Ascherman Professor of Computer Science (Emeritus)
Data Science: Is it Real?

Paul Bliese
Associate Professor of Business Administration in the Management Department of the Darla Moore School of Business at the University of South Carolina.
Using R for Mixed-effects (Multilevel) Models [introductory/intermediate]

Hendrik Blockeel
Katholieke Universiteit Leuven
Decision Trees for Big Data Analytics [intermediate]

Saso Dzeroski
Jozef Stefan Institute, Dept. of Knowledge Technologies
Multi-target Prediction: Techniques and Applications [introductory/intermediate]

Minos Garofalakis
Professor, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece
Data Streaming Analytics [intermediate/advanced]

David Gerbing
Professor of Quantitative Methods. Portland State University
Data Visualization with R [introductory]

Maurizio Lenzerini
Full professor in Computer Science. Sapienza Università di Roma.
Semantic technologies for open data publishing [intermediate/advanced]

Bing Liu
Distinguished Professor Department of Computer Science University of Illinois at Chicago (UIC)
Lifelong Learning and its Applications in NLP [intermediate/advanced]

B.S. Manjunath
Distinguished Professor. Electrical and Computer Engineering.University of California, Santa Barbara
Unstructured (Big) Data [introductory]

Folker Meyer
Argonne National Laboratory
Efficient Multi Cloud Execution of Reproducible Data Analytics using Common Workflow Language, AWE and SHOCK [introductory/intermediate]

Wladek Minor
Professor of Molecular Physiology and Biological Physics. University of Virginia, Charlottesville, USA
Big Data in Biomedical Sciences [introductory/advanced]

Fionn Murtagh
Professor of Data Science, University of Huddersfield.
The New Science of Big Data Analytics, Based on the Geometry and the Topology of Complex, Hierarchic Systems. [introductory/advanced]

Raymond Ng
Professor of Computer Science at the University of British Columbia
Mining and Summarizing Text Conversations [introductory]

Srinivasan Parthasarathy
Ohio State University
Network Science Fundamentals [introductory/intermediate]

Kyuseok Shim
Professor of Electrical and Computer Engineering Department, Seoul National University, Korea
MapReduce Algorithms for Big Data Analysis [introductory/intermediate]

Jaideep Srivastava
University of Minnesota
Social Computing: Computing as an Integral Tool to Understanding Human Behavior and Solving Problems of Social Relevance. [introductory/intermediate]

Jeffrey Ullman
Stanford W. Ascherman Professor of Computer Science (Emeritus)
Big-data Algorithms That Aren't Machine Learning [introductory]

Sebastián Ventura
Professor of Computer Sciences and Artificial Intelligence in the University of Córdoba
Pattern Mining on Big Data [intermediate/advanced]

Xiaowei Xu
Professor. Department of Information Science. University of Arkansas at Little Rock
Mining Big Networked Data [introductory/advanced]

Zhongfei Zhang
Professor, Department of Computer Science, Watson School of Engineering and Applied Sciences. Binghamton University
Relational and Media Data Learning and Knowledge Discovery [introductory/advanced]