||Foundations of Big Data Systems
4. NoSQL, NewSQL and Polystore 系统 (Key-Value存储, 文档存储, 图数据库)
The course addresses the foundations of modern big data systems. The focus is on data management infrastructure. This infrastructure is typically built on top of modern distributed/parallel computing platforms (e.g., MapReduce, Spark), run a distributed/parallel data management platform, employ main memory systems (both row stores for OLTP and column stores for analytics), and consist of multi-modal systems to handle different types of data coming from different data sources. This course will cover these foundational issues.
The course will review the foundational issues in modern big data systems. The topics that will be covered are the following:
？ Fundamentals of distributed and parallel data management, focusing on data fragmentation, distributed query processing, distributed transactions, replication, and data integration;
？ Main memory systems and column-based data representation;
？ Big data analytics platforms (distributed storage systems, MapReduce, Spark, graph analytics, stream data management);
？ NoSQL, NewSQL and Polystore Systems (Key-Value Stores, Document Stores, Graph Databases)
||Principles of Distributed Database Systems,Tamer Ozsu,Springer；
||The course addresses the foundations of modern big data systems. The focus is on data management infrastructure. The course will address the fundamental challenges and components of big data systems and approaches that have been developed to address them. The objective is that by the end of this course, students should have a good understanding of the foundations of these systems.
Session 1: Fundamentals of distributed and parallel data management
【Description of the Session】(purpose, requirements, class and presentations scheduling, etc.) This session will cover the classical distributed/parallel data management topics such as data partitioning and distribution, distributed query processing, distributed transaction processing
Session 2： Main memory systems and column-based data representation
【Description of the Session】(purpose, requirements, class and presentations scheduling, etc.) An important aspect of big data systems is main memory processing and column-based storage for data analytics. This session will cover these issues
Session 3： Big data analytics platforms
【Description of the Session】(purpose, requirements, class and presentations scheduling, etc.) This session will focus on the platforms that have been developed for big data analytics. The specific topics that will be considered are MapReduce, Spark, and stream processing.
Session 4： Graph analytics & graph databases
【Description of the Session】(purpose, requirements, class and presentations scheduling, etc.) Graphs have emerged as an important representation in big data systems. In this session we consider both the graph analytics and graph database issues.
Session 5： NoSQL Systems
【Description of the Session】(purpose, requirements, class and presentations scheduling, etc.) We give an overview of NoSQL systems focusing on Key-Value Stores and Document Stores