大数据系统基础课程详细信息

课程号 04833660 学分 2
英文名称 Foundations of Big Data Systems
先修课程
中文简介 课程介绍:
这门课程主要讲述大数据系统的基础。重点在于数据管理的架构,这种架构通常建立在分布式并行的计算平台(例如MapReduce和SPARK)上,运行一个并行分布式数据管理平台,采用主存系统(包括面向OLTP的行存储和面向分析的列存储),包括处理多源异构数据的多模态系统。本次课程将包括这些大数据系统的基础部分。

课程内容:
课程包括现代大数据系统基础部分,具体内容包括:
1. 分布式并行数据管理的基础,包括数据划分,分布式数据处理,分布式事务处理,备份和数据集成。
2. 主存系统和基于列的数据表示
3. 大数据分析系统(分布式数据存储系统,MapReduce,Spark,图分析和流数据处理)
4. NoSQL, NewSQL and Polystore 系统 (Key-Value存储, 文档存储, 图数据库)
英文简介 Course Introduction:
 The course addresses the foundations of modern big data systems. The focus is on data management infrastructure. This infrastructure is typically built on top of modern distributed/parallel computing platforms (e.g., MapReduce, Spark), run a distributed/parallel data management platform, employ main memory systems (both row stores for OLTP and column stores for analytics), and consist of multi-modal systems to handle different types of data coming from different data sources. This course will cover these foundational issues.

Course Content:
The course will review the foundational issues in modern big data systems. The topics that will be covered are the following:
? Fundamentals of distributed and parallel data management, focusing on data fragmentation, distributed query processing, distributed transactions, replication, and data integration;
? Main memory systems and column-based data representation;
? Big data analytics platforms (distributed storage systems, MapReduce, Spark, graph analytics, stream data management);
? NoSQL, NewSQL and Polystore Systems (Key-Value Stores, Document Stores, Graph Databases)
开课院系 信息科学技术学院
通选课领域  
是否属于艺术与美育
平台课性质  
平台课类型  
授课语言 英文
教材 Principles of Distributed Database Systems,Tamer Ozsu,Springer;
参考书
教学大纲 The course addresses the foundations of modern big data systems. The focus is on data management infrastructure. The course will address the fundamental challenges and components of big data systems and approaches that have been developed to address them. The objective is that by the end of this course, students should have a good understanding of the foundations of these systems.
Session 1: Fundamentals of distributed and parallel data management
【Description of the Session】(purpose, requirements, class and presentations scheduling, etc.) This session will cover the classical distributed/parallel data management topics such as data partitioning and distribution, distributed query processing, distributed transaction processing

Session 2:  Main memory systems and column-based data representation
【Description of the Session】(purpose, requirements, class and presentations scheduling, etc.) An important aspect of big data systems is main memory processing and column-based storage for data analytics. This session will cover these issues

Session 3:  Big data analytics platforms
【Description of the Session】(purpose, requirements, class and presentations scheduling, etc.) This session will focus on the platforms that have been developed for big data analytics. The specific topics that will be considered are MapReduce, Spark, and stream processing.

Session 4:  Graph analytics & graph databases
【Description of the Session】(purpose, requirements, class and presentations scheduling, etc.) Graphs have emerged as an important representation in big data systems. In this session we consider both the graph analytics and graph database issues.

Session 5: NoSQL Systems
【Description of the Session】(purpose, requirements, class and presentations scheduling, etc.) We give an overview of NoSQL systems focusing on  Key-Value Stores and Document Stores
以教师课堂讲授为主
学生课堂报告和课程书面报告
教学评估 邹磊:
学年度学期:17-18-3,课程班:大数据系统基础1,课程推荐得分:4.17,教师推荐得分:3.96,课程得分分数段:85-90;