紧凑数据结构与大数据课程详细信息

课程号 04833350 学分 2
英文名称 Compact Data Structures for Big Data
先修课程 数据结构与算法
中文简介 本课程教授用于处理大数据(尤其是网络大数据)的紧凑数据结构及相应的算法、概率方法和统计工具。目前最大的大数据或许就是因特网上的数据流了。对这个大数据的分析能为提高网络性能、网络应用的用户体验及网络安全提供理论基础。然而网络大数据无法保留。在这个背景下,本课程讲授一系列在过去30年里逐渐发展起来的紧凑数据结构和它们的理论分析,这些数据结构可以用来把大数据变小,以便于存储和应用。本课程的数据结构和算法可以分为两类:(1)计数与计模,包括probabilistic counting, bitmap algorithms, FM sketch, hyperloglog sketch, virtual bitmap, virutal FM sketch, virtual hyperloglog, countMin, counter braids, randomized counter sharing, and virtual counters;(2)成员查找与分类,包括Bloom filters, counting Bloom filters, Bloomier filters, blocked Bloom filters, multi-set filters, and multi-hashing tables。学生不但将掌握这些数据结构和算法,而且要学习它们在网络数据测量、安全、及其它领域中的应用。不但将掌握相关理论知识,而且要实现一部分算法,并在实际网络数据上应用。
英文简介 This course covers compact data structures, algorithms, probabilistic methods, and statistical tools for handling big data, particularly in the setting of high-speed networks. There is hardly any other data set whose size can rival the big network data that flows on the Internet. Analyzing this big data is extremely useful in improving network performance, user experience and cybersecurity. However, storing such data for analysis is impossible. With this background, the course offers a series of compact data structures and their theoretical analysis that are developed over the last three decades, with increasing capabilities of making big data small for storage and quick access of data properties. The offered data structures and their associated algorithms are broadly classified into two categories: (1) counting and cardinality algorithms, including probabilistic counting, bitmap algorithms, FM sketch, hyperloglog sketch, virtual bitmap, virutal FM sketch, virtual hyperloglog, countMin, counter braids, randomized counter sharing, and virtual counters, and (2) membership lookup and classification, including Bloom filters, counting Bloom filters, Bloomier filters, blocked Bloom filters, multi-set filters, and multi-hashing tables. The students will be exposed to not only these data structures and algorithms, but also their applications in Internet traffic measurement, cybersecurity, as well as applications beyond the network context. The students are expected to learn the compact data structures and algorithms through lectures, and implement a selected subset with experiments over real network data.
开课院系 信息科学技术学院
通选课领域  
是否属于艺术与美育
平台课性质  
平台课类型  
授课语言 英文
教材 Network Applications of Bloom Filters: A Survey,Andrei Brodery, Michael Mitzenmacherz,Internet mathematics,2004,Fast Dynamic Multiple-Set Membership Testing Using Combinatorial Bloom Filters,Fang Hao, Murali Kodialam, T. V. Lakshman, and Haoyu Song,IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1,2012,A Linear-Time Probabilistic Counting Algorithm for Database Applications,Kyu-young Whang, Brad T. Vander-Zanden, Howard M. Taylor,ACM Transactions on Database Systems, Vol. 15, No. 2,1990,Bitmap Algorithms for Counting Active Flows on High Speed Links,Cristian Estan, George Varghese, Mike Fisk,ACM Internet Measurement Conference,2003,High-Speed Per-Flow Traffic Measurement with,Peter Lieven, Bj¨orn Scheuermann,IEEE INFOCOM,2010,HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm,Philippe Flajolet,  éric Fusy, Olivier Gandouet and Frédéric Meunier,Conference on Analysis of Algorithms,2007,Counter Braids: A Novel Counter Architecture for Per-Flow Measurement,Yi Lu, Andrea Montanari, Balaji Probhakar, Sarang Dharmapurikar, and Abdul Kabbani,ACM SIGMETRICS,2008,Traffic Measurement for Big Network Data,Shigang Chen, Min Chen, Qingjun Xiao,Springer,2016,Traffic Measurement on the Internet,Tao Li, Shigang Chen,Springer,2012,Hyper-Compact Virtual Estimators for Big Network Data,Qingjun Xiao, Shigang Chen, Min Chen, Yibei Ling,ACM SIGMETRICS,2015,
参考书
教学大纲 1) 学生将在课程结束时掌握一系列(尤其是网络大数据)的紧凑数据结构及相应的算法、概率方法和统计工具。
2) 学生不但将掌握相关理论知识,而且要学习它们在网络数据测量、安全、及其及其它领域中的应用,并且要实现一部分算法。
3) 帮助学生理解和分析大数据,培养和训练学生的研究能力。
Introduction to Big Data and Compact Data Structures 2学时
Counting and Cardinality Algorithms
Probabilistic Counting  3学时
Bitmap Algorithms  2学时
FM Sketch  1学时
Hyperloglog Sketch  1学时
Virtual Bitmap  3学时
Virtual FM Sketch  3学时
Virtual Hyperloglog  1学时
CounterMin  1学时
Counter Braids  2学时
Randomized Counter Sharing  2学时
Virtual Counters  1学时

Membership Lookup and Classification
Bloom Filters  2学时
Counting Bloom Filters  1学时
Bloomier Fiters  2学时
Blocked Bloom Filters  1学时
Multiset Bloom Filters  2学时
Multi-hash Tables  2学时
课堂授课为主,课后练习、编程为辅。
平时(30%),编程(40%),考试(30%)
教学评估 杨仝:
学年度学期:16-17-3,课程班:紧凑数据结构与大数据1,课程推荐得分:4.75,教师推荐得分:4.75,课程得分分数段:90-95;
学年度学期:17-18-3,课程班:紧凑数据结构与大数据1,课程推荐得分:4.38,教师推荐得分:4.06,课程得分分数段:95-100;