This is a tentative schedule. It will be updated according to the actual progress.
EventDateDescriptionCourse Material
Lecture01/12/2022 (Wed)
01/19/2022 (Wed)
Course Admin; Era of Big Data ; Data-center Architecture[slides]Readings: Ch.1 of [JLin]; [DataCenter];
Additional references: [Borg]; [Omega]; [Sparrow]; [Apollo]; [Mercury]; [MapReduceFamilySurvey2013]; [Kubernetes2];
Due01/24/2022 11:59AM
MondayAssignment #0 - Hadoop Cluster Setup is due! -
Lecture01/26/2022 (Wed)
Programming Models for Big Data Computing: MapReduce/ Hadoop, GFS/HDFS[slides]Readings: [MapReduce]; [GoogleFileSystem]; Ch2.1-2.4 of [MMDS]; Ch2, Ch3.1-3.4 of [JLin];
Holiday02/02/2022 (Wed)
02/04/2022 (Fri)
Chinese Lunar New Year -
Lecture02/09/2022 (Wed)
Big Data Processing StackReadings: [Hadoop];
Additional references: [Kubernetes1]; [Kubernetes2];
Due02/13/2022 23:59PM
SundayAssignment #1 - Hadoop over Kubernetes is due! -
Lecture02/16/2022 (Wed)
02/18/2022 (Fri)
High-level Big Data Query Languages: Pig and Hive[slides]Readings: [PigLatin]; [Hive1]; Ch.16-17 of [Hadoop];
Additional references: [Hive2]; [Hive3]; [HiveAdvances]; [Pig]; [Hive];
Due02/16/2022 23:59PM
WednesdayAssignment #1 - Similar Users Detection via MapReduce (updated on 30 Jan) is due! -
Lecture02/23/2022 (Wed)
02/25/2022 (Fri)
BDAS and Spark[slides]Readings: [Spark2018]
Additional references: [SparkScaling] [MapReduceVsSpark] Ch.1, Ch.10 of [LearnSpark] Appendix A of [SparkAnalytics]
Lecture03/02/2022 (Wed)
03/04/2022 (Fri)
Spark SQL[slides]Readings: [SparkSQL] [LearnSpark2]
Additional references: [SharkSQL] [SparkMBase] Ch.3-6 of [LearnSpark2ndEd]
Due03/08/2022 23:59PM
TuesdayAssignment #2 - Pig and Hive is due! -
Lecture03/09/2022 (Wed)
03/11/2022 (Fri)
03/16/2022 (Wed)
03/18/2022 (Fri)
Big Stream Processing frameworks: Unified Log via Apache Kafka; Storm ; Spark Streaming ; Spark Structural Streaming ; Lambda & Kappa Architecture;Readings: [Storm@Twitter]; [Heron]; [SparkStreaming]; Ch.8 of [LearnSpark2ndEd];
Additional references: [KafkaBook]; [KleppmannMSSS]; [StormApplied];
Lecture03/23/2022 (Wed)
03/25/2022 (Fri)
03/30/2022 (Wed)
04/01/2022 (Fri)
Big Graph Processing frameworks: Pregel/Giraph and GraphLab ; GraphX, GraphFrame;Readings: [GraphLab1]; [PowerGraph]; [GraphX];
Additional references: [GraphChi];
Due03/28/2022 23:59PM
MondayAssignment #3 - Spark (updated on 14 Mar) is due! -
Holiday04/06/2022 (Wed)
04/08/2022 (Fri)
Reading Week.
Optional Seminars in Reading Week (04/06/2022 and 04/08/2022) : Generalized Streaming Model and Apache Beam; GraphLab 2.0: Challenges and solutions for processing Power-Law Graphs in PracticeReadings: [StreamingSys] [FlinkBook1] [FlinkBook2] [Flink]
Lecture04/13/2022 (Wed)
Big Data Stores (aka NoSQL Databases)[slides]Readings: [Dynamo] [BigTable] [RealtimeHadoopFacebook] Ch.20 of [Hadoop] [Cassandra]
Additional references: [HBase] [CassandraBook]
Holiday04/15/2022 (Fri)
Easter -
Due04/19/2022 23:59PM
TuesdayAssignment #4 - Kafka is due! -
Lecture04/20/2022 (Wed)
04/22/2022 (Fri)
(Cont'd) Big Data Stores (aka NoSQL Databases)[slides]Readings: [Dynamo] [BigTable] [RealtimeHadoopFacebook] Ch.20 of [Hadoop] [Cassandra]
Additional references: [HBase] [CassandraBook]
LectureMachine Learning Support and Beyond[slides]
Readings: [SparkMLlib] Ch.11 of [LearnSpark] Ch.9 of [LearnSpark2ndEd]
Due05/10/2022 23:59PM
TuesdayAssignment #5 - GraphFrames, GraphX, HBase and SparkML is due! -
Due05/17/2022 12:00PM
TuesdayQ&A Assignment is due! -
Due05/23/2022 12:00PM
MondayProject is due![Project]