Schedule
This is a tentative schedule. It will be updated according to the actual progress.
-
EventDateDescriptionCourse Material
-
Lecture01/11/2023 (Wed)
01/18/2023 (Wed)
Course Admin; Era of Big Data ; Data-center Architecture[slides]Readings: Ch.1 of [JLin]; [DataCenter];
Additional references: [Borg]; [Omega]; [Sparrow]; [Apollo]; [Mercury]; [MapReduceFamilySurvey2013]; [Kubernetes2];
-
Due01/21/2023 11:59AM
SaturdayAssignment #0 - Hadoop Cluster Setup is due! -
Holiday01/25/2023 (Wed)
Chinese Lunar New Year -
Lecture02/01/2023 (Wed)
Programming Models for Big Data Computing: MapReduce/ Hadoop, GFS/HDFS[slides]Readings: [MapReduce]; [GoogleFileSystem]; Ch2.1-2.4 of [MMDS]; Ch2, Ch3.1-3.4 of [JLin];
-
Lecture02/08/2023 (Wed)
Resource Management Platforms for Big Data Processing Systems[slides]Readings: [Hadoop];
-
Lecture02/15/2023 (Wed)
High-level Big Data Query Languages: Pig and Hive[slides]Readings: [PigLatin]; [Hive1]; Ch.16-17 of [Hadoop];
Additional references: [Hive2]; [Hive3]; [HiveAdvances]; [Pig]; [Hive];
-
Due02/21/2023 23:59PM
TuesdayAssignment #1 - Community Detection is due! -
Lecture02/22/2023 (Wed)
BDAS and Spark[slides]Readings: [Spark2018]
Additional references: [SparkScaling] [MapReduceVsSpark] Ch.1, Ch.10 of [LearnSpark] Appendix A of [SparkAnalytics]
-
Lecture03/01/2023 (Wed)
Spark SQL[slides]Readings: [SparkSQL] [LearnSpark2]
Additional references: [SharkSQL] [SparkMBase] Ch.3-6 of [LearnSpark2ndEd]
-
Lecture03/08/2023 (Wed)
03/15/2023 (Wed)
Big Stream Processing frameworks: Unified Log via Apache Kafka; Storm ; Spark Streaming ; Spark Structural Streaming ; Lambda & Kappa Architecture;Readings: [Storm@Twitter]; [Heron]; [SparkStreaming]; Ch.8 of [LearnSpark2ndEd];
Additional references: [KafkaBook]; [KleppmannMSSS]; [StormApplied];
-
Due03/13/2023 23:59PM
MondayAssignment #2 - Pig, Hive and SparkRDD is due! -
Lecture03/22/2023 (Wed)
03/29/2023 (Wed)
Big Graph Processing frameworks: Pregel/Giraph and GraphLab ; GraphX, GraphFrame;Readings: [GraphLab1]; [PowerGraph]; [GraphX];
Additional references: [GraphChi];
-
Holiday04/05/2023 (Wed)
Ching Ming Festival -
Due04/05/2023 23:59PM
WednesdayAssignment #3 - Kafka is due! -
Lecture04/12/2023 (Wed)
04/19/2023 (Wed)
Big Data Stores (aka NoSQL Databases)[slides]Readings: [Dynamo] [BigTable] [RealtimeHadoopFacebook] Ch.20 of [Hadoop] [Cassandra]
Additional references: [HBase] [CassandraBook]
-
Lecture04/26/2023 (Wed)
Spark Machine Learning Support and Beyond (time-permitting)Readings: [SparkMLlib] Ch.11 of [LearnSpark] Ch.9 of [LearnSpark2ndEd]
-
Due04/27/2023 23:59PM
ThursdayAssignment #4 - GraphFrames, GraphX, HBase is due! -
Exam05/03/2023 19:00
WednesdayFinal Exam -
Due05/12/2023 23:59PM
FridayProject is due![Project]