Schedule
This is a tentative schedule. It will be updated according to the actual progress.
-
EventDateDescriptionCourse Material
-
Lecture01/10/2024 (Wed)
01/18/2024 (Thu)
Resource Management and Infrastructure for Big Data Systems and Cloud-Native Applications[slides]Readings: [YARN]; [Mesos]; Ch.2-3 of [Hadoop]; [CloudData]; [Kubernetes1];
Additional references: [Borg]; [Omega]; [Sparrow]; [Apollo]; [Mercury]; [MapReduceFamilySurvey2013]; [Kubernetes2];
-
Lecture01/25/2024 (Thu)
An Introduction to ZooKeeper (ZK)[slides]Readings: [ZooKeeper1];
Additional references: [ZooKeeper]; [ZAB1]; [ZAB2];
-
Lecture02/01/2024 (Thu)
DAG-based Dataflow Systems: Dryad, DryadLINQ, Tez and Beyond[slides] -
Lecture02/07/2024 (Wed)
High-level Big Data Query Languages: Pig and Hive[slides]Readings: [PigLatin]; [Hive1]; Ch.16-17 of [Hadoop];
Additional references: [Hive2]; [Hive3]; [HiveAdvances]; [Pig]; [Hive];
-
Holiday02/10/2024 (Sat)
Chinese Lunar New Year -
Due02/18/2024 23:59PM
SundayAssignment #1 - Hadoop over Kubernetes is due! -
Lecture02/21/2024 (Wed)
02/28/2024 (Wed)
BDAS and Spark[slides]Readings: [Spark2018]
Additional references: [SparkScaling] [MapReduceVsSpark] Ch.1, Ch.10 of [LearnSpark] Appendix A of [SparkAnalytics]
-
Holiday03/06/2024 (Wed)
Reading Week -
Due03/11/2024 23:59PM
MondayAssignment #2 - Pig, Hive and SparkRDD is due! -
Lecture03/13/2024 (Wed)
Spark SQL[slides]Readings: [SparkSQL] [LearnSpark2]
Additional references: [SharkSQL] [SparkMBase] Ch.3-6 of [LearnSpark2ndEd]
-
Lecture03/20/2024 (Wed)
03/27/2024 (Wed)
Big Stream Processing frameworks: Unified Log via Apache Kafka; Storm ; Spark Streaming ; Spark Structural Streaming ; Lambda & Kappa Architecture;Readings: [Storm@Twitter]; [Heron]; [SparkStreaming]; Ch.8 of [LearnSpark2ndEd];
Additional references: [KafkaBook]; [KleppmannMSSS]; [StormApplied];
-
Lecture04/03/2024 (Wed)
04/10/2024 (Wed)
Big Graph Processing frameworks: Pregel/Giraph and GraphLab ; GraphX, GraphFrame;Readings: [GraphLab1]; [PowerGraph]; [GraphX];
Additional references: [GraphChi];
-
Due04/05/2024 23:59PM
FridayAssignment #3 - SparkSQL, Kafka, and Streaming is due! -
Lecture04/10/2024 (Wed)
04/17/2024 (Wed)
Big Data Stores (aka NoSQL Databases)[slides]Readings: [Dynamo] [BigTable] [RealtimeHadoopFacebook] Ch.20 of [Hadoop] [Cassandra]
Additional references: [HBase] [CassandraBook]
-
Lecture04/24/2024 (Wed)
Spark Machine Learning Support and Beyond (time-permitting)[slides]Readings: [SparkMLlib] Ch.11 of [LearnSpark] Ch.9 of [LearnSpark2ndEd]
-
Due05/02/2024 23:59PM
ThursdayAssignment #4 - GraphFrames, GraphX, HBase is due! -
Exam05/08/2024 19:00
WednesdayFinal Exam