Schedule
This is a tentative schedule. It will be updated according to the actual progress.
-
EventDateDescriptionCourse Material
-
Lecture01/10/2024 (Wed)
01/17/2024 (Wed)
Course Admin; Era of Big Data ; Data-center Architecture[slides]Readings: Ch.1 of [JLin]; [DataCenter];
Additional references: [Borg]; [Omega]; [Sparrow]; [Apollo]; [Mercury]; [MapReduceFamilySurvey2013]; [Kubernetes2];
-
Due01/21/2024 23:59PM
SundayAssignment #0 - Hadoop Cluster Setup is due! -
Lecture01/24/2024 (Wed)
Programming Models for Big Data Computing: MapReduce/ Hadoop, GFS/HDFS[slides]Readings: [MapReduce]; [GoogleFileSystem]; Ch2.1-2.4 of [MMDS]; Ch2, Ch3.1-3.4 of [JLin];
-
Lecture01/31/2024 (Wed)
Resource Management Platforms for Big Data Processing SystemsReadings: [Hadoop];
-
Lecture02/07/2024 (Wed)
High-level Big Data Query Languages: Pig and Hive[slides]Readings: [PigLatin]; [Hive1]; Ch.16-17 of [Hadoop];
Additional references: [Hive2]; [Hive3]; [HiveAdvances]; [Pig]; [Hive];
-
Holiday02/14/2024 (Wed)
Chinese Lunar New Year -
Due02/18/2024 23:59PM
SundayAssignment #1 - Community Detection is due! -
Lecture02/21/2024 (Wed)
02/28/2024 (Wed)
BDAS and Spark[slides]Readings: [Spark2018]
Additional references: [SparkScaling] [MapReduceVsSpark] Ch.1, Ch.10 of [LearnSpark] Appendix A of [SparkAnalytics]
-
Lecture02/28/2024 (Wed)
03/13/2024 (Wed)
Spark SQL[slides]Readings: [SparkSQL] [LearnSpark2]
Additional references: [SharkSQL] [SparkMBase] Ch.3-6 of [LearnSpark2ndEd]
-
Holiday03/06/2024 (Wed)
Reading Week -
Due03/11/2024 23:59PM
MondayAssignment #2 - Pig, Hive and SparkRDD is due! -
Lecture03/20/2024 (Wed)
03/27/2024 (Wed)
Big Stream Processing frameworks: Unified Log via Apache Kafka; Storm ; Spark Streaming ; Spark Structural Streaming ; Lambda & Kappa Architecture;Supplementary Materials: [Apache Beam] [Flink]
Readings: [Storm@Twitter]; [Heron]; [SparkStreaming]; Ch.8 of [LearnSpark2ndEd];
Additional references: [KafkaBook]; [KleppmannMSSS]; [StormApplied];
-
Lecture04/03/2024 (Wed)
04/10/2024 (Wed)
Big Graph Processing frameworks: Pregel/Giraph and GraphLab ; GraphX, GraphFrame;Readings: [GraphLab1]; [PowerGraph]; [GraphX];
Additional references: [GraphChi];
-
Due04/05/2024 23:59PM
FridayAssignment #3 - SparkSQL, Kafka, and Streaming is due! -
Lecture04/10/2024 (Wed)
04/17/2024 (Wed)
Big Data Stores (aka NoSQL Databases)[slides]Supplementary Materials: [Consistency Model] [CAP vs. PACELC theorem, ACID vs. BASE]
Readings: [Dynamo] [BigTable] [RealtimeHadoopFacebook] Ch.20 of [Hadoop] [Cassandra]
Additional references: [HBase] [CassandraBook]
-
Lecture04/24/2024 (Wed)
Spark Machine Learning Support and Beyond (time-permitting)[slides]Readings: [SparkMLlib] Ch.11 of [LearnSpark] Ch.9 of [LearnSpark2ndEd]
-
Due05/02/2024 23:59PM
ThursdayAssignment #4 - GraphFrames, GraphX, HBase is due! -
Exam05/08/2024 19:00
WednesdayFinal Exam -
Due05/12/2024 23:59PM
SundayProject is due![Project]