~ ESTR4316 ~


Course Assessment for ESTR4316

Your grade will be based on the following components:

  • Homework & Programming assignments (4 sets in total): 45%
  • Mid-term Exam: 9% (1-hour mid-term examination)
  • Final Exam: 37% (2-hour final examination)
  • Oral Paper Presentations: 9%

Presentation Schedule

Date of presentation Paper Presenter Link to the presentation
Jan 20 S. Keshav, “How to Read a Paper”, ACM SIGCOMM Computer Communication Review, July 2007 Prof. Lau
Feb 10 A1 [YARN] V.K. Vavilapalli, A.C.Murthy, “Apache Hadoop YARN: Yet Another Resource Negotiator,” ACM Symposium on Cloud Computing (SoCC) 2013 FAN Junbo yarn.pdf
B1 [Mesos] B. Hindman et al, “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center”, NSDI 2011 BAO Ergute mesos.pdf mesos.pptx
Feb 17 C2 [Omega] M. Schwarzkopf, A. Konwinski, M.Abd-El-Malek, J. Wilkes, “Omega: flexible, scalable schedulers for large compute clusters,” Eurosys 2013 Allen Zhong omega.pdf
D1 [Apollo] E. Boutin et al, “Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing”, OSDI 2014 SUN Weize apollo.pptx
Feb 24 A2 [Sparrow] K. Ousterhout et al, “Sparrow: Distributed, Low Latency Scheduling”, ACM SOSP 2013 FAN Junbo sparrow.pdf
B2 [DRF] A. Ghodsi et al, “Dominant Resource Fairness: Fair Allocation of Multiple Resource Types,” NSDI 2011 BAO Ergute DRF.pptx
**No Meeting on Mar 3 due to Instructor’s conference trip. An extra hour of Make-up meeting will be held on Apr 21.**
Mar 10 C1 [Borg] A. Verma, L. Pedrosa, “Large-scale cluster management at Google with Borg”, Eurosys 2015 Allen Zhong Borg.pdf
D2 [Mercury] K. Karanasos et al, “Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters”, Usenix ATC 2015 SUN Weize Mercury.pptx
Mar 17 [PowerGraph] Joseph Gonzalez et al, “PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs,” OSDI 2012 Fan Junbo powergraph.pdf
[Heron] Sanjeev Kulkarni et al, "Twitter Heron: Stream Processing at Scale,” SIGMOD 2015 Bao Ergute HERON.pptx
Mar 24 [KafkaSamza] Martin Kleppmann and Jay Kreps, “Kafka, Samza and the Unix philosophy of distributed data,” IEEE Data Engineering Bulletin, 38(4), Dec 2015 Sun Weize kafka&samza.pptx
[HiveAdvances] Yin Huai et al, “Major Technical Advancements in Apache Hive,” ACM SIGMOD 2014 Allen Zhong Hive.pdf
Mar 31 [FlumeJava] Craig Chambers et al, “FlumeJava: Easy, Efficient Data-Parallel Pipelines,” PLDI 2010 Bao Ergute
[MillWheel] Tyler Akidau et al, “MillWheel: Fault-Tolerant Stream Processing at Internet Scale,” VLDB 2013 Fan Junbo millwheel.pdf
Apr 7 [SummingBird] Oscar Boykin et al, “Summingbird: A Framework for Integrating Batch and Online MapReduce Computations,” VLDB 2014 Allen Zhong summingbird.pdf
[TwitterExperience] Jimmy Lin and Dmitriy Ryaboy, “Scaling Big Data Mining Infrastructure: The Twitter Experience,” ACM SIGKDD Explorations, Vol. 14, Issue 2, 2013 Sun Weize Twitter.pptx
Apr 21 (2-hr section) [StratosphereFlink] Alexander Alexandrov et al,” The Stratosphere platform for Big Data Analytics,” VLDB Journal 2014. [Stratophere is the basis of the Apache Flink platform] Bao Ergute stratosphere.pptx
[Naiad1] Derek G. Murray et al, "Naiad: A Timely Dataflow System,” ACM SOSP 2013 ( A more gentle introduction of this paper can be found at: “Incremental, Iterative Data Processing with Timely Dataflow,” Communications of ACM, Oct 2016) Fan Junbo naiad.pdf
[TensorFlow] Martin Abadi et al, “TensorFlow: A System for Large-Scale Machine Learning,” OSDI 2016 Sun Weize TensorFlow– A system for large-scale machine learning.pptx
[Petuum] Eric P. Xing et al, “Strategies and Principles of Distributed Machine Learning on Big Data,” Engineering (The Journal of Chinese Academy of Engineering), 2016 Allen Zhong Petuum.pdf