Additional References

General Readings

The Netflix Challenge

Similar/Relevant Courses offered Elsewhere

Cloud Computing

Data Stream Algorithms

  • [ChakDataStream] CS49 Data Stream Algorithms, Amit Chakrabarti, Dartmouth College, Fall 2011,
  • [AggaDataStream] Charu C. Aggarwal (Ed.), Data Stream Models and Algorithms, Springer 2007.
  • [PIndykStreaming] Streaming etc., by Piotr Indyk, a short course given at Rice University, 2009,
  • [PIndykSvH] Sketching via Hashing: from Heavy Hitters to Compressive Sensing to Sparse Fourier Transform: by Piotr Indyk slides and writeup, at PODS, 2013.
  • [IndykPhDOpen12] Piotr Indyk, MIT, “Data Stream Algorithms,” Open lectures for PhD Students in Computer Science 2012,
  • [JXu] A Tutorial on Network Data Streaming, by Jun (Jim) Xu, ACM Sigmetrics 2007,
  • [GGR] Querying and Mining Data Streams – You Only Get One Look: A Tutorial by Minos Garofalakis, Johannes Gehrke, Rajeev Rastogi, VLDB 2002.,
  • [GaroRamaUCB] CS286 Implementation of Database Systems, UC Berkeley, Minos Garofalakis, Raghu Ramakrishnan,
  • [SmolaUCB] Stat 260 Scalable Machine Learning of UC Berkeley, by Alex Smola, CMU,
  • [FM85] Probabilistic Counting Algorithms for Data Base Applications, Phillippe Flajolet and G.Nigel Martin, Journal of Computer and System Sciences (JCSS), 1985.
  • [DF03] Loglog counting of large cardinalities, M. Durand and P. Flajolet, European Symposium on Algorithms 2003
  • [FFGM07] Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm, P. Flajolet, Eric Fusy, O. Gandouet, and F. Meunier, Conference on Analysis of Algorithms, 2007
  • [Whang 90] A linear-time probabilistic counting algorithm for database applications, K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor, ACM Transaction on Database Systems (TODS), 1990
  • [AMS96] The space complexity of approximating the frequency moments, Noga Alon, Yossi Matias and Mario Szegedy, ACM STOC 1996, JCSS 1999
  • [CH08] G.Cormode, M.Hadjieleftheriou, “Finding Frequent Items in Data Streams,” VLDB 2008
  • [CM05] What’s hot and what’s not: tracking most frequent items dynamically, Graham Cormode and S. Muthukrishnan, ACM TODS’ 05
  • [ACHPWY12] Agarwal, Cormode, Huang, Phillips, Wei, and Yi, Mergeable Summaries, PODS 2012.
  • [SpaceBound] Space-optimal Heavy Hitters with Strong Error Bounds, RADU BERINDE, PIOTR INDYK, GRAHAM CORMODE, MARTIN J. STRAUSS, TODS’ 10,

MapReduce and other Big Data Processing Platforms

Mining Massive Graphs and Graph-based Processing Platforms

  • [GraphLab2] Carlos Guestrin et al, “GraphLab 2: Parallel Machine Learning for Large-Scale Natural Graphs,” NIPS Big Learning Workshop 2011.
  • [GraphLab1] Yucheng Low, Joseph Gonzalez et al, “GraphLab: A New Framework for Parallel Machine Learning,” UAI 2010.
  • [PowerGraph] Joseph Gonzalez et al, “PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs,” OSDI 2012.

Locality-Sensitive Hashing

Dimension Reduction

Recommendation Systems

  • [Netflix09] Yehuda Koren, Robert Bell and Chris Volinsky, “Matrix Factorization Techniques for Recommendation Systems,” IEEE Computer, August 2009.

  • [KorenTalk] Yehuda Koren, “Chasing $1000000: How we Won the Netflix Progress Prize,” Page 4 to Page 12

  • [Mahout] Apache Mahout: Scalable Machine Learning and Data Mining,

Gradient Descent

  • [Pedregosa18] Fabian Pedregosa, “A birds-eye view of optimization algorithms”, November 2018.
    • This webpage provides nice, interactive visualization of how GD and SGD behave under different settings, e.g. learning rate/ step-size, etc”
  • [Sra18] Suvrit Sra, Lecture 25: Stochastic Gradient Descent, 2018.
    • Prof. Sra gave a nice one-dimension example (starting 22:50/53:03) to illustrate why SGD works so well at the beginning stage (in terms of moving in the right direction towards the optimal point even though only ONE random data-point is used to “compute” the required direction of movement.)