Additional References

General Readings

The Netflix Challenge

Similar/Relevant Courses offered Elsewhere

Cloud Computing

Data Stream Algorithms

MapReduce and other Big Data Processing Platforms

Mining Massive Graphs and Graph-based Processing Platforms

  • [GraphLab2] Carlos Guestrin et al, “GraphLab 2: Parallel Machine Learning for Large-Scale Natural Graphs,” NIPS Big Learning Workshop 2011.

  • [GraphLab1] Yucheng Low, Joseph Gonzalez et al, “GraphLab: A New Framework for Parallel Machine Learning,” UAI 2010.

  • [PowerGraph] Joseph Gonzalez et al, “PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs,” OSDI 2012.

Locality-Sensitive Hashing

Dimension Reduction

Recommendation Systems

  • [Netflix09] Yehuda Koren, Robert Bell and Chris Volinsky, “Matrix Factorization Techniques for Recommendation Systems,” IEEE Computer, August 2009.

  • [KorenTalk] Yehuda Koren, “Chasing $1000000: How we Won the Netflix Progress Prize,” Page 4 to Page 12

  • [Mahout] Apache Mahout: Scalable Machine Learning and Data Mining, http://mahout.apache.org

Gradient Descent

  • [Pedregosa18] Fabian Pedregosa, “A birds-eye view of optimization algorithms”, November 2018.
    • This webpage provides nice, interactive visualization of how GD and SGD behave under different settings, e.g. learning rate/ step-size, etc”
  • [Sra18] Suvrit Sra, Lecture 25: Stochastic Gradient Descent, 2018.
    • Prof. Sra gave a nice one-dimension example (starting 22:50/53:03) to illustrate why SGD works so well at the beginning stage (in terms of moving in the right direction towards the optimal point even though only ONE random data-point is used to “compute” the required direction of movement.)