References - IERG4300/ ESTR4300 Web-Scale Information Analytics / Fall 2024

Highly Recommended Textbooks

[MMDS] Mining of Massive Datasets (the 3rd version) by Anand Rajaraman, Jeff Ullman and Jure Leskovec, Cambridge University Press. Latest version can be downloaded from MMDS Homepage or Mining of Massive Datasets.pdf
[JLin] Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer, Morgan and Claypool Publishers, 2010, can be freely downloaded from http://lintool.github.io/MapReduceAlgorithms/
[CBishop] Pattern Recognition and Machine Learning by Christopher M. Bishop, Published by Springer Science and Business, 2007.
[MLE/MAP] Estimating Probabilities: MLE and MAP http://www.cs.cmu.edu/~tom/mlbook/Joint_MLE_MAP.pdf

[HTF] Elements of Statistical Learning 2nd Edition by Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, Published by Springer, 2009. Ebook version can be downloaded from: http://link.springer.com/book/10.1007/978-0-387-84858-7 via a CUHK IP address
[JWHT] An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, Published by Springer, 2013. Ebook version can be downloaded from: http://link.springer.com/book/10.1007/978-1-4614-7138-7 via a CUHK IP address
[PCA] Principal Component Analysis, 2nd Edition, by I.T. Jolliffe, Published by Springer 2002, Ebook version can be download from: http://cda.psych.uiuc.edu/statistical_learning_course/Jolliffe%20I.%20Principal%20Component%20Analysis%20(2ed.,%20Springer,%202002)(518s)MVsa.pdf
[ShaliziADAEPV] Cosma Rohilla Shalizi, “Advanced Data Analysis from an Elementary Point of View”, Cambridge University Press, 2014. Draft available for download from: http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/
[ShaprioStockman] Shaprio and Stockman, Computer Vision, 2000, Chapter 4.2-4.9, https://courses.cs.washington.edu/courses/cse576/book/ch4.pdf
[Blum] Blum, Avrim, John Hopcroft, and Ravindran Kannan. “Foundations of Data Science.” (2017): https://www.cs.cornell.edu/jeh/book%20no%20so;utions%20March%202019.pdf

Additional References

General Readings

[Top10] X.Wu et al, “Top 10 Algorithms in Data Mining,”, Knowledge Information System (2008) 14:1-37, also available at: http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf

The Netflix Challenge

Blending 101 http://pragmatictheory.blogspot.hk/2008/07/blending-101.html
Lessons from the Netflix Prize by Yehuda Koren Slides
Lessons from the Netflix Prize Challenge http://kdd.org/exploration_files/6-Netflix-1.pdf
Factorization meets the neighborhood: a multifaceted collaborative filtering model http://dl.acm.org/citation.cfm?id=1401944
Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=0c953e630cccd64d2ad5fbb09f08425e7f82b7a3
Modeling Relationships at Multiple Scales to ImproveAccuracy of Large Recommender Systems http://dl.acm.org/ft_gateway.cfm?id=1281206
Matrix Factorization Techniques For Recommender Systems https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf
Collaborative Filtering with Temporal Dynamics http://www.cc.gatech.edu/~zha/CSE8801/CF/kdd-fp074-koren.pdf
The BellKor 2008 Solution to the Netflix Prize https://asset-pdf.scinapse.io/prod/54392637/54392637.pdf
The BellKor solution to the Netflix Prize http://pzs.dstu.dp.ua/DataMining/recom/bibl/ProgressPrize2007_KorBell.pdf
Try This at Home by Simon Funk http://sifter.org/~simon/journal/20061211.html
Matrix Factorization: A Simple Tutorial and Implementation in Python http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/

Similar/Relevant Courses offered Elsewhere

[ANg] Machine Learning by Andrew Ng, Stanford CS229 Course Notes, https://cs229.stanford.edu/
[ChakDataStream] CS49 Data Stream Algorithms, Amit Chakrabarti, Dartmouth College, Fall 2011, http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/Notes/lecnotes.pdf, back-up URLhttps://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=8741cb464af1e230f989c3613290033364a8be29
[JLeskovecMMDS] Mining Massive Data Sets, by Jure Leskovec, Stanford Course CS246, http://www.stanford.edu/class/cs246/
[ASmolaUCB] Scalable Machine Learning, by Alex Smola, UC Berkeley Course Statistics 241B, CS281B, http://alex.smola.org/teaching/berkeley2012/
[TwitterUCB] Analyzing Big Data with Twitter, by Marti Hearst et al, UC Berkeley School of Information, Course i290, http://blogs.ischool.berkeley.edu/i290-abdt-s12/
[TellAvivU13] Edith Cohen, Amos Fiat, Haim Kaplan, Paula Ta-Shma, Tova Milo, CS 0368.3239, Leveraging Big Data, Fall 2013/2014, TAU (Tel Aviv University) http://www.cohenwang.com/edith/bigdataclass2013
[JLinUMD] Data-Intensive Information Proessing Applications, by Jimmy Lin, University of Maryland Course INFM718G/CMSC838G, http://lintool.github.io/UMD-courses/bigdata-2010-Spring/syllabus.html
[WCohenCMU] Machine Learning with Large Datasets, by William W. Cohen, CMU Course 10-605 http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Spring_2014
[ASmolaCMU] Introduction to Machine Learning, by Alex Smola, CMU Course 10-701 http://alex.smola.org/teaching/cmu2013-10-701x/index.html
[ShaliziADAEPV] Cosma Rohilla Shalizi, “Advanced Data Analysis from an Elementary Point of View”, Cambridge University Press, 2014. Draft available for download from: http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/
[AMoore] Statistical Data Mining Tutorials – Tutorial Slides by Andrew W. Moore, https://www.cs.cmu.edu/~./awm/tutorials/list.html
[HadoopHacking] Guide for Happy Hadoop Hacking, http://curtis.ml.cmu.edu/w/courses/index.php/Guide_for_Happy_Hadoop_Hacking
[Strange10] Gilbert Strang, Linear Algebra, 2010.

Cloud Computing

[DataCenter] The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition, by Luiz Andre Barroso and Urs Holzle, Published by Morgan and Claypool, 2013, http://web.eecs.umich.edu/~mosharaf/Readings/DC-Computer.pdf
[CloudData] Siba Mohammad, Sebastian Breb, Eike Schallenhn, “Cloud Data Management: A Short Overview and Comparison of Current Approaches,” 24th GI-Workshop on Foundations of Databases, May 2012. https://ceur-ws.org/Vol-850/paper_mohammad.pdf
[Tim Harford] Tim Harford, Big data: are we making a big mistake?

Data Stream Algorithms

[ChakDataStream] CS49 Data Stream Algorithms, Amit Chakrabarti, Dartmouth College, Fall 2011, http://www.cs.dartmouth.edu/~ac/Teach/CS49-Fall11/Notes/lecnotes.pdf, back-up URLhttps://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=8741cb464af1e230f989c3613290033364a8be29
[AggaDataStream] Charu C. Aggarwal (Ed.), Data Stream Models and Algorithms, Springer 2007.
[PIndykStreaming] Streaming etc., by Piotr Indyk, a short course given at Rice University, 2009, http://people.csail.mit.edu/indyk/Rice/
[PIndykSvH] Sketching via Hashing: from Heavy Hitters to Compressive Sensing to Sparse Fourier Transform: by Piotr Indyk slides and writeup, at PODS, 2013.
[IndykPhDOpen12] Piotr Indyk, MIT, “Data Stream Algorithms,” Open lectures for PhD Students in Computer Science 2012, http://phdopen.mimuw.edu.pl/index.php?page=z11w3#zal
[JXu] A Tutorial on Network Data Streaming, by Jun (Jim) Xu, ACM Sigmetrics 2007, http://www.cc.gatech.edu/~jx/8803DS08/sigm07.pdf
[GGR] Querying and Mining Data Streams – You Only Get One Look: A Tutorial by Minos Garofalakis, Johannes Gehrke, Rajeev Rastogi, VLDB 2002.
[GaroRamaUCB] CS286 Implementation of Database Systems, UC Berkeley, Minos Garofalakis, Raghu Ramakrishnan, http://db.cs.berkeley.edu/cs286sp07/
[SmolaUCB] Stat 260 Scalable Machine Learning of UC Berkeley, by Alex Smola, CMU, http://alex.smola.org/teaching/berkeley2012/streams.html
[FM85] Probabilistic Counting Algorithms for Data Base Applications, Phillippe Flajolet and G.Nigel Martin, Journal of Computer and System Sciences (JCSS), 1985.
[DF03] Loglog counting of large cardinalities, M. Durand and P. Flajolet, European Symposium on Algorithms 2003
[FFGM07] Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm, P. Flajolet, Eric Fusy, O. Gandouet, and F. Meunier, Conference on Analysis of Algorithms, 2007
[Whang 90] A linear-time probabilistic counting algorithm for database applications, K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor, ACM Transaction on Database Systems (TODS), 1990
[AMS96] The space complexity of approximating the frequency moments, Noga Alon, Yossi Matias and Mario Szegedy, ACM STOC 1996, JCSS 1999
[CH08] G.Cormode, M.Hadjieleftheriou, “Finding Frequent Items in Data Streams,” VLDB 2008
[CM05] What’s hot and what’s not: tracking most frequent items dynamically, Graham Cormode and S. Muthukrishnan, ACM TODS’ 05
[ACHPWY12] Agarwal, Cormode, Huang, Phillips, Wei, and Yi, Mergeable Summaries, PODS 2012.
[SpaceBound] Space-optimal Heavy Hitters with Strong Error Bounds, RADU BERINDE, PIOTR INDYK, GRAHAM CORMODE, MARTIN J. STRAUSS, TODS’ 10, http://dimacs.rutgers.edu/~graham/pubs/papers/countersj.pdf

[BloomSurvey] Bloom Filter survey by Broder & Mitzenmacher http://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf
[CountMin] Count-Min sketch https://sites.google.com/site/countminsketch/

MapReduce and other Big Data Processing Platforms

[MMDSHadoopLabs] Mining Massive Data Sets: Hadoop Labs, by Daniel Templeton and Jure Leskovec, Stanford Course CS246H, http://snap.stanford.edu/class/cs246-2017/cs246h.html
[PlatformsKentU] Advanced computing Platforms for Data Processing, by Ruoming Jin, Kent State University Course http://www.cs.kent.edu/~jin/Cloud12Spring/Cloud.html
[BDAS] The Berkeley Data Analytics Stack (BDAS), https://amplab.cs.berkeley.edu/software/
[TeraSort] TeraByte Sort on Apache Hadoop, http://sortbenchmark.org/YahooHadoop.pdf

Mining Massive Graphs and Graph-based Processing Platforms

[PowerLaw] Zipf, Power-Laws and Pareto: A Ranking Tutorial, by L. Adamic, https://www.hpl.hp.com/research/idl/papers/ranking/ranking.html, https://web.cs.dal.ca/~shepherd/courses/csci4141/zipf/ranking.html
[Pregel] G. Malewicz et al, “Pregel: A System for Large-Scale Graph Processing,” ACM SIGMOD 2010.

[GraphLab2] Carlos Guestrin et al, “GraphLab 2: Parallel Machine Learning for Large-Scale Natural Graphs,” NIPS Big Learning Workshop 2011.
[GraphLab1] Yucheng Low, Joseph Gonzalez et al, “GraphLab: A New Framework for Parallel Machine Learning,” UAI 2010.
[PowerGraph] Joseph Gonzalez et al, “PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs,” OSDI 2012.

Locality-Sensitive Hashing

[PIndykLSH] Locality-Sensitive Hashing (LSH) Algorithm and Implementation (E2LSH), by Piotr Indyk, http://web.mit.edu/andoni/www/LSH/index.html
[ZG] Reza Bosagh Zadeh and Ashish Goel, “Dimension Independent Similarity Computation,” version 4, May 2013. http://arxiv.org/abs/1206.2082

Dimension Reduction

[SVD] Notes on Singular Value Decomposition by Edo Liberty of Yahoo/ Yale which contains the proof on (i) the existence and uniqueness of SVD for any matrix, and (2) SVD gives the best low-rank approximation
[GuruswamiKannan] Notes on “Singular Value Decomposition” ( Chapter 4 of http://www.cs.cmu.edu/~venkatg/teaching/CStheory-infoage/hopcroft-kannan-feb2012.pdf ), of CMU Course 15-496/15-859X: Computer Science Theory for Information Age http://www.cs.cmu.edu/~venkatg/teaching/CStheory-infoage/, Spring 2012.

[SVDPCA] Rasmus Elsborg Madsen, Lars Kai Hansen and Ole Winther, “Singular Value Decomposition and Principal Component Analysis,” Feb 2004, http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=4000 (Please note that there are a few typos in this note though: e.g,
1. In the 1st line of page 4, the eigenvectors should be called the principal AXIS or principal DIRECTIONS, NOT principal components.
1. 3 lines above Eq. 13 in page 4, it should be “n » m” instead of “n « m”.
1. A different convention used in this note for the original input matrix (as compared to our class notes): Here, each row of the original input data matrix corresponds to a FEATURE (or Attribute) whereas each column corresponds to a data-point.
[LSaul] Non-linear Dimension Reduction – a Tutorial by Lawrence Saul, NIPS 2005, http://www.robots.ox.ac.uk/~cvrg/michaelmas2007/nips05_nldr.pdf
[ANgCS229PCA] Machine Learning by Andrew Ng, Stanford CS229 Course Notes on PCA: https://cs229.stanford.edu/notes2020spring/cs229-notes10.pdf
[2DSVD] Two-Dimensional Singular Value Decomposition (2DSVD) for 2D Maps and Images

Recommendation Systems

[Netflix09] Yehuda Koren, Robert Bell and Chris Volinsky, “Matrix Factorization Techniques for Recommendation Systems,” IEEE Computer, August 2009.
[KorenTalk] Yehuda Koren, “Chasing $1000000: How we Won the Netflix Progress Prize,” Page 4 to Page 12
[Mahout] Apache Mahout: Scalable Machine Learning and Data Mining, http://mahout.apache.org

Gradient Descent

[Pedregosa18] Fabian Pedregosa, “A birds-eye view of optimization algorithms”, November 2018.
- This webpage provides nice, interactive visualization of how GD and SGD behave under different settings, e.g. learning rate/ step-size, etc”
[Sra18] Suvrit Sra, Lecture 25: Stochastic Gradient Descent, 2018.
- Prof. Sra gave a nice one-dimension example (starting 22:50/53:03) to illustrate why SGD works so well at the beginning stage (in terms of moving in the right direction towards the optimal point even though only ONE random data-point is used to “compute” the required direction of movement.)