~ Home ~



The course discusses data-intensive analytics, and automated processing of very large amount of structured and unstructured information. We focus on leveraging the MapReduce and other related paradigms to create parallel algorithms that can be scaled up to handle massive data sets such as those collected from the World Wide Web or other Internet systems and applications. We organize the course around a list of large-scale data analytic problems in practice. The required theories and methodologies for tackling each problem will be introduced. As such, the course only expects students to have solid knowledge in probability, statistics, linear algebra and computer programming skills. Topics to be covered include: the MapReduce computational model and its system architecture and realization in practice ; Finding Frequent Item-sets and Association Rules ; Finding Similar Items in high-dimensional data ; Dimensionality Reduction techniques ; Clustering ; Recommendation systems ; Analysis of Massive Graphs and its applications on the World Wide Web ; Large-scale supervised machine learning; Processing and mining of Data Streams and their applications on large-scale network/ online-activity monitoring.

Course Information

Lecture time and venue:

  • MON 11:30 - 13:15, LSB LT3
  • THU 10:30 - 11:15, ERB 407

Lecture time and venue (ESTR 4300):

  • Time: THU 11:30 - 12:15, ERB 407


  • Time: THU 9:30 - 10:15
  • Venue: ERB 407

TA Office Hours: (If you want to ask TAs for help beyond those periods, please send an email to make reservations with the TA in advance.)

  • Liu Yang: WED 15:30 - 16:30 SHB802
  • Zhang Bowen: THU 15:30 - 16:30 SHB802
  • Huang Huaiyi: TUE 15:30 - 16:30 SHB702


  • Prof. Wing Cheong Lau. wclau [at] ie [dot] cuhk [dot] edu [dot] hk
  • Office hours: MON 10:00 - 11:00, SHB 818

Teaching Assistant:

  • Liu Yang ly016 [at] ie [dot] cuhk [dot] edu [dot] hk
  • Zhang Bowen zb016 [at] ie [dot] cuhk [dot] edu [dot] hk
  • Huang Huaiyi hh016 [at] ie [dot] cuhk [dot] edu [dot] hk

Website account:

User: engg4030
Password: fall2016engg

Highly Recommended Textbooks

Tentative Timetable

Week Lecture Date Topic Period Recommended Readings Additional References
1 Jan 8, 11 Course Admin ; Era of Big Data Analytics M4-5, H2-3 [Jlin]Ch1 ; [MMDS]Ch1 -
2 Jan 15, 18 Computing as a Utility ; Data-center Architecture M4-5, H3 - [DataCenter]
3 Jan 22, 25 MapReduce M4-5, H3 [MMDS]Ch2.1-2.4 ; [JLin]Ch2 -
4 Jan 29, Feb 1 MapReduce (cont'd) ; The Big Data Processing stack M4-5, H3 [JLin]Ch3.1-3.4 [CloudData]
5 Feb 5, 8 Frequent Item-Set Mining and Association Rules M4-5, H3 [MMDS]Ch6.1-6.4 -
6-7 Feb 12, 22 Finding Similar Items and LSH M4-5, H3 [MMDS]Ch3.1-3.5 [ZG]
**Feb 15 - 21 Chinese New Year Holiday**
8 Feb 26, Mar 1 Finding Similar Items and LSH (cont'd) M4-5, H3 [ZG] -
9 Mar 5, 8 Clustering and GMM M4-5, H3 [MMDS] Ch7.1-7.4 [MMDS] Ch11, [CBishop] Ch.9, [MLE/MAP] -
**An in-class Mid-term will be held on Mar 12 (Mon)**
10-11 Mar 15,19 Clustering and GMM (cont'd) H3, M4-5 [MMDS] Ch7.1-7.4 [MMDS] Ch11, [CBishop] Ch.9, [MLE/MAP] , -
11-12 Mar 19, 22, 26 Dimension Reduction M4-5, H3, M4-5 [MMDS] Ch11 [PCA], [GuruswamiKannan]
**Mar 30 - Apr 2 Easter Holidays & Apr 5 Public Holiday: Ching Ming Festival**
14 Apr 9, 12 Recommendation Systems M4-5, H3 [SVDPCA], [ANgCS229PCA], [ShaliziADAEPV]Ch17 ; -
15 Apr 16, 19 Recommendation Systems (cont'd) ; Regression and Gradient Descent M4-5, H3 [MMDS] Ch9 [Netflix09]; [KorenTalk]; [ANg]

Course Assessment

Your grade will be based on the following components:

  • Homeworks & Programming assignments (5 sets in total): 50%
  • Mid-term: 10%
  • Final Exam: 40% (2-hour final examination)

Student/Faculty Expectations on Teaching and Learning


Academic Honesty

You are expected to do your own work and acknowledge the use of anyone else's words or ideas. You MUST put down in your submitted work the names of people with whom you have had discussions.

Refer to http://www.cuhk.edu.hk/policy/academichonesty for details

When scholastic dishonesty is suspected, the matter will be turned over to the University authority for action.

You MUST include the following signed statement in all of your submitted homework, project assignments and examinations. Submission without a signed statement will not be graded.

I declare that the assignment here submitted is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.

Course Collaborators