~ Home ~

Feed


Description

The course discusses data-intensive analytics, and automated processing of very large amount of structured and unstructured information. We focus on leveraging the MapReduce and other related paradigms to create parallel algorithms that can be scaled up to handle massive data sets such as those collected from the World Wide Web or other Internet systems and applications. We organize the course around a list of large-scale data analytic problems in practice. The required theories and methodologies for tackling each problem will be introduced. As such, the course only expects students to have solid knowledge in probability, statistics, linear algebra and computer programming skills. Topics to be covered include: the MapReduce computational model and its system architecture and realization in practice ; Finding Frequent Item-sets and Association Rules ; Finding Similar Items in high-dimensional data ; Dimensionality Reduction techniques ; Clustering ; Recommendation systems ; Analysis of Massive Graphs and its applications on the World Wide Web ; Large-scale supervised machine learning; Processing and mining of Data Streams and their applications on large-scale network/ online-activity monitoring.

Course Information

Lecture time and venue:

  • TUE 9:30am - 10:15am, ERB 703
  • THUR 9:30am - 11:15am, ERB 703

Tutorial:

  • TUE 6:30pm - 7:15pm

Instructor:

  • Prof. Wing Cheong Lau. wclau [at] ie [dot] cuhk [dot] edu [dot] hk
  • Office hours: Tue 10:45am to 12:15pm or By Appointment

Teaching Assistant:

  • YANG Ronghai
    • yr013 [at] ie [dot] cuhk [dot] edu [dot] hk
    • Office hour: Tue 7:15 pm - 8:15 pm

Website account:

User: engg4030
Password: fall4030engg

Highly Recommended Textbooks

Tentative Timetable

Lecture Date Topic Period Recommended Readings Additional References
Sep 2, 4 Course Admin ; Overview of Big Data and Era of Cloud Computing T2, H2-3 [Jlin]Ch1 ; [MMDS]Ch1 [DataCenter]
**Sep 9 the Chinese Mid-Autumn Festival**
Sep 11 MapReduce H2-3 [MMDS]Ch2.1-2.4 -
**Sep 16: Class cancelled due to Typhoon**
Sep 18, 23, 25 MapReduce (cont'd) T2, H2-3 [JLin]Ch2 ; [JLin]Ch3.1-3.4 [CloudData]
Sep 25, Oct 7, 9 Frequent Item-Set Mining and Association Rules T2, H2-3 [MMDS]Ch6.1-6.4
**No Class for Sep 30. Instructor will be on conference leave, make-up class is scheduled for Dec 1 9:30am to 12:30pm**
**Oct 2 Chung Yeung Festival**
Oct 14, 16 Data Stream Algorithms T2, H2-3 [MMDS] Ch4.1-4.5 -
**An in-class Mid-term will be held on Oct 21 (Tue)**
Oct 23 Data Stream Algorithms (cont'd) H2-3 [ChakDataStream] Ch0,Ch1,Ch4.4,Ch6 -
Oct 28, 30 Finding Similar Items and Locality Sensitive Hash (LSH) T2, H2-3 [MMDS]Ch3.1-3.5 [ZG]
Nov 4, 6 Clustering and GMM T2, H2-3 [MMDS] Ch7.1-7.4, [CBishop] Ch.9 -
Nov 11, 13 Dimension Reduction; T2, H2-3 [MMDS] Ch11 ; [SVDPCA], [ANgCS229PCA], [ShaliziADAEPV]Ch17 [PCA], [GuruswamiKannan]
Nov 18 Recommendation Systems T2, H2-3 [MMDS] Ch9 [Netflix09]; [KorenTalk]
**Class Suspended for Nov 20 (Thu) due to University Congregation**
Nov 25,27 Analyzing Massive Graphs T2, H2-3 [JLin] Ch5 -
Dec 1 Graph-based Distributed Proc. Systems Mon 9:30am to 12:30pm at ERB1009 [GraphLabPapers] -
Dec 1 IF Time permits: Supervised Learning Overview ; Decision Tree Mon 9:30am to 12:30pm at ERB1009 [ShaprioStockman] Ch4.2-4.9 [ANg], [AMoore]

Course Assessment

Your grade will be based on the following components:

  • Homeworks & Programming assignments (4-5 sets in total): 50%
  • Mid-term: 15%
  • Final Exam: 35% (2-hour final examination)

Student/Faculty Expectations on Teaching and Learning

http://www.erg.cuhk.edu.hk/Student-Faculty-Expectations

Academic Honesty

You are expected to do your own work and acknowledge the use of anyone else's words or ideas. You MUST put down in your submitted work the names of people with whom you have had discussions.

Refer to http://www.cuhk.edu.hk/policy/academichonesty for details

When scholastic dishonesty is suspected, the matter will be turned over to the University authority for action.

You MUST include the following signed statement in all of your submitted homework, project assignments and examinations. Submission without a signed statement will not be graded.

I declare that the assignment here submitted is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.

Course Collaborators