~ Home ~

Feed


Latest News

  • The deadline of the project (only for IEMS 5709) is 11:59 AM, Dec 28th (not at mid-nite)!
  • Homework 4 is released! The due date is Dec 07!
  • Homework 3 is released! The due date is Nov 20!

Description

The course discusses data-intensive analytics, and automated processing of very large amount of structured and unstructured information. We focus on leveraging the MapReduce and other related paradigms to create parallel algorithms that can be scaled up to handle massive data sets such as those collected from the World Wide Web or other Internet systems and applications. We organize the course around a list of large-scale data analytic problems in practice. The required theories and methodologies for tackling each problem will be introduced. As such, the course only expects students to have solid knowledge in probability, statistics, linear algebra and computer programming skills. Topics to be covered include: the MapReduce computational model and its system architecture and realization in practice ; Finding Frequent Item-sets and Association Rules ; Finding Similar Items in high-dimensional data ; Dimensionality Reduction techniques ; Clustering ; Recommendation systems ; Analysis of Massive Graphs and its applications on the World Wide Web ; Large-scale supervised machine learning; Processing and mining of Data Streams and their applications on large-scale network/ online-activity monitoring.

Course Information

The lectures and the tutorials will be conducted in ZOOM:

  • [IMPORTANT] ZOOM meeting ID: 994 7183 4002
  • For students who did not register for the course but wants to access the meeting, please email the TA to obtain the meeting password. (See below for TAs' email addresses)

Lecture time:

  • MON 12:30 - 14:15 (Location: LSK_LT4)
  • THU 11:30 - 13:15 (Location: LSK_LT2)

Lecture time (ESTR4300):

  • WED 16:30 - 17:30 (Location: SHB 833)
  • START FROM WEEK 2 (Sept 15)

Tutorial (start from Week 2):

  • Time: MON 16:30 - 17:15 (Location: ERB 706)
  • Time: TUE 15:30 - 16:15 (Location: BMS 2)
  • Zoom Meeting ID: same as the lecture ID

TAs Office Hours: (If you want to ask TAs for help beyond those periods, please send an email to make reservations with the TA in advance.)

  • Hongyu Deng: THU 15:00 - 16:00 (Office: SHB 825)
  • Yuhang Cao: FRI 11:00 - 12:00 (Office: SHB 702)
  • Siyue Xie: MON 9:30 - 10:30 (Office: SHB 803)

You can visit the TA's office, or reserve a meeting on Zoom in advance.

Instructor:

  • Prof. Wing Cheong Lau. wclau [at] ie [dot] cuhk [dot] edu [dot] hk
  • Office hours: MON 15:00 - 16:00 (Office: SHB 818)

Teaching Assistant:

  • Hongyu Deng dh021 [at] ie [dot] cuhk [dot] edu [dot] hk
  • Yuhang Cao cy020 [at] ie [dot] cuhk [dot] edu [dot] hk
  • Siyue Xie xiesiyue [at] link [dot] cuhk [dot] edu [dot] hk

Website account:

User: ierg4300
Password: fall2021ierg

Highly Recommended Textbooks

Tentative Timetable

Week Lecture Date Topic Period Recommended Readings Additional References
1 Sept 6, 9 Course Admin; Era of Big Data Analytics; M5-6, H4-5 [Jlin]Ch1 [DataCenter]
2 Sept 13, 15 Computing as a Utility; Data-center Architecture M5-6, H4-5 [MMDS]Ch1 -
3,4 Sept 20, 23, 27, 30 MapReduce/ Hadoop ; The Big Data Processing stack M5-6, H4-5, M5-6, H4-5 [MMDS]Ch2.1-2.4; [JLin]Ch2; [JLin]Ch3.1-3.4 [CloudData]
5 Oct 4, 7 Frequent Item-Set Mining and Association Rules M5-6, H4-5 [MMDS]Ch6.1-6.4 -
6 Oct 11 Finding Similar Items and LSH M5-6 [MMDS]Ch3.1-3.5 [ZG]
**Oct 14 Public holiday: Chung Yeung Festival**
7 Oct 18, 21 Clustering and GMM M5-6, H4-5 [MMDS] Ch7.1-7.4 [MMDS] Ch11, [CBishop] Ch.9, [MLE/MAP] -
8 Oct 25, 28 Dimension Reduction M5-6, H4-5 [MMDS] Ch11 [PCA], [GuruswamiKannan]
9 Nov 1, 4 Recommendation Systems M5-6, H4-5 [SVDPCA], [ANgCS229PCA], [ShaliziADAEPV]Ch17 ; -
10-11 Nov 8, 11, 15, 18 Regression and Gradient Descent ; Recommendation Systems (cont'd) M5-6, H4-5, M5-6, H4-5 [MMDS] Ch9 [Netflix09]; [KorenTalk]; [ANg]
12 Nov 22, 25 Data Stream Algorithms M5-6, H4-5 [MMDS] Ch4.1-4.5 ; -
13 Nov 29, Dec 2 Data Stream Algorithms (cont'd) M5-6, H4-5 [ChakDataStream] Ch0,Ch1,Ch4.4,Ch6 ; -

Course Assessment

Your grade will be based on the following components:

For IERG4300/ ESTR4300:

  • Homework (5 sets in total): 65%
  • Final Exam: 35%

For IEMS5709:

  • Homework (5 sets in total): 50%
  • Final Exam: 35%
  • Project: 15%

Student/Faculty Expectations on Teaching and Learning

http://mobitec.ie.cuhk.edu.hk/StaffStudentExpectations.pdf

Academic Honesty

You are expected to do your own work and acknowledge the use of anyone else's words or ideas. You MUST put down in your submitted work the names of people with whom you have had discussions.

Refer to http://www.cuhk.edu.hk/policy/academichonesty for details

When scholastic dishonesty is suspected, the matter will be turned over to the University authority for action.

You MUST include the following signed statement in all of your submitted homework, project assignments and examinations. Submission without a signed statement will not be graded.

I declare that the assignment here submitted is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.

Course Collaborators