~ Home ~

Feed


IMPORTANT: Revised Course Assessment Scheme for IERG4300

Dear all,

Due to the early cancellation of the classes as well as all centralized examinations by the University,

we have revised the overall assessment scheme of IERG4300 as follows:

  1. The weighting of the previous 4 sets of homeworks, i.e. HW#0 to HW#3, will remain unchanged, namely, 4 of them together will contribute to 40% of the overall grade, i.e. 10% from each of the previous homework.

  2. The weighting of the midterm will remain unchanged, i.e. contribute to 10% of the overall grade.

  3. We will release one more set of programming-oriented homework (HW#4) by next Wed (Nov 27), to be due on Dec 20, which will contribute to 25% of the overall grade.

  4. Add another "Q&A" assignment, also to be due on Dec 20, which carries 25% of the overall grade. In particular, this assignment is to ask each student to design and submit a set of questions AND model-answers/suggested solutions for a future 2-hr-long final examination of IERG4300. To avoid asking trivial questions which merely test the memorization ability of the exam takers, you should assume the exam to be an open-book/open-note exam or something similar to our midterm which allows a student to bring pages of cheat-sheet into the examination venue.

Your submission will be graded according to its:

  • ORIGINALITY and thoughtfulness of the questions, i.e., non-trivial and be able to highlight and test/promote the most important concepts/ ideas/ techniques which have been taught in our class so far.

  • Correctness of the suggested solutions/ model answers.

  • Comprehensive nature (or the lack of), i.e. your set of questions together, should cover multiple (the more, the better) key concepts/ ideas/ techniques taught in our class so far. In other words, setting a single MapReduce question to take up the entire 2-hr exam period won't be a good choice.

  • Suitability of the overall set of questions for a time-limited 2hr exam. In other words, it should be reasonable for a student to complete your proposed set of questions within a 2hr limit.

  • Diversity of the questions in terms of their difficulties to differentiate students with different level of competence on the subject being tested.

Since the originality and thoughtfulness of the proposed questions are the key considerations, you MUST NOT copy or merely adapt/re-phrase questions found elsewhere (i.e. from past papers of IERG4300 or similar courses or textbooks) and submit as your own work. Instead, study our course materials and then ask yourself which are the most important concepts you have learned from this course and then try to design a related question for each (some) of those concepts to promote/ strengthen a student's understanding of such concept. i.e. viewing your questions as training exercises for the exam taker).

Lastly, I understand some of you may have difficulties to catch the already-extended deadline of HW#3 (which was Nov. 18) due to special circumstances. Towards this end, we have decided to further postpone the deadline of HW#3 one last time to Nov. 26 11:59pm. For those who have submitted HW#3 already, you are free to re-submit your improved versions again and again until Nov. 26.

It is important for you to complete HW#3. This is particularly true as part of the to-be-released HW#4 will be built upon your codes in HW#3.

Please stay tune for the release of HW#4. In the meantime, you can start working on the "Q&A" assignment now.

If you have further questions, please do not hesitate to send me email.

Best Regards,

Wing

Description

The course discusses data-intensive analytics, and automated processing of very large amount of structured and unstructured information. We focus on leveraging the MapReduce and other related paradigms to create parallel algorithms that can be scaled up to handle massive data sets such as those collected from the World Wide Web or other Internet systems and applications. We organize the course around a list of large-scale data analytic problems in practice. The required theories and methodologies for tackling each problem will be introduced. As such, the course only expects students to have solid knowledge in probability, statistics, linear algebra and computer programming skills. Topics to be covered include: the MapReduce computational model and its system architecture and realization in practice ; Finding Frequent Item-sets and Association Rules ; Finding Similar Items in high-dimensional data ; Dimensionality Reduction techniques ; Clustering ; Recommendation systems ; Analysis of Massive Graphs and its applications on the World Wide Web ; Large-scale supervised machine learning; Processing and mining of Data Streams and their applications on large-scale network/ online-activity monitoring. .

Course Information

Lecture time and venue:

  • MON 09:30 - 11:15, ERB 407
  • WED 09:30 - 11:15, KKB 101

Lecture time and venue(ESTR4300):

  • WED 14:30 - 15:15, ERB 407

Tutorial:

  • Time: WED 11:30 - 12:15, KKB 101
  • Time: THU 16:30 - 17:15, ERB 406

TA Office Hours: (If you want to ask TAs for help beyond those periods, please send an email to make reservations with the TA in advance.)

  • Da Sun Handason Tam: TUE 14:30 - 15:30 (SHB803)
  • Siyue Xie: THU 11:30 - 12:30 (SHB803)

Instructor:

  • Prof. Wing Cheong Lau. wclau [at] ie [dot] cuhk [dot] edu [dot] hk
  • Office hours: WED 13:00 - 14:00 (SHB 818)

Teaching Assistant:

  • Da Sun Handason Tam tds019 [at] ie [dot] cuhk [dot] edu [dot] hk
  • Siyue Xie xs019 [at] ie [dot] cuhk [dot] edu [dot] hk

Website account:

User: ierg4300
Password: fall2019ierg

Highly Recommended Textbooks

Tentative Timetable

Week Lecture Date Topic Period Recommended Readings Additional References
1 Sept 4 Course Admin; Era of Big Data Analytics; Computing as a Utility; Data-center Architecture W2-3 [Jlin]Ch1 ; [MMDS]Ch1 [DataCenter]
2 Sept 9, 11 MapReduce M2-3, W2-3 [MMDS]Ch2.1-2.4 ; [JLin]Ch2 -
3-4 Sept 16, 18, 23 MapReduce (cont'd) ; The Big Data Processing stack M2-3, W2-3, M2-3 [JLin]Ch3.1-3.4 [CloudData]
4-5 Sept 25, 30 Frequent Item-Set Mining and Association Rules W2-3, M2-3 [MMDS]Ch6.1-6.4 -
**Oct 7 Public holiday: Chung Yeung Festival**
5-7 Oct 2, 9, 14 Finding Similar Items and LSH W2-3, W2-3, M2-3 [MMDS]Ch3.1-3.5 [ZG]
7-8 Oct 16, 21, 23 Clustering and GMM W2-3, M2-3, W2-3 [MMDS] Ch7.1-7.4 [MMDS] Ch11, [CBishop] Ch.9, [MLE/MAP] -
**An in-class Mid-term will be held on Oct 23 (Wed)**
9 Oct 28, 30 Dimension Reduction M2-3, W2-3 [MMDS] Ch11 [PCA], [GuruswamiKannan]
10 Nov 4, 6 Recommendation Systems M2-3, W2-3 [SVDPCA], [ANgCS229PCA], [ShaliziADAEPV]Ch17 ; -
11 Nov 11, 13 Recommendation Systems (cont'd) ; Regression and Gradient Descent M2-3, W2-3 [MMDS] Ch9 [Netflix09]; [KorenTalk]; [ANg]
12 Nov 18 Data Stream Algorithms M2-3 [MMDS] Ch4.1-4.5 ; -
**Nov 20 (Wed) The lecture will be cancelled as the course instructor will be out of town (tutorial will still go on)**
13 Nov 25, 27 Data Stream Algorithms (cont'd) M2-3, W2-3 [ChakDataStream] Ch0,Ch1,Ch4.4,Ch6 ; -
Nov 26 23:59 Homework 3 Deadline - - -
Dec 20 23:59 Homework 4 Deadline,

Q&A Assignment Deadline

- - -

Course Assessment

Your grade will be based on the following components:

NEW course assessment scheme:

  • Homework 0: 10%
  • Homework 1: 10%
  • Homework 2: 10%
  • Homework 3: 10%
  • Homework 4: 25%
  • Mid-term: 10%
  • Q&A Assignment: 25%

OLD course assessment scheme:

* Homeworks & Programming assignments (5 sets in total): 50%

* Mid-term: 10% .

* Final Exam: 40% (2-hour final examination)

Student/Faculty Expectations on Teaching and Learning

http://mobitec.ie.cuhk.edu.hk/StaffStudentExpectations.pdf

Academic Honesty

You are expected to do your own work and acknowledge the use of anyone else's words or ideas. You MUST put down in your submitted work the names of people with whom you have had discussions.

Refer to http://www.cuhk.edu.hk/policy/academichonesty for details

When scholastic dishonesty is suspected, the matter will be turned over to the University authority for action.

You MUST include the following signed statement in all of your submitted homework, project assignments and examinations. Submission without a signed statement will not be graded.

I declare that the assignment here submitted is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.

Course Collaborators