~ Home ~
IMPORTANT: Revised Course Assessment Scheme for IERG4300
Dear all,
Due to the early cancellation of the classes as well as all centralized examinations by the University,
we have revised the overall assessment scheme of IERG4300 as follows:
The weighting of the previous 4 sets of homeworks, i.e. HW#0 to HW#3, will remain unchanged, namely, 4 of them together will contribute to 40% of the overall grade, i.e. 10% from each of the previous homework.
The weighting of the midterm will remain unchanged, i.e. contribute to 10% of the overall grade.
We will release one more set of programming-oriented homework (HW#4) by next Wed (Nov 27), to be due on Dec 20, which will contribute to 25% of the overall grade.
Add another "Q&A" assignment, also to be due on Dec 20, which carries 25% of the overall grade. In particular, this assignment is to ask each student to design and submit a set of questions AND model-answers/suggested solutions for a future 2-hr-long final examination of IERG4300. To avoid asking trivial questions which merely test the memorization ability of the exam takers, you should assume the exam to be an open-book/open-note exam or something similar to our midterm which allows a student to bring pages of cheat-sheet into the examination venue.
Your submission will be graded according to its:
ORIGINALITY and thoughtfulness of the questions, i.e., non-trivial and be able to highlight and test/promote the most important concepts/ ideas/ techniques which have been taught in our class so far.
Correctness of the suggested solutions/ model answers.
Comprehensive nature (or the lack of), i.e. your set of questions together, should cover multiple (the more, the better) key concepts/ ideas/ techniques taught in our class so far. In other words, setting a single MapReduce question to take up the entire 2-hr exam period won't be a good choice.
Suitability of the overall set of questions for a time-limited 2hr exam. In other words, it should be reasonable for a student to complete your proposed set of questions within a 2hr limit.
Diversity of the questions in terms of their difficulties to differentiate students with different level of competence on the subject being tested.
Since the originality and thoughtfulness of the proposed questions are the key considerations, you MUST NOT copy or merely adapt/re-phrase questions found elsewhere (i.e. from past papers of IERG4300 or similar courses or textbooks) and submit as your own work. Instead, study our course materials and then ask yourself which are the most important concepts you have learned from this course and then try to design a related question for each (some) of those concepts to promote/ strengthen a student's understanding of such concept. i.e. viewing your questions as training exercises for the exam taker).
Lastly, I understand some of you may have difficulties to catch the already-extended deadline of HW#3 (which was Nov. 18) due to special circumstances. Towards this end, we have decided to further postpone the deadline of HW#3 one last time to Nov. 26 11:59pm. For those who have submitted HW#3 already, you are free to re-submit your improved versions again and again until Nov. 26.
It is important for you to complete HW#3. This is particularly true as part of the to-be-released HW#4 will be built upon your codes in HW#3.
Please stay tune for the release of HW#4. In the meantime, you can start working on the "Q&A" assignment now.
If you have further questions, please do not hesitate to send me email.
Best Regards,
Wing
Description
The course discusses data-intensive analytics, and automated processing of very large amount of structured and unstructured information. We focus on leveraging the MapReduce and other related paradigms to create parallel algorithms that can be scaled up to handle massive data sets such as those collected from the World Wide Web or other Internet systems and applications. We organize the course around a list of large-scale data analytic problems in practice. The required theories and methodologies for tackling each problem will be introduced. As such, the course only expects students to have solid knowledge in probability, statistics, linear algebra and computer programming skills. Topics to be covered include: the MapReduce computational model and its system architecture and realization in practice ; Finding Frequent Item-sets and Association Rules ; Finding Similar Items in high-dimensional data ; Dimensionality Reduction techniques ; Clustering ; Recommendation systems ; Analysis of Massive Graphs and its applications on the World Wide Web ; Large-scale supervised machine learning; Processing and mining of Data Streams and their applications on large-scale network/ online-activity monitoring. .
Course Information
Lecture time and venue:
MON
09:30 - 11:15, ERB 407WED
09:30 - 11:15, KKB 101
Lecture time and venue(ESTR4300):
WED
14:30 - 15:15, ERB 407
Tutorial:
- Time:
WED
11:30 - 12:15, KKB 101 - Time:
THU
16:30 - 17:15, ERB 406
TA Office Hours: (If you want to ask TAs for help beyond those periods, please send an email to make reservations with the TA in advance.)
- Da Sun Handason Tam:
TUE
14:30 - 15:30 (SHB803) - Siyue Xie:
THU
11:30 - 12:30 (SHB803)
Instructor:
- Prof. Wing Cheong Lau.
wclau [at] ie [dot] cuhk [dot] edu [dot] hk
- Office hours:
WED
13:00 - 14:00 (SHB 818)
Teaching Assistant:
- Da Sun Handason Tam
tds019 [at] ie [dot] cuhk [dot] edu [dot] hk
- Siyue Xie
xs019 [at] ie [dot] cuhk [dot] edu [dot] hk
Website account:
User: ierg4300
Password: fall2019ierg
Highly Recommended Textbooks
[MMDS] Mining of Massive Datasets (Download version 1.3) by Anand Rajaraman, Jeff Ullman and Jure Leskovec, Cambridge University Press. Latest version can be downloaded from Mining of Massive Datasets.pdf
[JLin] Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer, Morgan and Claypool Publishers, 2010, can be freely downloaded from http://lintool.github.io/MapReduceAlgorithms/
[CBishop] Pattern Recognition and Machine Learning by Christopher M. Bishop, Published by Springer Science and Business, 2007.
[MLE/MAP] Estimating Probabilities: MLE and MAP http://www.cs.cmu.edu/~tom/mlbook/Joint_MLE_MAP.pdf
[HTF] Elements of Statistical Learning 2nd Edition by Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, Published by Springer, 2009. Ebook version can be downloaded from: http://link.springer.com/book/10.1007/978-0-387-84858-7 via a CUHK IP address
[JWHT] An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, Published by Springer, 2013. Ebook version can be downloaded from: http://link.springer.com/book/10.1007/978-1-4614-7138-7 via a CUHK IP address
[PCA] Principal Component Analysis, 2nd Edition, by I.T. Jolliffe, Published by Springer 2002, Ebook version can be download from: http://www.springerlink.com/content/h41v76/?p=e8e028e1c9ba414690c9179ee7c0e388&pi=3 via a CUHK IP address
[ShaliziADAEPV] Cosma Rohilla Shalizi, "Advanced Data Analysis from an Elementary Point of View", Cambridge University Press, 2014. Draft available for download from: http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/
[ShaprioStockman] Shaprio and Stockman, Computer Vision, 2000, Chapter 4.2-4.9, https://courses.cs.washington.edu/courses/cse576/book/ch4.pdf
[Blum] Blum, Avrim, John Hopcroft, and Ravindran Kannan. "Foundations of Data Science." (2017): https://www.cs.cornell.edu/jeh/book.pdf
Tentative Timetable
Week | Lecture Date | Topic | Period | Recommended Readings | Additional References |
---|---|---|---|---|---|
1 | Sept 4 | Course Admin; Era of Big Data Analytics; Computing as a Utility; Data-center Architecture | W2-3 | [Jlin]Ch1 ; [MMDS]Ch1 | [DataCenter] |
2 | Sept 9, 11 | MapReduce | M2-3, W2-3 | [MMDS]Ch2.1-2.4 ; [JLin]Ch2 | - |
3-4 | Sept 16, 18, 23 | MapReduce (cont'd) ; The Big Data Processing stack | M2-3, W2-3, M2-3 | [JLin]Ch3.1-3.4 | [CloudData] |
4-5 | Sept 25, 30 | Frequent Item-Set Mining and Association Rules | W2-3, M2-3 | [MMDS]Ch6.1-6.4 | - |
**Oct 7 Public holiday: Chung Yeung Festival** | |||||
5-7 | Oct 2, 9, 14 | Finding Similar Items and LSH | W2-3, W2-3, M2-3 | [MMDS]Ch3.1-3.5 | [ZG] |
7-8 | Oct 16, 21, 23 | Clustering and GMM | W2-3, M2-3, W2-3 | [MMDS] Ch7.1-7.4 [MMDS] Ch11, [CBishop] Ch.9, [MLE/MAP] | - |
**An in-class Mid-term will be held on Oct 23 (Wed)** | 9 | Oct 28, 30 | Dimension Reduction | M2-3, W2-3 | [MMDS] Ch11 | [PCA], [GuruswamiKannan] |
10 | Nov 4, 6 | Recommendation Systems | M2-3, W2-3 | [SVDPCA], [ANgCS229PCA], [ShaliziADAEPV]Ch17 ; | - |
11 | Nov 11, 13 | Recommendation Systems (cont'd) ; Regression and Gradient Descent | M2-3, W2-3 | [MMDS] Ch9 | [Netflix09]; [KorenTalk]; [ANg] |
12 | Nov 18 | Data Stream Algorithms | M2-3 | [MMDS] Ch4.1-4.5 ; | - |
**Nov 20 (Wed) The lecture will be cancelled as the course instructor will be out of town (tutorial will still go on)** | |||||
13 | Nov 25, 27 | Data Stream Algorithms (cont'd) | M2-3, W2-3 | [ChakDataStream] Ch0,Ch1,Ch4.4,Ch6 ; | - |
Nov 26 23:59 | Homework 3 Deadline | - | - | - | |
Dec 20 23:59 | Homework 4 Deadline, Q&A Assignment Deadline |
- | - | - |
Course Assessment
Your grade will be based on the following components:
NEW course assessment scheme:
- Homework 0: 10%
- Homework 1: 10%
- Homework 2: 10%
- Homework 3: 10%
- Homework 4: 25%
- Mid-term: 10%
- Q&A Assignment: 25%
OLD course assessment scheme:
* Homeworks & Programming assignments (5 sets in total): 50%
* Mid-term: 10% .
* Final Exam: 40% (2-hour final examination)
Student/Faculty Expectations on Teaching and Learning
http://mobitec.ie.cuhk.edu.hk/StaffStudentExpectations.pdf
Academic Honesty
You are expected to do your own work and acknowledge the use of anyone else's words or ideas. You MUST put down in your submitted work the names of people with whom you have had discussions.
Refer to http://www.cuhk.edu.hk/policy/academichonesty for details
When scholastic dishonesty is suspected, the matter will be turned over to the University authority for action.
You MUST include the following signed statement in all of your submitted homework, project assignments and examinations. Submission without a signed statement will not be graded.
I declare that the assignment here submitted is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.