~ Announcements ~

Feed


Major announcements will be made on both this page and your registered email on Blackboard. Please make sure you can receive emails from Blackboard.

IMPORTANT: Revised Course Assessment Scheme for IERG4300

Due to the early cancellation of the classes as well as all centralized examinations by the University,

we have revised the overall assessment scheme of IERG4300 as follows:

  1. The weighting of the previous 4 sets of homeworks, i.e. HW#0 to HW#3, will remain unchanged, namely,
  • 4 of them together will contribute to 40% of the overall grade, i.e. 10% from each of the previous homework.
  1. The weighting of the midterm will remain unchanged, i.e. contribute to 10% of the overall grade.

  2. We will release one more set of programming-oriented homework (HW#4) by next Wed (Nov 27), to be due on Dec 20, which will contribute to 25% of the overall grade.

  3. Add another "Q&A" assignment, also to be due on Dec 20, which carries 25% of the overall grade. In particular, this assignment is to ask each student to design and submit a set of questions AND model-answers/suggested solutions for a future 2-hr-long final examination of IERG4300. To avoid asking trivial questions which merely test the memorization ability of the exam takers, you should assume the exam to be an open-book/open-note exam or something similar to our midterm which allows a student to bring pages of cheat-sheet into the examination venue.

Your submission will be graded according to its:

a. ORIGINALITY and thoughtfulness of the questions, i.e., non-trivial and be able to highlight and test/promote the most important concepts/ ideas/ techniques which have been taught in our class so far.

b. Correctness of the suggested solutions/ model answers.

c. Comprehensive nature (or the lack of), i.e. your set of questions together, should cover multiple (the more, the better) key concepts/ ideas/ techniques taught in our class so far. In other words, setting a single MapReduce question to take up the entire 2-hr exam period won't be a good choice.

d. Suitability of the overall set of questions for a time-limited 2hr exam. In other words, it should be reasonable for a student to complete your proposed set of questions within a 2hr limit.

e. Diversity of the questions in terms of their difficulties to differentiate students with different level of competence on the subject being tested.

Since the originality and thoughtfulness of the proposed questions are the key considerations, you MUST NOT copy or merely adapt/re-phrase questions found elsewhere (i.e. from past papers of IERG4300 or similar courses or textbooks) and submit as your own work. Instead, study our course materials and then ask yourself which are the most important concepts you have learned from this course and then try to design a related question for each (some) of those concepts to promote/ strengthen a student's understanding of such concept. i.e. viewing your questions as training exercises for the exam taker).

Lastly, I understand some of you may have difficulties to catch the already-extended deadline of HW#3 (which was Nov. 18) due to special circumstances. Towards this end, we have decided to further postpone the deadline of HW#3 one last time to Nov. 26 11:59pm. For those who have submitted HW#3 already, you are free to re-submit your improved versions again and again until Nov. 26.

It is important for you to complete HW#3. This is particularly true as part of the to-be-released HW#4 will be built upon your codes in HW#3.

Please stay tune for the release of HW#4. In the meantime, you can start working on the "Q&A" assignment now.

Postponing the due date of HW#3 to Nov 18 (Mon) 11:59pm

Due to the special situation on campus, let's postpone the due date of HW#3 of IERG4300 to next Monday (Nov 18) 11:59pm.

Homework 3 is posted. It is due on November 13, 2019

Homework 3 is posted to the course web page. It is due on November 13, 2019.

Midterm of IERG4300 will be held on Oct 23 (Wed) from 9:30am to 10:30am

The midterm will be held from 9:30am to 10:30am using our normal lecture time and venue. Scope of the Mid-term will cover everything from the beginning of the semester to (and including) the topic of "Finding Similar Items and Locality Sensitive Hashing". Each student can bring an A4-sized sheet of paper with notes on BOTH sides into the midterm. Beside short questions for different topics, the mid-term will definitely contain a question of writing pseudo codes for Map Reduce to formulate/ solve some computing tasks. On a separate note, we will postpone the deadline of Homework#2 to Oct 28 (Mon) 11:59pm to avoid clashing with the midterm. However, since doing HW#2 will help your preparation for the midterm, I strongly encourage you to seriously attempt to solve the problems in HW2 (even if not completely) before the midterm on Oct 23. If you have further questions, please do not hesitate to contact us.

Homework 2 is posted. It is due on October 28, 2019

Homework 2 is posted to the course web page. It is due on October 28, 2019.

Homework 1 is posted. It is due on October 2, 2019 [posted on Sept 20]

Homework 1 is posted to the course web page. It is due on October 2, 2019. Please start doing this homework as early as possible. According to past experience, it requires more than 10 hours to finish. Please feel free to ask TAs questions.

IE DIC Cluster [posted on Sept 20]

We have set up the IE DIC (Data-Intensive Cluster) account for you. For students who cannot setup the single node Hadoop cluster in HW#0, please contact the TAs. You can either choose to set up a single node hadoop cluster with TAs’ help or you can use the IE DIC Cluster account to run MapReduce program. Hadoop has already been installed in IE DIC cluster. Please note that the Hadoop in IE DIC cluster is based on Hortonworks, and it is a little different from the standard Hadoop system. For more information, please refer to https://hortonworks.com/. Hadoop is well installed in the IE DIC Cluster and you can login the cluster to submit jobs via the following command:

ssh student_id@dic14.ie.cuhk.edu.hk

where student_id is your student ID number. You can find the password on the grading board of the elearning system. Note that this machine can only be accessed within IE network, you can follow the link http://mobitec.ie.cuhk.edu.hk/ierg4300Fall2019/homework/vpn_setup.html to setup IE VPN using your IE account. For those who are from other departments and would like to use the DIC cluster, please contact the TAs to get a temporary IE account.

In the next tutorial, we will focus on the method to use IE DIC cluster. We have already the overview of the DIC cluster below. You can bring your laptop and follow the instruction step by step in the next tutorial.

Cluster Overview:

  • 28 nodes,
  • Memory: 47 GB 2 + 30 GB*4 + 63 GB*3 + 16 GB19 = 707 GB
  • Virtual CPU Cores: 24*2 + 16*7 + 8*19 = 312 cores
  • Disk: 44.2TB
  • Resource management platform: YARN
  • Installed applications: MapReduce, Hive, Pig, Spark

Cluster login:

Find the logs of applications:

  • Users can find the logs of all applications in the cluster via the web UI: http://dicvm1.ie.cuhk.edu.hk:19888/
  • Users can find the details of a particular application via the web UI: http://dicvm1.ie.cuhk.edu.hk:19888/jobhistory/job/job_147339 6442288_0004​ ​ where ​job_​1473396442288_0004 is the ID of the job you created.
  • The log information of an application includes
    • How many containers are allocated
    • The scheduling time and the completion time of each container
    • The stderr file which can help you to find bugs of your code.

Venue and time for additional weekly tutorial determined [posted on Sept 11]

The other tutorial section is scheduled on Thursday, 16:30 - 17:15, every week in ERB 406.

Venue and time for the ESTR4300 determined [posted on Sept 9]

The ESTR4300 is scheduled on 14:30-15:15, Wednesday every week at ERB 407.

Tutorial time and venue determined [posted on Sept 9]

The tutorial is finally scheduled on 11:30-12:15 a.m., Wednesday every week at KKB 101. Please be noted and schedule your time accordingly

For those who want to attend weekly tutorial but cannot attend Wed's session due to time-clash with other registered course, please send your time-table (screen shot from CUSIS) to tds019@ie.cuhk.edu.hk by 23:59 tomorrow (10th, Sept). We can then pick a time-slot to cover everyone else.

New Doodle poll for tutorial arrangement [posted on Sept 7]

Since we can not reach consensus after doing the Doodle poll, we have created a new poll which includes more options (e.g. before 9:30 a.m. and after 6:30 p.m.). Please indicate your availability in ALL time slot. The deadline of the doodle poll is 11:30 a.m., 9th, Sept (Monday).

If you have completed the previous poll, please submit the poll again. Sorry for the inconvenience caused.

https://doodle.com/poll/v4t4idx3gh7xdqcc#calendar

Special Tutorial for Homework#0 [posted on Sept 6]

The other MSC course (IEMS5730) will hold a tutorial on Sept 6, which matches our Homework#0.

For more details, please follow the listed information:

2019R1 Big Data Systems and Information Processing (IEMS5730): [IEMS5730] First Tutorial

*Date: September 6, 2019 (Firday)

*Time: 5:30 pm - 6:30 pm

*Venue: SHB801

In this tutorial, the TAs will help you to complete Homework#0 (e.g., providing hints if you have met abnormal problems). To fully utilize the time, you are suggested to start Homework0 AS SOON AS POSSIBLE. In particular, would you please set up your account in Google Cloud Platform to get $300 free credit (the suggested platform) or apply for AWS Education program for students. Note that such application can take several hours, therefore, you should start it at your earliest convenient time. Once the account is setup, you can follow Homework#0 to launch instances and setup the cluster accordingly.

Important: doodle poll for our tutorial arrangement [posted on Sept 4]

We are scheduling our tutorials. Please follow the doodle link below and indicate your availability in different time slots and enter your SID as your Name. The deadline of the anonymous doodle poll is 23:00, 6th, Sep (Friday). https://doodle.com/poll/pdanue8pzcwsncd5