IERG4300/ ESTR4300 Web-Scale Information Analytics / Spring 2023

Announcements

  • A very lively, insightful and interesting book related to what we learned in this course. [Algorithms to live by].

  • Released: [Homework 4]. Due: Tue, April 25, 23:59PM.
  • Released: [Homework 3]. Due: Wed, April 5, 23:59PM.
  • Released: [Homework 2]. Due: Fri, March 10, 23:59PM.
  • Tutorial Notes for tutorial 3 (Feb. 1st), tutorial 4 (Feb. 8th) and tutorial 5 (Feb. 15th) are posted Here.

    You may refer to the contents while doing HW#1.

  • Tips on running MapReduce Jobs within DIC resource limits are posted Here.

    Otherwise you may encounter tasks being killed/ tasks getting stuck/ tasks running extremely slow/ disk full while doing HW#1.

  • You can find the useful URLs and paths for completing HW#1 on IE DIC Here.

  • Course/ Tutorial recordings will not be provided starting from Feb 1.

  • IE DIC is now available for your homework. Detailed information refer to here.

  • Released: [Homework 1]. Due: Tue, February 21, 23:59PM.
  • Released: [Homework 0]. Due: Sat, January 21, 11:59AM.
  • Website account: bigdata, password: spring2023bigdata


Course Description

The course discusses data-intensive analytics, and automated processing of very large amount of structured and unstructured information. We focus on leveraging the MapReduce and other related paradigms to create parallel algorithms that can be scaled up to handle massive data sets such as those collected from the World Wide Web or other Internet systems and applications. We organize the course around a list of large-scale data analytic problems in practice. The required theories and methodologies for tackling each problem will be introduced. As such, the course only expects students to have solid knowledge in probability, statistics, linear algebra and computer programming skills. Topics to be covered include:

  • the MapReduce computational model and its system architecture and realization in practice;
  • Finding Frequent Item-sets and Association Rules ; Finding Similar Items in high-dimensional data;
  • Dimensionality Reduction techniques ; Clustering ; Recommendation systems;
  • Analysis of Massive Graphs and its applications on the World Wide Web;
  • Large-scale supervised machine learning;
  • Processing and mining of Data Streams and their applications on large-scale network/ online-activity monitoring.

Please check Blackboard for important announcements, assignment submissions, grades, etc.

Course Assessment

  • Homework (5 sets in total): 65%
  • Final Exam: 35%

Student/Faculty Expectations on Teaching and Learning

http://mobitec.ie.cuhk.edu.hk/StaffStudentExpectations.pdf

Academic Honesty

You are expected to do your own work and acknowledge the use of anyone else’s words or ideas. You MUST put down in your submitted work the names of people with whom you have had discussions.

Refer to http://www.cuhk.edu.hk/policy/academichonesty for details

When scholastic dishonesty is suspected, the matter will be turned over to the University authority for action.

You MUST include the following signed statement in all of your submitted homework, project assignments and examinations. Submission without a signed statement will not be graded.

I declare that the assignment here submitted is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.

Academic Honesty Slides from Associate Dean of Faculty of Engineering

Previous Offerings


Time and Venue

Lectures:
  • Mon 09:30AM - 11:15AM
    ICS_L1 @ Institute of Chinese Studies
  • Wed 09:30AM - 10:15AM
    SHB_801 @ Ho Sin Hang Engineering Building
  • Tue 12:30PM - 01:30PM (ESTR4300 Only)
    SHB_833 @ Ho Sin Hang Engineering Building
Tutorials:
  • Wed 10:30AM - 11:15AM
    SHB_801 @ Ho Sin Hang Engineering Building

Instructor

Email: wclau [at] ie.cuhk.edu.hk

Office hours: TBD (SHB 818)

Teaching Assistants

Siyue Xie

Email: xiesiyue [at] link.cuhk.edu.hk

Office hours: Thur 9:30 - 10:30AM (SHB 803)

Kaixuan Luo

Email: luokaixuan [at] link.cuhk.edu.hk

Office hours: Mon 3:00 - 4:00PM (SHB 803)

Da Sun Handason Tam

Email: tds019 [at] ie.cuhk.edu.hk

Office hours: Thur 3:30 - 4:30PM (SHB 803)