IERG4300/ ESTR4300 Web-Scale Information Analytics / Fall 2024
Announcements
- Released: [Homework 4]. Due: Mon, December 9, 11:59 PM.
-
[Final Exam] The final exam will be held on Dec 18 (Wed), 7:00pm to 10:00pm at YIA LT5. Please mark your schedule.
- Released: [Homework 3]. Due: Mon, November 25, 11:59 PM.
- Released: [Homework 2]. Due: Sat, November 9, 11:59 PM.
-
Tips on running MapReduce Jobs within DIC resource limits are posted Here. Otherwise you may encounter tasks being killed/ getting stuck/ running extremely slow or disk full while doing HW#1.
-
You can find the useful URLs and paths for completing HW#1 on IE DIC Here.
-
IE DIC is now available for your homework. Detailed information refer to Here.
- Released: [Homework 1]. Due: Sat, October 19, 11:59 PM.
-
The time and venue of weekly tutorial are released. Please check the details in the course homepage.
-
The time and venue of ESTR4300 weekly 1-hr extra lecture are released. Please check the details in the course homepage.
-
[Special Arrangement for First Tutorial] Our first tutorial will take place on Thursday, September 12, from 4:30 PM to 5:30 PM in SHB801 (Ho Sin-Hang Engineering Building, Room 801). This session is a one-time arrangement, the regular schedule will be announced next week based on your feedback.
-
Please select your available time slots for tutorial time through this link by Sunday, Sept. 8. Each Tutorial will last for 45 minutes.
- Released: [Homework 0]. Due: Sat, September 14, 11:59 PM.
-
Please be noted that Late-add students MUST attend the lectures/ tutorials from the very beginning of the semester.
-
The due date for HW0 is strict for all. Late-add student will NOT be granted extra time extension for submission.
-
Website account:
bigdata
, password:Fall2024bigdata
Course Description
The course discusses data-intensive analytics, and automated processing of very large amount of structured and unstructured information. We focus on leveraging the MapReduce and other related paradigms to create parallel algorithms that can be scaled up to handle massive data sets such as those collected from the World Wide Web or other Internet systems and applications. We organize the course around a list of large-scale data analytic problems in practice. The required theories and methodologies for tackling each problem will be introduced. As such, the course only expects students to have solid knowledge in probability, statistics, linear algebra and computer programming skills. Topics to be covered include:
- The MapReduce computational model and its system architecture and realization in practice;
- Finding Frequent Item-sets and Association Rules ; Finding Similar Items in high-dimensional data;
- Dimensionality Reduction techniques ; Clustering ; Recommendation systems;
- Analysis of Massive Graphs and its applications on the World Wide Web;
- Large-scale supervised machine learning;
- Processing and mining of Data Streams and their applications on large-scale network/ online-activity monitoring.
Please check Blackboard for important announcements, assignment submissions, grades, etc.
Course Assessment
- Homework/ Programming Assignments (5 sets in total): 65%
- Final Exam: 35%
Student/Faculty Expectations on Teaching and Learning
http://mobitec.ie.cuhk.edu.hk/StaffStudentExpectations.pdf
Academic Honesty
You are expected to do your own work and acknowledge the use of anyone else’s words or ideas. You MUST put down in your submitted work the names of people with whom you have had discussions.
Refer to http://www.cuhk.edu.hk/policy/academichonesty for details
When scholastic dishonesty is suspected, the matter will be turned over to the University authority for action.
You MUST include the following signed statement in all of your submitted homework, project assignments and examinations. Submission without a signed statement will not be graded.
I declare that the assignment here submitted is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Academic Honesty Slides from Associate Dean of Faculty of Engineering
Generative AI Policy
You are NOT allowed to use any Generative AI (e.g., ChatGPT, Claude etc.) in this course. Anyone who uses LLMs for completing the homework will be treated as cheating.
Previous Offerings
Time and Venue
- Wed 07:00PM - 10:00PM
LT5 @ Yasumoto Int'l Acad Park - Tue 05:00PM - 06:00PM (ESTR4300. From Sept. 10 to Dec. 3)
SHB728 @ Ho Sin-Hang Engg Bldg - Wed 07:00PM - 10:00PM, December 4th (Make-up lecture)
TBD
- Thur 02:30PM - 03:15PM
ERB713 @ William M.W. Mong Engg Bldg
Teaching Assistants
Weiheng Tang
Email: tangweiheng [at] link.cuhk.edu.hk
Office hours: Tue 3:00PM - 04:00PM (SHB 803)
Muyi Wang
Email: muyi.wang [at] link.cuhk.edu.hk
Office hours: Tue 04:30PM - 05:30 (SHB 803)
Jiazhi Yang
Email: jzyang [at] link.cuhk.edu.hk
Office hours: TBD (SHB 702A)