IEMS5730 Big Data Systems and Information Processing / Spring 2022
Announcements
- Released: [Assignment #5 - GraphFrames, GraphX, HBase and SparkML]. Due: Tue, May 10, 23:59PM.
- Released: [Project]. Due: Mon, May 23, 12:00PM.
- Released: [Q&A Assignment]. Due: Tue, May 17, 12:00PM.
- Released: [Assignment #4 - Kafka]. Due: Tue, April 19, 23:59PM.
- Released: [Assignment #3 - Spark (updated on 14 Mar)]. Due: Mon, March 28, 23:59PM.
-
The list of suggested topics for Project is available. Topic registration form will be available on Blackboard (under Course Contents) on Mar 16 (Wed) 20:00 on a first-come-first-served basis.
- Released: [Assignment #2 - Pig and Hive]. Due: Tue, March 8, 23:59PM.
-
The movielens_large dataset in [Assignment #1 - Similar Users Detection via MapReduce] is updated.
- Released: [Assignment #1 - Similar Users Detection via MapReduce (updated on 30 Jan)]. Due: Wed, February 16, 23:59PM.
- Released: [Assignment #1 - Hadoop over Kubernetes]. Due: Sun, February 13, 23:59PM.
-
The Zoom meeting ID for lectures and tutorials is 96033060088 (Zoom meeting Link). Password can be found in Blackboard annoucement.
-
From January 24th, the venue of the tutorial session will be changed to Ho Sin-Hang Engineering Building 833
-
We are now scheduling the time of course tutorials. Please login to Blackboard, check the announcement, and use the dedicated google form link to submit your unavailable time slot (your class schedule in this semester). The deadline for you to submit is 11:59am (noon), January 16 (Sunday).
- Released: [Assignment #0 - Hadoop Cluster Setup]. Due: Mon, January 24, 11:59AM.
-
Website account:
bigdata
, password:spring2022bigdata
Course Description
This course aims to provide students an understanding in the operating principles and hands-on experience with mainstream Big Data Computing systems. Open-source platforms for Big Data processing and analytics would be discussed. Topics to be covered include:
- Programming models and design patterns for mainstream Big Data computational frameworks ;
- System Architecture and Resource Management for Data-center-scale Computing ;
- System Architecture and Programming Interface of Distributed Big Data stores ;
- High-level Big Data Query languages and their processing systems ;
- Operational and Programming tools for different stages of the Big Data processing pipeline including data collection/ ingestion, serialization and migration, workflow coordination.
Prerequisite: This course contains substantial hands-on components which require solid background in programming and hands-on operating systems experience. If you have never used a command-line interface to install/configure/manage an operating system, e.g. a linux-based one, you will need to pick-up the skills yourself and IT CAN BE VERY TIME-CONSUMING for you to complete the homeworks. (Students without the aforementioned required background may take several 10’s of hours to finish EACH homework assignment).
Please check Blackboard for important announcements, assignment submissions, grades, etc.
Course Assessment
The grade is based on the following components (tentative):
- Homework & Programming Assignments (5 sets): 60%
- Project with Presentation: 10%
- Final Exam: 30%
If face to face exam cannot be held, the course assessment scheme will be revised as follows:
- Homework & Programming Assignments (5 sets): 65%
- Project with Presentation: 10%
- Q&A Assignment: 25%
Student/Faculty Expectations on Teaching and Learning
http://mobitec.ie.cuhk.edu.hk/StaffStudentExpectations.pdf
Academic Honesty
You are expected to do your own work and acknowledge the use of anyone else’s words or ideas. You MUST put down in your submitted work the names of people with whom you have had discussions.
Refer to http://www.cuhk.edu.hk/policy/academichonesty for details
When scholastic dishonesty is suspected, the matter will be turned over to the University authority for action.
You MUST include the following signed statement in all of your submitted homework, project assignments and examinations. Submission without a signed statement will not be graded.
I declare that the assignment here submitted is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Academic Honesty Slides from Associate Dean of Faculty of Engineering
Previous Offerings
Lecture Time and Venue
First 4 weeks (01/12, 01/19, 01/26, 02/09):
- Wed 7:00PM - 10:00PM (Lee Shau Kee Building LT2)
From (02/16) onwards
- Wed 7:00PM - 9:00PM (Lee Shau Kee Building LT2)
- Fri 7:00PM - 9:00PM (Science Centre L3)
Tutorial Time and Venue
- Mon 6:30PM - 7:15PM (Ho Sin-Hang Engineering Building 833)
Instructor
Email: wclau [at] ie.cuhk.edu.hk
Office hours: Thu 4:30 - 5:30PM (SHB 818)
Teaching Assistants
Da Sun Handason Tam
Email: tds019 [at] ie.cuhk.edu.hk
Office hours: Tue 2:00 - 3:00PM (SHB 803)