IERG4330/ESTR4316 Programming Big Data Systems / Spring 2024

Announcements

  • The final exam will be held on May 8 (Wed), 2024, from 7:00pm to 10:00pm, at YIA LT5 (which is next door to our usual lecture room on Wed).

  • Released: [Assignment #4 - GraphFrames, GraphX, HBase]. Due: Thu, May 2, 23:59PM.
  • Released: [Assignment #3 - SparkSQL, Kafka, and Streaming]. Due: Fri, April 5, 23:59PM.
  • IE DIC Hadoop Cluster is now available for your homework. Detailed information refer to Here.

  • Due to time constraints this semester, we decided to reduce the number of homework tasks from the original 5 to 4 to reduce the burden. It is hoped that this will enable everyone to focus more on learning content rather than rushing to complete homework. Notice: this also means that the weight of each honework will be increased from the original 13% to 16.25%.

  • Released: [Assignment #2 - Pig, Hive and SparkRDD]. Due: Mon, March 11, 23:59PM.
  • Released: [Assignment #1 - Hadoop over Kubernetes]. Due: Sun, February 18, 23:59PM.
  • Additional 1-hour lecture for ESTR4316 will be held every Monday at SHB833 from 17:30 to 18:30.

  • Tutorials will be held every Monday from 18:30 to 19:15 at SHB833, starting from Jan 22

  • Please poll for your desired tutorial time before 2024-01-12 in link following: Google Form

  • Website account: bigdata, password: spring2024bigdata


Course Description

This course aims to provide students an understanding in the operating principles and hands-on experience with mainstream Big Data Computing systems. Open-source platforms for Big Data processing and analytics would be discussed. Topics to be covered include:

  • Programming models and design patterns for mainstream Big Data computational frameworks ;
  • System Architecture and Resource Management for Data-center-scale Computing ;
  • System Architecture and Programming Interface of Distributed Big Data stores ;
  • High-level Big Data Query languages and their processing systems ;
  • Operational and Programming tools for different stages of the Big Data processing pipeline including data collection/ ingestion, serialization and migration, workflow coordination.

Prerequisite: This course contains substantial hands-on components which require solid background in programming and hands-on operating systems experience. If you have never used a command-line interface to install/configure/manage an operating system, e.g. a linux-based one, you will need to pick-up the skills yourself and IT CAN BE VERY TIME-CONSUMING for you to complete the homeworks. (Students without the aforementioned required background may take several 10’s of hours to finish EACH homework assignment).

Please check Blackboard for important announcements, assignment submissions, grades, etc.

Course Assessment

The grade is based on the following components (tentative):

  • Homework & Programming Assignments (5 sets 4 sets): 65%
  • Final Exam: 35%

Student/Faculty Expectations on Teaching and Learning

http://mobitec.ie.cuhk.edu.hk/StaffStudentExpectations.pdf

Academic Honesty

You are expected to do your own work and acknowledge the use of anyone else’s words or ideas. You MUST put down in your submitted work the names of people with whom you have had discussions.

Refer to http://www.cuhk.edu.hk/policy/academichonesty for details

When scholastic dishonesty is suspected, the matter will be turned over to the University authority for action.

You MUST include the following signed statement in all of your submitted homework, project assignments and examinations. Submission without a signed statement will not be graded.

I declare that the assignment here submitted is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.

Academic Honesty Slides from Associate Dean of Faculty of Engineering

Large Language Models (LLMs) Policy

You are NOT allowed to use any LLMs (e.g., ChatGPT, Claude etc.) in this course. Anyone who uses LLMs for completing the homework will be treated as cheating.

Previous Offerings


Lecture Time and Venue

First 4 weeks (02/01):

  • Thu 6:30PM - 9:30PM (Ho Sin Hang Engineering Building, SHB 801)

From (02/07) onwards

  • Wed 7:00PM - 10:00PM (Yasumoto International Academic Park, YIA LT4)

Tutorial Time and Venue

  • Mon 6:30PM - 7:15PM (Ho Sin Hang Engineering Building, SHB 833)

Instructor

Email: wclau [at] ie.cuhk.edu.hk

Office hours: By Appointment (SHB 818)

Teaching Assistants

Kaicheng Xiao

Email: xk023 [at] ie.cuhk.edu.hk

Office hours: By Appointment (SHB 828)

Qun Yang

Email: yangqun [at] link.cuhk.edu.hk

Office hours: By Appointment (SHB 729)