~ Tutorial 1 ~

Feed


A Tour in the Cloud & Basic Linux Operations

A first journey in the cloud. Amazon EC2 (Google Compute Engine, Windows Azure): launch VM, try both windows and Linux, port mapping.

Linux 101: navigate around, upload/download files, run programs. In this tutorial, student warms up with the working environment in the semester.

Tutorial Logistics

  • Organization of tutorial session.
  • Structure of tutorial notes.
  • Notations: optional, NOTE, TIP, EXERCISE

NOTE: You are strongly encouraged to bring you own laptops since we will use a normal classroom (NOT IE lab) for the further tutorials. Choose the desktop environment that suits you best.

Register Amazon Web Services (AWS) - Cloud Computing Services

Note: 1. We have posted the instruction to apply for an AWS student account in the CUHK Blackboard. Feel free to follow the instructions.

  1. Through out the course, only instructions of major steps are provided. You need to work out the details on your own. Ask Internet first, then your peers, and TA last.

  2. You have two options: 1. You can bind your credit card with an AWS account, and then if you apply for a student account, you will get the promotional code of $100. 2. If you don't want to bind your credit card, and you will get a student account without the promotional credit code, which has $75 credit. You can choose any option, and both accounts can launch t2.large instance in AWS and help you finish the homeworks. If you bind your credit card with the account you created. Upon reaching your credit amount, your billing will automatically switch to the credit card on file. In case of burn up your credit card, you can set up a billing alert as instructed in the AWS Usage.

  3. Given the limited resources, we will only use it for proof-of-concept (POC) experiments. We will use our private DIC cluster for large jobs.

A Tour in the Cloud

EXERCISE: Find other introductory materials. Get a feel of what "IaaS" provides in the cloud environment. Try other components in AWS subject to our academic pass subscription limit, e.g. website, mobile App, etc.

Windows VM (optional for Azure)

Try one if you have interest. Use remote desktop connection to access it. e.g. mstsc in windows, rdesktop package in Linux.

TIP: In the dashboard, you can find a CONNECT button to download an .rdp file. This file can be usd by remote desktop client on windows directly.

TIP: On MAC, you may need to download the latest Microsoft Remote Desktop from iTunes.

Launch a Linux VM

Launch a Linux Virtual Machine (VM). Ubuntu 14.04 is prefered. The "quick create" default options will do. Fill in required information, e.g.:

Detailed Instructions including other platforms (e.g., Azure, Google Cloud Computing Engine and OpenStack)

Verify it's working in the dashboard. Navigate around to explore configurations and statistics.

Connect to Linux VM

  • MAC or Linux: Launch terminal and issue a command like ssh ec2-user@public-ip-address-of-instance
  • Windows: Find an SSH client, e.g. PuTTY and Some Others.

NOTE: For the GUI clients, you need to fill in some connection information, e.g. port is 22, remote address is public-ip-address, user is ec2-user. Below shows one example. Click Connection > SSH > Auth in the left-hand navigation pane and configure the private key to use by clicking Browse under Private key file for authentication.

A sample output in the terminal:

$ssh ec2-user@54.85.206.188
Warning: Permanently added the RSA host key for IP address '54.85.206.188' to the list of known hosts.
ec2-user@54.85.206.188's password: 
Welcome to Ubuntu 12.04.3 LTS (GNU/Linux 3.2.0-57-virtual x86_64)

 * Documentation:  https://help.ubuntu.com/

  System information as of Mon Jan 13 08:31:10 UTC 2014

  System load:  0.54              Processes:           94
  Usage of /:   3.5% of 28.83GB   Users logged in:     0
  Memory usage: 1%                IP address for eth0: 10.62.144.17
  Swap usage:   0%

  Graph this data and manage this system at https://landscape.canonical.com/

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

38 packages can be updated.
20 updates are security updates.

Last login: Mon Jan 13 04:56:45 2014 from ip-123-255-102-243.wlan.cuhk.edu.hk
ec2-user@ip-172-30-0-64:~$

Now you are in a Linux machine (Precisely you have a shell called bash ). This is the default remote environment you will use throughout the course.

NOTE: The above texts from the terminal is the "screenshot" in Linux world. You will see a lot of this in this class. Just get used to it.

Passwordless SSH

Install SSH:

SSH (“Secure SHell”) is a protocol for securely accessing one machine from another. Hadoop uses SSH for accessing another slaves nodes to start and manage all HDFS and MapReduce daemons. SSH server should has been already installed. Otherwise, install SSH server by:

ubuntu@master:~$ sudo apt-get install openssh-server

We do not want to run Hadoop as the root. So we will create a new user/group for hadoop related jobs.

ubuntu@master:~$ sudo addgroup hadoop
Adding group `hadoop' (GID 1001) ...
Done.
ubuntu@master:~$ sudo adduser --ingroup hadoop hduser

Now we login as hduser and the rest of the tutorial will act as hduser, if not specified explicitly.

ubuntu@master:~$ su hduser

#Go to home directory of hduser
hduser@master:/home/ubuntu$ cd ~

Generate SSH key pairs for hduser.

#Use empty passphrase
hduser@master:~$ ssh-keygen -t rsa -f id_rsa
hduser@master:~$ mkdir .ssh
hduser@master:~$ mv id_rsa* .ssh/
hduser@master:~$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys
hduser@master:~$ chmod 700 .ssh/
hduser@master:~$ chmod 600 .ssh/authorized_keys 

#Verify by ssh to localhost
hduser@master:~$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is c1:7b:f2:19:f0:fb:5d:a1:ee:a6:18:6b:df:6a:85:f5.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-74-generic x86_64)
...

Security group

  1. Open the Amazon EC2 console

  2. Choose one instace, and then choose the Description option, you can find the Security groups:

  3. Click the launch-wizard, and then in the security group page, find the Inbound option.

  4. Click Edit, set the security group to the following configuration:

Modifying the Volume

You can set the volume when you launch an instance. The following is how to modify the volume when the intance exists.

  1. Open the Amazon EC2 console
  2. Choose Volumes, select the volume to modify, and then choose Actions, Modify Volume.
  3. The Modify Volume window displays the volume ID and the volume's current configuration, including type, size, and IOPS. For the course purpose, you need to change the size, like 30 GiB.
  4. Connect to your instances via SSH, expand the Linux file system:
# First use command lsblk to find the device name (The name displays in your machine may be different from the tutorial's)
ubuntu@master:~$ lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0  30G  0 disk
└─xvda1 202:1    0   8G  0 part /
# Expand the modified partition using growpart (and note the unusual syntax of separating the device name from the partition number)
ubuntu@master:~$ sudo growpart  /dev/xvda 1
CHANGED: partition=1 start=16065 old: size=16755795 end=16771860 new: size=62894475,end=62910540
# A look at the lsblk output confirms that the partition /dev/xvda1 now fills the available space on the volume /dev/xvda 
ubuntu@ip-172-31-23-27:~$ lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0  30G  0 disk
└─xvda1 202:1    0  30G  0 part /

Clone Instances from Image

The above tutorial of password less ssh, security group, and expanding the volume is for one instance. To save your time, you can create an image of this instance, and launch new instances from the image. For detailed instructions.

Linux 101

Main reference: http://linuxcommand.org/learning_the_shell.php

Concepts to be delivered during tutorial:

Navigation

ec2-user@ip-172-30-0-64:~$ pwd
/home/ec2-user
ec2-user@ip-172-30-0-64:~$ ls
ec2-user@ip-172-30-0-64:~$ cd ..
ec2-user@ip-172-30-0-64:/home$ ls
ec2-user  ubuntu
ec2-user@ip-172-30-0-64:/home$ cd ..
ec2-user@ip-172-30-0-64:/$ ls
bin   dev  home        lib    lost+found  mnt  proc  run   selinux  sys  usr  vmlinuz
boot  etc  initrd.img  lib64  media       opt  root  sbin  srv      tmp  var
ec2-user@ip-172-30-0-64:/$ pwd
/

NOTE: $ is the command prompt, after which you can type a command. After entering a command, there will be some output. Rule of thumb: read output carefully, esp. during more complex operations later. Linux command outputs are usually self-documenting.

Seeking for help

ec2-user@ip-172-30-0-64:/$ ls --help
Usage: ls [OPTION]... [FILE]...
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.

Mandatory arguments to long options are mandatory for short options too.
  -a, --all                  do not ignore entries starting with .
...

NOTE: ... in the last line denotes omitted console outputs. Find the complete version by enter the corresponding command.

ec2-user@ip-172-30-0-64:/$ man ls

LS(1)                                               User Commands                                               LS(1)

NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List  information  about the FILEs (the current directory by default).  Sort entries alphabetically if none of
       -cftuvSUX nor --sort is specified.

...

       --block-size=SIZE
              scale sizes by SIZE before printing them.  E.g., `--block-size=M' prints sizes in  units  of  1,048,576
              bytes.  See SIZE format below.
 Manual page ls(1) line 1 (press h for help or q to quit)

TIP: Use arrow keys or j/k to move around the manual page. Use / to search for a keyword. Press q to end.

What is the man command? Try man man

Basic file/dir operations

Make directories:

ec2-user@ip-172-30-0-64:~$ mkdir mydir
ec2-user@ip-172-30-0-64:~$ ls
mydir
ec2-user@ip-172-30-0-64:~$ cd mydir/
ec2-user@ip-172-30-0-64:~/mydir$ pwd
/home/ec2-user/mydir
ec2-user@ip-172-30-0-64:~/mydir$ cd ..
ec2-user@ip-172-30-0-64:~$ ls
mydir
ec2-user@ip-172-30-0-64:~$ rmdir mydir
ec2-user@ip-172-30-0-64:~$ ls
ec2-user@ip-172-30-0-64:~$ 

Ceate and view text files:

ec2-user@ip-172-30-0-64:~$ echo "this is my first file" > myfile
ec2-user@ip-172-30-0-64:~$ ls
myfile
ec2-user@ip-172-30-0-64:~$ cat myfile 
this is my first file

echo prints the string to STDOUT. > redirects STDOUT to a file. More on IO redirection.

What is cat then? Try man or --help.

Move/Copy/Remove file:

ec2-user@ip-172-30-0-64:~$ ls
myfile
ec2-user@ip-172-30-0-64:~$ cat myfile 
this is my first file
ec2-user@ip-172-30-0-64:~$ cp myfile myfile2
ec2-user@ip-172-30-0-64:~$ ls
myfile  myfile2
ec2-user@ip-172-30-0-64:~$ cat myfile2
this is my first file
ec2-user@ip-172-30-0-64:~$ mv myfile myfile.moved
ec2-user@ip-172-30-0-64:~$ ls
myfile2  myfile.moved
ec2-user@ip-172-30-0-64:~$ rm myfile2
ec2-user@ip-172-30-0-64:~$ ls
myfile.moved

TIP: Do more experiments like this. Use ls (probably with options like -a, -l) to inspect a dir. Use cat or less/more to inspect the content of a text file.

About filename:

  • Basically a flat string. No concept of "extension name". Though, people may have naming conventions sometime.
  • Files start with . is "hidden". Use ls -a to see them.

File transfer

Major methods:

  • scp/sftp under Linux/Mac
  • An open source and cross-platform SFTP client: FileZilla
  • An SFTP client under windows: WinSCP

Pick the one that suits you most.

Tasks:

  • Create a text file in your desktop. Upload it to the server. Verify it is same as the file you created locally.
  • Create a text file in your server. Download it to your desktop. Verify it is same as the file you created remotely.

File download from the Internet

ec2-user@ip-172-30-0-64:~$ mkdir try-wget
ec2-user@ip-172-30-0-64:~$ cd try-wget/
ec2-user@ip-172-30-0-64:~/try-wget$ wget 'https://github.com/hupili/agile-ir/raw/master/data/Shakespeare.tar.gz'
--2014-01-14 03:02:09--  https://github.com/hupili/agile-ir/raw/master/data/Shakespeare.tar.gz
Resolving github.com (github.com)... 192.30.252.131

...

ec2-user@ip-172-30-0-64:~/try-wget$ ls
Shakespeare.tar.gz

Now you have downloaded Shakespeare's works, all in one compressed archive Shakespeare.tar.gz. Following is a shortcut to uncompress it:

ec2-user@ip-172-30-0-64:~/try-wget$ tar -xzvf Shakespeare.tar.gz 
data/
data/sonnet-59.txt
data/sonnet-139.txt
data/sonnet-88.txt
data/sonnet-123.txt
data/sonnet-137.txt
data/play-twogents.txt

...

data/sonnet-134.txt
data/sonnet-93.txt
data/sonnet-24.txt
data/sonnet-3.txt
data/play-juliuscaesar.txt

What's -xzvf? Try man or --help.

NOTE: Some commands have shorthand notation for multiple options. In the above example, tar -xzvf YOUR_FILE is equivalent of tar -x -z -v -f YOUR_FILE. Try the latter one yourself.

EXERCISE: Navigate the data dir and operate on those files, e.g. cp, mv.

EXERCISE: Get familiar with tar, zip, gzip, bzip2. You are very likely to get others' data in those formats.

EXERCISE: Get familiar with wget options. A simple crawler can be obtained by wget -r START_URL.

EXERCISE: Try to use curl to download the same file. Most Linux distribution has wget and/or curl by default.

Suppose you have finished processing data. Cleanup as follows:

ec2-user@ip-172-30-0-64:~/try-wget$ ls
data  Shakespeare.tar.gz
ec2-user@ip-172-30-0-64:~/try-wget$ ls data/
play-12night.txt         play-titus.txt              sonnet-122.txt  sonnet-152.txt  sonnet-42.txt  sonnet-72.txt

...

play-tempest.txt         sonnet-120.txt              sonnet-150.txt  sonnet-40.txt   sonnet-70.txt
play-timonathens.txt     sonnet-121.txt              sonnet-151.txt  sonnet-41.txt   sonnet-71.txt
ec2-user@ip-172-30-0-64:~/try-wget$ rm -rf data/
ec2-user@ip-172-30-0-64:~/try-wget$ ls
Shakespeare.tar.gz

rm -rf is a powerful command. Use with great care.

Execute an executable file

Write your first shell script

ec2-user@ip-172-30-0-64:~$ cat > hello.sh
echo "hello world. My first shell script!"
ec2-user@ip-172-30-0-64:~$ ls
hello.sh
ec2-user@ip-172-30-0-64:~$ cat hello.sh 
echo "hello world. My first shell script!"

cat > reads STDIN and redirect all the content to hello.sh. The second line echo "hello world. My first shell script!" is typed by you. After that press ctrl+d to end typing.

EXERCISE: Try this way to create more files. This is the simplest way to write small text files without using a text-based editor.

Make it executable:

ec2-user@ip-172-30-0-64:~$ ls -l hello.sh 
-rw-rw-r-- 1 ec2-user ec2-user 43 Jan 14 07:26 hello.sh
ec2-user@ip-172-30-0-64:~$ chmod a+x hello.sh 
ec2-user@ip-172-30-0-64:~$ ls -l hello.sh 
-rwxrwxr-x 1 ec2-user ec2-user 43 Jan 14 07:26 hello.sh

The x character indicates that the file is executable. Read more.

Execute it:

ec2-user@ip-172-30-0-64:~$ ./hello.sh 
hello world. My first shell script!
ec2-user@ip-172-30-0-64:~$ /home/ec2-user/hello.sh 
hello world. My first shell script!

NOTE: One often ignored syntax: If the executable is under current working directory, prefix it with ./. Or else, the system will try to locate that command in PATH.

About shell commands (optional)

The commands you use, e.g. ls, cd, mkdir, are just some pre-installed executables in the system. You can find their location and verify that they are executable:

ec2-user@ip-172-30-0-64:~$ which ls
/bin/ls
ec2-user@ip-172-30-0-64:~$ ls -l /bin/ls
-rwxr-xr-x 1 root root 105840 Nov 19  2012 /bin/ls

which itself is an executable file:

ec2-user@ip-172-30-0-64:~$ which which 
/usr/bin/which
ec2-user@ip-172-30-0-64:~$ ls -l /usr/bin/which
lrwxrwxrwx 1 root root 10 Mar 29  2012 /usr/bin/which -> /bin/which
ec2-user@ip-172-30-0-64:~$ ls -l /bin/which 
-rwxr-xr-x 1 root root 946 Mar 29  2012 /bin/which

Automate your work by shell

Create a script, download.sh , with the following content.

# Clean previously downloaded data
rm -f Shakespeare.tar.gz
rm -rf data/
# Download
wget 'https://github.com/hupili/agile-ir/raw/master/data/Shakespeare.tar.gz'
# Uncompress 
tar -xzvf Shakespeare.tar.gz 
# list files
ls data/

TIP: No need to type in. Use cat > and copy paste the content into your terminal. The paste operations are different across terminals.

Content after # is comment.

Now execute the script:

ec2-user@ip-172-30-0-64:~$ chmod a+x download.sh 
ec2-user@ip-172-30-0-64:~$ ./download.sh 

...

The result is same as that when you type those commands in shell one by one directly. By writing scripts, you can automate tedious daily jobs. You will see some of this in the later part of this course.

EXERCISE: Shell scripts also supports common programming constructs, e.g. condition, loop, etc. Try to self-learn them from the Internet. Google "bash script" or something similar.

Linux 102 (optional)

Those are optional in the first tutorial due to time limit but we will encounter them in following tutorials. Just-In-Time instructions will be given but it's strongly recommended that you warm up at the earliest convenience.

Text editor -- VIM and others

With linux 101, you can at least operate in the following way:

  • Write codes in your desktop locally with you favourite GUI editor.
  • Upload codes and data to that Linux server.
  • Execute.
  • Download result and analyze.

This upload/download cycle causes considerable overheads when you need to frequently modify codes or configuraiton files.

VIM is a powerful text editor. There are many tutorials and guides online.

Emacs is also a widely available and highly customizable text editor. It's interesting to learn some Emacs basic operations and concepts.

Sometimes, nano will be fired up to input cerntain information.

Text editors are just tools. Pick one that is most convenient to you.

Package management

Cheatsheet for Ubuntu:

  • sudo apt-get install PACKAGE
  • sudo apt-get purge PACKAGE
  • sudo apt-file search FILE_NAME

Ubuntu will friendly prompt you for package installation. e.g. installing Git

ec2-user@ip-172-30-0-64:~$ git
The program 'git' is currently not installed.  You can install it by typing:
sudo apt-get install git
ec2-user@ip-172-30-0-64:~$ sudo apt-get install git
Reading package lists... Done
Building dependency tree

...

After this operation, 15.2 MB of additional disk space will be used.
Do you want to continue [Y/n]? y
Get:1 http://azure.archive.ubuntu.com/ubuntu/ precise/main liberror-perl all 0.17-1 [23.8 kB]

...

Setting up git-man (1:1.7.9.5-1) ...
Setting up git (1:1.7.9.5-1) ...
ec2-user@ip-172-30-0-64:~$ git
usage: git [--version] [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p|--paginate|--no-pager] [--no-replace-objects] [--bare]

...

   rm         Remove files from the working tree and from the index
   show       Show various types of objects
   status     Show the working tree status
   tag        Create, list, delete or verify a tag object signed with GPG

See 'git help <command>' for more information on a specific command.

dotfiles

Use ls -al and you will see many dotfiles (e.g. .bashrc). .XXXrc is the convention for customized configurations. You can bulid the best working environment for you via those configuration files, e.g. change color, add command alias, etc.

Try to search the Internet and do some customization. Many people put their own configs online, example. You can get some pointers from those repos.

parallel setup

You will manage a list of machines in this course. It would be cumbersome to ssh to every machine and excute the same setup command. One way is to use parallel-ssh for the parallel configuration.

sudo apt-get install pssh

parallel-ssh -i -h hosts.txt echo "hello, world"

#Run a long command without timing out:
parallel-ssh -i -h hosts.txt -t 0 sleep 10000

A more elegant way is to use Ansible. Ansible gives teams the power to scale IT automation, manage complex deployments and speed productivity.

Outcome of This Tutorial

  • You have a feel of IaaS via AWS.
  • You can operate a Linux server using shell.
  • You can upload/download code/data to/from the remote Linux server.
  • You have basic idea of a shell script and the use of it for automation.
  • You feel comfortable reading CLI examples.

Further Pointers

  • One MOOC Startup Engineering has useful materials to engage one into the Linux world. See Week 2: Linux, Command Line, SSJS, Emacs, Git, Dotfiles. ( Robin Lee)