A Tour in the Cloud & Basic Linux Operations
A first journey in the cloud. Amazon EC2 (Google Compute Engine, Windows Azure): launch VM, try both windows and Linux, port mapping.
Linux 101: navigate around, upload/download files, run programs. In this tutorial, student warms up with the working environment in the semester.
Tutorial Logistics
- Organization of tutorial session.
- Structure of tutorial notes.
- Notations: optional, NOTE, TIP, EXERCISE
NOTE:
You are strongly encouraged
to bring you own laptops since we will use a normal classroom (NOT IE lab) for the further tutorials.
Choose the desktop environment that suits you best.
Register Amazon Web Services (AWS) - Cloud Computing Services
Note: 1. We have posted the instruction to apply for an AWS student account in the CUHK Blackboard. Feel free to follow the instructions.
Through out the course, only instructions of major steps are provided. You need to work out the details on your own. Ask Internet first, then your peers, and TA last.
You have two options: 1. You can bind your credit card with an AWS account, and then if you apply for a student account, you will get the promotional code of $100. 2. If you don't want to bind your credit card, and you will get a student account without the promotional credit code, which has $75 credit. You can choose any option, and both accounts can launch t2.large instance in AWS and help you finish the homeworks. If you bind your credit card with the account you created. Upon reaching your credit amount, your billing will automatically switch to the credit card on file. In case of burn up your credit card, you can set up a billing alert as instructed in the AWS Usage.
Given the limited resources, we will only use it for proof-of-concept (POC) experiments. We will use our private DIC cluster for large jobs.
A Tour in the Cloud
- Management portal of AWS: https://console.aws.amazon.com/console/home
- The official introduction: https://aws.amazon.com/getting-started/
EXERCISE: Find other introductory materials. Get a feel of what "IaaS" provides in the cloud environment. Try other components in AWS subject to our academic pass subscription limit, e.g. website, mobile App, etc.
Windows VM (optional for Azure)
Try one if you have interest.
Use remote desktop connection to access it.
e.g. mstsc
in windows, rdesktop
package in Linux.
TIP:
In the dashboard, you can find a CONNECT
button to download an .rdp
file.
This file can be usd by remote desktop client on windows directly.
TIP: On MAC, you may need to download the latest Microsoft Remote Desktop from iTunes.
Launch a Linux VM
Launch a Linux Virtual Machine (VM). Ubuntu 14.04 is prefered. The "quick create" default options will do. Fill in required information, e.g.:
Verify it's working in the dashboard. Navigate around to explore configurations and statistics.
Connect to Linux VM
- MAC or Linux: Launch terminal and issue a command like
ssh ec2-user@public-ip-address-of-instance
- Windows: Find an SSH client, e.g. PuTTY and Some Others.
NOTE:
For the GUI clients, you need to fill in some connection information,
e.g. port is 22, remote address is public-ip-address
,
user is ec2-user
. Below shows one example.
Click Connection > SSH > Auth
in the left-hand navigation pane and configure the private key to use by clicking Browse under Private key file for authentication.
A sample output in the terminal:
$ssh ec2-user@54.85.206.188
Warning: Permanently added the RSA host key for IP address '54.85.206.188' to the list of known hosts.
ec2-user@54.85.206.188's password:
Welcome to Ubuntu 12.04.3 LTS (GNU/Linux 3.2.0-57-virtual x86_64)
* Documentation: https://help.ubuntu.com/
System information as of Mon Jan 13 08:31:10 UTC 2014
System load: 0.54 Processes: 94
Usage of /: 3.5% of 28.83GB Users logged in: 0
Memory usage: 1% IP address for eth0: 10.62.144.17
Swap usage: 0%
Graph this data and manage this system at https://landscape.canonical.com/
Get cloud support with Ubuntu Advantage Cloud Guest:
http://www.ubuntu.com/business/services/cloud
38 packages can be updated.
20 updates are security updates.
Last login: Mon Jan 13 04:56:45 2014 from ip-123-255-102-243.wlan.cuhk.edu.hk
ec2-user@ip-172-30-0-64:~$
Now you are in a Linux machine (Precisely you have a shell called bash ). This is the default remote environment you will use throughout the course.
NOTE: The above texts from the terminal is the "screenshot" in Linux world. You will see a lot of this in this class. Just get used to it.
Passwordless SSH
Install SSH:
SSH (“Secure SHell”) is a protocol for securely accessing one machine from another. Hadoop uses SSH for accessing another slaves nodes to start and manage all HDFS and MapReduce daemons. SSH server should has been already installed. Otherwise, install SSH server by:
ubuntu@master:~$ sudo apt-get install openssh-server
We do not want to run Hadoop as the root. So we will create a new user/group for hadoop related jobs.
ubuntu@master:~$ sudo addgroup hadoop
Adding group `hadoop' (GID 1001) ...
Done.
ubuntu@master:~$ sudo adduser --ingroup hadoop hduser
Now we login as hduser
and the rest of the tutorial will act as hduser
, if not specified explicitly.
ubuntu@master:~$ su hduser
#Go to home directory of hduser
hduser@master:/home/ubuntu$ cd ~
Generate SSH key pairs for hduser.
#Use empty passphrase
hduser@master:~$ ssh-keygen -t rsa -f id_rsa
hduser@master:~$ mkdir .ssh
hduser@master:~$ mv id_rsa* .ssh/
hduser@master:~$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys
hduser@master:~$ chmod 700 .ssh/
hduser@master:~$ chmod 600 .ssh/authorized_keys
#Verify by ssh to localhost
hduser@master:~$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is c1:7b:f2:19:f0:fb:5d:a1:ee:a6:18:6b:df:6a:85:f5.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-74-generic x86_64)
...
Security group
Open the Amazon EC2 console
Choose one instace, and then choose the Description option, you can find the Security groups:
Click the launch-wizard, and then in the security group page, find the Inbound option.
Click Edit, set the security group to the following configuration:
Modifying the Volume
You can set the volume when you launch an instance. The following is how to modify the volume when the intance exists.
- Open the Amazon EC2 console
- Choose Volumes, select the volume to modify, and then choose Actions, Modify Volume.
- The Modify Volume window displays the volume ID and the volume's current configuration, including type, size, and IOPS. For the course purpose, you need to change the size, like 30 GiB.
- Connect to your instances via SSH, expand the Linux file system:
# First use command lsblk to find the device name (The name displays in your machine may be different from the tutorial's)
ubuntu@master:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 30G 0 disk
└─xvda1 202:1 0 8G 0 part /
# Expand the modified partition using growpart (and note the unusual syntax of separating the device name from the partition number)
ubuntu@master:~$ sudo growpart /dev/xvda 1
CHANGED: partition=1 start=16065 old: size=16755795 end=16771860 new: size=62894475,end=62910540
# A look at the lsblk output confirms that the partition /dev/xvda1 now fills the available space on the volume /dev/xvda
ubuntu@ip-172-31-23-27:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 30G 0 disk
└─xvda1 202:1 0 30G 0 part /
Clone Instances from Image
The above tutorial of password less ssh, security group, and expanding the volume is for one instance. To save your time, you can create an image of this instance, and launch new instances from the image. For detailed instructions.
Linux 101
Main reference: http://linuxcommand.org/learning_the_shell.php
Concepts to be delivered during tutorial:
- System, Process, Executable, Shell, Commands, Command-Line-Interface
- Filesystem, directory tree, Filesystem Hierarchy Standard see also
- Standard input/output, pipe
Navigation
ec2-user@ip-172-30-0-64:~$ pwd
/home/ec2-user
ec2-user@ip-172-30-0-64:~$ ls
ec2-user@ip-172-30-0-64:~$ cd ..
ec2-user@ip-172-30-0-64:/home$ ls
ec2-user ubuntu
ec2-user@ip-172-30-0-64:/home$ cd ..
ec2-user@ip-172-30-0-64:/$ ls
bin dev home lib lost+found mnt proc run selinux sys usr vmlinuz
boot etc initrd.img lib64 media opt root sbin srv tmp var
ec2-user@ip-172-30-0-64:/$ pwd
/
NOTE:
$
is the command prompt, after which you can type a command.
After entering a command, there will be some output.
Rule of thumb: read output carefully, esp. during more complex operations later.
Linux command outputs are usually self-documenting.
Seeking for help
ec2-user@ip-172-30-0-64:/$ ls --help
Usage: ls [OPTION]... [FILE]...
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.
Mandatory arguments to long options are mandatory for short options too.
-a, --all do not ignore entries starting with .
...
NOTE:
...
in the last line denotes omitted console outputs.
Find the complete version by enter the corresponding command.
ec2-user@ip-172-30-0-64:/$ man ls
LS(1) User Commands LS(1)
NAME
ls - list directory contents
SYNOPSIS
ls [OPTION]... [FILE]...
DESCRIPTION
List information about the FILEs (the current directory by default). Sort entries alphabetically if none of
-cftuvSUX nor --sort is specified.
...
--block-size=SIZE
scale sizes by SIZE before printing them. E.g., `--block-size=M' prints sizes in units of 1,048,576
bytes. See SIZE format below.
Manual page ls(1) line 1 (press h for help or q to quit)
TIP:
Use arrow keys or j
/k
to move around the manual page.
Use /
to search for a keyword.
Press q
to end.
What is the man
command? Try man man
Basic file/dir operations
Make directories:
ec2-user@ip-172-30-0-64:~$ mkdir mydir
ec2-user@ip-172-30-0-64:~$ ls
mydir
ec2-user@ip-172-30-0-64:~$ cd mydir/
ec2-user@ip-172-30-0-64:~/mydir$ pwd
/home/ec2-user/mydir
ec2-user@ip-172-30-0-64:~/mydir$ cd ..
ec2-user@ip-172-30-0-64:~$ ls
mydir
ec2-user@ip-172-30-0-64:~$ rmdir mydir
ec2-user@ip-172-30-0-64:~$ ls
ec2-user@ip-172-30-0-64:~$
Ceate and view text files:
ec2-user@ip-172-30-0-64:~$ echo "this is my first file" > myfile
ec2-user@ip-172-30-0-64:~$ ls
myfile
ec2-user@ip-172-30-0-64:~$ cat myfile
this is my first file
echo
prints the string to STDOUT
.
>
redirects STDOUT
to a file.
More on IO redirection.
What is cat
then? Try man
or --help
.
Move/Copy/Remove file:
ec2-user@ip-172-30-0-64:~$ ls
myfile
ec2-user@ip-172-30-0-64:~$ cat myfile
this is my first file
ec2-user@ip-172-30-0-64:~$ cp myfile myfile2
ec2-user@ip-172-30-0-64:~$ ls
myfile myfile2
ec2-user@ip-172-30-0-64:~$ cat myfile2
this is my first file
ec2-user@ip-172-30-0-64:~$ mv myfile myfile.moved
ec2-user@ip-172-30-0-64:~$ ls
myfile2 myfile.moved
ec2-user@ip-172-30-0-64:~$ rm myfile2
ec2-user@ip-172-30-0-64:~$ ls
myfile.moved
TIP:
Do more experiments like this.
Use ls
(probably with options like -a
, -l
) to inspect a dir.
Use cat
or less
/more
to inspect the content of a text file.
About filename:
- Basically a flat string. No concept of "extension name". Though, people may have naming conventions sometime.
- Files start with
.
is "hidden". Usels -a
to see them.
File transfer
Major methods:
scp
/sftp
under Linux/Mac- An open source and cross-platform SFTP client: FileZilla
- An SFTP client under windows: WinSCP
Pick the one that suits you most.
Tasks:
- Create a text file in your desktop. Upload it to the server. Verify it is same as the file you created locally.
- Create a text file in your server. Download it to your desktop. Verify it is same as the file you created remotely.
File download from the Internet
ec2-user@ip-172-30-0-64:~$ mkdir try-wget
ec2-user@ip-172-30-0-64:~$ cd try-wget/
ec2-user@ip-172-30-0-64:~/try-wget$ wget 'https://github.com/hupili/agile-ir/raw/master/data/Shakespeare.tar.gz'
--2014-01-14 03:02:09-- https://github.com/hupili/agile-ir/raw/master/data/Shakespeare.tar.gz
Resolving github.com (github.com)... 192.30.252.131
...
ec2-user@ip-172-30-0-64:~/try-wget$ ls
Shakespeare.tar.gz
Now you have downloaded Shakespeare's works, all in one compressed archive Shakespeare.tar.gz
.
Following is a shortcut to uncompress it:
ec2-user@ip-172-30-0-64:~/try-wget$ tar -xzvf Shakespeare.tar.gz
data/
data/sonnet-59.txt
data/sonnet-139.txt
data/sonnet-88.txt
data/sonnet-123.txt
data/sonnet-137.txt
data/play-twogents.txt
...
data/sonnet-134.txt
data/sonnet-93.txt
data/sonnet-24.txt
data/sonnet-3.txt
data/play-juliuscaesar.txt
What's -xzvf
? Try man
or --help
.
NOTE:
Some commands have shorthand notation for multiple options.
In the above example, tar -xzvf YOUR_FILE
is equivalent of tar -x -z -v -f YOUR_FILE
.
Try the latter one yourself.
EXERCISE:
Navigate the data
dir and operate on those files, e.g. cp
, mv
.
EXERCISE:
Get familiar with tar
, zip
, gzip
, bzip2
.
You are very likely to get others' data in those formats.
EXERCISE:
Get familiar with wget
options.
A simple crawler can be obtained by wget -r START_URL
.
EXERCISE:
Try to use curl
to download the same file.
Most Linux distribution has wget
and/or curl
by default.
Suppose you have finished processing data
.
Cleanup as follows:
ec2-user@ip-172-30-0-64:~/try-wget$ ls
data Shakespeare.tar.gz
ec2-user@ip-172-30-0-64:~/try-wget$ ls data/
play-12night.txt play-titus.txt sonnet-122.txt sonnet-152.txt sonnet-42.txt sonnet-72.txt
...
play-tempest.txt sonnet-120.txt sonnet-150.txt sonnet-40.txt sonnet-70.txt
play-timonathens.txt sonnet-121.txt sonnet-151.txt sonnet-41.txt sonnet-71.txt
ec2-user@ip-172-30-0-64:~/try-wget$ rm -rf data/
ec2-user@ip-172-30-0-64:~/try-wget$ ls
Shakespeare.tar.gz
rm -rf
is a powerful command.
Use with great care.
Execute an executable file
Write your first shell script
ec2-user@ip-172-30-0-64:~$ cat > hello.sh
echo "hello world. My first shell script!"
ec2-user@ip-172-30-0-64:~$ ls
hello.sh
ec2-user@ip-172-30-0-64:~$ cat hello.sh
echo "hello world. My first shell script!"
cat >
reads STDIN and redirect all the content to hello.sh
.
The second line echo "hello world. My first shell script!"
is typed by you.
After that press ctrl+d
to end typing.
EXERCISE: Try this way to create more files. This is the simplest way to write small text files without using a text-based editor.
Make it executable:
ec2-user@ip-172-30-0-64:~$ ls -l hello.sh
-rw-rw-r-- 1 ec2-user ec2-user 43 Jan 14 07:26 hello.sh
ec2-user@ip-172-30-0-64:~$ chmod a+x hello.sh
ec2-user@ip-172-30-0-64:~$ ls -l hello.sh
-rwxrwxr-x 1 ec2-user ec2-user 43 Jan 14 07:26 hello.sh
The x
character indicates that the file is executable.
Read more.
Execute it:
ec2-user@ip-172-30-0-64:~$ ./hello.sh
hello world. My first shell script!
ec2-user@ip-172-30-0-64:~$ /home/ec2-user/hello.sh
hello world. My first shell script!
NOTE:
One often ignored syntax:
If the executable is under current working directory, prefix it with ./
.
Or else, the system will try to locate that command in PATH.
About shell commands (optional)
The commands you use, e.g. ls
, cd
, mkdir
, are just some pre-installed executables in the system.
You can find their location and verify that they are executable:
ec2-user@ip-172-30-0-64:~$ which ls
/bin/ls
ec2-user@ip-172-30-0-64:~$ ls -l /bin/ls
-rwxr-xr-x 1 root root 105840 Nov 19 2012 /bin/ls
which
itself is an executable file:
ec2-user@ip-172-30-0-64:~$ which which
/usr/bin/which
ec2-user@ip-172-30-0-64:~$ ls -l /usr/bin/which
lrwxrwxrwx 1 root root 10 Mar 29 2012 /usr/bin/which -> /bin/which
ec2-user@ip-172-30-0-64:~$ ls -l /bin/which
-rwxr-xr-x 1 root root 946 Mar 29 2012 /bin/which
Automate your work by shell
Create a script, download.sh
, with the following content.
# Clean previously downloaded data
rm -f Shakespeare.tar.gz
rm -rf data/
# Download
wget 'https://github.com/hupili/agile-ir/raw/master/data/Shakespeare.tar.gz'
# Uncompress
tar -xzvf Shakespeare.tar.gz
# list files
ls data/
TIP:
No need to type in.
Use cat >
and copy paste the content into your terminal.
The paste operations are different across terminals.
Content after #
is comment.
Now execute the script:
ec2-user@ip-172-30-0-64:~$ chmod a+x download.sh
ec2-user@ip-172-30-0-64:~$ ./download.sh
...
The result is same as that when you type those commands in shell one by one directly. By writing scripts, you can automate tedious daily jobs. You will see some of this in the later part of this course.
EXERCISE: Shell scripts also supports common programming constructs, e.g. condition, loop, etc. Try to self-learn them from the Internet. Google "bash script" or something similar.
Linux 102 (optional)
Those are optional in the first tutorial due to time limit but we will encounter them in following tutorials. Just-In-Time instructions will be given but it's strongly recommended that you warm up at the earliest convenience.
Text editor -- VIM and others
With linux 101, you can at least operate in the following way:
- Write codes in your desktop locally with you favourite GUI editor.
- Upload codes and data to that Linux server.
- Execute.
- Download result and analyze.
This upload/download cycle causes considerable overheads when you need to frequently modify codes or configuraiton files.
VIM is a powerful text editor. There are many tutorials and guides online.
Emacs is also a widely available and highly customizable text editor. It's interesting to learn some Emacs basic operations and concepts.
Sometimes, nano will be fired up to input cerntain information.
Text editors are just tools. Pick one that is most convenient to you.
Package management
Cheatsheet for Ubuntu:
sudo apt-get install PACKAGE
sudo apt-get purge PACKAGE
sudo apt-file search FILE_NAME
Ubuntu will friendly prompt you for package installation. e.g. installing Git
ec2-user@ip-172-30-0-64:~$ git
The program 'git' is currently not installed. You can install it by typing:
sudo apt-get install git
ec2-user@ip-172-30-0-64:~$ sudo apt-get install git
Reading package lists... Done
Building dependency tree
...
After this operation, 15.2 MB of additional disk space will be used.
Do you want to continue [Y/n]? y
Get:1 http://azure.archive.ubuntu.com/ubuntu/ precise/main liberror-perl all 0.17-1 [23.8 kB]
...
Setting up git-man (1:1.7.9.5-1) ...
Setting up git (1:1.7.9.5-1) ...
ec2-user@ip-172-30-0-64:~$ git
usage: git [--version] [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
[-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
...
rm Remove files from the working tree and from the index
show Show various types of objects
status Show the working tree status
tag Create, list, delete or verify a tag object signed with GPG
See 'git help <command>' for more information on a specific command.
dotfiles
Use ls -al
and you will see many dotfiles (e.g. .bashrc
).
.XXXrc
is the convention for customized configurations.
You can bulid the best working environment for you via those configuration files,
e.g. change color, add command alias, etc.
Try to search the Internet and do some customization. Many people put their own configs online, example. You can get some pointers from those repos.
parallel setup
You will manage a list of machines in this course. It would be cumbersome to ssh
to every machine and excute the same setup command. One way is to use parallel-ssh
for the parallel configuration.
sudo apt-get install pssh
parallel-ssh -i -h hosts.txt echo "hello, world"
#Run a long command without timing out:
parallel-ssh -i -h hosts.txt -t 0 sleep 10000
A more elegant way is to use Ansible
. Ansible gives teams the power to scale IT automation, manage complex deployments and speed productivity.
Outcome of This Tutorial
- You have a feel of IaaS via AWS.
- You can operate a Linux server using shell.
- You can upload/download code/data to/from the remote Linux server.
- You have basic idea of a shell script and the use of it for automation.
- You feel comfortable reading CLI examples.
Further Pointers
- One MOOC Startup Engineering has useful materials to engage one into the Linux world. See Week 2: Linux, Command Line, SSJS, Emacs, Git, Dotfiles. ( Robin Lee)