Tutorial Logistics
- Static Sites
- Build Hadoop with Docker
- Run jobs
Static Sites
The image that we are going to use is a single-page website that I've already created for the purpose of this demo and hosted on the registry - prakhar1989/static-site. We can download and run the image directly in one go using docker run.
$ docker run prakhar1989/static-site
Digest: sha256:48ca6254c16b81a7960ad874e901f027fbcaac66509cfd79ce8f3f6da424d3b1
Since the image doesn't exist locally, the client will first fetch the image from the registry and then run the image. If all goes well, you should see a Nginx is running... message in your terminal. Okay now that the server is running, how do see the website? What port is it running on? And more importantly, how do we access the container directly from our host machine?
Well in this case, the client is not exposing any ports so we need to re-run the docker run command to publish ports. While we're at it, we should also find a way so that our terminal is not attached to the running container. This way, you can happily close your terminal and keep the container running. This is called detached mode.
$ docker run -d -P --name static-site prakhar1989/static-site
In the above command, -d will detach our terminal, -P will publish all exposed ports to random ports and finally --name corresponds to a name we want to give. Now we can see the ports by running the docker port [CONTAINER] command
$ docker port static-site
443/tcp -> 0.0.0.0:32768
80/tcp -> 0.0.0.0:32769
You can open http://localhost:32769 in your browser.
Note: If you're using docker-toolbox, then you might need to use docker-machine ip default to get the IP.
You can also specify a custom port to which the client will forward connections to the container.
$ docker run -p 8888:80 prakhar1989/static-site
Nginx is running...
To stop a detached container, run docker stop by giving the container ID.
I'm sure you agree that was super simple. To deploy this on a real server you would just need to install Docker, and run the above Docker command. Now that you've seen how to run a webserver inside a Docker image, you must be wondering - how do I create my own Docker image? This is the question we'll be exploring in the next section.
Build Hadoop with Docker
We've looked at images before, but in this section we'll dive deeper into what Docker images are and build our own image!
Docker images are the basis of containers. To see the list of images that are available locally, use the docker images command.
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> aef319ff5fb2 2 days ago 643MB
ubuntu spark 7f1e3360a281 2 days ago 2.26GB
ubuntu hadoop 507c6b330971 2 days ago 933MB
ubuntu 16.04 7e87e2b3bf7a 4 days ago 117MB
mesosphere/spark latest 5c25c7985707 14 months ago 1.42GB
java openjdk-8-jdk d23bdf5b1b1b 2 years ago 643MB
prakhar1989/static-site latest f01030e1dcf3 3 years ago 134MB
The above gives a list of images that I've pulled from the registry, along with ones that I've created myself (we'll shortly see how). The TAG refers to a particular snapshot of the image and the IMAGE ID is the corresponding unique identifier for that image.
For simplicity, you can think of an image akin to a git repository - images can be committed with changes and have multiple versions. If you don't provide a specific version number, the client defaults to latest. For example, you can pull a specific version of ubuntu image
# docker pull ubuntu:16.04
To get a new Docker image you can either get it from a registry (such as the Docker Hub) or create your own. There are tens of thousands of images available on Docker Hub. You can also search for images directly from the command line using docker search.
After getting a Docker images, you can run it as follows.
# docker run -ti ubuntu:16.04
root@adb26705d7f1:/#
Install Java
root@adb26705d7f1:/# apt update
root@adb26705d7f1:/# apt install software-properties-common python-software-properties
root@adb26705d7f1:/# add-apt-repository ppa:webupd8team/java
root@adb26705d7f1:/# apt update
root@adb26705d7f1:/# apt install oracle-java8-installer
root@adb26705d7f1:/# java -version
After installing Java in ubuntu:16.04, we need to install the following tools in the Docker images.
root@adb26705d7f1:/# apt update
root@adb26705d7f1:/# apt install wget
root@adb26705d7f1:/# apt install vim
root@adb26705d7f1:/# apt install net-tools # ifconfig
root@adb26705d7f1:/# apt install iputils-ping # ping
To install the cluster environment, we need to install SSH and rsync.
root@adb26705d7f1:/# apt install ssh
root@adb26705d7f1:/# apt install rsync
Setup passphraseless ssh. In the Docker, there is a litter different to set up the ssh. First, you need to build the following directory and run:
root@adb26705d7f1:/# mkdir /var/run/sshd
root@adb26705d7f1:/# /usr/sbin/sshd
It is because after we run the container, we need open the ssh server manually. Then, to set up passphraseless ssh, we use the following commands.
root@adb26705d7f1:/# cd ~/
root@adb26705d7f1:/# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
root@adb26705d7f1:/# cd .ssh
root@adb26705d7f1:/# cat id_rsa.pub >> authorized_keys
Test ssh and exit:
root@adb26705d7f1:/# ssh localhost
root@adb26705d7f1:/# exit
Install Hadoop:
root@adb26705d7f1:/# wget https://www-us.apache.org/dist/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
root@adb26705d7f1:/# tar -zxvf hadoop-2.9.2.tar.gz
Set environment variables:
root@adb26705d7f1:/# vim ~/.bashrc
export HADOOP_HOME=/hadoop-2.9.2
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export PATH=$HADOOP_HOME/bin:$PATH:$JAVA_HOME/bin
root@adb26705d7f1:/# source ~/.bashrc
Config hadoop env:
root@adb26705d7f1:/# cd hadoop-2.9.2
root@adb26705d7f1:/# vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
Config core-site.xml:
root@adb26705d7f1:/# vim etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Config hdfs-site.xml:
root@adb26705d7f1:/# vim etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Format the filesystem:
root@adb26705d7f1:/# bin/hdfs namenode -format
Start hdfs:
root@adb26705d7f1:/# ./sbin/start-dfs.sh
Check the dfshealth page. host: adb26705d7f1/172.17.0.4
open: 172.17.0.4:50070
After installing the all the environment variables in the image, we can save it to build other images based on the above image.
root@adb26705d7f1:/hadoop-2.9.2# exit
root@wutong:~# docker commit -m "hadoop install" adb26705d7f1 ubuntu:hadoop0127
root@wutong:~# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu hadoop0127 41b876015594 29 seconds ago 2.17GB
<none> <none> aef319ff5fb2 2 days ago 643MB
ubuntu spark 7f1e3360a281 2 days ago 2.26GB
ubuntu hadoop 507c6b330971 2 days ago 933MB
ubuntu 16.04 7e87e2b3bf7a 4 days ago 117MB
mesosphere/spark latest 5c25c7985707 14 months ago 1.42GB
java openjdk-8-jdk d23bdf5b1b1b 2 years ago 643MB
prakhar1989/static-site latest f01030e1dcf3 3 years ago 134MB
Run jobs
Enter a saved image:
root@wutong:~# docker run -it ubuntu:hadoop0127 /bin/bash
root@1676378bda6f:/#
Make the HDFS directories required to execute MapReduce jobs:
bin/hdfs dfs -mkdir /user
Teragen:
# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar teragen 100000 terasort/input
Supplements
Open/Delete Docker images:
(Linux) # docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu hadoop c4947022e03e About an hour ago 2.18GB
ubuntu java 71a751b9551c 3 hours ago 933MB
ubuntu 16.04 7e87e2b3bf7a 12 hours ago 117MB
hello-world latest fce289e99eb9 3 weeks ago 1.84kB
busybox latest 3a093384ac30 3 weeks ago 1.2MB
(Linux) # docker run -it ubuntu:java /bin/bash
(Linux) # docker rmi fce289e99eb9
Open/Stop Docker container:
(Linux) $ sudo su -
(Linux) # docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
996b7da00089 ubuntu:java "/bin/bash" 11 minutes ago Up 11 minutes thirsty_bose
448e80221a66 ubuntu:16.04 "/bin/bash" About an hour ago Up About an hour inspiring_chatelet
75b6b0563555 ubuntu:java "/bin/bash" 3 hours ago Exited (0) 14 minutes ago elastic_elbakyan
860d21c19e1b ubuntu:java "/bin/bash" 3 hours ago Exited (0) 2 hours ago suspicious_panini
aa9d3c99c3f3 ubuntu:16.04 "/bin/bash" 3 hours ago Exited (127) 3 hours ago condescending_rosalind
f2c276cb124a ubuntu:16.04 "/bin/bash" 5 hours ago Exited (0) 3 hours ago admiring_yalow
(Linux) # docker stop $CONTAINER_ID
(Linux) # docker start $CONTAINER_ID
References
- Docker installation: https://www.docker.com/get-started
- Previous Tutorial: http://mobitec.ie.cuhk.edu.hk/ierg4330Spring2017/tutorial/tutorial3/tutorial3.html