Getting Started: Docker

Background

Docker is the new buzz word in the industry. This will be a part of series of posts that I will be posting to document my learning about Docker. This post will be a living one for quite sometime as I explore more about it and update them here. So, lets get started.

What is Virtualization?

Virtualization in computer terms, is a process of emulating something that does not exists physically or in real. Virtualization came into existence in order to increase the utilization of the resources of the (computer) system. In general resources of a system refers to the following.

  • Memory/RAM
  • CPU Cycles/Time
  • Storage/Disk Space
  • Network Bandwidth

Most of the time, not all of the above mentioned resources are utilized equally by programs executing in the system. There is always a higher probability for the fraction if not more chance for some of the resources to be under utilized or wasted. Hence, virtualization came into existence to improve the utilization percentage or effectively utilize the resources of any given system. Virualization can be categorised into the following types.

  • Hardware Virtualization - Virtualizes hardware such as CPU, RAM, hard disk, network card, graphics card, etc allowing user to install multiple systems and share hardware resources among them. Eg., VMware, VirtualBox, Hyper-V
    • Virtual Machine Eg., VMware, VirtualBox, Hyper-V
    • Hypervisor Eg., ESXi
  • Software Virtualization
    • Operation System Virtualization - Virtualizes kernal of the operating system to be shared among multiple user spaces. A typical operating system will contain one kernel and one user space (ie., where all other user applications reside).
    • Application Virtualization - Virtualizes and abstracts the underlying operating system. Eg., Sandboxie
    • Service Virtualization - Mocking or creation of virtual services to double real ones. Eg., LISA, MonteBank
  • Memory Virtualization - Aggregating memory from distributed cluster or virtualizes memory from disk Eg., Virtual Memory
  • Database Virtualization - Aggregating data storage and management in a distributed cluster
  • Network Virtualization - Virtualizes network addressing spaces Eg., VLAN, VPN

We are more interested in hardware and software virtualization, especially application and operating system level virtualization in this post as Docker has its importance there.

What is Docker?

Docker is a software-based virtualization engine for Linux/Unix-based operating systems to virtualize application and operating system at the filesystem level so that any application targeting Docker can bring their own libraries, dependencies and environment along with them. Docker achieves application virtualization using cgroups (Control Groups) and kernel namespaces; filesystem virtualization using UnionFS (Union Filesystem - Aufs). Cgroups is a kernel module which is used to manage and control system resources for one or more processes running in the system. UnionFS is a type of filesysem which allows to superimpose two or more filesystem structures (directories and files) and to be treated as one single image. Each of the filesystem structure is called as a "layer" in Docker world. The layering architecture of the filesystem not only allows sharing of a single layer between multiple Docker images but also for the applications to bring their own layers containing libraries, dependencies and environment along with them. This in-turn allows Docker images to be much smaller in size, portable across multiple platforms/operating systems and incremental development without polluting existing development environment.

Docker vs VMware/VirtualBox/Hyper-V?

Traditional virtualization systems like VMware/VirtualBox/Hyper-V do hardware virtualization, ie., emulates physical processor (CPU), memory (RAM) or disk (Hard disk) to multiple operating systems so that each of the them thinks they have access to the real resources. An operating system running within such a system cannot distinguish whether it is running on a real hardware or a virtual one. In contrast, Docker takes a different approach of virtualizing kernel so that all the Docker images shares a single kernel but different user spaces. By sharing a single kernel, it avoids the hassle of installing and managing multiple operating systems (ie., kernel) for each application or application instances.

Installing Docker

Docker can now be installed in both Windows and Linux/Unix operating systems though it was initially supported only for Linux/Unix. Though it appears to be running on Windows using PowerShell interface, under-the-hood Docker spawns a Linux virtual machine (VM) boot2docker (lightweight Linux distribution based on Tiny Core Linux made specifically to run Docker containers) using VirtualBox or Hyper-V to run its core service. Hence, I personally always recommend installing Docker on a Linux/Unix-based real or virtual machine to avoid confusion when creating complex network topologies. Installing Docker is pretty much a straight-forward one: Just head to https://www.docker.com/products/docker and download the installer package/executable for your operating system.

Docker Images and Containers

At the onset, these two words may make your head spin. It did for me too. And, in most of the places, it seemed like these words are used interchangeably. May be not, when you've understood the essence of it. To put is simply, if images are concrete entities, container are thier instances, images in action or execution. To make it clearer, a single image can be used to create multiple containers which are present in different state of existence. To be precise, an container can be started, paused or stopped but an image can't be. All set! Let's get into command-line interface, various commands, its usage and purposes.

Commands

Downloading Images and Tags: The pull command allows pulling for images from Docker hub which is preset at hub.docker.com.

$ docker pull <image-name>:<tag> $ docker pull centos:latest

Image names may also be in the format of <namespace>/<image>. A tag may be used to differentiate between different versions or flavors of a specific image. Listing Images: The images command lists the images present in the local system (Docker repository)

$ docker images

Deleting Images: The rmi (remove image) command removes image specified by image id from the local system

$ docker rmi <image-id>

Creating/Running Image Containers: The run command executes an image as a container. As said earlier, a container is an image in execution which is living in the memory of the system. Below is the basic format for a vanilla run command. The <command> part in the below construct is the name of the process to execute after starting the container which also serves as a entry point into the container. For an operating system image (Eg., CentOS, LinuxMint) this is the shell program (sh or bash) present in the system. Note that a random name is assigned to the container when creating a container by running an image for the first time. Use --name <container-name> switch to specify a user-defined name to the container.

$ docker run <image-name> <command> $ docker run centos:latest /bin/bash $ docker run centos:latest --name mylinux /bin/bash

Listing Image Containers: The ps (process stats) command lists the running/alive containers in the system. Use -all switch to lists all containers including the ones that have earlier been stopped or died.

$ docker ps $ docker ps -all

Deleting Image Containers: The rm (remove) command deletes an container present in the system.

$ docker rm <container-id>

Network Configuration

One of the most common requirement that may arise once an image has been download and executed is to provide network access to the running container. This is usually done using command-line switches and configurations while creating a container. In order to expose a TCP/UDP port of the container to the external world, a port mapping is done between the Docker's network interface (internal) and the host's network interface (external). The internal/external references to the network interface is relative to Docker and may not necessarily correspond to private/public IP addresses.

$ docker run -p <host-network-ip>:<host-port>:<container-port> <image-name> <command> $ docker run -p 192.168.1.123:8080:80 <image-name> <command> $ docker run -p 192.168.1.123:8080:80/udp <image-name> <command>

The above example command maps (exposes) the container at port 80 via/at host's network interface of IP 192.168.1.123 and port 8080. Hence, all traffic to IP address 192.168.1.123 at port 8080 will be redirected to the container at port 80. The host may have more then one network interface and may also have an public IP address if it is an internet-facing system. Filesystem Volumes

Volumes are used to mount a directory/file in the host file system to the container. This is a persistant storage layer which can be shared accross multiple containers and can live through container deletion. Volumes are created/attached using -v switch while creating a container by running its image.

$ docker run -v <host-path>:<container-path> $ docker run -v /usr/home/foo:/mnt/foo

References