Tutorial¶
This document will go through the steps needed to configure a Docker Swarm environment, and running a toy eHive pipeline.
Tip
In this document, worker (in lower case) refers to a compute node of the Docker Swarm cluster, whereas Worker (capitalized) refers to the eHive process that runs Jobs.
Set up a swarm¶
Tip
To set up Docker Swarm in the cloud, follow these instructions
- For Amazon Web Services: https://docs.docker.com/docker-cloud/cloud-swarm/create-cloud-swarm-aws/
- For Microsoft Azure: https://docs.docker.com/docker-cloud/cloud-swarm/create-cloud-swarm-azure/
Caution
Make sure you don’t have any docker swarm leftover from previous attempts, or the times when dynamically allocated IPs were different.
On the worker nodes run:
docker swarm leave
On the manager node run:
docker swarm leave --force
Attention
On Linux, one of the default network layers needed for the swarm to work (the ingress network) overlaps with the WTGC wireless network (IP range 10.255.0.0/16 vs 10.255.0.0/21). You will have to either disconnect from WTGC or re-configure the ingress network following the instructions found at https://docs.docker.com/engine/swarm/networking/#customize-the-ingress-network
Important
These instructions have been fully tested on a swarm composed of Ubuntu 14.04 machines. MacOS machines can be added to the swarm, but with a few restrictions due to the way the network is configured on this OS. For instance, MacOS cannot be swarm managers, or host the blackboard.
Pick a master node for your swarm and make sure its Docker engine is listening on a TCP interface. Under Ubuntu, this is configured in
/etc/default/docker
, by adding this toDOCKER_OPTS
:-H tcp://0.0.0.0 -H unix:///var/run/docker.sock
The first
-H
enables the TCP server on all network interfaces, the second-H
is to keep the engine listening on its standard UNIX socket, which is needed by the docker CLI interface.Note
-H tcp://
alone makes the TCP server listen on the local interface only, which would make it invisible from the other nodes.Init a fresh swarm on the master:
docker swarm init
Copy the command to join the swarm to the worker nodes and run it there
Tip
If you lose the command, just run this to get it back:
docker swarm join-token worker
Take note of the last part of that connection command, change the port number from 2377 to 2375 and use this value to set
DOCKER_MASTER_ADDR
in all the terminals in which you will run beekeeper.pl or the debug scriptdev/docker_jobs.pl
.:export DOCKER_MASTER_ADDR=123.45.67.89:2375
Note
By default, Docker containers use the Google Public DNS servers 8.8.8.8 and 8.8.4.4. If you want to use DNS names to refer to servers (
DOCKER_MASTER_ADDR
and the MySQL server below) and those are not publicly advertised, you will need to add--dns my.dns.ip.address
parameter(s) to the Docker daemon (/etc/default/docker
under Ubuntu)Check that the swarm has all the nodes you want (from the manager node):
docker node ls
Note
Docker services can only be created from the master node. This
affects all the docker service create
commands listed below.
Create the pipeline database¶
The easiest is to set up a database server outside of the swarm. You need to make sure it is visible from all the swarm nodes. Do check that:
- you use the public IP address in the URL for that database (neither “localhost” nor “127.0.0.1” will do)
- the server allows external access (check both MySQL server’s config and your firewalls)
- the swarm nodes are able to access the server’s network
You can also submit the database as a Docker service, for instance:
docker service create --name blackboard --publish 8306:3306 --reserve-cpu 1 --env MYSQL_RANDOM_ROOT_PASSWORD=1 --env MYSQL_USER=ensrw --env MYSQL_PASSWORD=ensrw_password --env 'MYSQL_DATABASE=%' mysql/mysql-server:5.5
This will create a MySQL 5.5 server with the user/password credentials you wish.
MYSQL_DATABASE=%
is a trick to make this image grant permissions to the user on all (%
) databases.The server will run on any node, but the local port 3306 (MySQL’s default) will be mapped to the manager node’s port 8306. Hence, the MySQL server URL would be on the manager’s IP address and port 8306.
Caution
Be aware that this way of running MySQL is considered unreliable since the database files only exist within the container, and won’t be kept upon restart (if the server crashes) or when the service ends.
The init_pipeline.pl command itself is the same as per usual:
init_pipeline.pl Bio::EnsEMBL::Hive::Examples::LongMult::PipeConfig::LongMult_conf -pipeline_url $EHIVE_URL -hive_force_init 1
If the pipeline and its dependencies are available on the host machine, you could run the command directly. Otherwise, let’s run the Docker image as a service:
docker service create --name=init_pipeline --restart-condition=none ensemblorg/ensembl-hive-docker-swarm init_pipeline.pl (...)
Tip
Docker will automatically pull the latest image before starting the containers, you don’t need to update the image yourself.
Run the pipeline¶
If you are restarting a pipeline, you may need to delete the services created by the previous attempt, as the service names have to be unique. Find out which services are still registered with
docker service ls
(see below) and delete the ones you don’t need any more:$ docker service rm long_mult-Hive-default-2_1 long_mult-Hive-default-1_2 long_mult-Hive-default-1_3
Beekeeper
You can run beekeeper.pl on any of the machines participating in the Swarm as long as you have set
DOCKER_MASTER_ADDR
variable there: it doesn’t have to be the master node!You can also submit the beeekeeper to the Swarm as a service:
docker service create --name long_mult_beekeeper1 --replicas 1 --restart-condition none --env DOCKER_MASTER_ADDR=$DOCKER_MASTER_ADDR --reserve-cpu 1 ensemblorg/ensembl-hive-docker-swarm \ beekeeper.pl -url $EHIVE_URL -loop
For debugging, you may have to share a directory with the container. Add this to the command-line before the image name:
--mount type=bind,source=/tmp/leo,destination=/tmp/leo
Make sure that the source directory exists on all the nodes, since you cannot control on which node the service will be executed.
Remember that LOCAL analyses will be run on the Beekeeper’s environment, and won’t be submitted.
You can also run the Beekeeper with the
-run
option instead of-loop
. The Beekeeper service will scale down to zero when the Beekeeper ends and you’ll need to rescale it to one every time you want another iteration:docker service scale long_mult_beekeeper1=1
This can be useful when debugging the Beekeeper, but when everything works, just switch it to
-loop
and enjoy.
In parallel, open a database connection and watch the pipeline being worked on!
Monitor the Workers (services) submitted by the Beekeeper with
docker service
:$ docker service ls ID NAME MODE REPLICAS IMAGE PORTS quqiykcjmnhk long_mult-Hive-default-2_1 replicated 0/4 ensemblorg/ensembl-hive-docker-swarm t0eundxn55m6 long_mult-Hive-default-1_2 replicated 0/4 ensemblorg/ensembl-hive-docker-swarm xi9f3ffbid5e long_mult-Hive-default-1_3 replicated 0/2 ensemblorg/ensembl-hive-docker-swarm $ docker service ps long_mult-Hive-default-1_2 ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS ekx78eij8veb long_mult-Hive-default-1_2.1 ensemblorg/ensembl-hive-docker-swarm mattxps Shutdown Failed 19 hours ago "starting container failed: oc…" m13t6brngmwl long_mult-Hive-default-1_2.2 ensemblorg/ensembl-hive-docker-swarm matttop Shutdown Complete 19 hours ago nb3pvz5daep4 long_mult-Hive-default-1_2.3 ensemblorg/ensembl-hive-docker-swarm mattxps Shutdown Failed 19 hours ago "starting container failed: oc…" j3j4vlm9b4m3 long_mult-Hive-default-1_2.4 ensemblorg/ensembl-hive-docker-swarm matttop Shutdown Complete 19 hours ago $ docker service logs long_mult-Hive-default-1_2 long_mult-Hive-default-1_2.1.ekx78eij8veb@mattxps | container_linux.go:262: starting container process caused "exec: \"/repo/ensembl-hive/scripts/dev/simple_init.py\": stat /repo/ensembl-hive/scripts/dev/simple_init.py: no such file or directory" long_mult-Hive-default-1_2.3.nb3pvz5daep4@mattxps | container_linux.go:262: starting container process caused "exec: \"/repo/ensembl-hive/scripts/dev/simple_init.py\": stat /repo/ensembl-hive/scripts/dev/simple_init.py: no such file or directory" $ docker service logs ekx78eij8veb long_mult-Hive-default-1_2.1.ekx78eij8veb@mattxps | container_linux.go:262: starting container process caused "exec: \"/repo/ensembl-hive/scripts/dev/simple_init.py\": stat /repo/ensembl-hive/scripts/dev/simple_init.py: no such file or directory"
Tip
When given a service name,
docker service logs
will print the logs of all the tasks of that service. When given a task ID (the first column ofdocker service ps
), the output is restricted to that task. This is the only way of getting the output of a specific Worker asdocker service logs
doesn’t accept “task names” (e.g. long_mult-Hive-default-1_2.2).Note
docker service logs
dumps the standard-output logs onto your standard-output and the standard-error logs onto your standard-error.We also provide a script
docker_jobs.pl
, located inensembl-hive/scripts/dev/
(which is not in the default PATH) to list either all the service replicas, or only the replicas of the service of your choice. The script uses Docker’s REST API onDOCKER_MASTER_ADDR
, and is a good way of checking that the information available to the DockerSwarm meadow is the same as on the command-line.$ ensembl-hive/scripts/dev/docker_jobs.pl Service_ID Service_name_and_index Task_ID Status Node_ID Node_name 0cjyvrg56e6a4qt666b161oky init_pipeline[1] mxibbp4s5mjxf2x9i8y2rt9fu complete hw7a5jd8tx20e51istjp3dp1i 172.22.70.252/matttop kldfgtvg6lehifcz7ggggw7cy long_mult_beekeeper1[1] 9ifvq4os3b8jm69ogngmck6jo complete hw7a5jd8tx20e51istjp3dp1i 172.22.70.252/matttop mwtzqypba2tnrrmfi4lg7wc43 long_mult-Hive-default-1_2[1] v96yhbbv7yli4xr3855d18x1y complete hw7a5jd8tx20e51istjp3dp1i 172.22.70.252/matttop mwtzqypba2tnrrmfi4lg7wc43 long_mult-Hive-default-1_2[2] 0448t1akalt8coak7vj1q2d9l complete 9m8hh96du7220yxtv65a8840q 172.22.68.27/mattxps mwtzqypba2tnrrmfi4lg7wc43 long_mult-Hive-default-1_2[3] mf2oev5kcltklz9hgenas1xc4 complete hw7a5jd8tx20e51istjp3dp1i 172.22.70.252/matttop mwtzqypba2tnrrmfi4lg7wc43 long_mult-Hive-default-1_2[4] 36a7uxdqc0l6m0kxkunp6rjn9 complete 9m8hh96du7220yxtv65a8840q 172.22.68.27/mattxps z7nz4ivyhnvja1o7ndobvqd26 long_mult-Hive-default-1_3[1] 7bofm0n7kp2d9dv5cy4hudg6w complete hw7a5jd8tx20e51istjp3dp1i 172.22.70.252/matttop z7nz4ivyhnvja1o7ndobvqd26 long_mult-Hive-default-1_3[2] tgk2hddhbuxiaxi6lsjzjnavf complete 9m8hh96du7220yxtv65a8840q 172.22.68.27/mattxps $ ensembl-hive/scripts/dev/docker_jobs.pl long_mult-Hive-default-1_2 Service_ID Service_name_and_index Task_ID Status Node_ID Node_name mwtzqypba2tnrrmfi4lg7wc43 long_mult-Hive-default-1_2[1] v96yhbbv7yli4xr3855d18x1y complete hw7a5jd8tx20e51istjp3dp1i 172.22.70.252/matttop mwtzqypba2tnrrmfi4lg7wc43 long_mult-Hive-default-1_2[2] 0448t1akalt8coak7vj1q2d9l complete 9m8hh96du7220yxtv65a8840q 172.22.68.27/mattxps mwtzqypba2tnrrmfi4lg7wc43 long_mult-Hive-default-1_2[3] mf2oev5kcltklz9hgenas1xc4 complete hw7a5jd8tx20e51istjp3dp1i 172.22.70.252/matttop mwtzqypba2tnrrmfi4lg7wc43 long_mult-Hive-default-1_2[4] 36a7uxdqc0l6m0kxkunp6rjn9 complete 9m8hh96du7220yxtv65a8840q 172.22.68.27/mattxps
You can submit new Workers to the swarm by creating a service that would run runWorker.pl:
docker service create --name=worker --replicas=1 --restart-condition=none ensemblorg/ensembl-hive-docker-swarm runWorker.pl -url $EHIVE_URL