Did you know that you can navigate the posts by swiping left and right?

A Spark microservice deployed on Docker 1.13 with Stacks and Compose

I was playing a lil bit with the new Compose v3 syntax and I found it super intuitive and easy. Let me share some notes on how to deploy an Apache Spark Docker Stack microservice with Compose, on a Docker 1.13 Swarm.

Apache Spark is a big-data tool, used to analyze raw data, scale structured data, launch compute jobs.

The initial topology of this microservice will consist of 1 Spark manager and 3 Spark workers. The manager is the cluster controller, while workers are the workhorses.

First, I usually drain the Swarm managers, a best-pratice useful for saving managers from running containers (and overloading Raft):

$ docker node update --availability drain node-1

Now, I create an overlay VxLAN internal network for Spark containers communication, calling it spark. I assign a specific subnet, the default one used by my images:

$ docker network create --driver overlay --subnet 10.0.0.0/22 spark

Now, let’s go through the Compose v3 YAML file to model the microservice. It’s fairly straightforward, introducing a nice deploy block, useful for specifying extra Docker Swarm options:

version: '3'

services:
  spark-master:
    image: fsoppelsa/spark-master
    networks:
      - spark
    ports:
      - 8080:8080
      - 7077:7077
      - 6066:6066
    deploy:
      replicas: 1
      update_config:
        parallelism: 1
        delay: 10s

  spark-worker:
    image: fsoppelsa/spark-worker
    networks:
      - spark
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s

networks:
  spark:
    external: true

Let’s save this file as spark.yml and deploy a Stack:

docker stack deploy -c spark.yml spark

After some minutes necessary to download the images from the Hub, we can check the services status:

Stacks:

$ docker stack ls
NAME   SERVICES
spark  2

Swarm services:

$ docker service ls
ID            NAME                MODE        REPLICAS  IMAGE
p08gv8jecnr3  spark_spark-master  replicated  1/1       fsoppelsa/spark-master:latest
shnxps5x2ev5  spark_spark-worker  replicated  3/3       fsoppelsa/spark-worker:latest

Nodes running the spark-worker service tasks:

$ docker service ps spark_spark-worker
ID            NAME                  IMAGE                          NODE    DESIRED STATE  CURRENT STATE           ERROR  PORTS
r7skx21oukv9  spark_spark-worker.1  fsoppelsa/spark-worker:latest  node-3  Running        Running 11 minutes ago         
ksrck05kf3hr  spark_spark-worker.2  fsoppelsa/spark-worker:latest  node-4  Running        Running 11 minutes ago         
pwptx8dm1tkx  spark_spark-worker.3  fsoppelsa/spark-worker:latest  node-2  Running        Running 11 minutes ago

We can now connect to port 8080 of any Swarm node and verify that we have 3 workers that joined the master:

Now, we can directly interact with services, for instance scale the service:

$ docker service scale spark_spark-worker=10
spark_spark-worker scaled to 10
$ docker service ls
ID            NAME                MODE        REPLICAS  IMAGE
p08gv8jecnr3  spark_spark-master  replicated  1/1       fsoppelsa/spark-master:latest
shnxps5x2ev5  spark_spark-worker  replicated  10/10     fsoppelsa/spark-worker:latest

And after some minutes, connect again to the Spark web interface and verify that we have 10 workers:

Of course, with the new Compose v3, rather than interacting with services manually, a new typical workflow would become to put the Stack YAML files in a revision control system and update the Stack with docker stack deploy.

Happy Docker!