A few things that I would like to see in Docker

I’ve mentioned before that I’ve rather taken a liking to Docker. However, there are a few things that I wish that were better.

Docker Doc

Java doc did something extremely well. (Other than getting people to hate checkstyle) it made sure there there was a standard way to write documentation on your code base, and that it could be formatted in a proper, consistent way.

It would be great to see if there was a similar tool for Docker to produce consistent documentation on what the dockerfile produces. It would also be great to have registry support for this as well.

Some of the things that I would expect to see mentioned on there:

  1. Required environment variables for the application
  2. Documentation on maintenance to be done on the container (i.e. run particular methods via the exec command)
  3. Ports that should be shared
  4. What volumes are assumed to be there
  5. How the configuration should be set  (is it a file, etc).
  6. How to connect the application with other services (databases, message queues, etc)

Docker Templates

It would be great to generate best practice docker files based on if you’re working with a Tomcat application, or a standard Jar. This would get rid of the creation time needed to make a Dockerfile. Along with Dockerfile templates, it would be nice to see better Jenkins publishing support for Docker. However, that’s a subject for another day.

Docker Traits

Multiple inheritance has a lot of issues that Java avoids, and C++ deals with. The issue with this is avoided in Python (with mixins), and Scala/Groovy with the addition of traits. Traits are aways to side load content into the docker file, post the base container, without having to use inheritance. That means that you could create a Docker container with utilities rather than baking them into the base image.

Application Profiles

Applications typically have a limited set of configurations. They are either one time running applications or they’re web applications. There are sometimes deviations from this, but they’re not so common.

For the most part, one time running applications require a configuration set to go into the application and a location to write out the data. For web applications it mostly needs a port to be open, configuration to a data storage system (i.e. MySQL, MongoDB, etc), and configuration information.

The best way that I can forsee this being setup is to have a standard mount location for these folders outside of the container, and to have them set as required inputs for starting the container.

Another item on configuration. I would love to see the configuration copied from the container to its outside storage for the starter configuration. This would mean that the Docker Registry complex configuration would have a new mounted volume outside of the container there with a configuration ready to have a change made to it.

Project: Music Organization System

Before ITunes and online music stores/streaming-options came arround, you had to build up a digital music collection if you wanted to load an MP3 player with it. I always prefered this option. This meant that I could manage my own collection, and that I wasn’t fixed to a service that would eat up all of my data and that I could listen to what I wanted. For example, in most US music services you can’t find the band Die Toten Hosen. They’re a great German band, but they haven’t hit the US market. Also, having your own collection it’s a lot easier to move the collection to other devices without having the direct integration (such as my car stereo).

The downside to managing your own music collection is that you’re subject to managing the collection yourself. That means that a large collection can get unwieldy very quickly. Thankfully there are a few tools to help with that. I found the JThink tools (SongKong and Jaikoz) to be very helpful with keeping an organized music collection.

What is it?

This project is intended to automatically standardize files into a human-friendly collection without user intervention.

What technologies are used?

  • Bash
  • SongKong (Jthink.net)- For file formatting, metadata correction, and metadata improvements (From an online source)
  • FFMpeg (for media conversion)
  • Docker
  • Docker Registry

How did I solve it?

To solve this issue I did the following:

  1. Created the Dockerfile and outlined the general steps used.
  2. Identified the software dependencies.
  3. Opened up X forwarding to test out SongKong (It’s mainly an X application, that has the possibility of a command line tool)
  4. Ensured that Songkong could operate from within the Docker container
  5. Moved over the Ogg2MP3 and Flac2Mp3 scripts. (Which can be found at Github.com/monksy)
  6. Created a docker registry so that I can keep the docker image local. (Songkong is a licensed and for pay product)
  7. Setup the CI pipeline with Jenkins
  8. Create a script to run on the box managing the music collection. This uses the Docker Registry to pull down this process and run the organization utility
  9. Setup the Crontab scripts to run the container

Some of the challenges that I had while doing all of this included:

  1. The difference between the run command and entrypoint. The entrypoint command within docker runs the command when the container is invoked. The RUN command may run only when the container is being built.
  2. The Jenkins Docker plugins are a little difficult to use and setup. I tried using the Docker-build-step plugin, however, it tended to include very little documentation, was very unhelpful about invalid input and was difficult to build and publish. Fortunately the Cloudbees Docker Build and Publish plugin was just what I was looking for.
  3. Debugging the Docker Registry was a pain. For the most part you’ll have to depend on the standard output coming out. If not that, do a docker exec -ti <container id>  /bin/bash and look for the log files.
    1. This really needs to be improved to output what is broken and why
    2. Bad logins to the docker registry from the client go from Version 2 of the API to Version 1 if something goes wrong on the Version 2 request. (I.e. a bad certificate file). This is frustrating.
  4. If you have a LetsEncrypt certificate to use on the Docker registry, it’s not very well documented that you should be using the Fullchain certificate file. Without it, you’ll have security issues.
    1. Another note on this, it should be a lot easier to create users on the registry rather than to generate HTAccess files.
    2. If you are generate a user access file, you have to use bcrypt as the encryption option. Otherwise, the docker registry won’t accept your credentials.
  5. The storage option that I used for storing the collection was a network mount point. Not having the proper permissions on the server side for the child folders caused a wild goose chase. That lead to studying up on the options of the mount.cifs tool. (For example file_mode, dir_mode, noperms, and forceuid options).
  6. Reproducing the application’s environment was a little difficult as that it wasn’t clear about where it’s private files were located.
  7. The id3 tagging command originally used no longer exists. I had to upgrade to the Id3v2 software and reformat the command usage.

I really like Docker

Two years ago I wrote an article about my disdain about the popularity of creating virtual machines to host applications. I was discouraged with the attitude of creating images that weren’t easy to rebuild, and felt that it encouraged bad practices around application setup and maintenance. I also felt that it was It seemed incredibly wasteful in terms of storage, memory, and cpu cycles to do a full OS emulation. I was a fan of the idea of OpenVZ, and LXC at the time. I didn’t know a lot about those options and didn’t go forward with it. However, through cheap hosting providers I learned about the downsides of oversubscribed OpenVZ hosts.

 

Docker came to popularity. It was an approach that went the LXC/Cgroups route and made it easier to use. Docker doesn’t attempt to virtualize the entire stack, but it does attempt to reproduce the Linux environment within the container and attempts to isolate the process running therein. It’s basically a sandbox for the filesystem, processes, and network. All of the benefits of a VM but none of the full hardware emulation needed.

Why do I like it? There are a few reasons that I like Docker they are:

It’s portable– For most of the internal structure of the VM, it’s based on pre built images. To start up an app, just pull it down from it’s online repository. For example, to startup an instance of Couchbase is a matter of running the following command:

docker run -d –name db -p 8091-8094:8091-8094 -p 11210:11210 couchbase

In the context of another NoSQL server, Riak, previously you had to fight with bad platform support and Erlang installs. With Docker, all of this is configured in the container and doesn’t negatively affect the host OS. The portability of docker means that you no longer have to figure out a new install procedure if the product is using Docker. On top of that you can configure where the persistent storage will be located. The application within Docker will have no knowledge of where it’s stored, nor will it care. The same goes with the networking configuration.

It’s opensource/free– Event the Docker Repo, Registry, and base images are freely available via the Dockerhub. When you want to move away from this model you can reproduce it within your own environment. On top of this, the Docker registry is a docker container itself, and it allows for versioning.

It’s social and collaborative- With the introduction of the DockerHub, that means that you can build your images on existing images. If you want to upgrade the underlying infrastructure (Ubuntu 14.04 to 32.01) it’s just a matter of upgrading the base image in your Dockerfile. That allows for testing and debugging in an isolated and repeatable manner (as opposed of making golden images). The organizations responsible for creating the products, i.e. Couchbase and Ubuntu, have their own official images that are frequently updated.

It’s easy to track changes- Everything about the build of the Docker image is based a declarative script known as a Dockerfile. There are a few nuances in the script (I.e. How a run command is run versus an entry point) However, it’s fairly easy to create, update, and track changes (via source control).


So far the only downside to docker that I’ve seen has been more on a creator issue: There is a tendency to try to containerize everything in the application environment. That includes a container for the database and another for the storage which both are frequently hard linked to the application container. I realize that is helpful for cases where you need to have a particular version, however I would rather have a single database install on the host OS to share between containers and maintain the security for that separately.