Dockerizing Development at Domino, Part II

If you've been following my blog for awhile, you probably saw my post on using Docker Compose to improve Domino Data Lab's development environment (titled "Docker Composing for Fun and Profit"). This post is the sequel.

Over the last few weeks we prioritized closing the loop on these development environment improvements because we had three new developers joining the fray. Historically our machine bootstrapping process has been a disaster. Getting my machine fully up and running when I joined last year took two days, if my memory serves me correctly.. The good news is that it seems like we've successfully tamed the beast.

Domino now has a working, containerized development environment and a simple python-based bootstrap script for OS X and Ubuntu to help facilitate setting a new environment up. Together these tools significantly reduce the time it takes to get a working environment down to a few hours or less. Yesterday I actually took these both out for a spin on an AWS server I was setting up "just to see." Using them I got to a working environment in right at an hour.

Today I want to share a few things with you about what we're doing to make this successful. But first let's review where we started and then proceed to where we're going.

The setup I described in my last post had roughly this structure:

  • Container for database, other services.
  • Container for holding the application under development.
  • On Linux mount the AUD directly. On OS X use rsync to push changes into the Docker VM on your machine.
  • Use docker-compose to orchestrate all of these containers.

If you need more detail than that feel free to go through the previous post. :) Now a bit about where we are today.

OS X Volume Mounting with Speed!

I made some pretty important changes to the way I set things up in OS X from my first post. In particular, you may note that in my first post I recommended using rsync to move changes from my host system (OS X) into the VirtualBox container. This hack needed to exist because of the horrible performance of VirtualBox's own file sharing system.

Since then I found a much easier setup in the form of Dinghy. Dinghy is a wrapper around docker-machine that sets up NFS sharing of the /Users folder between OS X and the VM that Docker is actually running in. NFS is much, much more performant than VirtualBox's file sharing, and enough so that we can mount our source directly into the container without issue. So things now worked a bit closer to how they did on Linux based systems, modulo one, important issue: some of our containers expected to be able to chown and chmod things as root. Dinghy's default settings prevent this over NFS.

For most folks these default settings are fine, but occasionally people like us need an escape hatch in order to do what we want to do. I proposed codekitchen/dinghy#170 (which is merged but not yet released) to provide just such an escape hatch and permit overriding the default file sharing settings provided by Dinghy. With that change in place things are mostly smooth in the file sharing department.

That said, as a word of advice: be aware that NFS has some weird things about it. Some applications (such as pip) will blow up if the wrong folder is actually an NFS share. (In our case we had one container where /tmp was mounted from the host. Boom!)

Bare Metal is Sometimes Best

The most important thing we learned was that the container for the Application Under Development was a bad idea for our system.

In our case, our AUD has to do a good bit of I/O on the file system and cares a lot about file ownership and permissions for security reasons. These reasons largely only apply in the production system, but minimizing differences between production and development is a priority for us. It's also not uncommon for sbt to do some I/O on our behalf that an IDE like IntelliJ will want to inspect in order to provide helpful features like autocompletion.

Anyone who has done a little bit of work in Docker will tell you that user id and group id management between docker containers and the host system is a little bit of a dumpster fire. After a long, winding adventure in trying to make the UIDs/GIDs automatically line up regardless of your actual host system, we took a step back and re-evaluated the consistency (or sanity) of trying to do this in the first place. We landed on the conclusion that, although Docker was really useful for ensuring that the services we needed were set up and configured correctly, the cost of trying to figure out how to get all of the moving pieces we wanted to use (IDEA, various other editors, sbt, tools, etc) working consistently wasn't worth the benefit. Linux and OS X systems both experienced a good deal of pain around trying to do that.

We concluded that bare metal is sometimes best. With the number of metaphorical coconuts in the air that depend on proper UID/GID/permissions it just made more sense for us to keep the actual application we are building on the host system and use containers for the services it needs.

Bootstrap Scripts Feel Painful, but Do Work

Another part of this project of ours was the bootstrap script for OS X and Ubuntu machines. I don't have a lot to say about this except that, as the author, it felt like a really burdensome thing to write. I was convinced at various moments that this wouldn't be worth the time I was spending on it.

Yesterday I set up a brand new Ubuntu environment using that script. It was worth it.

Looking Forward

We've come a long way since my first blog post, and we're continuing to improve.

Since my initial work with Dinghy, I've started participating in the Docker for Mac Beta, which is based on xhyve virtualization and a custom file sharing layer that is both a) performant and b) not subject to some of the peculiarities of NFS. I think that once this is released to the general public it'll be a much better experience than what we have today.

As always, I would love feedback on what you thought of this post! Positive feedback is what convinces me to write more frequently, so if you enjoyed this let me know!

Docker Composing for Fun and Profit

Our application at Domino is complex to say the least. I think one of the best accomplishments of the engineering team so far is that the interface into our product is probably deceptively simple. But the mechanics of getting that experience right are - as you might imagine - consistently require a lot of deep thought.

As our very own jobs page states: "This isn't your run-of-the-mill CRUD app." And that's true.

One of the difficulties that comes out is that as a result of the complexity of our application our development environment has deviated a good deal from our productionized environment. Pretty much anyone on the team can boot the app, run some basic jobs, and generally tinker around with it. But there are a number of things that won't work quite correctly. Or worse, they work but very differently than they do in a live environment.

Luckily we have the capability to spin up a live environment pretty easily, but ideally we want to catch things (and generally be able to develop against the product) without involving AWS whenever possible. It's a lot quicker to save a file and reload the application locally than it is to generate artifacts and deploy them to a server.

So I decided to go on an adventure with docker-compose yesterday to see how close I could get us to that reality.

It turns out I got pretty damn close. As of today, I can docker-compose up in our application and get an array of docker containers that have all the crucial bits of our applications. Along the way I learned some interesting bits that I thought I would share.

VirtualBox file sharing is not sbt's friend

One of the most painful bits about my attempts to set this up were the abysmal performance characteristics of VirtualBox's file sharing system. One of the nicest things about docker-machine is that it will mount /Users into the VirtualBox VM for you. So anything under /Users is directly mountable to a Docker container as if you were on a linux machine. Unfortunately, the fact that it's going through VirtualBox causes all sorts of weird performance characteristics that aren't happy.

To get around this I picked a folder on the VirtualBox VM that I wanted all our files to go to, and wrote a script that executes rsync over an SSH connection to the VM to push file changes to the root of the VM file system.

Using the environment variables that docker machine sets for me (plus one that I added), I was able to make it pretty generic:

rsync --archive \
--rsh="ssh -i $DOCKER_CERT_PATH/id_rsa -o StrictHostKeyChecking=no" \
--exclude ".git" \
--exclude "*/target" \
$(pwd) \
docker@$DOCKER_HOST_IP:/

Executed from the root of my project, this will package and upload all the relevant files to VirtualBox. Note the DOCKER_HOST_IP isn't a default environment variable. That's one I defined like so:

DOCKER_HOST_IP=$(docker-machine ip $DOCKER_MACHINE_NAME)

All other references to the DOCKER_HOST include a bunch of "tcp://..." and other garbage like that when, sometimes, you just want the IP address.

This pretty much eliminated all my file system related performance issues and had the added bonus of putting an end to the ridiculous Play-auto-reload that annoys me to no end. (I'm officially declaring my intent to burn our Play app to the ground and replace it with Lift - but that's going to be an entirely different blog post.)

Give it More Juice!

In order to get things humming nicely I had to give the VirtualBox VM some more juice. I upped its memory allowance to 4GB and gave it 4 cores to play with. You are, after all, running sbt in there.

Yo Dawg, I Heard You Like Docker

I'm using Docker to use an app that wants to use Docker. YO DAWG. 

But seriously, nesting Dockers, while possible, probably isn't the best idea for active development because then you've got to shell into the container running your application to take a look at what containers its running and tinker around with them. It's a much nicer experience to have that all available from my OS X shell with my normal docker commands. So, I did just that.

Turns out docker-compose is quite clever. If, in your environment section of your docker-compose.yml file, you define an environment variable without giving it a value it will pull the value from the currently running shell. So if you're composing a container that has docker installed, you can tell that docker CLI inside the container to connect to the very same docker daemon that is running it very easily.

In the context of an entire compose file that looks like:

Next Steps

For our situation at least, we have a bit of duplication between what I've done to get our system running locally and our devops stack that deploys our servers. We're going to be looking at de-duplicating some of that moving forward. The worst offender at the moment is a particular config file that's > 200 lines that has to be manually altered when someone wants to run this setup on their machine.

That aside though, I'm thrilled that I got to play with docker-compose a bit and it has solved a very real problem of not having a realistic environment to test my code in inside OS X.

Next: Read the follow-up to this post, Dockerizing Development at Domino, Part II

Write Code That is Easy To Delete

This blog post has some genuinely good advice:

Every line of code written comes at a price: maintenance. To avoid paying for a lot of code, we build reusable software. The problem with code re-use is that it gets in the way of changing your mind later on.

We're always super eager to DRY up our code. The truth is, not all code needs to be DRY. I think in my career I've developed a few different philosophies for how I interact with a codebase depending on the situation that I'm in.

I think the separation between library and concrete application and the separation between long-lived and frequently-changing code are the two most impactful in my engineering style.

If I'm writing a library I'm going to be aggressive about providing consumers of the library levers to pull to make it do what they want. If I'm writing an application, I'd prefer to repeat some code or a pattern a few times before promoting it to an abstraction of its own.

If I'm writing long-lived code, I'd prefer to sit and think it through intentionally. Choose good names, think about the relationships between the moving parts. If I'm writing something that's probably going to change next week I (for better or worse) will likely concern myself less with those things.

That's kind of the tradeoff I think. It's inevitable, right? Either way, this article was thought provoking for me about how I write code. Hopefully it'll do the same for you.

All the Life Changes. At Once.

I suspect that October and November 2015 will go down in the record books. I seem to have happened upon several life changes all at once. The first, and biggest of which being getting engaged. Truthfully, if you told me at this point last year I'd be writing that sentence today I would have had a good laugh at you. Secondary to that, I left my role at Elemica at the end of October and on Monday I'm starting a new position slinging code for Domino Data Lab.

With these changes I felt that this was an appropriate time to do an introspective blog post. So, here is what has been on my mind for the past week arranged in no particular order.

Leave the Team Better Off

I was at Elemica for nearly three years. Two months shy of that marker, actually. That's the longest position I've held since graduation. It was also one of those most rewarding positions I've held.

Now is the right time to make a move, but I can't help but take a bit of pause at the fact that there are some things and people I'm going to miss every day. I helped build some stuff that's pretty damn impressive. You can actually watch Arun, the VP of the Engineering Department, presenting it at Elemica's conference this year. Maybe the full impact of it doesn't translate outside the world of supply chain. If not then just trust me. It's awesome.

I'm hoping to be at Domino much longer than three years. But however long I'm there, however long I'm anywhere, I've realized that I have a responsibility to improve the team I'm on as much as I do to improve the product we're building. Whether I'm there a year or ten, I have an opportunity to improve any team I'm on.

At Elemica, that looked like being honest when I had a perspective that was contrary to the prevailing one on the team. It looked like being willing to slog through writing and rewriting Selenium tests to try and make them reliable. Sometimes it looked like taking a moment to tweak a Jenkins plugin or our Hubot. It could also look like walking someone through Lift's form handling step by step.

These are all things that are a little bit tangental to my primary goal of delivering functionality. They are, however, equally (if not more) important.

Commutes are Toxic

During my week off I've been working on passion projects. Mostly stuff like moving the Georgia General Assembly API to a free Heroku instance, working on making TravisCI build the Lift Framework snapshot releases, restructuring the finances for Crazy Goat Creative a bit, and working on a pet project that I hope to share soon.

Meanwhile I went ahead and moved into a coworking space that's not far from home. I debated a lot about actually joining a coworking space. Going to the office was formerly this incredibly depressing proposition for me. I wasn't sure whether or not I wanted to pay for one.

This week I've seen that now it's largely a non-event. On Monday I was able to leave to space, go get gas, pick up some stuff from Publix and go home. This took the same amount of time my commute home from Elemica would have taken. And guess what, I was in a much better mood at the end of it! I didn't have to sit in stop and go traffic down 400 and then on the downtown connector to get home. Regardless of the time of day I can normally get home in about 10 minutes.

Net-net: I don't think humans were designed to sit in cars. I realized that was having a huge impact on my mood and with that element removed from my regular, weekly routine I'm already much happier. Here's hoping that the sun comes out and I can start biking ā€“ and then become much healthier.

Life Things Matter

Getting engaged has been a whirlwind of an experience. That timed with DHH's Medium post "RECONSIDER" got me thinking again about the whole work/life balance thing. David said it pretty well, so I'll just quote him:

I wanted a life beyond work. Hobbies, family, and intellectual stimulation and pursuits beyond Hacker News, what the next-next-next JavaScript framework looks like, and how we can optimize our signup funnel.

I wanted to embrace the constraints of a roughly 40-hour work week and feel good about it once it was over. Not constantly thinking I owed someone more of my precious twenties and thirties. I only get those decades once, shit if Iā€™m going to sell them to someone for a bigger buck a later day.

Minus the (what I think is a) typo at the end, this largely articulates my feelings on the matter: "Shit if I'm going to give that time away." I only get each year once and there are no do-overs.

I want to put a dent in the universe with my work. But while there will be periods of long days, I will not allow that to become a permanent lifestyle. It will be the exception to the rule and it will be time boxed and for a specific reason (e.g. "We broke the production system and it has to be fixed ASAP" or "We're backlogged and working on hiring more people or reducing scope to fix that problem."). If the day comes where I'm not holding to that standard, then I will move to rectify that as quickly as possible.

Maybe that means I'll never be a founder of a venture-backed startup or an Executive Vice President of anything. If so, I'm okay with that. My nights sitting next to Katie on the couch drinking wine and binging on How I Met Your Mother are far more precious to me than either of those distinctions.

Honestly, companies, investors, and everyone else should be encouraging that ordering of priorities. My last post was on the importance of empathy in software engineering. The truth is that if I'm building software for humans to use, I'm probably going build software more empathetically if I'm a complete, healthy human ā€“ physically, emotionally, and spiritually. At the end of the day, that translates to better software for the organization that writes my paycheck. That, in turn, translates to software that is more valuable to the customer and capable of bearing a higher price tag.

Also, I am guaranteed of one thing in this life: I will affect others. Software comes and goes. It's unlikely my name will ever be listed alongside Alan Turing for my contributions to Computer Science. But how I impact the people that fill my life is an effect that will ripple for decades. It is my responsibility to enjoy and be present with those people and impact them well. In the long term that will mean being a present husband, a present father, and a present friend. The challenge with that is that impacting people isn't something that can be put on autopilot. If I mentally check out of those relationships because work becomes too demanding I don't stop affecting those people. I start affecting them negatively.

No paycheck or dent in the universe matters if to achieve it I end up hurting the people closest to me, missing out on the beauty and the wonder that life has to offer, and breaking myself in the process.

Conclusion

Hopefully you've enjoyed reading this somewhat random collection of thoughts. Things are pretty exciting in my life right now and that's prompted a lot of thinking. If you made it this far, thank you for entertaining my little outlet.