I do not love transitive dependencies. However, they seem almost unavoidable. Consider HTTP libraries. The odds are that if you're writing software in 2016, there's a decent chance you're writing software that will want to talk to some other software on the Internet. Furthermore, there's also a decent chance you'll use HTTP to talk to that other software.
The good news here is that you don't have to keep re-inventing the wheel. You have plenty of reusable libraries to choose from in order to accomplish this communication. Pull them in to your Super Cool Open Source Project™, and your project then can speak HTTP with relatively little effort from you! Being able to re-use code in this manner is one of the major strengths of today's software world. It's also a significant weakness.
I've spent some of my free time lately developing a project I've named Giterrific. As a part of that project I've had to wrestle with some of the problems inherent in transitive dependencies and, in the process, have developed a solution I'm calling the Pluggable Dependency Pattern that I believe more open source projects should consider adopting. This blog post will outline the issues we ran into attempting to integrate Giterrific into a proof of concept project at Domino, my employer, and how I used this pattern to overcome those issues.
This pattern may not in fact be new to you. I've seen it appear sporadically in other projects. Yet in implementing it myself I learned some useful guidelines for integrating it into my projects, and became convinced this is something other projects should be discussing.
To set up the story that I'm about to tell you, it's helpful to ensure we all start with an understanding of transitive dependencies.
A transitive dependency is a dependency that exists because of transitivity. If you're writing project A, and project A depends on project B. In turn, project B depends on project C. Therefore, C is a transitive dependency of A.
If this sounds fimiliar to transitivity from a Mathematics class, then you're thinking of the right thing. It works the same way!
Dumpster Fires, the Mother of Invention
Giterrific aims to do one thing well: expose information about private Git repositories in a simple, straightforward JSON API. Further, it also aims to make that easy to integrate with your existing code. To that end, I've provided a library named giterrific-client that provides Scala API bindings for calling out to the Giterrific server.
The abstraction here is pretty simple. You make a call to a function I've defined that is designed to retrieve commit information for a particular git branch. Behind the scenes, giterrific-client will make an HTTP request to your installation of Giterrific and return the result. To accomplish this, I used my favorite HTTP abstraction: Databinder Dispatch. This was a simple 48 hour project over a weekend. The following week I handed this off to a coworker who I thought could find use of this at Domino.
Can you guess what happened?
If you guessed dumpster fire, then you guessed correctly.
At Domino, we're using the Play Framework for a lot of our development work - much like a lot of other Scala shops. The Play Framework version we're using depends on a library called "Async HTTP Client." It's a Java library that makes speaking HTTP easier.
For giterrific-client, I used Databinder Dispatch to help me speak HTTP easier. Dispatch in turn uses Async HTTP Client, also. The problem: Dispatch uses a different version of Async HTTP Client than Play. Moreover, it uses a totally incompatible version. This meant that with the way I'd architected giterrific-client, there's no way that any relatively recent Play application could use it.
This is why transitive dependencies are so obnoxious. It's really trivial on any project to create a situation where you have conflicting dependencies. To explain another way consider the following example:
- Your project A1 depends on B1 and C1.
- B1 depends on X1.
- C1 depends on X2.
- A1 therefore depends on X1 and X2.
- X1 and X2 will attempt to occupy the exact same space, and conflict. As a result either B1 or C1 will not behave as they were intended to.
So what do we do to fix this problem? Well, typically in this scenario project A1 would get no say in what B1 or C1 pull in transitively. However, if it could, it could easliy tell B1 "Hey, I want you to use X2 instead of X1 - and here's how you talk to it."
To resolve the issue, I tweaked the architecture of Giterrific to use what I'm calling The Pluggable Dependency Pattern.
I created an interface I've named
HttpDriver. Previously, Giterrific's client code would talk directly to Dispatch when it wanted to make an HTTP request. Now, it'll talk to an
HttpDriver. By default, this will still talk to Dispatch under the hood, but it can be configured to talk to whatever HTTP library you'd prefer to use.
For Giterrific 0.2.0, I ship four different pre-built drivers: the default Dispatch driver, a driver for Play 2.4, a driver for Play 2.5, and a driver for Finagle HTTP. If one of them doesn't work for you, it's quite trivial to swap out which driver is being used or implement your own
HttpDriver from scratch. Since I've done this, we've successfully been able to implement a Giterrific POC in our Play application at Domino with minimal fuss.
The caveat here is that the underlying HTTP libraries are no longer declared as transitive dependencies. The project that wants to use Giterrific has to pull in Dispatch, Finagle, or Play themselves.
From Transitivity to Pluggability
Surely, making everything pluggable on every open source project would become very annoying. Transitive dependencies are a huge convenience until they blow up in some unexpected way. Arguably, this is an avoidable problem by using a microservice architecture instead of a monolith architecture. However, I think in spite of that there's space for reasonable people to disagree when it comes to "the best fit" for a certain underlying service.
Some folks make heavy use of some core Finagle features throughout their architecture, for example. If they get a benefit from that and want to use Giterrific, why should my personal preference for Databinder Dispatch handicap them?
Pluggability, I think, has a few specific cases that it will work very well for. Specifically, cases where:
- The task you're trying to accomplish is fairly common with a lot of angles.
- The task you're trying to accomplish is sufficiently complex to warrant concern over what transitive dependencies you'll pull in.
There are two examples of areas ripe for pluggability. Specifically, HTTP interaction and JSON serailization/deserialization. Both of these tend to be pretty common tasks with different angles to them (which socket library gets used, which reflection method does the serialization algorithm use) and both can require some complex dependencies in their own right to accomplish their end goal.
Furthermore, I think that it's important that the project which wishes to be pluggable should have its abstraction for pluggability self-contained. So, for example, I think it's unlikely (based on my current thinking) that I'll ever make
HttpDriver its own library. Doing so would just move the transitive dependency risk to a different link in the chain. Therefore, each project I implement this pattern in shall have it's own
HttpDriver that will define the minimum interface that project needs in order to speak HTTP.
This was a useful pattern for me to fix a real probem I ran into. As always, your mileage may vary. Either way, I would love to hear your thoughts on this. Reach out to me on Twitter at @farmdawgnation and let me know your thoughts. I'll update this post with any interesting exchanges that result.