Return to Blog

Extend Drake (Make for Data) with a Simple Clojure Project

Earlier this year we released Drake, an open source data workflow tool. It was exciting to see the interest in Drake. We were especially pleased by the quality of outside contributions made to the project (as one example: S3 support – thanks @howech!).

But we weren’t happy with the delay on our side in accepting these contributions. Our process to approve these contributions and include them in an “official release” took longer than we’d like.

We also worried that Drake could become bloated over time if we bundled every extension into Drake’s core.

To help fix this, we’ve added support for plugins!

As of version 0.1.4, developers can extend Drake with a simple Clojure project and publish that project to other Drake users immediately. No need to wait for us to review, merge, release. And users can easily pick and choose which plugins they run with.

Drake’s plugin mechanism allows modular extension of 1) Drake’s filesystem and 2) Drake’s step protocols.

Drake’s filesystem is how Drake understands how to deal with inputs and outputs, based on their prefix. For example, Drake has a filesystem implementation that recognizes the prefix hdfs (e.g., “hdfs:/myfile.txt”). Drake then knows to interact with HDFS when dealing with such a file. So now, any developer can publish a plugin that recognizes a custom prefix and teaches Drake how to interact with the appropriate filesystem.

Drake’s step protocols define how Drake runs the commands in any given step. The default protocol is shell, meaning step commands are run on the shell. Other built-in protocols include ruby, python and R. So now, any developer can publish a plugin that defines a new custom step protocol and teaches Drake how to run commands in any step that uses that protocol.

Here’s a quick illustration of the Clojure code you’d write to support a brand new Drake protocol called myprotocol:

(defn myprotocol []
 (reify Protocol
   (cmds-required? [_] true)
   (run [_ step]
     ;; What you'd really do here is process the
     ;; step's commands in some new and awesome way
     (println "Hello Step!"))))

Publish that plugin to Clojars as a regular Clojure project, and now anyone in the Drake community can make use of your cool new protocol, e.g.:

out.txt <- in.txt [myprotocol]
 some commands
 go here

Documentation for creating and using Drake plugins is on the Plugins wiki page.

To see a complete example of building a plugin, take a look at the drake-echostep project, which provides a working “Hello World” style demo.

For a more real world example, have a look at the drake-honeyql project, which demonstrates integrating a legacy Java project in order to support a new Drake step protocol.

If you’re interested in extending Drake with your own plugin and you’d like some help, feel free to contact us through Drake’s issue tracker or newsgroup.

Of course, you’re also more than welcome to publish a Drake plugin without any input from us! If you release a plugin for Drake, please add it to the list of known plugins on the Plugins wiki page.

Go make some great plugins!

Aaron Crow
Lead Software Engineer, Factual