Why the Data team loves Clojure

Where we use Clojure

Here at Sakay, the Data team’s tools are primarily written in Clojure (and Clojurescript). The biggest codebase written in this language is hOTPot. What exactly is hOTPot? If you have experience working with transport data, the spelling may have already given you a hint.

hOTPot is our API proxy that normalizes results from OTP (hence why hOTPot), adds additional information related to the route, and orchestrates the flow of this data. Whenever you do a search from this destination to that destination, the client makes a request to the server, and that server is hOTPot. hOTPot receives the request, calculates and transforms the data, routes this data to the appropriate different components, and returns the itinerary back to the client.

In the midst of all these actions and calculations happening, one important thing to highlight is that what’s flowing all throughout is plain ol’ data—specifically Clojure’s common data structures like Maps, Lists, Sets. No classes, objects nor structs and the Iike. This drastically simplifies things as that data becomes easy to manipulate. Pair it with REPL-driven development, and the data is literally right at your fingertips every step of the transformation.

REPL-Driven Development

One of Clojure’s strengths is its REPL-driven development (moving forward to be referred to as RDD). A succinct way to describe it is its a set of tools and practices for programming that emphasizes fast and rich feedback. It encourages working on small bits of code, getting realtime feedback on the output, making iterations based on that output and cycle repeats.

A big enabler for this is your dev tooling (and is often one of the blockers for someone getting into Clojure). This is because the best free option for a long time was emacs, which meant for most people not only did you have to learn the language, but you also had to learn a new text editor. The easier option was Cursive, but it was and isn’t free for paid work.

Fortunately there now exists Calva, which makes VSCode one of the best Clojure and Clojurescript editors out there. Their website has excellent documentation which makes setup a breeze, and it even provides a Getting Started project which introduces you to the Clojure language and RDD.

One thing that’s so great about Calva is how it views REPL-driven development:

Mainly, I think Stuart Halloway is right about the REPL being best used from inside the files you are editing rather than from the prompt. It doesn’t mean that Calva’s REPL window should be neglected, but efforts should be directed such that the file editor REPL is our first way to improve the experience. Expect the Calva REPL window to get much less ”in your face”, than it is today, as the editor REPL gets stronger.

What this means in practice is rich comments support. There’s an excellent video in the link that gives you an idea of the workflow. Peter Stromberg, the creator of Calva, walks through the FizzBuzz problem using Clojure and RDD. Main takeaway is it’s like having a souped up Go Playground in your editor in the exact file you want to make the changes in. Some benefits of which are less context switching, and it has access to your locally defined vars and functions. This means there is less need to play computer in your head, as you have instant feedback in your REPL instead, allowing you to rapidly iterate through your problem.

For example, let’s say you’re thinking of using mod in a function, but you’re not sure exactly how it works. No need to context switch to a browser and search. No need to ask your coworker beside you. Just ask the REPL.

REPL-Driven Debugging

Now while RDD is great once you understand it, there is quite a learning curve. It took a while for it to click with me because it was not easy finding materials about REPL-driven development with real-world projects, especially in regards to debugging. Toy projects? Easy, who debugs toy projects anyway. Real-world projects? It was “Let’s just console.log all things.” for a while…until I came across this article and the concept of Inline def debugging.

In a nutshell, since symbols bound with def are global to the current namespace, this means you have access to the value that was last bound to it. This makes it perfect for RDD debugging. No need to switch to the browser to see what the last console.log statement returned. It’s also perfect for debugging hard-to-manually-simulate-in-REPL inputs.

For example, in one of our Data tools, I recently pushed out a feature. However, it seems this feature broke something because when you clicked “Export” for certain types of transport feeds, it stopped exporting a .json file for download. The console showed this error:

Inline def debugging to the rescue! I referred to the relevant lines of code in the error, and arrived at this function:

For those unfamiliar with Clojure, update1 calls the accumulate-passthroughs function on the value bound to the :waypoints key of a map. So given that it was the value bound to :waypoints that was being transformed, this is what I bound to the inline def expression:

Next I click “Export” in the Data tool GUI so that waypoints gets called and bound. If I go to my REPL and enter waypoints it returns the value last bound to it:

I won’t go into further detail with regards to the problem scope and context, but the cause of the error was a missing :shape key in the 2nd object. What I want you to focus on is that debugging this was made very easy because I could quickly see the input being passed with the help of inIine def debugging.

You may even want to continue working with that waypoints variable to fix the issue. Since you already have sample data for an error-causing input, you can quickly spin up a rich comment and interactively work with that data.

Here’s what a real-world rich comment may look Iike. This served as both documentation and what iterating through a problem may look like:

Now when another dev comes across the code, they can just call each expression to figure out how and why certain parts of the code exist. They don’t need to ask a person…they can just ask the REPL.

Additional Resources

I hope that gave you some insight on why Clojure and RDD can be such a great environment to work in. If you’d like to start learning Clojure, here are some great (free and not free) resources that may be of help: