MoviePass is huge in our engineering department. We love movies. We love schemes. We’re walking distance from the movie theater in the newly refurbished Century City mall. It’s all a movie match made in heaven (or was).
So we’re seeing a ton of movies. And that means we’re constantly learning about how, at the end of the day, it’s what’s on the inside that matters the most — whether we’re talking about a stuffed bear, strangely attractive fish monster, or strangely attractive prince monster.
And it’s a lesson that we’re learning on the engineering team here at Factual, too.
For the past five years, we relied on a data system that gave us lots of scale and automation — which is exactly how our scrappy team can maintain such high quality, fresh data across 52 countries, 130MM POIs, and 475 categories. That was great, but we needed finer-grained tools that would let us balance that scale with precise, targeted improvements.
Late last year, our places team shipped a top-to-bottom overhaul of our Global Places data processing pipeline. Externally, not much has changed: we still have the same great, fresh global POI dataset. But on the inside, we’ve completely changed how we think about and build data. And unlike a marmalade sandwich Paddington forgets outside, this change won’t go stale anytime soon.
We call our new processing pipeline Neutronic — a big data processing platform built by customizing HBase’s snapshot ability so that it gives us Git-like flow controls over our big data (there’s also a healthy amount of Solr, Elasticsearch, and Kafka thrown in). It’s “git for big data,” as we like to say.
Neutronic has changed the way we build our places datasets. It’s changed how we think about our data sets. It’s a big deal. To break it down further, here are the three ways it’s changing our data game:
- Neutronic is independent – Neutronic borrows the idea of branches from Git — allowing engineers and teams to run workflows that change, mutate, update, and refresh our data independently from each other. Whereas all changes had to be packaged together into a single release in the past, Neutronic will allow our category team to ship changes in isolation from our machine learning teams’ experiments.
- Neutronic can be incremental – Our old data system forced us to adopt an “all aboard” approach to updating our data. When we ran workflows, not only were all the teams contributing their changes onto a single release, but our default workflows ran for all releases. Now, if we just need to update a chain to keep it perfectly accurate, we can do that and no more!
- Neutronic is flexible – These days, we have a pretty good idea of what we do well here at Factual (hint: it’s location data). That focus allowed us to build Neutronic to specifically attack the problem of aggregating, cleaning and optimizing an enormous POI dataset. We were able to sneak in tons of features that will allow us to more naturally express things like relationships between places and the different ways to geographically represent a building. It’s like pouring a foundation when you know exactly what kind of house you want to build.
So we’ve upgraded our tools big time, but that’s probably only visible from the inside (at least, until we publish this post). Neutronic gives us more flexibility to improve our data, frees up development time to work on new features and shortens iteration time. As a partner or customer of Factual, you can expect Factual to become even better at keeping our POI dataset accurate and up-to-date, ultimately being a better reflection of the physical world.
And that’s our key job right now: to translate these internal improvements into undeniable external results. To do that will prove once again what we’ve come to learn from stuffed bears, shut-in princes, and sensual fish monsters: it’s what’s on the inside that really matters.
By the way — if you’re interested in coming to work on our data or infrastructure: check out our job listings! Neutronic may have shipped, but we aren’t done improving it. To extend the Git metaphor, we’re in the process of building “continuous integration for data” to sit on top of our “Git for data” … among many, many other projects. Click here to view our open jobs.