Return to Blog

Perfect Data Does Not Exist

In today’s top achieving, insta-worthy culture, the quest for perfection is more palpable than ever. Whether it’s obtaining the “perfect” look, attending the top schools and then working at only the “top” firms/companies, or being the model employee, we are encouraged to strive for perfection. However, true perfection is an illusion, and beauty (or perfection) is in the eye of the beholder. These same expectations of perfection can also be applied to location data.

This is especially applicable when businesses are seeking POI location data sets to be used to power products or drive enhanced business intelligence and analytics. The demand is for a location dataset that is always an exact replica of the physical world, consistently reflecting every change that happens in the brick and mortar space. It is also expected to be exactly suitable for every, individual use case. However, very much like in our quest for a picture-perfect life, there are unique obstacles in creating and measuring the quality of a “perfect” location dataset.

We at Factual are constantly thinking about data quality, how to weave it into our processes. This is the first piece in a series addressing data quality. Let’s start by understanding marketplace challenges and the different methods available to measure data quality.

UNDERSTANDING MARKETPLACE CHALLENGES
There are three primary challenges that data providers face when building high quality data sets.

1. Recency

Brick and mortar spaces have become more volatile and movable, with changes in the business landscape occurring more quickly than ever.  From pop-up shops, to co-working spaces, to restaurant churn, POIs can change in what seems like a blink of an eye. It can be difficult to capture each of these changes in real-time due to lags in freshness of public records, social sites/apps or even on a company’s own website.

2. Universal standards & availability of data

While there is some standardization in format for place records (address, phone number, etc) in western countries, there is no universal standard that all countries adhere to when recording POIs. Each country has its own standardized format and in some cases, particularly in emerging countries, there is no (or very little) regulation in format at all. Additionally, the level of development (technological and otherwise) of the country can affect the availability of place record data for data aggregation. Countries where there are fewer, more scattered digital signals (e.g. India) pose an inherent challenge for POI mapping.

Therefore, the third and overarching reason is that:

3.  Perfect data does not exist

Accuracy and availability of data received from any source is reliant on how quickly companies are able to update those records. This requires constant iteration on a dataset that may never be considered “finalized.” The result is a lack of absolute ground truth or a “gold” set of data that is updated in real-time and accurate at all times. While there are various sources that may serve as proxies for “gold” data and benchmarking, there is no practical way to gather and maintain absolute ground truth for worldwide POI data.

DATA DRIVEN APPROACH TO QUALITY
With these challenges in mind, Factual has taken a methodical approach towards measuring data quality. It is vital to understand our data’s strengths and weaknesses in order to ensure that the dataset we build is high quality. To do this, Factual uses two high level concepts: Comprehensiveness and Accuracy. Both are intrinsic to our data building process and are used at the POI entity and POI attribute levels. They are defined as:

  • POI Comprehensiveness: how much of the real world our dataset covers, or the percent of real world POIs that are found in our dataset.
  • POI Accuracy: how precise are our records, or what percent of our records are real places.
  • Attribute Comprehensiveness: for our records that correspond to real places, what percentage have a value for a given attribute (e.g., address)?
  • Attribute Accuracy: for our records that correspond to real places and that have a value for an attribute (e.g., address), what percentage have the correct value?

By using measures such as comprehensiveness and accuracy, Factual applies a data-driven approach to our data-building process. These two primary metrics (in additional to others), are at the core of how we build and measure the quality of our data.

Interested in learning more about Factual’s global data? Contact our team of location experts today.

RELATED POSTS