One of the product management interview questions I’ve been asking of recent is this: “how tall was Napoleon?” The question requires a prop - a print out of search results for *how tall was napoleon*.

There are a couple assumptions I ask candidates to make. First, that his height is one of many that I want them to be able to algorithmically determine, so they can all be assembled into a large table of how tall every famous statesman was. Therefore, their solution should be generalized. Second, their algorithm can access anything on the search results page, anything linked to the search results page, as well as anything else they think would be useful, so long as it can be mapped back to the search results page in some way.

The most basic answers tend to involve strategies like extracting all the numbers on the search results page and averaging them or taking their mode.

Just to gather the numbers, there are lots of problems that need to be solved. If you try to grab his height off a web page, how does your algorithm know which of the many numbers on that page might actually represent his height? Once your algorithm grabs a bunch of potential heights (*5’2.5*, *5 foot 2½ inches*, *5’2 1/2”*, *five-foot-two*), how does it canonicalize the many representations? How do you know the subject referenced by each height you find (*Napolean I* versus *Napoleon II* versus *Napoleon III* versus *Napoleon Bonaparte*, etc.) is actually the one you are looking for? Eventually, you might deal with things like weighing the confidence you have in each height you retrieved based on the different sites you sourced the heights from. The list goes on.

But then there’s a subtlety to the problem in the case of my having chosen Napoleon: there are actually two different inch standards commonly used to report Napoleon height!

Napoleon is commonly considered short because he is often cited at around 5’2”. That’s 5’2”, however, measured under the old French standard for an inch (*pouce*). Translated to the current inch most of us are familiar with (the English standard), he was actually closer to 5’7”. Typical search results will therefore have a smattering of data that clusters around these two separate points. Averaging the numbers from both clusters to determine his height therefore will give a very wrong answer. Similarly, picking one of the two clusters by mode may well yield a very incomplete answer.

**This, in my mind, is where the question can lead to a more interesting conversation.**

Things I look for in a good answer:

- Did the candidate have adequate solutions for extracting the numbers off the page, canonicalizing, etc.?
- Did the candidate realize that Napoleon might be an outlier relative to the general problem of figuring out how tall any given statesman is? If so, how would their algorithm determine this?
- If they did realize the outlier nature, did they have a solution in mind for establishing a definitive height? Alternately, can they come up with a way to present the search for this fact to end users in order to take advantage of a broader audience of problem solvers?

The latter gets to the crux of what Factual delivers. We’re trying to weave together every fact we can (and not just from the internet!) into databases that developers can feel confident about building off of and contributing back into. That means encountering variations on the Napoleon problem over and over, and figuring out how to solve them systematically. It also means considering how the presentation of data to end users might incorporate even more helpful inputs to determine fact from fiction. It’s a pretty interesting journey so far.

Incidentally, Napoleon himself gets some credit for enabling one of the better solutions to this problem: searching for his height in centimeters. It is during his rule that the metric system was adopted in France.

Citations: