Factual is an open data platform for application developers that leverages large-scale data aggregation and community exchange. Our focus is on making data more accessible (i.e. cheaper, higher quality, less encumbered) for machines and developers, to drive and accelerate innovation in an unprecedented way. We take on the dirty work of data management and data curation, letting developers focus on higher value and more productive tasks. We provide clean, structured data with complete source transparency to developers on liberal terms.
Factual was founded in 2007 by Gil Elbaz, co-founder of Applied Semantics (which launched ASI’s AdSense product). Applied Semantics was acquired by Google in 2003, and Gil continued with Google until he left in 2007 to found Factual. Gil has long believed that making data accessible will enable more innovative tools and applications. To that end, he set out to develop an open data platform in an effort to maximize data accuracy, transparency, and accessibility. He has attracted a great team to help build out his vision.
Factual offers thousands of datasets across a variety of topics (with a deep focus in Local data) aggregated from multiple sources, made easily accessible for developers to build web and mobile apps. For example, you will find datasets for millions of U.S. and International local businesses and points of interest, as well as datasets on entertainment, education, government and health. Unlike other data providers, Factual believes data aggregation and curation should be a community effort which drastically brings down the cost of data ownership and management.
Here are a few reasons a developer might use Factual:
Our APIs are free to everyone up to a certain volume. Our downloads fees vary based on a variety of factors. Please see our pricing page for more details.
Factual’s datasets are open for edits and we put incentives in place for users of the data to provide quality contributions in exchange for discounts (or free!) access. This feeds into an open data ecosystem – similar to contributing back to the code base of an open source project.
While Factual has unique tools to improve the quality of the data over time, we cannot guarantee any data table will be 100% comprehensive and accurate. We hope that with the help of the community and our powerful inferencing and curation tools, our data will become better and more reliable over time.
No. There are many ways for Factual to know that a specific fact appears on a company’s website: users often cite websites as sources of facts when they are making inputs, edits and corrections; 3rd party web caches and search engines independently document the content of web pages; and less frequently, the conclusion that a fact appears on a specific site may be verified by our own crawl. All of our methodologies abide by robots.txt and other standard practices that dictate good behavior on the web.
Factual aggregates data from many sources including partners, user community, and the web, and applies a sophisticated machine-learning technology stack to: 1. Learn facts from millions of structured and unstructured sources 2. Clean, standardize, and canonicalize the data 3. Merge, de-dupe, and map entities across multiple sources.
We encourage our partners to provide edits and contributions back to the data ecosystem to reduce the overall transaction costs via exchange. A majority of our partners embrace our open model by providing edits and contributions, enriching the data for everyone.
On the surface, Factual Tables are very much like relational database tables: they're organized into rows and columns, and you can apply basic operations, such as filters and joins, to them. But Factual Tables differ from database tables in one very important way: Each cell in a Factual Table can incorporate multiple inputs entered by users or extracted from the web. These inputs are used to establish a consensus value for the cell. For example, a quick web search for “Napoleon height” returns two inconsistent answers: 5'2” and 5'6½”. A Factual Table of the heights of historical figures would collect inputs on Napoleon's height from various sources, and the value displayed would be determined by a consensus algorithm. (By the way, the most correct answer is 5'6½”. 5'2” is from the French system, in which the unit of measurement is longer than an Imperial inch.)
Furthermore, Factual Tables support operations at the underlying 'input' level. For example, an input filter can filter out a set of sources or other users who have been deemed unreliable, and the entire table can be re-rendered, ignoring those unwanted inputs entirely. It's a bit like a source code control system which enables viewing of a document at a historical point in time, except in addition to the time dimension, Factual can filter on user, source and other pieces of metadata.
Here are some of the things you can do with Factual Tables:
Data with a time dimension can be dealt with in a couple of ways. An appropriate field may be defined as a component of a schema, e.g. a historical table of average temperature per city per day. Or, it can be left out, as in a table which simply stores current temperature for each city. In this latter case, it would be be appropriate to limit historical inputs to recent history, or to change the aggregation function to "Wiki" which means only the last "edit" is displayed.
You may not use the website or the API to create a copy of Factual's data. If you would like a full copy of the data you may contact us regarding a download.
Many Factual datasets can be downloaded simply by clicking on the "Download" button. Some tables may require you contact us before downloading the data.
You can send us a csv file with your corrections or additions and we will update the data. Write functionality will be added to our new V3 API in Q2 2012.
When new rows of data are added to a Factual Table, they typically appear instantly in the table. Making read calls through the API, requesting a fresh download of the data, or previewing the data in the Factual workbench (/t/table_id) will immediately reflect the additions. There are some exceptions to this rules, on a table-by-table basis. When making corrections to facts in existing rows, your corrections may not always be reflected in API read calls or downloads unless the Factual system determines that your fact is indeed the most correct fact for that subject.
Factual provides options for white lists, black lists, custom filtering, and malicious user reporting. Behind the scenes, we also comb for bad data regularly, and try to ensure that sources of bad data factor fairly low in building the best data.
Factual does not currently support "private" fields within Factual Tables. All data in a single Factual Table is visible by anyone with access to the table.
Yes. However, for performance reasons, joins are currently impractical for some of the larger tables at Factual. In general, Factual will provide views of tables that are commonly joined together for you so that all the data you require will be in a single "view" of the data. If you don't see what you need, ask for help in our support forum.
When you submit a correction to Factual, we weigh your correction against other data we may have for facts about the same subject. If our algorithms don't find that your submissions seem as reliable as some of the other data we have, it may not show up. However, these things change over time and you may see your corrections appear as more data arrives to help corroborate it.
Different algorithms are used for each field of a Factual Table, and often tables contain algorithms that guide the process of including or excluding data table wide based on a variety of factors. For example, tables may consider whether or not there are corroborating inputs for a fact or row, if the provider of data has been flagged as suspicious, etc. Also, data provided to Factual is typically transformed to a "canonical" representation before it is weighed against other data. You may give Factual "1 Main Street", which Factual may weigh as "1 Main St." to find agreement among other sources.
UUIDs (Also called Factual IDs) uniquely identify entities at Factual. They are a standards compliant implementation of globally unique identifiers. As such, you can generate them on your own or rely on Factual to generate them when submitting data to Factual. A couple specific things about UUIDs when used at Factual make them uniquely useful. First, if an entity exists in more than one table, we work to ensure that the same UUID identifies that entity in each table so that they can be mapped or joined. Secondly, when you submit data to a Factual table that identifies entities by UUIDs and do not provide a UUID, Factual will attempt to determine if the entity already exists at Factual and reuse the existing UUID.
We are currently working on an API for discovering and downloading updates to Factual data that you have cached. Until it is ready, the only way to update your data is to re-download the dataset.
You can see documentation and code samples at the in the Factual developer site.
We do our best to provide outstanding uptime, but occasionally you may see slower performance periods. We will do our best to announce on the developer forums when scheduled updates will be taking place that will result in downtime. If you need an SLA / uptime guarantees please contact us.
Please refer to our support page.
Raw data, facts, and general ideas are not protected by copyright law. If you've legally gathered a bunch of data and post it to Factual, it is in most cases free for the world to use and build upon. If you'd like to place certain people's use of your data on certain conditions, or if you're contractually obligated (by someone who gave you the data) to display certain conditions, you should consider adding an enforceable terms and conditions to the table you contribute. Whether or not those terms are "enforceable" is your responsibility; courts and lawyers often disagree on what makes online terms enforceable, so if you have questions you should consult a lawyer or a do-it-yourself legal resource.
Don't upload data that are confidential, restricted from publication by contract (e.g. your employment contract), or that you've obtained by illegal means (e.g. cracking a protected database). This would violate Factual's terms of service and could make you personally liable for any damages that result from your actions.
If you're thinking of uploading content (as opposed to raw data and facts) to Factual, consider that Factual is a data organization tool, not a content hosting service. Many of Factual's most exciting capabilities, like the mashing together and sorting of previously unrelated data sets, don't really make sense when applied to content. To the extent you do contribute original expression (as opposed to facts) to the site -- e.g., when you contribute editorial comments to certain data sets -- you can keep the copyright in your expression but understand that Factual and all of its users may make use of it under the Factual Terms of Service.
Finally, you should take the time to read the Factual Terms of Service. (Really!) They're human-friendly (as opposed to just lawyer-friendly) and very important to understand, as they lay out your rights, privileges, and responsibilities while using the site.
Pretty much! Most of the data on Factual is free to re-use, analyze, download, manipulate, build upon, and mash with other data. In some cases, whoever posted the table (including, in some cases, Factual itself!) may apply some usage guidelines because they're obligated by contract to other parties, or just because they want you to abide by certain uses. Where you see those special terms, it is your responsibility to abide by them. If you have any doubts or questions about it, contact us or just turn to data that's free to use -- which, by default, most of the data on Factual is.
Send us a notice that complies with the DMCA requirements and we'll follow the DMCA process faithfully.