Updates to the Factual iOS SDK

Factual would like to announce the updated release of our officially supported SDK for iOS!  This update includes support for the many new API features Factual has to offer. Here are some of the main updates:

New API support:

Resolve: starts with incomplete data for an entity, and returns potential entity matches
Facets: returns row counts for Factual tables, grouped by facets of data
Geopulse: provides point-based access to geographic attributes
Reverse Geocode: converts coordinates into addresses
Monetize: finds affiliate deals for places in Factual’s Global Places database
Match: maps your data to Factual places
Submit: submits edits to existing rows, or submits new rows of data to Factual tables
Flag: flags problematic rows in Factual tables

Opened up the source code (accessible via the main project on github).
Updated the sample iOS project to use this new SDK and illustrate new features.

Under the hood, we’ve also made following improvements:

Removed dependency on OAConsumer Library.
Switched to Automatic Reference Counting.

Simply follow the directions on our github page to get the latest version and start using it in your project.

Eva Ho Speaking at DataWeek 2012

Eva Ho, our VP of Marketing and Business Operations, will be speaking on a panel tomorrow at DataWeek titled “Challenges of Building on Geo Data”.  She will be joined by Ben Standefer (Urban Airship), Ankit Agarwal (Micello), and Peter Davie (TomTom).  It will be from 12:00 pm to 12:40 pm in the SPUR Urban Center, 4th floor, Room 2.  The session description is:

New US Release & Enhancements to Global Places

We are very pleased to announce a number of significant enhancements to our US Places dataset soon to be followed by similar improvements to the rest of the world.
Enrichments to US Data
We’ve added a boatload of new entities to the US including 80K landmarks (parks, memorials, historic buildings, and other monuments), 25K transport hubs (airports, rail stations and a handful of ports), and 190K new ATM locations. We’ve also included over 50 million additional references and edits from our partners to improve both coverage and accuracy. This brings us up to just over 23 million entities in the US alone, and over 63 million places in 50 countries worldwide.
Category Enhancements Globally
Our categories have taken an increasingly central role in the distribution and management of our data, so we’ve made our categorization framework more friendly to humans and more efficient for machines. These improvements include:

50 new categories for better, more granular classification
Numeric category IDs for more structured search and data management
Category translations available in Italian, German, French, Spanish, Korean, Japanese, and Chinese

We’ve made the entire category hierarchy available as a Factual table so you can query it in all languages, and also made it available as a JSON file on Github to facilitate baking in client-side category logic. See more information on categories here.
150 Chains
Chains – stores representing both local and national brands – are often included in Places data sets but can rarely be managed as distinct entities. Factual now manages a table of chains which connects directly to our Places: developers can query by explicit chain ID to get the complete list of our first 150 authoritative chains from our partners Location3 and Universal Business Listings (many more coming) that connect to over 333K places. We also have an additional 775 auto-generated chains produced by machine clustering – these are experimental and won’t have the same coverage or precision, so experiment with care. We’re testing these features out in the US before expanding globally – see more on chains here.
Factual Place Rank
With over 23MM Places in the US, developers of Local applications often find that there are too many records to present to the user, and it is difficult to filter those most meaningful for your app. Factual Place Rank aims to provide a relative metric by which developers can sort places by their informatic and social footprint, to ensure the most prominent places rise to the top of the stack. We’re using Factual Place Rank as the default ranking for searches – the feature is in beta so we’re testing it in the US only. See more on Factual Place Rank and all Global Places Attributes here.
Going Global
Taken together, these changes are not insignificant and could bork existing code. We’re therefore releasing this US dataset as a new, versioned resource. We’ll follow with new revs of our US Restaurants and US Hotels data. All other countries will follow shortly, and this will become the production Global Places dataset. We’ve posted a migration overview online that describes the changes in more detail and helps you minimize disruption.

Advice I’d Send Myself Before Starting My Machine Learning Internship at Factual

I spent this summer as a Data Specialist Intern at Factual, and was tasked with improving our Global Places categorization. Factual employs a wide variety of strategies at every stage of its data pipeline, and categorization is just one part of that. To clarify, every Factual Place belongs to one category from our 400+ node taxonomy. My job was to ensure that the existing process was producing data of high quality, and explore alternate means of improving category accuracy and coverage. Here are some things I wish I’d known before I started out.

The Wisdom of Crowds: Using Ensembles for Machine Learning

Whether it’s computing Point of Interest similarities for our Resolve service, or predicting business categories from business names, we at Factual use Machine Learning to solve a wide variety of problems in our pursuit of quality data and products. One of the most powerful Machine Learning techniques we turn to is ensembling. Ensemble methods build surprisingly strong models out of a collection of weak models called base learners, and typically require far less tuning when compared to models like Support Vector Machines.

Introducing Factual Global Products

At Factual, our goal is to provide access to definitive global data.  We started with Factual Global Places, our flagship product which combines data on over 58 million local businesses and points of interest with rich APIs to help bring context to every point on the globe.  Today, we’re excited to announce the release of our second major data vertical, Factual Global Products.  Factual Global Products currently provides detailed product data for over 500,000 of the most popular consumer packaged goods in the US, including your favorite health, beauty, food, beverage, and household products.  With Global Products, you can easily access key product attributes, find products using powerful search tools or UPC lookup, and connect to product pages across the web.