The next time you’re at a party thrown by people working in big data, remember this simple trick to fit in immediately. All you have to do is start talking about the importance of transparency in the location data sphere, and you’ll be greeted with open arms. Some of them will probably hire you.
I don’t mean to sound cynical. As the location field continues to grow and more companies enter it, emphasizing transparency of data is admirable. You may have noticed, though, that I’ve already used the word “transparency” three times in the last two paragraphs without providing any context. The way that most people use term right now, it could probably be replaced with any professional-sounding buzzword.
That’s my real problem with the emphasis on “transparency” right now. Instead of actually being honest about the sources and quality of their data, many companies promote the abstract idea of transparency just because it seems like the right thing to say.
This kind of posturing doesn’t benefit anyone. It glosses over some very real problems in the location industry, and leaves no room for more pragmatic solutions. We are not going to be able to ensure that all data sources are transparent and high quality overnight -- it’s simply not realistic. In this post, I intend to talk frankly about some real ways to make improvements in the way we talk about data sources. These solutions may not be as polished as the ones we are used to promoting, but that is because they exist in the real world.
(A quick disclaimer: it is highly likely that the opinions of many people in the industry will differ from my own in this article. X-Mode is still a young company, and these conclusions are based on our own observations, as well as our vision for the burgeoning field. I welcome disagreement, and would feel gratified if this article prompted further dialogue.)
For the sake of everyone in the room, I’m going to define exactly what I mean by transparency. Quite simply, where is a company getting its data from? This is information that many companies hesitate to share, and that is understandable. It’s important to protect the identity of providers, especially if it’s a 3rd party vendor with scale. At the end of the day this is a business, and full transparency may not always make sense.
So don’t worry: you certainly don’t need to give out every detail about your data providers. But you should provide some information. In order to provide high quality data and be truly competitive in the field, it’s important for 50-70% of your data to come from your own first party SDK inventory, rather than from third party vendors. This gives your company more direct control over data, and allows you to monitor its quality more closely.
It’s also important not to embellish your numbers, or to be deceptive about the quality of your data.
Lots of companies are guilty of this, but it’s definitely a bad idea to exaggerate your numbers by more than 10 or 20%. Don’t be a company that claims to have 200 million IDs running location data -- this is obviously not true, and it just makes it look like you don’t trust your own data enough to be honest about it.
(The X-Mode team has actually downloaded every app in the app store, and we can confirm that only about 5,000 of them run always on location. So even if you think nobody else sees your lies -- we do!)
Along with the number of IDs, it’s important to be honest about the kind of data you’re collecting. There are essentially four different categories of sources for location data right now, each one providing higher quality data than the last.
Bidstream and Cell Tower Data are easy to scale, but provide low quality data. You may be able to grow your company quickly with this kind of data, but in the long run you won’t be able to compete with companies using higher quality data.
Foreground data is also good for scaling, but only provides about 20 points a day on average.
The best source, and the one you ideally source the majority of your data from, is Always On Location data collected using app-side SDKs. This kind of data provides 50+ points a day, with over half of it with an accuracy of within twenty meters. If you are getting the majority of your data from an SDK, and are using no more than 5 sources or methodologies, your data will be of higher quality. You shouldn’t just be transparent about that: you should be proud.
A lot of words are being spilled right now about how the location data industry is consolidating, and how transparency is increasing. While this may be true, it is also the case that a lot of companies are focusing more on scalability than quality right now. To these companies, I would just like to offer a bit of advice from the X-Mode Team. Focus on quality, and scalability will come. In the end this will not just lead to a stronger company, but a stronger industry in general.