Re: Why isn’t collaborative geodata a bigger deal already?

First off thanks everyone for the great responses, it’s great to have different perspectives refine my thinking on this subject. In this post I’m going to attempt to respond to many of the great comments and questions. Some of my responses won’t be complete, and will beg a full post to themselves – indeed many of the issues raised are things I’ve thought about and have future posts planned. But I like a conversation much more than a monologue, so it makes sense to address what comes up now.

The ‘FOSS vs. Commercial divide’ definitely needs its own post, but I will evoke Arnulf and say that FOSS can be commercial, and the proper divide is FOSS vs. propertietary, which I will address in a future post.

Alan, thanks a ton for the thoughtful insight. It’s great to get feedback from someone who’s been thinking about these things far longer than I. In future posts I’m hoping to more fully explore how we can bootstrap an architecture of participation – this post was to just posit reasons as to why we haven’t seen collaborative geospatial data emerge already. But it’s great to hear that the ‘priesthood’ has no real power – as in much of life the power is truly with people, and we’ve really just got to organize right to exercise the power.

I agree that data is not software, and that the challenges are going to be different, but I’d be interested in your thoughts on how software and data are fundamentally different such that an architecture of participation could not be formed. Because I’d point to open source software and replace software with data to a number of your points –

– the great desire is to have software not to create it
– creating software requires intellectual effort (not just technological & physical)
– the more effort it takes the more valuable it is
– if software is valuable people want to steal it

As for your assertion that if data is valuable then people want to sabotage it, I’d refer to one of Weber’s points of what leads to a successful collaborative project – ‘The product benefits from widespread peer attention and review, and can improve through creative challenge and error correction (that is, the rate of error correction exceeds the rate of error introduction)‘ So the issue is not if people want to sabotage it, it’s if the architecture of participation can handle the error correction at a rate greater than error introduction. Of course he’s not just referring to malignant error introduction, but it’s necessarily a part of it. So the question is if the commons can resist the sabotage. If a true commons of value is established, then people who find value there will want to protect the commons. If there are tools that make this easy, then it’s more likely the commons will be protected. You can have easy rollbacks, you could have people sign up to ‘watch this area of the map’, like ‘watch this page’ on wikipedia, looking out for vandals on the areas you care about, and you can limit commit rights. I’ll go in to these in more depth in a full post in the future, but note that open source software suffers very little from sabotage, as those who contribute directly are vetted before. Wikipedia is more prone to it, but also is able to correct itself. So we won’t pretend that potential sabotage of data won’t increase as the dataset grows popular, we just need to figure out the proper architecture such that the commons will be protected and fixed in a timely manner. One should also note that some datasets are very valuable to a few people, but not that valuable to everyone. So bike enthusiasts who want to map their favorite paths likely won’t have their data vandalised.

As for people wanting to steal the valuable data, that shouldn’t be an issue, just like it’s not for open source software – the commons must be guaranteed to remain open. I take this to be a base condition for collaborative geospatial data to really succeed. I do concede that there could be other incentive structures that allow substantial collaboration around geospatial data. But at this point I’m not so much thinking about them, I’m thinking of something similar to the open source software movement, where the base case is that the collaborative data is open to all.

On the subject of Spatial Data Infrastructures, I’ve got another whole thread on SDI’s. Geospatial webs and applying architectures of participation to SDI’s and the like. I think Dave‘s point was also mostly about SDI’s as well, the interconnected content. For now I’m really just focusing on the micro level, creating and maintaining geospatial datasets. I certainly don’t think that all, or even the majority, of data on a true public SDI/geospatial web will be built collaboratively – we’re just talking about a small piece of the content puzzle. But I do believe that it can play an important role in helping to bootstrap a true public SDI, and it will be combined with sensors and real-time data services and the like, and I think the discovery piece that Jo points out is quite important. Dave, I’d actually disagree that historically the open source community has lead the charge, but I do think it will lead the charge for collaboration on open geodata. Are the surprises from proprietary software you’re thinking of SDI related or specifically for open geospatial collaboration?

The topic of public SDI’s segues to Geoff‘s great point that we’re likely going to see collaborative mapping emerge in places like Asia where goverments have restrictive terms for access to geospatial information. Thanks for the link to malsingmaps.com, I’d not seen it before, and am attempting to gather examples of proto-collaborative mapping. Looks like they’re using MapBuilder for their online map. Hopefully I can get in touch with them and learn more about how the community works and what the motivations of individuals are, but this is really one of the most advanced collaboratively mapping examples that I’ve seen, and I’m quite excited about it. Previously I had actually been thinking that innovation might first come from countries with less restrictive mapping policies, that we’d first see perhaps a massive project to improve TIGER data, since you have such a jump start with over 90% of a basemap for the US complete. And that you’d have forward looking mapping agencies collaborating with citizens on more ‘fun’ datasets, like nature areas and bike paths (MassGIS ,my favorite mapping agency, has had some experiments with those layers). But in some places the need is great and a small group of motivated individuals could just make enough of a difference to start. It looks like they’re making use of MapBuilder, MapServer and probably PostGIS, using strong open source projects as the base on top of which they innovate, which is definitely the path to take.

For Sean‘s point, I completely agree that it’s going to take a lot of time and effort. But I actually think open geodata falls closer to software than wikipedia (though wikipedia is great for proving that the root concept may work even better for domains other than software). The GNU effort couldn’t reuse any existing tools, the legal constraints forced them to build it all from scratch. And it took many years before it got critical mass, and even more until Linux built an operating system for the tools.

I also think the ‘snowball point’ will also be more like open source software as well – wikipedia snowballs right when you’re past the notion that only professionals are qualified to write an encyclopedia. But software certainly doesn’t snowball at a similar point. It snowballs when the existing open source software is close enough to the needs of commercial companies such that it costs less money for them to invest in the open source software than it does for them to buy proprietary licenses. Of course this point is different for each company, depending on many, many factors. But as one company invests in open source and gets it good enough for their needs then it may become advanced enough for other companies to invest for the next step, and thus a snowball is born. I believe the point when mapping data will snowball is when it makes economic sense for a company to invest in a collaboratively built map, improving it for their needs, instead of licensing a proprietary map. And yes, this too will be different for each company – some only need general context to overlay their specific geospatial information, others need exact info and routing and the like.

But I agree, it’s going to take time and energy, both at the meta level to make it easier to overcome the logistical problems, and at the down and dirty level of going out to re-survey, well, just about everything. Just attempting to identify what’s held it up in the past by no means is the same as building, and that is the much bigger challenge. It’s going to be an uphill battle for awhile, but I do believe eventually we too will see a snowball. And I’m keeping a firm eye on you and the pleiades project for some brilliant techno-cultural inventions.