GeoServer Tech Talk at Google

Last week I gave a ‘tech talk’ talk on GeoServer at Google, which was a fun experience. It’s for employees of Google, but they make the talk available to all on Google Video: GeoServer and Architectures of Participation for Geospatial Information. If you like it give a nice rating on the page. It touches on The Open Planning Project (my employer), how we leveraged standards to implement KML output in GeoServer to talk to Google Earth, and goes a bit in to architectures of participation for geodata, which I’m in the midst of writing about in this blog.

It’s 54 minutes long, perhaps in the future I’ll link to specific chunks of it. I’m also making the slides available (creative commons attribution license) in powerpoint and open document.

I also said I’d post some links in my blog. Many of them are already on the sidebar, and are also now referenced in the slides. But for completeness sake:

GeoServer homepage

GeoServer Google Earth info

GeoServer Google Maps info

The Open Planning Project

Coase’s Penguin, Yochai Benkler

Public Commons of Geographic Data, Harlan Onsrud, et. al (pdf)
Open Geospatial Consortium

Architecture of Participation by Tim O’Reilly

My take on Architectures of Participation you can also follow the tag.

Why isn’t collaborative geodata a big deal already?

So if by Weber’s criteria geospatial data has a high chance for a true architecture of participation to form around, then why don’t we see more collaboratively built maps? Why do governments and commercial ventures dominate the landscape of geospatial data? Though there are several emerging examples, shouldn’t they be a bigger deal?

As I posit that a successful architecture of participation is made up of both a social component and a technical component, let’s examine the hold ups in each.

On the social side of the fence I believe the problem is mostly historical. Maps have traditionally been viewed as a source of power, something to be kept secret. King Phillip II of Spain kept his maps under lock and key – state secrets to be protected. Maps are historically the tools of conquerors and rulers, and thus kept private to retain an advantage. Though maps obviously have a huge value for a large number of civil society uses, there seems to be a legacy of maps as a competitive advantage instead of a base of cooperation. The workflows for the creation of maps thus aren’t seen as having the potential for wider contributions, for opening themselves up.

In contrast the tradition with software is one of sharing. Indeed it wasn’t until the late 1970s before anyone even thought of software as something to keep protected, to attempt to sell . Software’s roots are academic and hobbyist – two groups for whom sharing is natural. Contributing to the sharing ethos is the fact that in the early days there were so many different chips and computers that the only way one could distribute a piece of software to more than one brand of computer was to include the source code. Binary distributions just weren’t a viable option if you wanted many people to make use of your software.

In the early 80s, however, computer manufacturers started providing their own operating systems and requiring license agreements to use them. Indeed Microsoft got their first big break by re-selling an operating system to IBM. This was a huge boon for computer software in general, as it enabled more people to work on software for a living, as software itself was seen to have value. But the sharing roots of software could not be denied, as Richard Stallman soon formed the GNU Project and the Free Software Foundation to counter the commodification of software and to create open structures of sharing.

While the Free and Open Source Software movement has grown to become a huge success, mapping data, with no tradition of sharing, has become increasingly locked down, commodified, and managed by a bureaucratic-informational complex. The sharing is weighted by the legacy of paper maps, which could not be copied infinitely, and searching for mapping data is similarly bogged in library metaphors. While software just needed to get back to its roots to share, geospatial information has no similar roots. Not surprisingly, the fore front of the fledgling open geodata movement is dominated by people with roots in software, not traditional GIS types.

As for the technical side of the fence, the main problem as I see it is that the software to create maps and do GIS is bloody expensive. Desktop software runs thousands of dollars, up to tens of thousands of dollars. There have emerged some low cost options, but those still run at least a couple of hundred dollars. Server software is even more expensive. Additionally there are also training costs – super expensive software seems to beg ‘training’ to justify itself by forming an ‘elect’ of those who are able to operate it. If it’s hard enough to operate that someone needs to be specially trained then it must be worth all that money (see for example oracle vs. postgres, the latter is almost as powerful, yet far, far easier to set up). Past that there is surveying equipment and GPSes, which have been quite expensive.

So in the past, the tools to create maps were too expensive. But this is changing. Commercial GPSes are quite cheap, and open source GIS software is emerging. Of course the technical in turn feeds back in to the social – I don’t believe that if existing GIS software were suddenly freely available it would be sufficient to spark a true architecture of participation around geospatial data. It is still too tied to a culture of the GIS elect – GIS as a special skill that must be learned, that only few have. While the first wave of open source desktop GIS are only aping the traditional tools, the great thing is that they are open, and for the most part designed in a modular fashion. I’ll try to hit on this more in a future post, but I believe that these OS desktop GIS tools will be reused to open the way to specialized tools that allow most anyone to create geospatial data. They may not be recognized as ‘real’ GIS, indeed I think the ‘elect’ will try hard to defend their position, but they will be the key to making true architectures of participation around geospatial data.

So the seeds of change are there, and indeed projects like geonames, wikimapia, openstreetmap, wifimaps, plus all the KML layers available, are already showing results. I believe right now we’re in a time like Free Software before Linux. Building some small scale stuff, some of which will have value. But we’ve not yet hit on the appropriate social and technical architectures to make a huge success like Linux. Until then we’ve just got to keep experimenting, and even after that I’m sure more experimentation will take place, and it will be different for every project, just like we’ve learned much from Linus’s social innovations, yet no project functions exactly like it. But we will gather a suite of tools and social structures to allow replicable success in collaborative mapping, and before long it will snowball into something far bigger than most can imagine.

AoP for Geospatial Data, via Weber

In ‘The Success of Open Source‘, Weber spends his last chapter looking at the more generic parts of the open source process, and their potential to be applied to other domains. He lays out the properties of the types of tasks that open a project for an open source project. To greater or lesser degrees, all are relevant to geospatial data, and I think they point to geospatial data as having huge potential to have a true ‘Architecture of Participation‘ around it.

* Disaggregrated contributions can be derived from knowledge that is accessible under clear, nondiscriminatory conditions, not proprietary or locked up.
Currently this is a decision that currently is left up to the governments who are the main providers of geographic data. In the United States most geographic data is available. In Europe this is not the case. But geographic data need not necessarily rely on governments, there are a number of commercial providers of data who have proved it is possible to build up a valuable cache of geographic data. Their primary ways of gathering the data are using gps devices and overlaying lines on satellite imagery. One could easily imagine a group of enthusiasts doing the same (or indeed a consortium of companies, as the reality of FOSS stands today) gathering and agreeing to share their geo knowledge under clear nondiscriminatory conditions. A key for this will likely be good licensing, a topic I hope to work on in the future. I believe we need a set of clear licenses that are aimed at geospatial data, indeed a range of licenses like those available for open source software or creative works.

* The product is perceived as important and valuable to a critical mass of users.

Geographic data is incredibly important and value. Everyone uses maps. There are private companies who make huge amounts of money selling up to date geographic data. Indeed there is a lot of geographic information that is overlooked by traditional mapping departments, such as bike trails, kayak routes, or wi-fi spots. Though small these communities could provide a critical mass of passionate users and potential contributors. But the value of good geospatial data is becoming more and more obvious to wider groups of people.

* The product benefits from widespread peer attention and review, and can improve through creative challenge and error correction (that is, the rate of error correction exceeds the rate of error introduction)
This is a question that needs more research. The first condition is definitely true, maps can improve from widespread peer attention and review. Many traditional surveying types would argue that it takes incredible training to be able to create a map, and that the rate of error introduction of opening up the data would exceed error introduction. Two thoughts in rebuttal to this.

The first is that cheap gps devices and satellite imagery are making the training on traditional surveying tools less necessary. Most anyone can operate a gps, and we all have the ability to look at a photo of a city and draw out where the roads are. Well designed software tools could simplify much of the complexity, such as topologies and other validations.

Second is that computer programming, the skill required to make FOSS, also takes incredible training. Allowing anyone in to modify the source code of the main distribution is certainly a bad idea, as one little typo can break everything. This is why there are complex governance structures and tools to manage the process. One could easily imagine a similar situation with geographic data – only core ‘committers’ would have complete access, around them some users would submit patches (from image tracing or gpses) for the committers to review, and past that ‘bug fixers’ who would just submit that there are problems with the data. This is a topic I hope to examine more in the future, how to make tools that lead to greater error correction than error introduction.

* There are strong positive network effects to the use of the production
In many ways we are only just now starting to see the positive network effects of spatial data. A big advance is that it is much more easy to spatially locate ourselves, with cell phones, cheap gpses, and other location aware devices. The recent google maps and the ‘mashups’ of maps built on top of their base data also point to the positive effects and innovation that can be built once base mapping data is available to give context to other interesting applications. Google Maps Mania points to hundreds of these ‘mashups’, where users create new maps displaying anything from crime in chicago, available apartments, free wifi spots, or real time network failures. And if we move to a real services vision with WMS and WFS, the data built on top of the basemap can further be combined with other data, creating new networks of spatial data.  Maps are inherently made up of different layers of information, and get more valuable when combined with other sources.

* An individual or a small group can take the lead and generate a substantive core that promises to evolve into something truly useful.
This depends on the size of the geographic area covered. But most traditional surveying and mapping crews are not incredible large. A small group of people can cover a fairly wide area. The mapchester and Isle of Wight workshops shows how a small group of devoted people can make substantial progress on a map in a weekend.

* A voluntary community of iterated interaction can develop around the process of building the product.
There is no reason why this could not happen, and indeed we are already seeing it with the Open Street Map project. The emerging academic sub-field of Public Participation GIS also points to communities of iterated interaction in using GIS to assist in planning and other decisions that affect their interests. This is likely where the most work is needed, as different social dynamics work better with different tools, and indeed with different groups of people. Open Source Software does not let anyone modify the code, but wikipedia basically does. And indeed with Open Source Software it took many years before the right social dynamics that enabled an operating system to be built, when Linus successfully made the bazaar method of delegation and decision making. But since most all the other factors point to the fact that geospatial data can have an architecture of participation built around it, I would posit that we need only to spend the energy on evolving the right community mechanism, and we’ll then hit on something very big.