New GeoServer Blog

So I’ve been an extraordinarily negligent blogger of late, mostly due to my creative juices and writing energy going in to Openplans.org – the collaboration software we’re working on at TOPP. But I’m planning to spend some good time over the holidays writing up another queue of blog posts, and hopefully finishing off some of the ideas I started.

The main reason I’m writing now though is to point people at our new GeoServer blog, at http://blog.geoserver.org. We’re hoping it can find it’s voice as a useful resource for information, and I’ve seen a couple project blogs that I’ve liked of late. The main motivation is to be able to give people a chance to follow what all we’re up to on GeoServer without having to dive in to the mailing lists. Our lists are getting quite a bit of traffic, and indeed even there it’s sometimes hard to figure out the big things people are working on, since it’s much more the details that get discussed. So we’ll be getting developers and users to give updates on what they’re working on, in and around GeoServer. We’re also hoping to use the blog to give useful hints and tricks, to highlight portions of the documentation. The docs have become quite detailed, to the point where it can be easy to miss interesting features. We’ll also announce releases and post new tutorials on the blog. It should hopefully serve as a low commitment way to keep up with what all our community is up to. If anyone has suggestions for topics they’d like to see covered in the blog let us know.

As for what else is going on, the thing that’s exciting me is Andrea’s work on Versioning WFS. It’s still in a very experimental stage, but I think it’s got great potential to become a very important piece in bringing architectures of participation to geospatial data (a thread which I hope to finish up, at least in blog drafts, over christmas). I’ve also really been digging the work of the guys at OpenLayers. TileCache is particularly nice, a great specialized tool that takes on one job and does it well. We’ve just released GeoServer 1.4.0, with the goal of moving to a modular, plug-in type architecture – my dream is that we’d be able to make something like TileCache that could be both stand alone and a couple clicks to install it as a plug-in on the GeoServer base. There has also been some exciting work around the edges of openlayers with a vector editing demo that backends on WFS-T, which would be really great, especially if we could combine with our versioning improvements.

I take your S3 and raise you an EC2

Just read Chris’s post about using Amazon’s S3 as a home for caches. The Amazon service I’ve actually been contemplating for tiling purposes is actually their Elastic Compute Cloud (EC2) . But before we get in to it, a bit on S3 and tiles. I’d actually still like the distributed peer to peer tile cache, as I talked about in my post on geodata distribution. But it makes a lot of sense to bootstrap existing services on the way to get there. S3 could certainaly help out as a ‘node of last resort’ – it’s nice to know that the tiles will definitely be available somewhere, if the cache isn’t yet popular enough to be distributed to someone else’s p2p cache on your more local network. I agree that bittorrent and coral aren’t up to snuff, but I do believe that distributing mapping tiles will work as p2p technology evolves. But first we have to get our act together with tiling in the geospatial community, so we can go with something concrete to the p2p guys. Which is why I’m excited about the work being done to figure this out.

As for EC2, I’ve been thinking about it in the context of doing caching with GeoServer. We’ve got some caching working with OpenLayers or Google Maps combined with either OSCache (with a tutorial for GeoServer) and Squid. I want to get it to the point where there’s not even a separate download, you just turn caching on, and then have a big ‘go’ button that walks the whole tile area caching it all on the way. The problem with it though is that huge datasets can take days, weeks and months to fully process. So this is where I think it could be kickass to use EC2 – provide a service to people where their ‘go’ button links to EC2 and it can throw tens, hundreds, or even thousands of servers to churn away at creating the tiles. Then return those to the server GeoServer’s on, or leave them on S3 – indeed this would save on the tile upload costs that Chris writes about, as you’d just send an SLD and the vector data in some nice compressed format. I imagine you could save on upload costs for rasters too, as you’d just upload the non-tiled images and do the tiling with EC2.

A next step for this tiling stuff would be to make a caching engine that can both pre-walk a tile set and be able to expire a spatial area of a tile set. The caching engine should store the cache according to the tile map service specification, but with the additional smarts the engine could be uploaded on to EC2 along with the tile creation software (GeoServer or MapServer), and just pre-walk the tiles, iterating through all the possible requests. And then it could also listen to a WFS-Transactional server that operates against the data used to generate the tiles in the first place. If a transaction takes place against a feature in a certain area, then that part of the cache would be expired, and could be either automatically or lazily regenerated (either send all the expired requests to the server right away, or wait until a user comes along and checks out that area again).

I like Paul’s WebMapServer href attribute in the tile map service spec, but I wonder if it’s sufficient… It might be nice if it had enough information for one to formulate enough of a GetMap request to replicate a given tile map service repository. I’m thinking the name of the layer and the ‘style’ (a named style or a link to an SLD). Maybe I’m missing something, but all the other information seems to be there. With that information then perhaps a smart tiling map service client could look at multiple repositories and realize that they were generated from the same base WMS service in the same way. Then it could swarm from multiples simultaneously. This starts to hint at the way forward for p2p distribution – for each WMS service just keep a global index of where tiling map server repositories live and let clients figure out which is fastest or hit all of them at once – including potentially other clients. A catalog that has metadata plus information of where to get even faster tiles would definitely be a popular – especially if registering there automatically put a caching tile map service in front of your WMS. You could also register say the feed of latest changes (or even just the bounding boxes of latest changes) of the WFS-T that people use to update the WMS, and smart clients can just listen and expire the tiles in a given area when they get notification from the feed.

I work for a dot-org

So back in May I wrote a post that touched on the need for a name the kinds of hybrid organization that don’t fit nicely in to the non-profit vs. for-profit binary view of the world. The best we came up with was ‘non-corporation’, which still suffers from the problem of being defined by what it’s not, not by what it is. Since then I’ve heard ‘for-benefit’, which I liked a bit more, but am not in love with. And when introducing TOPP I generally just say ‘high tech non-profit’. But I think I’ve finally come upon the name to use, and it was sitting right under my nose all along.

‘I work for a dot-org’.

Try it out, let me know what you think. It’s obviously a play on the ‘dot-com‘ – which has been well established as something other than working for a big corporation (even though many of them have since become big corporations). It is not a narrow definition, which I like, as I think it’s far too soon to define what a ‘dot-org’ is, and what isn’t one. This parallels the ‘dot-com’, which seemed to be any company that was doing internet stuff. I like that it softly emphasizes a high tech nature, but the only real criteria is that the organization classifies itself online with a .org top level domain name. But I do think that non-profits doing the traditional non-profit thing should not be considered ‘dot-orgs’, just like citibank didn’t become a ‘dot-com’ when they put up a site at citibank.com.

Ok, so now that we’ve got a name the next thing to do is to spread the meme. We probably need some nice manifesto, or at least some concrete definition of what it is, even if it is something broadly inclusive. Then spread it widely, get the organizations that would be obvious dot-orgs to start identifying themselves as such, and then get our friends in the media to start writing up stories. I think success will be when a kid graduating from college can tell her parents that she’s going to work for a dot-org when she graduates, and have them not only know what that is, but be psyched that their daughter is going to be doing something good for the world and will be able to pay off her student loans before she’s 40. Or at least that will be the first success, the final success will be when the standard way to set up a new venture will be something more just and better structured to do good than the corporations running rampant today.

Proprietary vs. FOSS in the Geospatial Web

Thanks for the prod Chris, an ideal world that brings open source collaboration to geospatial data does beg the question as to what software will look like. I do strongly suspect that the core components of an architecture of participation for geospatial information would need to be open source, see my post on holarchies of participation. But I think that the edges will likely be proprietary. So the core collaboration server components will be open source, and the easy to use software pieces that aren’t whole hog GIS will be open source. But there will still be proprietary desktop GIS systems, that just have integration with the collaboration components. There is a lot of advanced functionality which it will just not make sense for the OS community to hit.

Weber provides a good lens to examine the implications of open source taken further:

The notion of ‘open-sourcing’ as a strategic organizational decision can be seen as an efficiency choice around distributed innovation, just as ‘outsourcing’ as an efficiency choice around transaction costs.

The simple logic of open sourcing would be a choice to pursue ad hoc distributed development of solutions for a problem that exists within an organization, is likely to exist elsewhere as well, and is not the key source of competitive advantage or differentiation for the organization.

So pieces of the stack that aren’t a source of competitive advantage for anyone will be those most likely to be open sourced. We see this with Frank Warmerdam’s GDAL library, which the All Points blog reports is included in ESRI’s crown jewel software. Why would the most proprietary software of the GIS world start using open source software? Because the task of reading in a variety of different formats isn’t a competitive advantage for them, so it makes more sense to cooperate than compete. How will this play out in the longer run? Data formats make the most sense, and along the same lines is projection libraries. The next step I see past that is the basic user interfaces.

This is starting to happen with new pluggable GIS systems like uDig. I see it quite likely that such toolkits that handle the reading and writing of formats and basic UI’s will have proprietary functionality built on top of them relatively soon. There will continue to be innovations in GIS analysis, new operations to be performed on data, better automatic extraction from vectors, ect. as well as innovations in visualization and more compelling user interfaces. These will be sold as proprietary software which integrates with the open source systems. The cool thing about this is it lowers the barrier to entry to new innovations in GIS, since a new company won’t have to write a full GIS system, and they won’t have to be dependent on a single company (like the current ArcGIS component sellers who are hosed if ArcGIS decides to replicate their functionality). And you will likely still have proprietary databases for advanced functionality – Oracle has great topology and versioning support that is not yet there in PostGIS. PostGIS will catch up in a couple of years, but by that time Oracle should have even more advanced functionality.

Another place we might see proprietary at the edges is open standards. We’ll likely see the basic standards – WMS, WFS, WCS – mostly fulfilled by open source. But proprietary software will likely do the more interesting analysis, the real web service chaining thing. Just like you’ll have proprietary plug-ins to uDig, so too will there be plug-ins for the Web Processing Service specification. One will be able to take an open source WFSor WCS and pass it to a proprietary WPS for some special processing (generalization, feature extraction, ect.), displaying the results on an open source WMS. I also suspect geospatial search will be best done by proprietary services, as is the trend in the wider web world. Of course Google and Yahoo run open source software extensively, but they keep their core search logic private. So geospatial web services that require massive processing power will likely have core logic proprietary, but will base it on open source software. This again follows Weber’s point – the basic functionality isn’t a core differentiator, so there will be collaboration on basic functionality – returning WMS and WFS of processed data or search results, for example – and proprietary innovation on the edges (more advanced processing algorithms on huge clusters of computers).

In short, proprietary software will continue to exist, it just won’t play the central role. It will be forced to push the edges of innovation even more to stay afloat, but I suspect it will always be a leading edge. Of course I believe open source will innovate as well, especially in this geospatial collaboration area. But the ideal is a hybrid world with the right balance of cooperation and competition to push things forward faster than we could alone.

On Framing

So there have been a couple points in responses of late that have got me thinking again about the power of framing an argument to use certain terms that benefit one side. An example of this is the two sides of the abortion debate – pro-life and pro-choice. If either was to take the implied opposite of the other side they would instantly lose the debate – imagine rallying behind ‘pro-death’ or ‘anti-choice’. Digging in to postmodern thought in college definitely got me too deep in to all sorts of meta debates. But as I’ve given up much of the meta thinking – realizing that living in the present moment leads to a richer and happier existence than constantly dissecting multiple levels of the world around – I still have retained a huge respect for the powers of words and ideas.

Often the world appears to me as a battlefield of memes. Castoriadis’s ideas of the Imaginary really got me thinking about how much society is shaped by ideas and words, and that one can potentially change society by changing how the people think and dream about themselves and their world.

Though I am utterly clueless as to how we might go about changing broader society with just a few well placed ideas, I am confident that there are a few things we can do to help frame the debate over sharing ideas, code, and data. In the world of open information, the Free Software Foundation is great at recognizing and recommending to avoid certain phrases, out of the recognition that they frame the debates in such a way that puts the other side at an extreme advantage.

The two terms that have come up recently in this blog are ‘commercial’ and ‘intellectual property’. All my thoughts on these are based on others, but I feel it’s important to remind myself and others about the framings of the debates.

‘Commercial’ – many people often make a comparison between ‘commercial’ and ‘open source’ software. This is a framing that companies whose livelihood is based on proprietary software seek to propagate, as it makes open source software seem to be something that can’t be used in a commercial environment, that doesn’t have sound business models. Free and Open Source Software has never included any clauses that prevent people from making money off of it, indeed one can charge whatever license fees one wants. The difference is that the source code of the software must always be included.

I am less strict about this framing than others, perhaps because I work for a non-profit instead of a commercial company making big money off of open source software. But I still feel it’s incredibly important that we can frame open source software as commercial software, and to recognize proprietary software as just one way to make money producing software. There are now many commercial companies that are quite profitable making open source software, and this is great for the open source community. I believe open source can go much farther, but a key is that it’s thought of as a viable commercial alternative, not some kind of opposite.
‘Intellectual Property’ – SteveC brings this up in the comments – pointing out that one can’t steal data, one can only infringe copyright. Alan responds that it can be stolen, as lawyers call copyright ‘intellectual property’. I’m going to have to side with Steve on this one, though I know ‘intellectual property’ is the popular term. Unfortunately it’s a poor reflection of the reality of digital information, and it’s an attempt on the parts of lawyers and the corporations who pay their bills to put a square peg in a round hole. To quote Barlow:

Intellectual property law cannot be patched, retrofitted, or expanded to
contain the gasses of digitized expression any more than real estate law
might be revised to cover the allocation of broadcasting spectrum.

(his whole ‘Economy of Ideas‘ is really quite excellent on the subject.)

It’s a travesty that the powers that be have successfully framed information such as data, software, music and the like as ‘property’ instead of ‘ideas’ – the latter which we accept should be spread, remixed, and recombined. A couple years ago I thought up an alternate definition for IP – Intellectual Production. Thus code and data still is something you create, and is of value, but doesn’t try to constrain it in ‘property’, it recognizes that the ideas, information, and information products that you produce are not really yours, they are the world’s. A couple weeks ago I ran in to another take from the FSF – ‘Intellectual Wealth’, they are pushing to create a World Intellectual Wealth Organization, instead of WIPO. They also have one of the best takes on the pitfalls of using the term ‘intellectual property’, which they call a ‘seductive mirage‘.

In conclusion, I think it’s of utmost importance that we frame the open geodata debate in the right way, that we learn from the lessons of the free and open source software movement. Open geodata need not be non-commercial, and it is not creating property. It is creating information that will make the world a more open, collaborative, and better place.

Re: Why isn’t collaborative geodata a bigger deal already?

First off thanks everyone for the great responses, it’s great to have different perspectives refine my thinking on this subject. In this post I’m going to attempt to respond to many of the great comments and questions. Some of my responses won’t be complete, and will beg a full post to themselves – indeed many of the issues raised are things I’ve thought about and have future posts planned. But I like a conversation much more than a monologue, so it makes sense to address what comes up now.

The ‘FOSS vs. Commercial divide’ definitely needs its own post, but I will evoke Arnulf and say that FOSS can be commercial, and the proper divide is FOSS vs. propertietary, which I will address in a future post.

Alan, thanks a ton for the thoughtful insight. It’s great to get feedback from someone who’s been thinking about these things far longer than I. In future posts I’m hoping to more fully explore how we can bootstrap an architecture of participation – this post was to just posit reasons as to why we haven’t seen collaborative geospatial data emerge already. But it’s great to hear that the ‘priesthood’ has no real power – as in much of life the power is truly with people, and we’ve really just got to organize right to exercise the power.

I agree that data is not software, and that the challenges are going to be different, but I’d be interested in your thoughts on how software and data are fundamentally different such that an architecture of participation could not be formed. Because I’d point to open source software and replace software with data to a number of your points –

– the great desire is to have software not to create it
– creating software requires intellectual effort (not just technological & physical)
– the more effort it takes the more valuable it is
– if software is valuable people want to steal it

As for your assertion that if data is valuable then people want to sabotage it, I’d refer to one of Weber’s points of what leads to a successful collaborative project – ‘The product benefits from widespread peer attention and review, and can improve through creative challenge and error correction (that is, the rate of error correction exceeds the rate of error introduction)‘ So the issue is not if people want to sabotage it, it’s if the architecture of participation can handle the error correction at a rate greater than error introduction. Of course he’s not just referring to malignant error introduction, but it’s necessarily a part of it. So the question is if the commons can resist the sabotage. If a true commons of value is established, then people who find value there will want to protect the commons. If there are tools that make this easy, then it’s more likely the commons will be protected. You can have easy rollbacks, you could have people sign up to ‘watch this area of the map’, like ‘watch this page’ on wikipedia, looking out for vandals on the areas you care about, and you can limit commit rights. I’ll go in to these in more depth in a full post in the future, but note that open source software suffers very little from sabotage, as those who contribute directly are vetted before. Wikipedia is more prone to it, but also is able to correct itself. So we won’t pretend that potential sabotage of data won’t increase as the dataset grows popular, we just need to figure out the proper architecture such that the commons will be protected and fixed in a timely manner. One should also note that some datasets are very valuable to a few people, but not that valuable to everyone. So bike enthusiasts who want to map their favorite paths likely won’t have their data vandalised.

As for people wanting to steal the valuable data, that shouldn’t be an issue, just like it’s not for open source software – the commons must be guaranteed to remain open. I take this to be a base condition for collaborative geospatial data to really succeed. I do concede that there could be other incentive structures that allow substantial collaboration around geospatial data. But at this point I’m not so much thinking about them, I’m thinking of something similar to the open source software movement, where the base case is that the collaborative data is open to all.

On the subject of Spatial Data Infrastructures, I’ve got another whole thread on SDI’s. Geospatial webs and applying architectures of participation to SDI’s and the like. I think Dave‘s point was also mostly about SDI’s as well, the interconnected content. For now I’m really just focusing on the micro level, creating and maintaining geospatial datasets. I certainly don’t think that all, or even the majority, of data on a true public SDI/geospatial web will be built collaboratively – we’re just talking about a small piece of the content puzzle. But I do believe that it can play an important role in helping to bootstrap a true public SDI, and it will be combined with sensors and real-time data services and the like, and I think the discovery piece that Jo points out is quite important. Dave, I’d actually disagree that historically the open source community has lead the charge, but I do think it will lead the charge for collaboration on open geodata. Are the surprises from proprietary software you’re thinking of SDI related or specifically for open geospatial collaboration?

The topic of public SDI’s segues to Geoff‘s great point that we’re likely going to see collaborative mapping emerge in places like Asia where goverments have restrictive terms for access to geospatial information. Thanks for the link to malsingmaps.com, I’d not seen it before, and am attempting to gather examples of proto-collaborative mapping. Looks like they’re using MapBuilder for their online map. Hopefully I can get in touch with them and learn more about how the community works and what the motivations of individuals are, but this is really one of the most advanced collaboratively mapping examples that I’ve seen, and I’m quite excited about it. Previously I had actually been thinking that innovation might first come from countries with less restrictive mapping policies, that we’d first see perhaps a massive project to improve TIGER data, since you have such a jump start with over 90% of a basemap for the US complete. And that you’d have forward looking mapping agencies collaborating with citizens on more ‘fun’ datasets, like nature areas and bike paths (MassGIS ,my favorite mapping agency, has had some experiments with those layers). But in some places the need is great and a small group of motivated individuals could just make enough of a difference to start. It looks like they’re making use of MapBuilder, MapServer and probably PostGIS, using strong open source projects as the base on top of which they innovate, which is definitely the path to take.

For Sean‘s point, I completely agree that it’s going to take a lot of time and effort. But I actually think open geodata falls closer to software than wikipedia (though wikipedia is great for proving that the root concept may work even better for domains other than software). The GNU effort couldn’t reuse any existing tools, the legal constraints forced them to build it all from scratch. And it took many years before it got critical mass, and even more until Linux built an operating system for the tools.

I also think the ‘snowball point’ will also be more like open source software as well – wikipedia snowballs right when you’re past the notion that only professionals are qualified to write an encyclopedia. But software certainly doesn’t snowball at a similar point. It snowballs when the existing open source software is close enough to the needs of commercial companies such that it costs less money for them to invest in the open source software than it does for them to buy proprietary licenses. Of course this point is different for each company, depending on many, many factors. But as one company invests in open source and gets it good enough for their needs then it may become advanced enough for other companies to invest for the next step, and thus a snowball is born. I believe the point when mapping data will snowball is when it makes economic sense for a company to invest in a collaboratively built map, improving it for their needs, instead of licensing a proprietary map. And yes, this too will be different for each company – some only need general context to overlay their specific geospatial information, others need exact info and routing and the like.

But I agree, it’s going to take time and energy, both at the meta level to make it easier to overcome the logistical problems, and at the down and dirty level of going out to re-survey, well, just about everything. Just attempting to identify what’s held it up in the past by no means is the same as building, and that is the much bigger challenge. It’s going to be an uphill battle for awhile, but I do believe eventually we too will see a snowball. And I’m keeping a firm eye on you and the pleiades project for some brilliant techno-cultural inventions.

GeoServer Tech Talk at Google

Last week I gave a ‘tech talk’ talk on GeoServer at Google, which was a fun experience. It’s for employees of Google, but they make the talk available to all on Google Video: GeoServer and Architectures of Participation for Geospatial Information. If you like it give a nice rating on the page. It touches on The Open Planning Project (my employer), how we leveraged standards to implement KML output in GeoServer to talk to Google Earth, and goes a bit in to architectures of participation for geodata, which I’m in the midst of writing about in this blog.

It’s 54 minutes long, perhaps in the future I’ll link to specific chunks of it. I’m also making the slides available (creative commons attribution license) in powerpoint and open document.

I also said I’d post some links in my blog. Many of them are already on the sidebar, and are also now referenced in the slides. But for completeness sake:

GeoServer homepage

GeoServer Google Earth info

GeoServer Google Maps info

The Open Planning Project

Coase’s Penguin, Yochai Benkler

Public Commons of Geographic Data, Harlan Onsrud, et. al (pdf)
Open Geospatial Consortium

Architecture of Participation by Tim O’Reilly

My take on Architectures of Participation you can also follow the tag.

Why isn’t collaborative geodata a big deal already?

So if by Weber’s criteria geospatial data has a high chance for a true architecture of participation to form around, then why don’t we see more collaboratively built maps? Why do governments and commercial ventures dominate the landscape of geospatial data? Though there are several emerging examples, shouldn’t they be a bigger deal?

As I posit that a successful architecture of participation is made up of both a social component and a technical component, let’s examine the hold ups in each.

On the social side of the fence I believe the problem is mostly historical. Maps have traditionally been viewed as a source of power, something to be kept secret. King Phillip II of Spain kept his maps under lock and key – state secrets to be protected. Maps are historically the tools of conquerors and rulers, and thus kept private to retain an advantage. Though maps obviously have a huge value for a large number of civil society uses, there seems to be a legacy of maps as a competitive advantage instead of a base of cooperation. The workflows for the creation of maps thus aren’t seen as having the potential for wider contributions, for opening themselves up.

In contrast the tradition with software is one of sharing. Indeed it wasn’t until the late 1970s before anyone even thought of software as something to keep protected, to attempt to sell . Software’s roots are academic and hobbyist – two groups for whom sharing is natural. Contributing to the sharing ethos is the fact that in the early days there were so many different chips and computers that the only way one could distribute a piece of software to more than one brand of computer was to include the source code. Binary distributions just weren’t a viable option if you wanted many people to make use of your software.

In the early 80s, however, computer manufacturers started providing their own operating systems and requiring license agreements to use them. Indeed Microsoft got their first big break by re-selling an operating system to IBM. This was a huge boon for computer software in general, as it enabled more people to work on software for a living, as software itself was seen to have value. But the sharing roots of software could not be denied, as Richard Stallman soon formed the GNU Project and the Free Software Foundation to counter the commodification of software and to create open structures of sharing.

While the Free and Open Source Software movement has grown to become a huge success, mapping data, with no tradition of sharing, has become increasingly locked down, commodified, and managed by a bureaucratic-informational complex. The sharing is weighted by the legacy of paper maps, which could not be copied infinitely, and searching for mapping data is similarly bogged in library metaphors. While software just needed to get back to its roots to share, geospatial information has no similar roots. Not surprisingly, the fore front of the fledgling open geodata movement is dominated by people with roots in software, not traditional GIS types.

As for the technical side of the fence, the main problem as I see it is that the software to create maps and do GIS is bloody expensive. Desktop software runs thousands of dollars, up to tens of thousands of dollars. There have emerged some low cost options, but those still run at least a couple of hundred dollars. Server software is even more expensive. Additionally there are also training costs – super expensive software seems to beg ‘training’ to justify itself by forming an ‘elect’ of those who are able to operate it. If it’s hard enough to operate that someone needs to be specially trained then it must be worth all that money (see for example oracle vs. postgres, the latter is almost as powerful, yet far, far easier to set up). Past that there is surveying equipment and GPSes, which have been quite expensive.

So in the past, the tools to create maps were too expensive. But this is changing. Commercial GPSes are quite cheap, and open source GIS software is emerging. Of course the technical in turn feeds back in to the social – I don’t believe that if existing GIS software were suddenly freely available it would be sufficient to spark a true architecture of participation around geospatial data. It is still too tied to a culture of the GIS elect – GIS as a special skill that must be learned, that only few have. While the first wave of open source desktop GIS are only aping the traditional tools, the great thing is that they are open, and for the most part designed in a modular fashion. I’ll try to hit on this more in a future post, but I believe that these OS desktop GIS tools will be reused to open the way to specialized tools that allow most anyone to create geospatial data. They may not be recognized as ‘real’ GIS, indeed I think the ‘elect’ will try hard to defend their position, but they will be the key to making true architectures of participation around geospatial data.

So the seeds of change are there, and indeed projects like geonames, wikimapia, openstreetmap, wifimaps, plus all the KML layers available, are already showing results. I believe right now we’re in a time like Free Software before Linux. Building some small scale stuff, some of which will have value. But we’ve not yet hit on the appropriate social and technical architectures to make a huge success like Linux. Until then we’ve just got to keep experimenting, and even after that I’m sure more experimentation will take place, and it will be different for every project, just like we’ve learned much from Linus’s social innovations, yet no project functions exactly like it. But we will gather a suite of tools and social structures to allow replicable success in collaborative mapping, and before long it will snowball into something far bigger than most can imagine.

AoP for Geospatial Data, via Weber

In ‘The Success of Open Source‘, Weber spends his last chapter looking at the more generic parts of the open source process, and their potential to be applied to other domains. He lays out the properties of the types of tasks that open a project for an open source project. To greater or lesser degrees, all are relevant to geospatial data, and I think they point to geospatial data as having huge potential to have a true ‘Architecture of Participation‘ around it.

* Disaggregrated contributions can be derived from knowledge that is accessible under clear, nondiscriminatory conditions, not proprietary or locked up.
Currently this is a decision that currently is left up to the governments who are the main providers of geographic data. In the United States most geographic data is available. In Europe this is not the case. But geographic data need not necessarily rely on governments, there are a number of commercial providers of data who have proved it is possible to build up a valuable cache of geographic data. Their primary ways of gathering the data are using gps devices and overlaying lines on satellite imagery. One could easily imagine a group of enthusiasts doing the same (or indeed a consortium of companies, as the reality of FOSS stands today) gathering and agreeing to share their geo knowledge under clear nondiscriminatory conditions. A key for this will likely be good licensing, a topic I hope to work on in the future. I believe we need a set of clear licenses that are aimed at geospatial data, indeed a range of licenses like those available for open source software or creative works.

* The product is perceived as important and valuable to a critical mass of users.

Geographic data is incredibly important and value. Everyone uses maps. There are private companies who make huge amounts of money selling up to date geographic data. Indeed there is a lot of geographic information that is overlooked by traditional mapping departments, such as bike trails, kayak routes, or wi-fi spots. Though small these communities could provide a critical mass of passionate users and potential contributors. But the value of good geospatial data is becoming more and more obvious to wider groups of people.

* The product benefits from widespread peer attention and review, and can improve through creative challenge and error correction (that is, the rate of error correction exceeds the rate of error introduction)
This is a question that needs more research. The first condition is definitely true, maps can improve from widespread peer attention and review. Many traditional surveying types would argue that it takes incredible training to be able to create a map, and that the rate of error introduction of opening up the data would exceed error introduction. Two thoughts in rebuttal to this.

The first is that cheap gps devices and satellite imagery are making the training on traditional surveying tools less necessary. Most anyone can operate a gps, and we all have the ability to look at a photo of a city and draw out where the roads are. Well designed software tools could simplify much of the complexity, such as topologies and other validations.

Second is that computer programming, the skill required to make FOSS, also takes incredible training. Allowing anyone in to modify the source code of the main distribution is certainly a bad idea, as one little typo can break everything. This is why there are complex governance structures and tools to manage the process. One could easily imagine a similar situation with geographic data – only core ‘committers’ would have complete access, around them some users would submit patches (from image tracing or gpses) for the committers to review, and past that ‘bug fixers’ who would just submit that there are problems with the data. This is a topic I hope to examine more in the future, how to make tools that lead to greater error correction than error introduction.

* There are strong positive network effects to the use of the production
In many ways we are only just now starting to see the positive network effects of spatial data. A big advance is that it is much more easy to spatially locate ourselves, with cell phones, cheap gpses, and other location aware devices. The recent google maps and the ‘mashups’ of maps built on top of their base data also point to the positive effects and innovation that can be built once base mapping data is available to give context to other interesting applications. Google Maps Mania points to hundreds of these ‘mashups’, where users create new maps displaying anything from crime in chicago, available apartments, free wifi spots, or real time network failures. And if we move to a real services vision with WMS and WFS, the data built on top of the basemap can further be combined with other data, creating new networks of spatial data.  Maps are inherently made up of different layers of information, and get more valuable when combined with other sources.

* An individual or a small group can take the lead and generate a substantive core that promises to evolve into something truly useful.
This depends on the size of the geographic area covered. But most traditional surveying and mapping crews are not incredible large. A small group of people can cover a fairly wide area. The mapchester and Isle of Wight workshops shows how a small group of devoted people can make substantial progress on a map in a weekend.

* A voluntary community of iterated interaction can develop around the process of building the product.
There is no reason why this could not happen, and indeed we are already seeing it with the Open Street Map project. The emerging academic sub-field of Public Participation GIS also points to communities of iterated interaction in using GIS to assist in planning and other decisions that affect their interests. This is likely where the most work is needed, as different social dynamics work better with different tools, and indeed with different groups of people. Open Source Software does not let anyone modify the code, but wikipedia basically does. And indeed with Open Source Software it took many years before the right social dynamics that enabled an operating system to be built, when Linus successfully made the bazaar method of delegation and decision making. But since most all the other factors point to the fact that geospatial data can have an architecture of participation built around it, I would posit that we need only to spend the energy on evolving the right community mechanism, and we’ll then hit on something very big.

Architectures of Participation for Geospatial Data (intro)

To me the most interesting thread in bringing Architectures of Participation to the geospatial world is the creation and maintenance of geographic information itself. I believe it has the greatest potential to have a true open source type movement around it, and indeed the first signs of it have already emerged: the mash-ups we’ve seen around Google Maps, Open Street Map, and others are pointing the way forward. These thoughts aren’t new to anyone who has seriously thought about mapping and open source/’web 2.0′, it’s the logical next step. But my posts on this are going to attempt to present the ideas to those who may not have been embedded in these thought streams, and I will ground the thoughts in Weber and Benkler, the two leading thinkers in my mind on bringing the ‘open source process’ to domains other than software. I will point to examples of how this is already happening in the geospatial realm, and I’ll also articulate my technical vision for the next wave, building on standards and existing GIS technologies. And I’ll touch on where I hope to see some of this stuff end up, and any related things I want to bring up along the way, as that’s the luxury I get with a blog ;).

Weber argues in ‘The Success of Open Source‘, that the most interesting thing about open source is the process, and that it theoretically could be applied to any digital information, as it is all infinitely copyable at no cost to the owner. Benkler similarly sees a broader social-economic model in open source in his ‘Coase’s Penguin‘. He calls it a third mode of production, the “commons-based peer-production”, characterized by groups of individuals collaborating on large scale projects with motivations that are not drawn from either the pricing of markets or the directions of managers (market and firm modes of productions, respectively).

Digitized geographic data certainly is infinitely copyable, but there are few examples of an people using open source process with geographic data. One can start to understand how open source geographic data might work by re-examining my metaphor of legos for open source software in the context of geospatial data instead of source code. Just as source code is a number of small files that fit together to make a program, so too does geographic data (points, lines, polygons) fit together to make a map. The ‘instructions’ in the case of geodata are not the human readable source code, but instead the raw data that can be used to make maps. Just as a binary program is a pre-assembled lego car, so too is a printed or online map un-modifiable. If someone wanted to change the map, to remix it for their purposes, or even just fix a street they know to be wrong, they would need the raw data (raw data = source code = instruction bookle of the lego metaphor). Most users are fine with the pre-assembled version, the actual map, but motivated users could likely do much more with the raw data – such as generate new maps, change the ‘style’ (the colors and data displayed) to emphasize different aspects, and make corrections to errors – that they could share with others. A license that stipulates that users of the data must also make their modifications open to others would certainly be possible, just like the GPL does for software.

In a future post I’ll explore the criteria Weber speculates as needed to build an open source process around domains other than software, and compare it against geospatial data. But for now we’ll hold off. I just want to start with raising the point that when information is digital, and is a ‘non-rival’ good, that is it doesn’t cost me anything if you have a copy of it, then ‘scarcity’ becomes much more of an artificial construct. The only thing enforcing that scarcity is intellectual property laws, and the open source software movement has shown that an initially small group of motivated people can turn that scarcity on its head. I’d like us to take a similar approach, to cooperate to build maps that are even more accurate and up to date than commercial providers and spy agencies can provide, taking that traditional source of power and putting it in the hands of all. It sounded silly with open source software – to build a better operating system than one of the most dominant companies in the world can – but just as that is coming to pass as many huge players rush in to help out, so too I think we could see the biggest buyers of commercial data flock to a solution that has them cooperate in a more economically efficient mode.