Architectures of Participation for Geospatial Data (intro)

To me the most interesting thread in bringing Architectures of Participation to the geospatial world is the creation and maintenance of geographic information itself. I believe it has the greatest potential to have a true open source type movement around it, and indeed the first signs of it have already emerged: the mash-ups we’ve seen around Google Maps, Open Street Map, and others are pointing the way forward. These thoughts aren’t new to anyone who has seriously thought about mapping and open source/’web 2.0′, it’s the logical next step. But my posts on this are going to attempt to present the ideas to those who may not have been embedded in these thought streams, and I will ground the thoughts in Weber and Benkler, the two leading thinkers in my mind on bringing the ‘open source process’ to domains other than software. I will point to examples of how this is already happening in the geospatial realm, and I’ll also articulate my technical vision for the next wave, building on standards and existing GIS technologies. And I’ll touch on where I hope to see some of this stuff end up, and any related things I want to bring up along the way, as that’s the luxury I get with a blog ;).

Weber argues in ‘The Success of Open Source‘, that the most interesting thing about open source is the process, and that it theoretically could be applied to any digital information, as it is all infinitely copyable at no cost to the owner. Benkler similarly sees a broader social-economic model in open source in his ‘Coase’s Penguin‘. He calls it a third mode of production, the “commons-based peer-production”, characterized by groups of individuals collaborating on large scale projects with motivations that are not drawn from either the pricing of markets or the directions of managers (market and firm modes of productions, respectively).

Digitized geographic data certainly is infinitely copyable, but there are few examples of an people using open source process with geographic data. One can start to understand how open source geographic data might work by re-examining my metaphor of legos for open source software in the context of geospatial data instead of source code. Just as source code is a number of small files that fit together to make a program, so too does geographic data (points, lines, polygons) fit together to make a map. The ‘instructions’ in the case of geodata are not the human readable source code, but instead the raw data that can be used to make maps. Just as a binary program is a pre-assembled lego car, so too is a printed or online map un-modifiable. If someone wanted to change the map, to remix it for their purposes, or even just fix a street they know to be wrong, they would need the raw data (raw data = source code = instruction bookle of the lego metaphor). Most users are fine with the pre-assembled version, the actual map, but motivated users could likely do much more with the raw data – such as generate new maps, change the ‘style’ (the colors and data displayed) to emphasize different aspects, and make corrections to errors – that they could share with others. A license that stipulates that users of the data must also make their modifications open to others would certainly be possible, just like the GPL does for software.

In a future post I’ll explore the criteria Weber speculates as needed to build an open source process around domains other than software, and compare it against geospatial data. But for now we’ll hold off. I just want to start with raising the point that when information is digital, and is a ‘non-rival’ good, that is it doesn’t cost me anything if you have a copy of it, then ‘scarcity’ becomes much more of an artificial construct. The only thing enforcing that scarcity is intellectual property laws, and the open source software movement has shown that an initially small group of motivated people can turn that scarcity on its head. I’d like us to take a similar approach, to cooperate to build maps that are even more accurate and up to date than commercial providers and spy agencies can provide, taking that traditional source of power and putting it in the hands of all. It sounded silly with open source software – to build a better operating system than one of the most dominant companies in the world can – but just as that is coming to pass as many huge players rush in to help out, so too I think we could see the biggest buyers of commercial data flock to a solution that has them cooperate in a more economically efficient mode.

Why Castoriadis is a revolutionary

A great passage from Cornelius Castoriadis, in The Imaginary Institution of Society, on why he is a revolutionary.

I desire and I feel the need to live in a society other than the one surrounding me. Like most people, I can live in this one and adapt to it, at any rate, I do live in it. However critically I may try to look at myself, neither my capacity for adaptation, nor my assimilation of reality seems to me to be inferior to the sociological average. I am not asking for immortality, ubiquity or omniscience. I am not asking society to ‘give me happiness’ I know this is not a ration that can be handed out by City Hall or my neighborhood Workers‘ Council and that, if this thing exists, I have to make it for myself, tailored to my own needs, as this has happened to me already and as this will probably happen to me again. In life, however, as it comes to me and to others, I run up against a lot of unacceptable things, I say they are not inevitable and that they stem from the organization of society. I desire, and I ask, first that my work be meaningful, that I may approve what it is used for and the way in which it is done, that it allow me genuinely to expend myself, to make use of my faculties and at the same time to enrich and develop myself. And I say that this is possible, with a different organization of society, possible for me and for everyone. I say that it would already be a basic change in this direction if I were allowed to decide, together with everyone else, what I had to do, and, with my fellow workers, how to do it

I should like, together with everyone else, to know what is going on in society, to control the extent and the quality of the information I receive. I ask to be able to participate directly in all the social decisions that may affect my existence, or the general course of the world in which I live. I do not accept the fact that my lot is decided, day after day, by people whose projects are hostile to me or simply unknown to me, and for whom we, that is I and everyone else, are only numbers in a general plan or pawns on a chessboard, and that, ultimately, my life and death are in the hands of people whom I know to be, necessarily, blind.

I know perfectly well that realizing another social organization, and the life it would imply, would by no means be simple, that difficult problems would arise at every step. But I prefer contending with real problems rather than with the consequences of de Gaulle’s delirium, Johnson’s schemes or Krushchev’s intrigues. Even if I and the others should fail along this path, I prefer failure in a meaningful attempt to a state that falls short of either failure or non-failure, and which is merely ridiculous.

I wish to be able to meet the other person as a being like myself and yet absolutely different, not like a number or a frog perched on another level (higher or lower, it matters little) of the hierarchy of revenues and powers. I wish to see the other, and for the other to see me, as another human being. I want our relationships to be something other than a field for the expression of aggressivity, our competition to remain within the limits of play, our conflicts – to the extent that they cannot be resolved or overcome – to concern real problems and real stakes, carrying with them the least amount of unconsciousness possible, and that they be as lightly loaded as possible with the imaginary. I want the other to be free, for my freedom begins where the other’s freedom begins, and, all alone, I can at best be merely ‘virtuous in misfortune’. I do not count on people changing into angels, nor on their souls becoming as pure as mountain lakes – which, moreover, I have always found deeply boring. But I know how much present culture aggravates and exasperates their difficulty to be and to be with others, and I see that it multiplies to infinity the obstacles placed in the way of their freedom.

I know, of course, that this desire cannot be realized today, nor even were the revolution to take place tomorrow, could it be fully realized in my lifetime. I know that one day people will live, for whom the problems that cause us the most anguish today will no longer even exist. This is my fate, which I have to assume and which I do assume. But this cannot reduce me to despair or to catatonic ruminations. Possessing this desire, which indeed is mine, I can only work to realize it. And already in the choice of my main interest in life, in the work I devote to it, which for me is meaningful (even when I encounter, and accept, partial failure, delays, detours and tasks that have no sense in themselves), in the participation in a group of revolutionaries which is attempting to go beyond the reified and alienated relations of current society – I am in a position partially to realize this desire. If I had been born in a communist society, would happiness have been easier to attain – I really do not know, and at any rate can do nothing about it. I am not, under this pretext, going to spend my free time watching television or reading detective novels.’

Distribution of Geodata

A big problem with geographic data is distributing. Geographic data is huge. In GML the roads of Los Angeles are 300 megabytes. The Google street map for just the US is terabytes of data. We have compelling 3d mapping applications, like Google Earth and WorldWind, and indeed a very high degree of responsiveness is now expected of flat web maps thanks to Google Maps. Traditionally one would just use a desktop GIS system to display new geographic information, but increasingly we want to be able to just view layers online.

Unfortunately these applications make incredibly high demands on existing interoperable servers. The tradition has been to dynamically generate a new map for every request – the server would get a request and go to its database and draw it. I believe the JRC in Italy banned WorldWind from hitting their server because the load was too much, and the bandwidth was costing them.

The first step is simple, and we’re hoping to get things going in GeoServer very soon, and is being done in clients like ka-map (though I wish it worked with all WMS and not just mapserver), and openlayers (which I love). You divide up the world in to tiles, just like google maps, and cache the results on the server. That way the server has a saved version of the results, and can just return it rather than generate it dynamically. What is needed past what happens now is a standard, so that clients tile up the earth in the same way and make the same requests. This way a request that one client had previously made would be available to another client looking at the same area, as long as the server is caching. But if there’s a way of dividing up the earth, by using simply the WMS protocol and Squid, you can get some better performance. See our initial experiment (we’re working on better styling), the initial views and zoom levels will be fast as they are cached by Squid. As you move to other areas of the map it may slow down a bit, but when someone else looks at them some other time they will be cached, generating no additional load on the dynamic server. This is a first step, as it eases the processing load on the server, and makes things faster, but unfortunately it wouldn’t help with the bandwidth costs associated with standing up a popular geospatial server.

This is where an architecture of participation can come in to play. Peer to Peer technology has been evolving significantly, and it could likely handle the same tiles that a server would cache. So instead of asking the server to return a set of tiles that represents an area, it could ask a p2p network instead of hitting the server. The problem that may initially look hard is how you get the tiles on the p2p network in the first place. This could actually be incredibly easy, if built in to thick 3d clients like Google Earth and WorldWind. These clients are already caching gigabytes of data, in tiles. We just need to make sure that the tiles meet a standard for how they divide up the earth. But the p2p technology could easily be built in to the clients and automatically start up and start sharing (though may need some configuration help to get past firewalls). Hopefully you’d also make it easy for people with extra server resources to help out, they could install a simple program and give it a certain amount of hard drive space, and it’d fill up with tiles for users – a voluntary supernode of sorts

We could actually likely do one better as well, and allow thin clients like google maps/openlayers/ka-map to talk to a p2p network. There are p2p clients that are web servers in their own right. So one could download and install a p2p client, and it would act as a ‘map accelerator’ for when uses the lightweight services. The web based client would ask the local p2p program if it had the tiles requested, which in turn would ask the whole p2p network. If any tiles weren’t there it could ask the real server (which is likely caching them anyways). And it would then insert the tiles in to the p2p network for use by other clients. Ideally this would work if users already had a p2p client, or they could download a generic p2p client that would be rebranded as ‘map accelerator’. This would also provide more persistent caching for web maps, since if you visited an area you previously visited, you’d have a fast local copy. Currently you only get local caching in each browser session. And those on your LAN would also greatly benefit from your caching.

As for implementations, Schuyler presented many of these ideas at the last OSGeo conference (see his presentation (powerpoint warning)), and had an interesting thought for actually building it. Using a Distributed Hash Table based on location, under the assumption that those who are physically near you are more likely to be looking at the same tiles as you, since people tend to look more at maps of the area they are in. Another idea I hadn’t thought of that John Graham brought up is ‘GeoTorrents’. I had always just thought of torrents being useful for distributing large datasets efficiently, but that you’d have to get the whole dataset. The idea John pointed to is that since a bittorrent already divides up large files in to much smaller files, you could bootstrap on that, just having the way it divides up a large map image be in to the pre-set tiles. I’ve never looked extensively at how bittorrent works, but if one is allowed to request only a small portion of the whole then this could be an ideal solution. Many p2p clients seem to be not so optimized for smaller files, but most do split up large files and ‘swarm’ it in a bunch of parts. So we could consider the full set of tiles the large file, and individual tiles the parts that make it up. The difference for us is that being able to just view a coherent set of the parts is probably more useful than the whole. But the point is that there should be some way to leverage the p2p concept so that those who have already viewed an area of the map can serve it to those who haven’t, instead of everyone just swamping a central service.

The alternative to going this route is just letting the big guys dominate the mapping world. Not only are they the only ones who can afford the data, they are the only ones who can stand up a service that can handle millions of users. The damn brilliant thing about using an architecture of participation for geospatial data information is that as a layer gets more popular it scales perfectly, since more people downloading and checking out the layer means that more people are serving it up to others. So even the smallest provider can afford to stand up a server and not have it slow to a crawl if it gets suddenly popular.

Freedom of Information and Geodata

So we’ve been having an interesting discussion on the geodata committee of OSGeo, on the topic of VMAP1 and getting access to it with the Freedom of Information Act (FOIA). I’ve been talking with Dave recently, and they’re setting up a non-profit that will be ideally suited to go after this type of stuff in the courts. Their current focus is more on information the Police keeps private, but are quite in to the idea of going after geographic data. So Mike and I met with him on Friday, discussing both metrocard FOIA and VMAP1 FOIA. I shot off an email to the geodata osgeo list about it, mostly expecting people to be excited about the fact that I had a great resource to use (Dave and his soon to be organization) to help out with this stuff. The response, however, ranged from been there, done that, gave up, to fairly negative – litigation is a dirty word, we should just ask more nicely for it. I concede that there should be an organization that plays nice and makes contacts and gets data that way. And OSGeo is easily one of the ideal organizations to do this. But I’m still interested in using another organization, perhaps TOPP, to actually go after some of this data in a more ‘aggressive’ manner.

What’s interesting to me about all of this is looking at my root assumptions about government, relative to those who feel that litigation for FOIA stuff is ‘highly controversial and antagonistic’. It probably comes down to the fact that I don’t really trust the government. Though that statement simplifies things far too much. I don’t think the governments out to get us, that it’s an evil institution that must be smashed to the ground. Or that our system of government is bad. On the contrary, I think that our system of government is pretty good. But I also believe that it could be better, that there are potentially more just systems of organizing and decision making for human beings, that our current form of democracy is the end all and be all (indeed at times I fear it’s taken a turn for the worse). But let’s focus for now on our current system of government, instead of utopistic futures.

I believe that it is my responsibility to do things like pursue FOIA litigation. I feel that institutions have a tendency towards stagnation and even corruption. At times I despair and feel the current massive influence of corporations on the state, the fact that money rules, is just an incredibly advanced form of corruption and propaganda. Given this tendency, there needs to be forces that keep the institutions honest and rise up if things get too bad. Thankfully, the founders of our current system of government were downright brilliant, able to build a set of documents which helped ushered in the more worldcentric values of freedom of equality which existed in only a majority of the population. Wilber elaborates:

The brilliance of the Founding Fathers was that they found a way to take this rare, elite stance–demanding equality and freedom for all–and force it on an entire population as the backbone of a series of legal and behavioral codes that demanded that, even if individuals are not at moral-stage 5 in their own interiors, they must conform their exterior behavior to rules consistent with a moral-stage-5 act (e.g., you do not have to love me, but if you shoot me they will lock you up). Thus, at their best, the laws of America embodied an attempt to encode higher, postconventional, worldcentric responses–regardless of race, sex, color, or creed–implemented with the consent of the governed (the moral-stage-5 social contract), even if those laws were developmentally ahead of most of the governed. (read more at ‘The Deconstruction of the World Trade Center’, part 2, though the whole thing is worth reading)

Throughout our history various documents have pointed the way to a more just world, and I firmly believe the Freedom of Information act was one of them. My lawyer friends say it’s an incredibly solid piece of law, that really clearly states that just about everything that a government does should be open and available to its citizens. Which makes infinite sense when viewed through the lens of what a truly democratic society should look like. But we’ve become used to a government often antagonistic to its people, and doing all that it can to keep things certain things secret. This can be for downright malicious reasons, but we need to remember not to attribute malice what can be explained by stupidity or ignorance. Often it’s the attitude that politicians feel they know what’s best, or even silly things like fear of reprisal if their works not perfect, as I hit on in ‘The Metadata Problem‘. Indeed I feel the same thing about corporations, that they aren’t evil controlled by evil capitalists, they’re just a weird institution that has followed its own logic too far and gotten out of hand.

The nice thing about government though is that it was designed with checks and balances that are ultimately in the hands of citizens (unlike corporations, which are only checked by shareholders, which often just stands in for ‘profit’). And so I feel it is the responsibility of conscious and informed citizens to make use of those checks and balances for a more open society. This is what I feel about FOIA litigation, it is the best way to engage with government. The law is crystal clear on the fact that data should be open. Yet institutions constantly give up a number of excuses as to why it should not be, the majority of which aren’t in line with the spirit or the letter of the FOIA. They are institutions that seek to propagate themselves, and/or are staffed by people who are ignorant and/or seeking to cover their stupidity. So they will naturally be hesitant to turn over information – if you’re doing a poor job you wouldn’t want to turn over evidence of that. But the point of FOIA is so that citizens can be aware if people in their government are doing a poor job. And of course it often ends up antagonistic, people will fight tooth and nail if they know there’s some information that can really harm them. Which makes me want to go after the information even more. I mean, I’m all for asking nicely initially. But when they constantly say they’re working with you, while giving away nothing and offering up any available excuse (say money for processing before 9/11 and ‘security’ after), then it gets a bit old.

The interesting thing about all FOIA litigation is that it always starts with a polite request for the information. Which is then denied, and an appeal is even made. Litigation is the last step, when reasonable requests have already been denied. I’m not sure how much asking more nicely can help. Of course there could be some backroom deal to let someone get some information, but I feel for a more open society that information needs to be truly open. It doesn’t help society as a whole if an architect can get access to the data for their newest plans, it only helps when the data is open for remixing, for citizen analysis, for reuse for other purposes.

A final note, the current situation is downright shitty, since there must exist high quality imagery – satellite and aerial photos – that have previously never had access given to because of ‘security’ reasons. And there was nothing about that statement you could really argue with. But now Google, Microsoft, Yahoo! and others have made incredibly high quality imagery available to anyone with an internet connection. So it’s obvious that releasing the governments similar data is not going to increase the security risk in the slightest, since some terrorist can already get access to the same data. So the only ones being hurt are those who are not interested in only getting their maps through big commercial providers, who don’t want to be advertised to, who want to build new interesting applications on their own.

But yes, in conclusion, I feel it’s the responsibility of citizens to demand and exercise the freedoms stated as given by the government. I’m not for a rampant litigation society, and don’t jump at the chance to sue someone. But in this case the courts are the branch of government which holds the power in check. Indeed lately the courts have done the best work of the three branches of government to protect freedom, currently laying the smack down on the Bush administration to curtail the broad powers they’ve been taking for themselves (see NYTimes (though probably will expire soon)). But for the courts to work, cases must be raised to engage them if the normal processes fail. They can not try and judge what does not come before them. If everyone just accepts things they way they are, then little will change. And if we are to cease to care, if apathy overtakes, then the future does not look bright – we will be dominated by the institutions that we made, instead of we the people leading the way to a better tomorrow.

Against Catalogs

So there are some open standards I like a lot, such as WMS and WFS, getting a map or raw geographic data from a server on the web. But there are some that I’m less of a fan on.

The Catalog (CS-W) specification is one. Past the fact it feels to heavyweight, I really have a hard time figuring out the ultimate vision… Everyone’s supposed to set up a catalog that can respond to queries about what they have? It analyzes the metadata and returns the best dataset. So like the city of New York is supposed to set up a catalog? But then probably the state should as well? And then there will be a national catalog? And a global one? Am I supposed to register on all of them if I have an NYC related dataset? I mean, is that the logical conclusion? I get the logical conclusion of WMS and WFS – everyone connects their spatial data to the web, and any client can ask them for a map or for the raw data. But this catalog thing is how I’m supposed to find them?

This is one of the cases where the library metaphor is hopelessly entrenched. The problem is you don’t know where to look in the first place. Lots of catalogs begs some sort of meta catalog, or else all the catalogs need to talk to each other in some way. Instead I think we should look to the web and search engines. They routinely _crawl_ the web and have complex algorithms to figure out what data is most relevant for a given search. As GeoSpatial gets on the web we can make use of much of the same web crawling technology to discover WMS and WFS services. Refractions is doing this with the help of Google in their OGC Survey, and MapDex is doing something similar (though it doesn’t seem to work all that well).

Unfortunately it’s definitely a chicken and egg problem. Organizations don’t share their spatial information since there’s no demand and no one will find it. And search engines won’t be built to organize information that’s just not there.

But I do believe that as valuable geospatial information becomes available, there will be an opportunity to search the spatial data, which the market will eventually fill. So how do we bootstrap out of this? I have two ideas, one which can be done now, the other which would take an organization with a lot of resources, or some creativity to pull together lots of idle resources.

The first is to just make a wiki-ish directory of useful services. Unfortunately information is available in a number of formats. The older stuff is usually ArcIMS, as they won the first round of spatial web services, as they win most every GIS related thing. After then WMS has caught on quite a bit, as an open standard to accomplish the same thing. But more recently KML, the Google Earth Format, has made a big splash. Ideally the infrastructure would also at least list shapefiles. And even better there would be a browser based viewer of the spatial information, so that search results could be combined and overlaid on one another. My preference would be WMS, the others could have reflector scripts that make them accessible as WMS. This directory could be organized like the open directory project – there are several examples of similar things being done in with Google Maps Mash-ups and Google Earth. But it’d be nice to have them in one place, and ideally even able to be overlaid on one another. And hopefully if the information was all listed in one place, it might motivate people to start to standardize on one format or another.

But regardless of format, a place where people know they can find links to useful information would be great to have. It would have to be a neutral location, so that no one feels someone else is gaining an advantage of some sort. And it’d be great if anyone could add new links, and comment on the usefulness of the data. Be able to add ‘metadata’, that may not be in the realm of traditional metadata, but which is useful to anyone else who may be investigating the dataset. Like what current users of the data are using it for, other datasets that might be similar, ect. One great location for this could be the OSGeo geodata committee. Which I just decided to join, since it’s really one of the most interesting things going in OSGeo, I’ll likely transition much of my effort there as Incubation settles.

The second is more complex, and bleeds in to other areas, so I’ll put it in its own post at a later date. But the point for me is that catalogs aren’t the answer. I can accept them if they’re just organizing one institution’s documents, but there’s no bigger vision than that, except for rumblings about having them link up in some way. Which strikes me as silly, one should just let a web service crawl the metadata documents themselves, which point to the services, instead of forcing it to figure out catalog protocols to get at information. Or just skip straight to the services and open the door for user generated metadata. We should make use of all the advances in general web search, and then just add the geospatial component, instead of reverting to an older way of searching that never really worked.

I recently talked to Peter Vretanos, the editor of the WFS spec, and it turns out he’s been thinking along the same lines, he wrote a thesis that examines the potential of Google plus WMS and WFS as a complete global SDI. I think the one thing he said needed adding was bounding box queries for search engines, so you could spatially constrain your queries. Which again, is just adding the spatial component to what’s already out there. Metacarta could some amazing stuff with this, returning not just explicitly spatial results, but also implicitly spatial results. If their GeoParser took a spatial constraint, so I could search for bookstores in New York. I guess google local does this, but it doesn’t seem to be all that good. And you can’t seem to plug in to the API anywhere, though they must do it internally somewhere.