Into The Pudding

thoughts on geospatial, augmenting capitalism, architectures of participation, and more

Archive for the 'geospatial' Category


Collaborative Mapping: Tools

Posted by cholmes on July 11, 2007

Continuing the collaborative mapping thread, I’d like to think a bit about tools to make this happen. Do a bit of dreaming, and maybe think through how we can get there. Definitely as soon as I start to talk about this people want to do all kinds of crazy synchronization and distributed editing of features. I do think we’ll get there, but I fear going for too much too soon, getting loaded down by over-designing and not addressing the immediate problems. Indeed Open Street Map has proven that if the energy is there the tools just need to do the very basics. I have been putting my energy in to getting a standards based implementation, on top of WFS-T, but that’s more because I know it and I like standards. I don’t think it’s the best way to do things, and I don’t even think it should be the default way to do things – at this point I’d prefer something more RESTful. But I believe in being compatible with as much as possible, and there are already nice clients written against WFS-T. So it should always be a route in to collaborative editing.

First off, I think we need more user friendly options for collaborative editing. Not just putting some points on a map, but being able to get a sense of the history of the map, getting logs of changes and diffs of certain actions. Editing should be a breeze, and there should be a number of tools that enable this. Google’s MyMaps starts to get at the ease of editing, but I want it collaborative, able to track the history of edits and give you a visual diff of what’s changed. Rollbacks should also be a breeze – if you have really easy tools to edit it’s also going to be easier for people to vandalize. So you need to make tools that are even easier to rollback. On the GeoServer extended WFS-T Versioning API we’ve got a rollback operation, that can work against an area of the map, a certain property, or a certain user (or combinations of those). Soon we hope to be working on some tools built on top of openlayers to handle those operations in a nice editing environment.

The next step on user friendly options will be desktop applications that aren’t full GIS, but that lets users easily edit. These can leverage the tools of existing open source GIS desktop environments, like uDig and qgis, but can strip down the interface to just be simple editing environments with a few hard coded background layers. You could have branded environments for specific layers of information. And ideally build other kinds of reporting tools that also leverage the same GIS tools, but in an interface geared towards the task at hand, like search and rescue or tracking birds. The other thing I hope to work on is getting some of the editing hooked up with Google Earth. I just learned there’s a COM API that might allow us to hack something in, or we can try to get Google Earth to support POSTing of KML to arbitrary URLs as Sean suggest

Next I’d like to see integration with ‘power tools’, the full on, expensive ass GIS applications that are the realm of ‘professionals’. Not that I have a huge love for those tools, but I’d really like to engage as many people as possible in to collaborative mapping. GIS professionals are a great target audience, since most of them are already passionate about mapping. They have a lot of expertise to bring to the table. And while some of them can be elitist about collaborative mapping and ‘lesser’ tools, so too can many of the amateurs raise their noses at people who aren’t DIY. At the extremes it can obviously be a major divide, but I think both could have a lot to teach each other if they’re willing to listen. But I believe the first step to get there is to get the ‘power tools’ compatible with the collaborative mapping protocols, so you start them off in collaboration. This is one reason I’m an advocate of the WFS-T approach, as there are plugins for ArcGIS and other heavy desktop GIS’s. I think we could see some professionals get really excited about collaborative mapping, as it could become the thing they are passionate and do in their free time that is fun and helps boost their resume. This is how many open source contributions work now, it’s a complex interplay that includes professional development. Perhaps one’s collaborative mapping contributions could help land jobs in the future.

I’d also like to see more automation available in the process. This is an area that could use a lot of experimentation, how much to automate, how much to let humans collaborate on. But I think there’s an untapped area of figuring out vector geometries from the aggregrated tracks of GPS, cell phones and wifi positioning data. People are generating tons of data every single day, and most of it is not even recorded. It’s great when people take a GPS and decide explicitly to map an area and then go online and digitize it. But we could potentially get even more accurate than just one person’s GPS by aggregating all the data over a road. Good algorithms could extract the vector information, including turn restriction data, since it could figure out that 99% of fast moving tracks are going in the same direction. Of course we’ll still need people to add the valuable attribute information, but this way they’d have a nice geometry already in place.

You could also do feature extraction from satellite and aerial imagery. This is obviously a tough people that many people are working on, but perhaps it could also be improved by the leveraging human collaboration. In a system with good feedback people could perhaps help train the feature extraction to improve over time. It also could be valuable to do automated change detection, which then notifies people that somethings changed in the area, and then they could figure out the proper action.

The final area I think we could improve with automation is prevention of vandalism and silly mistakes. GeoServer had work done by Refractions a few years ago to do an automatic validation engine. Unfortunately this has languished with no documentation, but it’s still part of GeoServer. One can define arbitrary rules to automatically reject bad transactions - geometries that intersect badly, roads with out names, ect. This could also reject things like ‘Chris Rulez’ scrawled over the whole of the US, as it could know that no real roads run in completely straight lines for over 200 miles. I could imagine a whole nice chain of rules to ensure that all edits meet certain quality criteria. And perhaps instead of rejecting straight up any edit that doesn’t follow all rules can go in to a sandbox. I could also imagine some sort of continuous integration system once there is topology to check network validity, and other quality assurance pieces that can’t take place instantly.

Ok, I’ll wrap this post up for now, will continue this thread soon.

Posted in architectures of participation, geospatial, geospatial web | No Comments »

Collaborative Mapping: The Business Thread, cont.

Posted by cholmes on July 5, 2007

So if there is a future where collaboratively mapping could be economically competitive, how do we go about actually getting there?  I actually think we’re further than many might think, though I believe there is still a lot of work to be done, innovating with the tools, communities and workflows to make this happen.  But I’ll address that in another post, for now I just want to present a possible path for collaborative mapping to bootstrap in to the mainstream.  I’m going to focus on street maps, since that’s the information that people pay big money for, and there is already early success with Open Street Map.  Later will examine how the lessons learned there can feed in to other domains and back

So step 0 is proving that it’s possible for a diverse group of people to collaborate on an openly licensed map.  I’d be hard pressed to entertain any arguments that Open Street Map has not already accomplished this.  Of course in its current state you can’t navigate a car on it, you’re not going to do emergency vehicle response with it.  But their driving principle has been they ‘just want a fscking map’, and a map they do have.  There are many contributors running around with GPS’s and creating a map.

The next point in the evolution is when the map is good enough for basic ‘context’.  Again, OSM is already there for several parts of the world.  If you’re doing a mashup of your favorite neighborhoods you don’t really care if all the streets are there.  You just need enough that it looks about like your neighborhood on other maps.  Many mashups use google maps and others in this way – which is sorta like using the same quality water to flush your toilet as comes out of your kitchen sink (USA!).  Which is to say a bit of a waste, but who really cares if someone else is paying for it.
Which speaks to another tipping point, which is when the big portals start putting ads on their maps.  Or when they start charging to use their APIs.  I concede now that this may never happen, that it’s a good loss leader to have people using your API for free as long as they put their maps out in the public.   But a part of me feels like we may be in that period of the GeoWeb like the first web bubble, when you could get $10 off coupons from CDNow and B+N, allowing you to buy any cd you wanted for a few bucks.  It wasn’t going to last, but it’s sure fun while it does.  But at some point there may be a shift when they need to make some money, which could drive more energy to collaborative maps as people look to get ads off their service.

The next step starts to get fun, which would be once a collaborative street map gets good enough for basic routing and navigation.  Right now it seems to be (though I could be wrong, I don’t know the OSM community intimately) people who set out to add data to the map, they want to get their area map.  If they go to new areas they’ll bring a GPS along, but it’s often to a totally unmapped area.  I think once large areas start to get close to completion we’ll have people hobble together ghetto car navigation kits.  A laptop with a GPS and the collaborative map, either connected over some kind of wireless internet or downloaded to the car.  One can drive around with this and it will show one’s place on the map, and directions to the end point as well.  Note that this kind of usage is currently illegal with Google Maps or any of the others who get their data from commercial providers.  From the API agreement: ‘In addition, the Service may not be used: (a) for or with real time route guidance (including without limitation, turn-by-turn route guidance and other routing that is enabled through the use of a sensor’.  This is because the commercial mapping providers make big money off of car navigation, and license the (exact same) data to do that at a higher price.

With basic navigation on a collaborative map in place you can get people excited about going off in to a ‘new frontier’, going off the map and tapping in to their inner Lewis and Clark.  Actively encourage people to Dérive (though I’m not sure how much the Situationists really would like the idea of people using cars to dérive) in to uncharted areas of the map.

On other fronts I believe that we’ll see niche areas getting high quality mapping.  Governments and companies will realize that if there’s a map that’s 80% done and they just need to fund the last 20%, and that owning the map is not their key value proposition, then they’ll just look to fund the collaborative map instead of doing it themselves.  Those that can think long term will realize that this will most always be cheaper, since they won’t have to keep paying to get it up to date.  With a good collaborative structure much of that will happen on its own.  And they may put a bit extra in each year.  And in areas where a few different organizations all partner up it will definitely be cheaper.  Already we’re seeing some enlightened folks fund Open Street Map contributors to have a mapping party and map an area.

We’ll also likely see collaborative maps for niche verticals.  If you’re doing walking maps then you don’t need the turn restriction information to do car routing, for example.  Someone may offer a map of the best drives in southern california, which would be a subset of the main map.  Or a detailed map of which roads need to be plowed after a snowstorm, that leaves out the roads that don’t.

After that I think you’ll see people hacking commercial nav systems to make use of the collaborative map, and then navigation companies offering low price versions of their systems that don’t rely on the commercial data.  Already we’re seeing navigation companies start to ‘leverage user contributions’, with TomTom’s ‘MapShare‘ to let people update points of interest and the like, and Dash Navigation’s ability to leverage GPS from other cars to see if a new road has opened up.  I think you may see people even more excited about this if they knew their work was going to a common good instead of just to the advantage of one company.

Once people are able to ‘correct’ the map that they’re driving on I believe we’ll see a really big tipping point.  Build in some voice recognition to call out the name of a street while you’re driving.  This could be billed as the ‘mapping game’, where one gets points for driving new areas.  One could even imagine a company that sets up a business with sort of ‘bounty navigation’ where you can actually make money if you drive new areas of the map and do good reporting of road names and the like.  This could be one of the decoupled functions of the economics around collaborative map making, the navigation company partners with the company that guarantees the map is up to date, and instead of contracting out another company to drive the roads they just put money rewards on driving in new areas.  People could make it so their navigation is free, or even have it be like the electrical grid where if you generate a lot of extra navigation information they pay you.  I haven’t thought through all the details of this, but I think it could work, and would be super cool for helping people think of geospatial data as a commons that one can contribute to and that we’re all responsible for and can be a part of, not just consumers of a service.

Which speaks a bit to a further point, which is when governments realize that they can tap in to and contribute to this as well.  The census spends a ton of money keeping up to date road information.  But their data is not entirely accurate, and it doesn’t do any turn restrictions.  Instead of maintaining their own database they could combine with an open map, and plug in to that workflow.  Indeed such a map likely would have started from one of their TIGER line maps anyways in the US.  So government organizations can join the ecosystem, likely just as funders contracting out other companies to perform the work, as they are starting to do more and more with open source software.  Some may want to try to do it themselves, but the smart ones will plug in to existing ecosystems.

The other tipping point towards the end will be when the big mapping providers decide to invest in collaborative maps.  I had initially been thinking that things would need to be really far along worldwide before they’d make the switch, but a more likely solution might be that they use it in conjunction with their commercial maps.  They already make use of TeleAtlas and Navtech in different places.  So as long as the collaborative map didn’t have a restriction about combining with other sources they could just use it in places that have poor coverage from the major providers.  And they could see where areas of the map are close to being done and strategically fund those.  Another potential source of investment in this kind of mapping could be from aid agencies in areas that commercial providers haven’t mapped.  They could hook up their GPS’s to gather information, and then employ a few people to help process and QA it to make maps they can use.  Since it’s not a core value proposition to them they can share it with others, and start to build really good street maps in areas that no one has touched because it’s too hard for the money they would get.  I would love to try a start up in Africa that hooks up the correcting car navigation systems to a bunch of vehicles and just starts building the living map.  It’d be quite ironic if Africa ended up with more up to date maps than Europe.

They key with all this for me is the evolution of viewing mapping data as a public good, that we all collaborate on to make better.  As GPS’s become more and more prevalent we are all just emitting maps as we go through our lives.  All that’s really needed is a structure to turn that in to useful information, getting the tools better and setting up the economic reward structure.  I’m not a business person, so I don’t have much more to throw out in terms of economic ideas.  But I believe it is possible to set the levers right to encourage this.  And I’m going to do my best to get the tools better and better to show what is possible and get us all moving towards as a future where an up to date accurate map is a commons available to all, and that all are a part of.

Posted in architectures of participation, geospatial, geospatial web | 8 Comments »

Google and the Geospatial Web: A smaller piece of a much, much larger pie.

Posted by cholmes on June 18, 2007

Well, I’ve been back from Where 2.0 for awhile, and as usual blogging hasn’t been the highest priority, but there was one topic that I’ve been really wanting to write about.

And that is that Google seems to be legitimately moving in a more open direction with regards to geospatial. I’ve rarely been overtly critical of their lack of openness, but it’s always been a source of frustration for me. And as I got to know more people there I definitely realized that their lack of collaboration wasn’t the result of any malicious intent, it was simply a perceived lack of resources - they felt they didn’t have time to put effort in to standards and working with others. And I’ll be the first to admit that it takes more work to be open and collaborative.

But I do say ‘perceived’, because the thing I’ve found again and again doing the ‘open’ thing is that it’s an investment that pays off in the medium and long term. Working alone definitely allows you to move faster in the short term, but working with others leaves you much better off in the longer term.

With regards to Google’s geo portfolio, the way I’ve always termed it is they could have a huge piece of a small pie or a sizeable piece of a much, much bigger pie. What pie is it we’re talking about? What I’ve referred to as the Geospatial Web, though I’m trying to call it the geoweb, since that term seems to be taking off more. They are obviously the clear leader, with Google Earth and Maps, specifically KML and mashups - as those both allow more geospatial data to get out in the world. And they could just push KML and their platform and do quite well. But it would be a silo. It wouldn’t be like the web, it’d be a greatly expanded and easier to use Geography Network. Much, much better and bigger, but still a single platform. It could potentially even become a platform like Windows, truly dominant, but the point for me is it still wouldn’t be as big as it could be. It wouldn’t be the World Wide Web, where innovation comes from all over building something far bigger than any single company could possibly make on their own.

The bigger pie is the vision of a true Geospatial Web, that diverse individuals and organizations all contribute to, and where technical innovations come from all over. To achieve this there must be an underpinning of open standards, that others can contribute to. There must be an ecosystem of companies and services, business models and startups and dot-orgs. The ecosystem can be dominated by an entity, but can’t be entirely dependent on a single entity, as would be the case if Google defines the software and the format and the search engine. But if this open geoweb is nurtured and encouraged the right way we’ll get exponential growth. Citizens will start demanding that governments and organizations data put their data on, just like we’ve seen happen with eGovernment on the WWW. It will become a default, and people will look at you weird if you have geospatial data that’s not on it.

I think it’s not crazy to aim for the majority of all spatial information to be available on it. It will be a much bigger pie than one that Google owns, as more and more people will feel comfortable making their data available, since it’s a public resource instead of clearly benefiting a single company. And it also allows further innovations to come from the outside. Google has a ton of smart people, but they don’t have all the smart people in the world. They can afford to let innovation come from elsewhere (though I’m sure they’ll probably just buy up the best ones), because they’ll start to do what the company does best: search. There’s no reason to own a geoweb when you can own the way most people find information on The geoweb.

Of course, even with search Google could constrain it to their web, as they did when geo search came out - it was called KML Search and only could find KML. What they are going for now is much more ambitious, and indeed a bit more risky. And so I applaud them for it - they are putting a stake in the ground that says ‘our best ideas are not behind us’. They are going to be a leading force in a much bigger pie, and turn this open collaboration in to a really good long term investment.

Ok, I’ve gone on speculating about things, I probably should give a bit of evidence. I admit that it’s pretty subtle, but based on it and a few conversations my gut tells me that they are legitimately on the level. At least for now, that’s not to say that some corporate decision could move things in an opposite direction: such is the fate of a publicly traded company. But they seem to be trying to do some work that will be hard to undo.

First, KML Search is now referred to as ‘geo search’, and is crawling not just KML but also GeoRSS, with more formats likely coming soon. This is one of the most important pieces to me, and was the announcement that excited me much more than StreetView. It is admitting that it’s ok for people to use other formats, even though KML is super nice and easy to use. Yes, more formats may confuse my grandmother (one of the eloquent arguments used by Google folks in the past for why we should all just use KML), but more formats also means extending an olive branch that says you can work with others.

Second, Google is an active sponsor in OGC’s OWS-5. I had been a bit skeptical of their throwing KML over the fence to OGC. Yes, it’s nice the copyright is with OGC, but it’s kind of meaningless to me unless KML actually aligns with the other open standards. And OGC would likely try to do that, but then it remains a question if Google would actually support the new standard. Or if they’d have this covert control over it with the ability to exclude any decisions they didn’t like by just not including an implementation in Google Earth and Maps. But they are sponsoring OWS-5 which will fund several server and client implementations to flesh out a new KML spec that incorporates other OGC standards. The OWS testbeds are the best way to develop specs in the OGC, and putting real money up for this definitely indicates for me a commitment to making KML a true open standard, not just a rubber stamped pseudo-standard. The one piece that I’m not sure on is how much they’ll have engineers working with OWS-5 to try out the new spec ideas on Google Earth and Maps. If they have a couple people show up at the kickoff meeting who are set to work on it for the next few months I will be very happy.

Third, John Hanke’s speech at Where 2.0 was the first time I had heard the Google geo team really tell the world that they want to work with others. Some of it was subtle, but there was definitely a flavor of openness and collaboration that I’d not felt before. Previous speeches would always come back to the innovations they’re doing, how great KML is, ect. There was little acknowledgment of an outside world, which could come across as fairly arrogant - that not only are we doing things the best way, we haven’t even looked in to how anyone else might do since we must be doing things the best.

And finally, in private conversations many googlers have talked about a more open shift in the past 6-9 months. There were always a few voices for that, but it sounds like a tipping point has been reached and there is now a critical mass. The voices are heard and effort is being oriented in that direction. I think it’s an investment that will really pay off for Google, and though I’m going to continue to work to push them in to ever more open directions (maybe even to be able to talk to them about what they’ve got in the pipeline without signing an NDA? Ah, to dream ;), count me as a skeptic who is becoming more and more convinced that we’re going to build a true, open, collaborative geoweb.

Posted in architectures of participation, geospatial, geospatial web | 4 Comments »

Public domain imagery from iCubed for WorldWind and beyond?

Posted by cholmes on May 14, 2007

So I’m watching this video about the new Java WorldWind.

And there’s a couple quotes of interest from Patrick Hogan, NASA’s lead on the project:

That’s access to different NASA datasets that you can leverage, public domain, so you can use and abuse that information as you like, do anything you want with it, but mostly have fun have fun with innovating, kind of going places we haven’t even dreamed of yet.

I should point out that the iLandsat is from a company called iCubed and they have provided that kind of, that dataset for the earth that typically costs about a quarter of a million plus just for internal use, and they have donated it to WorldWind for use by the public.

Public domain imagery from iCubed? Sounds like a dream come true to me. Of course this just opens up lots more questions, like what resolution, what part of the world, what year is it from, ect. But if it’s truly public domain that’s really great news for any collaborative mapping projects that are unsure about deriving their information from commercial imagery.

I’m hoping that someone will be able to hack in and figure out if the imagery is really available. But the server referred to in the source code seems to get ‘Server is too busy’ errors, and when I use WorldWind here I’m not getting any tiles. When I get some time I’ll maybe try to dig in to the source a bit more and maybe get some links to the imagery.

Looking at the source code does seem to reveal some references to GeoServer, for their placename layer, which we always like to see :) I will encourage them to change the namespace prefix from ‘topp’ (which is the default and refers to the organization I work for), to something more appropriate like ‘nasa’ (though keeping it does make it easier for me to know it’s a GeoServer, which is nice…). And I’m curious about their ASPX cache - if you guys let me know what/how you’re caching I’d be happy to try to build it as a module for GeoServer.

Posted in architectures of participation, geospatial, geospatial web | 4 Comments »

I, for one, welcome our new proprietary SDI overlords…

Posted by cholmes on May 13, 2007

So I’ve been slow on the uptake, as the geo blogosphere’s conversation about Google’s KML Search‘ took place awhile ago. Mostly because I’m only about half way through my meandering series of essays to make several points, one of which is that SDIs are crap, and that once there’s enough geospatial information of real value out there then a company will come along and let us search it. Well, reality has caught up with me as I’ve lost any kind of consistency with blog posts.

Allan, for one, asks the question of whether Google’s KML search is an Spatial Data Infrastructure or not. For me the whole SDI question is irrelevant, as they’ve been working at it for close to twenty years, dumped millions upon millions of dollars in to it and have come up with nothing. I read a paper that pointed that the Geospatial One Stop (GOS), a top dollar SDI portal, the center of US SDI development, had an ‘average 5622 user visits per week’ in April 2004. And lest you think this was them bemoaning that so few people were using it, this was an example of ’success’. One of my non-profit’s blogs, focused on new york city traffic problems, gets more traffic than that. The GOS’s current Alexa rank is 491,396. Blogs about Google Earth get far, far more traffic. The paper goes on to point out that SDI’s take years, if not decades, before they are ‘fully operational’.

In less than two and half years Google has built a better ‘SDI’ than anyone else in the world. The whole point of SDI’s is to ‘facilitate the availability of and access to spatial data’. In my opinion they accomplished this before they even had search, since you could find more spatial data browsing the keyhole bbs than you could with a search box on any SDI, and could certainly access it more easily. You can pick just about any definition of SDI’s (of which there are many) and Google KML + Maps Mashups succeed on all their criteria except for explicit agreements between organizations (which is sort of just overhead, since the Web doesn’t need these kinds of agreements to have lots of information available).

Like everyone else, I have major concerns about putting all our eggs in the basket of a single company, no matter how much they promise to not be evil. A ‘true’ SDI is obviously not one controlled by a corporation (even if a GIS monopolist was trying and never really got anywhere). But I think they’re making steps by looking to standardize KML within the OGC. But I disagree with Raj that the intellectual property matters all that much without the corresponding harmonization with current or at least future OGC standards. And I really hope he’s not hinting that such a harmonization is not on the table and that OGC’s willing to rubber stamp KML because Google was so kind to donate it. The CTO of Opera explained the problems of this quite well in a nice article on CNET He was talking about Microsoft’s attempts to standardize its Office Open XML spec, which is a direct competitor of Open Document Format. His point can be summed up with:

While it’s healthy to have competition between different standards, it’s rarely productive to have competing standards within an organization.

He points to a couple cases where there have been competing standards and both have suffered. I believe it will be to the detriment of all if KML does not align its geometry objects with GML, and eventually separate data from presentation, as one does with SLD.

I’ll try to dig a bit more in to my feelings on KML itself in a future post, but the main thing I think they need to do is get a clear separation of concerns. GeoRSS/GML should not be viewed as a competitor, but instead a more standard way to handle the issue of geometries. KML is great as a package for delivering content to Google Earth, but google earth is a special environment, and it’d be nice if pieces of it were re-usable in other environments, and there were clear profiles on what makes sense to support in a 2d environment, what makes sense if you want to be crawled by KML search, which elements you need for 3d, ect.

But the main point of ‘KML search’ (could you please add GeoRSS, WMS and WFS and do ‘geospatial search’?) to me is that this raises the bar of what availability and access to geospatial data means. Though SDI builders have been dreaming of it for years, it will now be unacceptable if they provide anything less capable than the GoogSDI. So though I don’t trust even a ‘non-evil’ publicly traded corporation, I think they’re pushing the bar in a very positive way. We just need to be sure we build a truly open ‘geoweb‘ together, with innovations coming from all corners by leveraging and crafting open standards and platforms.

Posted in geospatial, geospatial web | 3 Comments »

New GeoServer Blog

Posted by cholmes on December 18, 2006

So I’ve been an extraordinarily negligent blogger of late, mostly due to my creative juices and writing energy going in to Openplans.org - the collaboration software we’re working on at TOPP. But I’m planning to spend some good time over the holidays writing up another queue of blog posts, and hopefully finishing off some of the ideas I started.

The main reason I’m writing now though is to point people at our new GeoServer blog, at http://blog.geoserver.org. We’re hoping it can find it’s voice as a useful resource for information, and I’ve seen a couple project blogs that I’ve liked of late. The main motivation is to be able to give people a chance to follow what all we’re up to on GeoServer without having to dive in to the mailing lists. Our lists are getting quite a bit of traffic, and indeed even there it’s sometimes hard to figure out the big things people are working on, since it’s much more the details that get discussed. So we’ll be getting developers and users to give updates on what they’re working on, in and around GeoServer. We’re also hoping to use the blog to give useful hints and tricks, to highlight portions of the documentation. The docs have become quite detailed, to the point where it can be easy to miss interesting features. We’ll also announce releases and post new tutorials on the blog. It should hopefully serve as a low commitment way to keep up with what all our community is up to. If anyone has suggestions for topics they’d like to see covered in the blog let us know.

As for what else is going on, the thing that’s exciting me is Andrea’s work on Versioning WFS. It’s still in a very experimental stage, but I think it’s got great potential to become a very important piece in bringing architectures of participation to geospatial data (a thread which I hope to finish up, at least in blog drafts, over christmas). I’ve also really been digging the work of the guys at OpenLayers. TileCache is particularly nice, a great specialized tool that takes on one job and does it well. We’ve just released GeoServer 1.4.0, with the goal of moving to a modular, plug-in type architecture - my dream is that we’d be able to make something like TileCache that could be both stand alone and a couple clicks to install it as a plug-in on the GeoServer base. There has also been some exciting work around the edges of openlayers with a vector editing demo that backends on WFS-T, which would be really great, especially if we could combine with our versioning improvements.

Posted in geospatial, geospatial web | 55 Comments »

I take your S3 and raise you an EC2

Posted by cholmes on October 29, 2006

Just read Chris’s post about using Amazon’s S3 as a home for caches. The Amazon service I’ve actually been contemplating for tiling purposes is actually their Elastic Compute Cloud (EC2) . But before we get in to it, a bit on S3 and tiles. I’d actually still like the distributed peer to peer tile cache, as I talked about in my post on geodata distribution. But it makes a lot of sense to bootstrap existing services on the way to get there. S3 could certainaly help out as a ‘node of last resort’ - it’s nice to know that the tiles will definitely be available somewhere, if the cache isn’t yet popular enough to be distributed to someone else’s p2p cache on your more local network. I agree that bittorrent and coral aren’t up to snuff, but I do believe that distributing mapping tiles will work as p2p technology evolves. But first we have to get our act together with tiling in the geospatial community, so we can go with something concrete to the p2p guys. Which is why I’m excited about the work being done to figure this out.

As for EC2, I’ve been thinking about it in the context of doing caching with GeoServer. We’ve got some caching working with OpenLayers or Google Maps combined with either OSCache (with a tutorial for GeoServer) and Squid. I want to get it to the point where there’s not even a separate download, you just turn caching on, and then have a big ‘go’ button that walks the whole tile area caching it all on the way. The problem with it though is that huge datasets can take days, weeks and months to fully process. So this is where I think it could be kickass to use EC2 - provide a service to people where their ‘go’ button links to EC2 and it can throw tens, hundreds, or even thousands of servers to churn away at creating the tiles. Then return those to the server GeoServer’s on, or leave them on S3 - indeed this would save on the tile upload costs that Chris writes about, as you’d just send an SLD and the vector data in some nice compressed format. I imagine you could save on upload costs for rasters too, as you’d just upload the non-tiled images and do the tiling with EC2.

A next step for this tiling stuff would be to make a caching engine that can both pre-walk a tile set and be able to expire a spatial area of a tile set. The caching engine should store the cache according to the tile map service specification, but with the additional smarts the engine could be uploaded on to EC2 along with the tile creation software (GeoServer or MapServer), and just pre-walk the tiles, iterating through all the possible requests. And then it could also listen to a WFS-Transactional server that operates against the data used to generate the tiles in the first place. If a transaction takes place against a feature in a certain area, then that part of the cache would be expired, and could be either automatically or lazily regenerated (either send all the expired requests to the server right away, or wait until a user comes along and checks out that area again).

I like Paul’s WebMapServer href attribute in the tile map service spec, but I wonder if it’s sufficient… It might be nice if it had enough information for one to formulate enough of a GetMap request to replicate a given tile map service repository. I’m thinking the name of the layer and the ’style’ (a named style or a link to an SLD). Maybe I’m missing something, but all the other information seems to be there. With that information then perhaps a smart tiling map service client could look at multiple repositories and realize that they were generated from the same base WMS service in the same way. Then it could swarm from multiples simultaneously. This starts to hint at the way forward for p2p distribution - for each WMS service just keep a global index of where tiling map server repositories live and let clients figure out which is fastest or hit all of them at once - including potentially other clients. A catalog that has metadata plus information of where to get even faster tiles would definitely be a popular - especially if registering there automatically put a caching tile map service in front of your WMS. You could also register say the feed of latest changes (or even just the bounding boxes of latest changes) of the WFS-T that people use to update the WMS, and smart clients can just listen and expire the tiles in a given area when they get notification from the feed.

Posted in geospatial, geospatial web | 9 Comments »

Proprietary vs. FOSS in the Geospatial Web

Posted by cholmes on October 1, 2006

Thanks for the prod Chris, an ideal world that brings open source collaboration to geospatial data does beg the question as to what software will look like. I do strongly suspect that the core components of an architecture of participation for geospatial information would need to be open source, see my post on holarchies of participation. But I think that the edges will likely be proprietary. So the core collaboration server components will be open source, and the easy to use software pieces that aren’t whole hog GIS will be open source. But there will still be proprietary desktop GIS systems, that just have integration with the collaboration components. There is a lot of advanced functionality which it will just not make sense for the OS community to hit.

Weber provides a good lens to examine the implications of open source taken further:

The notion of ‘open-sourcing’ as a strategic organizational decision can be seen as an efficiency choice around distributed innovation, just as ‘outsourcing’ as an efficiency choice around transaction costs.

The simple logic of open sourcing would be a choice to pursue ad hoc distributed development of solutions for a problem that exists within an organization, is likely to exist elsewhere as well, and is not the key source of competitive advantage or differentiation for the organization.

So pieces of the stack that aren’t a source of competitive advantage for anyone will be those most likely to be open sourced. We see this with Frank Warmerdam’s GDAL library, which the All Points blog reports is included in ESRI’s crown jewel software. Why would the most proprietary software of the GIS world start using open source software? Because the task of reading in a variety of different formats isn’t a competitive advantage for them, so it makes more sense to cooperate than compete. How will this play out in the longer run? Data formats make the most sense, and along the same lines is projection libraries. The next step I see past that is the basic user interfaces.

This is starting to happen with new pluggable GIS systems like uDig. I see it quite likely that such toolkits that handle the reading and writing of formats and basic UI’s will have proprietary functionality built on top of them relatively soon. There will continue to be innovations in GIS analysis, new operations to be performed on data, better automatic extraction from vectors, ect. as well as innovations in visualization and more compelling user interfaces. These will be sold as proprietary software which integrates with the open source systems. The cool thing about this is it lowers the barrier to entry to new innovations in GIS, since a new company won’t have to write a full GIS system, and they won’t have to be dependent on a single company (like the current ArcGIS component sellers who are hosed if ArcGIS decides to replicate their functionality). And you will likely still have proprietary databases for advanced functionality - Oracle has great topology and versioning support that is not yet there in PostGIS. PostGIS will catch up in a couple of years, but by that time Oracle should have even more advanced functionality.

Another place we might see proprietary at the edges is open standards. We’ll likely see the basic standards - WMS, WFS, WCS - mostly fulfilled by open source. But proprietary software will likely do the more interesting analysis, the real web service chaining thing. Just like you’ll have proprietary plug-ins to uDig, so too will there be plug-ins for the Web Processing Service specification. One will be able to take an open source WFSor WCS and pass it to a proprietary WPS for some special processing (generalization, feature extraction, ect.), displaying the results on an open source WMS. I also suspect geospatial search will be best done by proprietary services, as is the trend in the wider web world. Of course Google and Yahoo run open source software extensively, but they keep their core search logic private. So geospatial web services that require massive processing power will likely have core logic proprietary, but will base it on open source software. This again follows Weber’s point - the basic functionality isn’t a core differentiator, so there will be collaboration on basic functionality - returning WMS and WFS of processed data or search results, for example - and proprietary innovation on the edges (more advanced processing algorithms on huge clusters of computers).

In short, proprietary software will continue to exist, it just won’t play the central role. It will be forced to push the edges of innovation even more to stay afloat, but I suspect it will always be a leading edge. Of course I believe open source will innovate as well, especially in this geospatial collaboration area. But the ideal is a hybrid world with the right balance of cooperation and competition to push things forward faster than we could alone.

Posted in architectures of participation, geospatial, geospatial web | 3 Comments »

On Framing

Posted by cholmes on September 24, 2006

So there have been a couple points in responses of late that have got me thinking again about the power of framing an argument to use certain terms that benefit one side. An example of this is the two sides of the abortion debate - pro-life and pro-choice. If either was to take the implied opposite of the other side they would instantly lose the debate - imagine rallying behind ‘pro-death’ or ‘anti-choice’. Digging in to postmodern thought in college definitely got me too deep in to all sorts of meta debates. But as I’ve given up much of the meta thinking - realizing that living in the present moment leads to a richer and happier existence than constantly dissecting multiple levels of the world around - I still have retained a huge respect for the powers of words and ideas.

Often the world appears to me as a battlefield of memes. Castoriadis’s ideas of the Imaginary really got me thinking about how much society is shaped by ideas and words, and that one can potentially change society by changing how the people think and dream about themselves and their world.

Though I am utterly clueless as to how we might go about changing broader society with just a few well placed ideas, I am confident that there are a few things we can do to help frame the debate over sharing ideas, code, and data. In the world of open information, the Free Software Foundation is great at recognizing and recommending to avoid certain phrases, out of the recognition that they frame the debates in such a way that puts the other side at an extreme advantage.

The two terms that have come up recently in this blog are ‘commercial’ and ‘intellectual property’. All my thoughts on these are based on others, but I feel it’s important to remind myself and others about the framings of the debates.

‘Commercial’ - many people often make a comparison between ‘commercial’ and ‘open source’ software. This is a framing that companies whose livelihood is based on proprietary software seek to propagate, as it makes open source software seem to be something that can’t be used in a commercial environment, that doesn’t have sound business models. Free and Open Source Software has never included any clauses that prevent people from making money off of it, indeed one can charge whatever license fees one wants. The difference is that the source code of the software must always be included.

I am less strict about this framing than others, perhaps because I work for a non-profit instead of a commercial company making big money off of open source software. But I still feel it’s incredibly important that we can frame open source software as commercial software, and to recognize proprietary software as just one way to make money producing software. There are now many commercial companies that are quite profitable making open source software, and this is great for the open source community. I believe open source can go much farther, but a key is that it’s thought of as a viable commercial alternative, not some kind of opposite.
‘Intellectual Property’ - SteveC brings this up in the comments - pointing out that one can’t steal data, one can only infringe copyright. Alan responds that it can be stolen, as lawyers call copyright ‘intellectual property’. I’m going to have to side with Steve on this one, though I know ‘intellectual property’ is the popular term. Unfortunately it’s a poor reflection of the reality of digital information, and it’s an attempt on the parts of lawyers and the corporations who pay their bills to put a square peg in a round hole. To quote Barlow:

Intellectual property law cannot be patched, retrofitted, or expanded to
contain the gasses of digitized expression any more than real estate law
might be revised to cover the allocation of broadcasting spectrum.

(his whole ‘Economy of Ideas‘ is really quite excellent on the subject.)

It’s a travesty that the powers that be have successfully framed information such as data, software, music and the like as ‘property’ instead of ‘ideas’ - the latter which we accept should be spread, remixed, and recombined. A couple years ago I thought up an alternate definition for IP - Intellectual Production. Thus code and data still is something you create, and is of value, but doesn’t try to constrain it in ‘property’, it recognizes that the ideas, information, and information products that you produce are not really yours, they are the world’s. A couple weeks ago I ran in to another take from the FSF - ‘Intellectual Wealth’, they are pushing to create a World Intellectual Wealth Organization, instead of WIPO. They also have one of the best takes on the pitfalls of using the term ‘intellectual property’, which they call a ‘seductive mirage‘.

In conclusion, I think it’s of utmost importance that we frame the open geodata debate in the right way, that we learn from the lessons of the free and open source software movement. Open geodata need not be non-commercial, and it is not creating property. It is creating information that will make the world a more open, collaborative, and better place.

Posted in geospatial | 1 Comment »

Re: Why isn’t collaborative geodata a bigger deal already?

Posted by cholmes on September 4, 2006

First off thanks everyone for the great responses, it’s great to have different perspectives refine my thinking on this subject. In this post I’m going to attempt to respond to many of the great comments and questions. Some of my responses won’t be complete, and will beg a full post to themselves - indeed many of the issues raised are things I’ve thought about and have future posts planned. But I like a conversation much more than a monologue, so it makes sense to address what comes up now.

The ‘FOSS vs. Commercial divide’ definitely needs its own post, but I will evoke Arnulf and say that FOSS can be commercial, and the proper divide is FOSS vs. propertietary, which I will address in a future post.

Alan, thanks a ton for the thoughtful insight. It’s great to get feedback from someone who’s been thinking about these things far longer than I. In future posts I’m hoping to more fully explore how we can bootstrap an architecture of participation - this post was to just posit reasons as to why we haven’t seen collaborative geospatial data emerge already. But it’s great to hear that the ‘priesthood’ has no real power - as in much of life the power is truly with people, and we’ve really just got to organize right to exercise the power.

I agree that data is not software, and that the challenges are going to be different, but I’d be interested in your thoughts on how software and data are fundamentally different such that an architecture of participation could not be formed. Because I’d point to open source software and replace software with data to a number of your points -

- the great desire is to have software not to create it
- creating software requires intellectual effort (not just technological & physical)
- the more effort it takes the more valuable it is
- if software is valuable people want to steal it

As for your assertion that if data is valuable then people want to sabotage it, I’d refer to one of Weber’s points of what leads to a successful collaborative project - ‘The product benefits from widespread peer attention and review, and can improve through creative challenge and error correction (that is, the rate of error correction exceeds the rate of error introduction)‘ So the issue is not if people want to sabotage it, it’s if the architecture of participation can handle the error correction at a rate greater than error introduction. Of course he’s not just referring to malignant error introduction, but it’s necessarily a part of it. So the question is if the commons can resist the sabotage. If a true commons of value is established, then people who find value there will want to protect the commons. If there are tools that make this easy, then it’s more likely the commons will be protected. You can have easy rollbacks, you could have people sign up to ‘watch this area of the map’, like ‘watch this page’ on wikipedia, looking out for vandals on the areas you care about, and you can limit commit rights. I’ll go in to these in more depth in a full post in the future, but note that open source software suffers very little from sabotage, as those who contribute directly are vetted before. Wikipedia is more prone to it, but also is able to correct itself. So we won’t pretend that potential sabotage of data won’t increase as the dataset grows popular, we just need to figure out the proper architecture such that the commons will be protected and fixed in a timely manner. One should also note that some datasets are very valuable to a few people, but not that valuable to everyone. So bike enthusiasts who want to map their favorite paths likely won’t have their data vandalised.

As for people wanting to steal the valuable data, that shouldn’t be an issue, just like it’s not for open source software - the commons must be guaranteed to remain open. I take this to be a base condition for collaborative geospatial data to really succeed. I do concede that there could be other incentive structures that allow substantial collaboration around geospatial data. But at this point I’m not so much thinking about them, I’m thinking of something similar to the open source software movement, where the base case is that the collaborative data is open to all.

On the subject of Spatial Data Infrastructures, I’ve got another whole thread on SDI’s. Geospatial webs and applying architectures of participation to SDI’s and the like. I think Dave’s point was also mostly about SDI’s as well, the interconnected content. For now I’m really just focusing on the micro level, creating and maintaining geospatial datasets. I certainly don’t think that all, or even the majority, of data on a true public SDI/geospatial web will be built collaboratively - we’re just talking about a small piece of the content puzzle. But I do believe that it can play an important role in helping to bootstrap a true public SDI, and it will be combined with sensors and real-time data services and the like, and I think the discovery piece that Jo points out is quite important. Dave, I’d actually disagree that historically the open source community has lead the charge, but I do think it will lead the charge for collaboration on open geodata. Are the surprises from proprietary software you’re thinking of SDI related or specifically for open geospatial collaboration?

The topic of public SDI’s segues to Geoff’s great point that we’re likely going to see collaborative mapping emerge in places like Asia where goverments have restrictive terms for access to geospatial information. Thanks for the link to malsingmaps.com, I’d not seen it before, and am attempting to gather examples of proto-collaborative mapping. Looks like they’re using MapBuilder for their online map. Hopefully I can get in touch with them and learn more about how the community works and what the motivations of individuals are, but this is really one of the most advanced collaboratively mapping examples that I’ve seen, and I’m quite excited about it. Previously I had actually been thinking that innovation might first come from countries with less restrictive mapping policies, that we’d first see perhaps a massive project to improve TIGER data, since you have such a jump start with over 90% of a basemap for the US complete. And that you’d have forward looking mapping agencies collaborating with citizens on more ‘fun’ datasets, like nature areas and bike paths (MassGIS ,my favorite mapping agency, has had some experiments with those layers). But in some places the need is great and a small group of motivated individuals could just make enough of a difference to start. It looks like they’re making use of MapBuilder, MapServer and probably PostGIS, using strong open source projects as the base on top of which they innovate, which is definitely the path to take.

For Sean’s point, I completely agree that it’s going to take a lot of time and effort. But I actually think open geodata falls closer to software than wikipedia (though wikipedia is great for proving that the root concept may work even better for domains other than software). The GNU effort couldn’t reuse any existing tools, the legal constraints forced them to build it all from scratch. And it took many years before it got critical mass, and even more until Linux built an operating system for the tools.

I also think the ’snowball point’ will also be more like open source software as well - wikipedia snowballs right when you’re past the notion that only professionals are qualified to write an encyclopedia. But software certainly doesn’t snowball at a similar point. It snowballs when the existing open source software is close enough to the needs of commercial companies such that it costs less money for them to invest in the open source software than it does for them to buy proprietary licenses. Of course this point is different for each company, depending on many, many factors. But as one company invests in open source and gets it good enough for their needs then it may become advanced enough for other companies to invest for the next step, and thus a snowball is born. I believe the point when mapping data will snowball is when it makes economic sense for a company to invest in a collaboratively built map, improving it for their needs, instead of licensing a proprietary map. And yes, this too will be different for each company - some only need general context to overlay their specific geospatial information, others need exact info and routing and the like.

But I agree, it’s going to take time and energy, both at the meta level to make it easier to overcome the logistical problems, and at the down and dirty level of going out to re-survey, well, just about everything. Just attempting to identify what’s held it up in the past by no means is the same as building, and that is the much bigger challenge. It’s going to be an uphill battle for awhile, but I do believe eventually we too will see a snowball. And I’m keeping a firm eye on you and the pleiades project for some brilliant techno-cultural inventions.

Posted in architectures of participation, geospatial, geospatial web | 11 Comments »