Collaborative Mapping: Tools, cont.

The next major area of tool improvement I see is expanding the wiki notion of editing to more of a merging revision control model, with branches, versions, patches and eventually expanding in to distributed repositories.  The ‘patch‘ is a small piece of code that can be applied to a computer program to fix something.  They are widely used in the open source software world, both to get the latest improvements, and to allow those who have commit rights to a source repository to review outside improvements before putting them in.  This helps create the meritocracy around projects, as they don’t let just anyone in to the repository as they might break the build.  Such a case is less likely with maps, but sometimes core contributors might want to see a couple sample patches before letting a new member in.  In the GeoServer versioning WFS work we have a GetDiff operation that returns a WFS Transaction that can then be applied to another WFS.  This fits in with the technical part of how a patch works – they’re really easy to apply to one’s dataset.  But unfortunately a WFS transaction is not as easy to read as a code patch.  The other great thing about patches is that when leaf nodes are updating their data they can just request the change set – the patches – instead of having to do a full check out.  So I’m still not sure how to solve this problem, the WFS Transaction is the best I’ve got, but I think we can do better, have a nice little format that just describes what changed.

Once we’ve got patches people are going to want the ability to merge changes.  If you made a patch and I made a patch and we both submit them then we need a way to see if they’re compatible.  Ideally you could merge at the feature level – if you change the road type and I change the road length of Interstate 5 then we shouldn’t get a conflict.  Even better, merge at the geometry level, if we changed different points on the road then those should merge nicely.  This will become important as people start to ‘check out’ their geo repositories, do edits, and then try to submit back in.  We could just do locking, which is what WFS-T does, but concurrent versioning is so much nicer – we just have to be able to pull off merging.

Right past merging is full on branches.  Which of course are much easier to pull off if you’ve got nice merging in place.  But branches will let people try out new geographic updates in their own sandbox before putting them on the mainstream.  This can lead to better reviews of the updates.  And with nice branching and merging you would be able to let a number of people work concurrently on their own area of the map, merging them seamlessly.  This is obviously a really hard problem, one that even ArcSDE has trouble with for the things people actually want to do.  I do think we’ll be able to get there in the open source world, indeed I believe we have a better chance of achieving it since once we get close we’ll get a lot of interest in people wanting it completed and meeting their needs, funding the iterative improvements.

The final piece, that I sort of don’t even want to think of yet, since it’s damn hard, is distributed versioning.  I do think it’s extremely important though, to let everyone have their own editing repository, which can flow back in to the main one.  I like the model a lot, and think it has great wins for geospatial.  But since we’ve barely got an SVN equivalent I think it’s wiser to wait a bit on these issues till we sort out what a patch should look like.  Indeed SVK was possible because SVn already existed.  But I’m definitely excited by the possibilities, for every node of the map to have the potential to be edited.  This can be a big win for areas with low bandwidth.

The next category of tool improvements is granular security settings.  Right now there’s not even a way to limit editing the map to only some users.  I think that many maps will flourish with the open to all editing style, making use of rollbacks to prevent vandalism.  But some will likely want to keep the map to set group of committers.  This way one could get commit rights after doing a number of good patches, perhaps ensuring higher quality for some maps.  You also might have different permissions for different users on different layers.  We should be able to get all of that with our current GeoServer security system, we just need to hook up a UI for it.  The trickier thing will be a nice feature, and I think is possible – limiting users to certain geospatial areas or features with specific properties.  Since the security system is integrated at the code level, and lets us use aspects, I think that this should be possible, will just take a bit of work to figure out.

Another area I see a lot of potential innovation is distributed processing of tiles.  Tiles are the clear winner for how to display geospatial information, Google Maps has risen the bar so that anything that isn’t tiled just feels out of date.  But tiling takes a ton of processing power.  Google is all set up to do it, but the rest of us aren’t.  To fully cache http://sigma.openplans.org to zoom level 17 would have taken me about 5 months.  Open Street Map has been making tremendous strides on this with their Tiles@Home initiative, which I am very impressed by.  OSM is lucky in many ways, in that they have a project that people want to devote their spare CPU cycles to.  It could be cool to set up marketplaces for processing of tiles, where companies that are going to keep their data private, or just that don’t have the reputation of OSM, can engage other nodes and give them micropayments for their work.  I think other areas of potential innovation include leveraging Amazon’s EC2 to process huge amounts of tiles.  We’re also going to need to have the collaborative mapping stuff hook up with the tiling efforts, so that when there are massive edits the tiles can expire themselves and get processors started on generating new ones.  We can likely leverage http’s Conditional GET functionality to let browsers and others cache geospatial data, but also get the most up to date data when its available.

The last area I’d like to see improvement on is more granular notification mechanisms.  GeoRSS output is the obvious choice, but could also do email or SMS notifications.  Speaking of which I’d love more innovation on mobile clients, and even super low tech versions like be able to SMS in a new or updated location by just entering cross streets or reading a position from GPS.  But one should be able to have the notifications based on very granular rules – ‘send updates for highways in this bounding box’, or ’email all occurrences of the brown spotted pigeon along this river bank’.  This would be useful not only for preventing vandalism, but also to enable people to take action on up to date reports.  The map becomes not just an artifact of what has happened, but a living thing can help create more up to date information.  If the brown spotted pigeon is seen in one area then it will alert more people who can then add updates on its location and get a more detailed map of its path.

I’m sure there are many more innovations to be had with tools, but this is just a start of the things that we’re starting to work on and the things I’d like to work on in the future.  At TOPP we’re doing this stuff when we don’t have paid client work (or have met revenue targets for the year, since we’re a non-profit), but if there’s anyone out there who wants to see specific areas accelerate we’d be very excited to take on paid work to do any of the things talked about here <end shameless plug/>.

Collaborative Mapping: Tools

Continuing the collaborative mapping thread, I’d like to think a bit about tools to make this happen. Do a bit of dreaming, and maybe think through how we can get there. Definitely as soon as I start to talk about this people want to do all kinds of crazy synchronization and distributed editing of features. I do think we’ll get there, but I fear going for too much too soon, getting loaded down by over-designing and not addressing the immediate problems. Indeed Open Street Map has proven that if the energy is there the tools just need to do the very basics. I have been putting my energy in to getting a standards based implementation, on top of WFS-T, but that’s more because I know it and I like standards. I don’t think it’s the best way to do things, and I don’t even think it should be the default way to do things – at this point I’d prefer something more RESTful. But I believe in being compatible with as much as possible, and there are already nice clients written against WFS-T. So it should always be a route in to collaborative editing.

First off, I think we need more user friendly options for collaborative editing. Not just putting some points on a map, but being able to get a sense of the history of the map, getting logs of changes and diffs of certain actions. Editing should be a breeze, and there should be a number of tools that enable this. Google’s MyMaps starts to get at the ease of editing, but I want it collaborative, able to track the history of edits and give you a visual diff of what’s changed. Rollbacks should also be a breeze – if you have really easy tools to edit it’s also going to be easier for people to vandalize. So you need to make tools that are even easier to rollback. On the GeoServer extended WFS-T Versioning API we’ve got a rollback operation, that can work against an area of the map, a certain property, or a certain user (or combinations of those). Soon we hope to be working on some tools built on top of openlayers to handle those operations in a nice editing environment.

The next step on user friendly options will be desktop applications that aren’t full GIS, but that lets users easily edit. These can leverage the tools of existing open source GIS desktop environments, like uDig and qgis, but can strip down the interface to just be simple editing environments with a few hard coded background layers. You could have branded environments for specific layers of information. And ideally build other kinds of reporting tools that also leverage the same GIS tools, but in an interface geared towards the task at hand, like search and rescue or tracking birds. The other thing I hope to work on is getting some of the editing hooked up with Google Earth. I just learned there’s a COM API that might allow us to hack something in, or we can try to get Google Earth to support POSTing of KML to arbitrary URLs as Sean suggest

Next I’d like to see integration with ‘power tools’, the full on, expensive ass GIS applications that are the realm of ‘professionals’. Not that I have a huge love for those tools, but I’d really like to engage as many people as possible in to collaborative mapping. GIS professionals are a great target audience, since most of them are already passionate about mapping. They have a lot of expertise to bring to the table. And while some of them can be elitist about collaborative mapping and ‘lesser’ tools, so too can many of the amateurs raise their noses at people who aren’t DIY. At the extremes it can obviously be a major divide, but I think both could have a lot to teach each other if they’re willing to listen. But I believe the first step to get there is to get the ‘power tools’ compatible with the collaborative mapping protocols, so you start them off in collaboration. This is one reason I’m an advocate of the WFS-T approach, as there are plugins for ArcGIS and other heavy desktop GIS’s. I think we could see some professionals get really excited about collaborative mapping, as it could become the thing they are passionate and do in their free time that is fun and helps boost their resume. This is how many open source contributions work now, it’s a complex interplay that includes professional development. Perhaps one’s collaborative mapping contributions could help land jobs in the future.

I’d also like to see more automation available in the process. This is an area that could use a lot of experimentation, how much to automate, how much to let humans collaborate on. But I think there’s an untapped area of figuring out vector geometries from the aggregrated tracks of GPS, cell phones and wifi positioning data. People are generating tons of data every single day, and most of it is not even recorded. It’s great when people take a GPS and decide explicitly to map an area and then go online and digitize it. But we could potentially get even more accurate than just one person’s GPS by aggregating all the data over a road. Good algorithms could extract the vector information, including turn restriction data, since it could figure out that 99% of fast moving tracks are going in the same direction. Of course we’ll still need people to add the valuable attribute information, but this way they’d have a nice geometry already in place.

You could also do feature extraction from satellite and aerial imagery. This is obviously a tough people that many people are working on, but perhaps it could also be improved by the leveraging human collaboration. In a system with good feedback people could perhaps help train the feature extraction to improve over time. It also could be valuable to do automated change detection, which then notifies people that somethings changed in the area, and then they could figure out the proper action.

The final area I think we could improve with automation is prevention of vandalism and silly mistakes. GeoServer had work done by Refractions a few years ago to do an automatic validation engine. Unfortunately this has languished with no documentation, but it’s still part of GeoServer. One can define arbitrary rules to automatically reject bad transactions – geometries that intersect badly, roads with out names, ect. This could also reject things like ‘Chris Rulez’ scrawled over the whole of the US, as it could know that no real roads run in completely straight lines for over 200 miles. I could imagine a whole nice chain of rules to ensure that all edits meet certain quality criteria. And perhaps instead of rejecting straight up any edit that doesn’t follow all rules can go in to a sandbox. I could also imagine some sort of continuous integration system once there is topology to check network validity, and other quality assurance pieces that can’t take place instantly.

Ok, I’ll wrap this post up for now, will continue this thread soon.

Collaborative Mapping: The Business Thread, cont.

So if there is a future where collaboratively mapping could be economically competitive, how do we go about actually getting there?  I actually think we’re further than many might think, though I believe there is still a lot of work to be done, innovating with the tools, communities and workflows to make this happen.  But I’ll address that in another post, for now I just want to present a possible path for collaborative mapping to bootstrap in to the mainstream.  I’m going to focus on street maps, since that’s the information that people pay big money for, and there is already early success with Open Street Map.  Later will examine how the lessons learned there can feed in to other domains and back

So step 0 is proving that it’s possible for a diverse group of people to collaborate on an openly licensed map.  I’d be hard pressed to entertain any arguments that Open Street Map has not already accomplished this.  Of course in its current state you can’t navigate a car on it, you’re not going to do emergency vehicle response with it.  But their driving principle has been they ‘just want a fscking map’, and a map they do have.  There are many contributors running around with GPS’s and creating a map.

The next point in the evolution is when the map is good enough for basic ‘context’.  Again, OSM is already there for several parts of the world.  If you’re doing a mashup of your favorite neighborhoods you don’t really care if all the streets are there.  You just need enough that it looks about like your neighborhood on other maps.  Many mashups use google maps and others in this way – which is sorta like using the same quality water to flush your toilet as comes out of your kitchen sink (USA!).  Which is to say a bit of a waste, but who really cares if someone else is paying for it.
Which speaks to another tipping point, which is when the big portals start putting ads on their maps.  Or when they start charging to use their APIs.  I concede now that this may never happen, that it’s a good loss leader to have people using your API for free as long as they put their maps out in the public.   But a part of me feels like we may be in that period of the GeoWeb like the first web bubble, when you could get $10 off coupons from CDNow and B+N, allowing you to buy any cd you wanted for a few bucks.  It wasn’t going to last, but it’s sure fun while it does.  But at some point there may be a shift when they need to make some money, which could drive more energy to collaborative maps as people look to get ads off their service.

The next step starts to get fun, which would be once a collaborative street map gets good enough for basic routing and navigation.  Right now it seems to be (though I could be wrong, I don’t know the OSM community intimately) people who set out to add data to the map, they want to get their area map.  If they go to new areas they’ll bring a GPS along, but it’s often to a totally unmapped area.  I think once large areas start to get close to completion we’ll have people hobble together ghetto car navigation kits.  A laptop with a GPS and the collaborative map, either connected over some kind of wireless internet or downloaded to the car.  One can drive around with this and it will show one’s place on the map, and directions to the end point as well.  Note that this kind of usage is currently illegal with Google Maps or any of the others who get their data from commercial providers.  From the API agreement: ‘In addition, the Service may not be used: (a) for or with real time route guidance (including without limitation, turn-by-turn route guidance and other routing that is enabled through the use of a sensor’.  This is because the commercial mapping providers make big money off of car navigation, and license the (exact same) data to do that at a higher price.

With basic navigation on a collaborative map in place you can get people excited about going off in to a ‘new frontier’, going off the map and tapping in to their inner Lewis and Clark.  Actively encourage people to Dérive (though I’m not sure how much the Situationists really would like the idea of people using cars to dérive) in to uncharted areas of the map.

On other fronts I believe that we’ll see niche areas getting high quality mapping.  Governments and companies will realize that if there’s a map that’s 80% done and they just need to fund the last 20%, and that owning the map is not their key value proposition, then they’ll just look to fund the collaborative map instead of doing it themselves.  Those that can think long term will realize that this will most always be cheaper, since they won’t have to keep paying to get it up to date.  With a good collaborative structure much of that will happen on its own.  And they may put a bit extra in each year.  And in areas where a few different organizations all partner up it will definitely be cheaper.  Already we’re seeing some enlightened folks fund Open Street Map contributors to have a mapping party and map an area.

We’ll also likely see collaborative maps for niche verticals.  If you’re doing walking maps then you don’t need the turn restriction information to do car routing, for example.  Someone may offer a map of the best drives in southern california, which would be a subset of the main map.  Or a detailed map of which roads need to be plowed after a snowstorm, that leaves out the roads that don’t.

After that I think you’ll see people hacking commercial nav systems to make use of the collaborative map, and then navigation companies offering low price versions of their systems that don’t rely on the commercial data.  Already we’re seeing navigation companies start to ‘leverage user contributions’, with TomTom’s ‘MapShare‘ to let people update points of interest and the like, and Dash Navigation‘s ability to leverage GPS from other cars to see if a new road has opened up.  I think you may see people even more excited about this if they knew their work was going to a common good instead of just to the advantage of one company.

Once people are able to ‘correct’ the map that they’re driving on I believe we’ll see a really big tipping point.  Build in some voice recognition to call out the name of a street while you’re driving.  This could be billed as the ‘mapping game’, where one gets points for driving new areas.  One could even imagine a company that sets up a business with sort of ‘bounty navigation’ where you can actually make money if you drive new areas of the map and do good reporting of road names and the like.  This could be one of the decoupled functions of the economics around collaborative map making, the navigation company partners with the company that guarantees the map is up to date, and instead of contracting out another company to drive the roads they just put money rewards on driving in new areas.  People could make it so their navigation is free, or even have it be like the electrical grid where if you generate a lot of extra navigation information they pay you.  I haven’t thought through all the details of this, but I think it could work, and would be super cool for helping people think of geospatial data as a commons that one can contribute to and that we’re all responsible for and can be a part of, not just consumers of a service.

Which speaks a bit to a further point, which is when governments realize that they can tap in to and contribute to this as well.  The census spends a ton of money keeping up to date road information.  But their data is not entirely accurate, and it doesn’t do any turn restrictions.  Instead of maintaining their own database they could combine with an open map, and plug in to that workflow.  Indeed such a map likely would have started from one of their TIGER line maps anyways in the US.  So government organizations can join the ecosystem, likely just as funders contracting out other companies to perform the work, as they are starting to do more and more with open source software.  Some may want to try to do it themselves, but the smart ones will plug in to existing ecosystems.

The other tipping point towards the end will be when the big mapping providers decide to invest in collaborative maps.  I had initially been thinking that things would need to be really far along worldwide before they’d make the switch, but a more likely solution might be that they use it in conjunction with their commercial maps.  They already make use of TeleAtlas and Navtech in different places.  So as long as the collaborative map didn’t have a restriction about combining with other sources they could just use it in places that have poor coverage from the major providers.  And they could see where areas of the map are close to being done and strategically fund those.  Another potential source of investment in this kind of mapping could be from aid agencies in areas that commercial providers haven’t mapped.  They could hook up their GPS’s to gather information, and then employ a few people to help process and QA it to make maps they can use.  Since it’s not a core value proposition to them they can share it with others, and start to build really good street maps in areas that no one has touched because it’s too hard for the money they would get.  I would love to try a start up in Africa that hooks up the correcting car navigation systems to a bunch of vehicles and just starts building the living map.  It’d be quite ironic if Africa ended up with more up to date maps than Europe.

They key with all this for me is the evolution of viewing mapping data as a public good, that we all collaborate on to make better.  As GPS’s become more and more prevalent we are all just emitting maps as we go through our lives.  All that’s really needed is a structure to turn that in to useful information, getting the tools better and setting up the economic reward structure.  I’m not a business person, so I don’t have much more to throw out in terms of economic ideas.  But I believe it is possible to set the levers right to encourage this.  And I’m going to do my best to get the tools better and better to show what is possible and get us all moving towards as a future where an up to date accurate map is a commons available to all, and that all are a part of.

Collaborative Mapping: The Business Thread

Since I was speaking at Where 2.0, which is a good bit more business oriented than most conferences I attend, I felt that it could be interesting to start to make the business case for how collaborative mapping could succeed.  I didn’t have time to do more than throw out a few ideas, but I think it’s important to start thinking about this.  I fully believe it will happen, though I’m not sure of the time frame at all.  But it’s almost inevitable, since the economic end result is a more efficient allocation of resources.  The only question is whether there will be enough incentives along the way.

At the core is the idea that there will be a series of ‘tipping points’, where it will be cheaper for an organization to fund a collaborative map than it is to buy the data from the commercial provider at the accuracy that they need.  The last clause is important, because there are many cases where one just needs data as a base for other information.  Many google maps mashups just need some context to display the information they’re showing.  So if there is a route to get from one tipping point to the next for different sets of organizations than a collaboratively built map will emerge as competitive if not better than those made by commercial providers.  There will be no middle man who bundles all the functions of mapping together and extracts rent after having done the work.  Instead there will be a diversity of organizations, including private companies, government, and individuals, who all work together to make accurate, up to date maps.

Currently the idea of a collaboratively built maps is still quite radical.  The work is being done mostly by amateurs (in the best sense of the word) and ‘true believers’, who know that it’s the right way to build open maps.  No traditional geodata providers seem to feel threatened at all, at least not yet.  This is very much like the early days of open source software, it was seen as some purist movement that would never have a big effect on anything.  But people stuck with it and ended up building a huge economic engine of change.  I am a true believe who is sure that collaborative mapping can become a more efficient way to build geospatial data, fully supported by a variety of business models.  So I’ve been thinking about how we can move from being rebels to becoming the default way of getting things done, as open source software has.  It took 20 years to go from starting the movement to mainstream success, and I think we can follow in their footsteps and do it even faster (and indeed leverage open source software as a base to build it upon)

Let’s start by fast forwarding to a future where we have economically successful collaborative maps.  Then from there we can look back and see how we might get there, what tipping points would be involved.  We are currently in a stage where business models around open source software are maturing.  Building software still costs money, even if it’s open source.  But there are a variety of ways to make money even with a collaborative base.  The key for me with this is that the functions of traditional software company have been decoupled from one another – you can buy support on open source software from one place, a manual on it from somewhere else, and then get training on how to use it from a third organization.  None of the functions of a software company have gone away, they’ve just been split up in to smaller pieces, and there is competition in the market for each of them.  Thus making them more efficient and more competitive.

The biggest providers of commercial geospatial data also wrap up a lot of functionality in to one package:

  •  pay people to go out and drive the roads to keep the database up to date.
  • Find and acquire data from public sources
  •  Process the raw data, doing quality assurance
  •  Ensure that the information is up to date – ie give people someone to sue if it goes wrong
  •  Services and Consulting – ‘analysis and proprietary research’, ‘business plan reviews and testing services.’ from http://www.navteq.com/developer/index.html
  • Geolocated yellow pages – ‘placing your business on the map

Since the commercial databases are not open there is no way to separate out this functionality.  With a collaborative map one could imagine niche companies doing one of them, or new companies that combine some of the processes here with other functions.  Navigation is an obvious one, and indeed TomTom is starting in on this with their MapShare.

So you could have a company that just goes out and drives the roads and turns over the data / adds it to the collaborative map.  Clients who want an area of the map more up to date could pay them directly.  You could also have small businesses who want to be sure that people get accurate directions to them.  We could even imagine ‘a man with a van’, but instead of for moving it’d be for driving roads with a GPS, and perhaps some camera’s strapped to the roof.  There then could be a company whose expertise is processing raw GPS data.  GM or FedEx might sell the data from their vehicles under permissive terms, and then a company could do a bunch of algorithmic analysis on the data to extract roads, and contribute those to a collaborative map.

Then there could be a class of companies who just provide guarantees.  Someone you can call up if there’s a problem, get a service level agreement.  They in turn would have internal people to drive roads, or contract out with the other companies when they needed a certain area to be at the accuracy they’ve agreed to provide others.  You have this in the open source software world, with companies like SpikeSource that test how things work together and give someone a number to call if things go wrong.

And of course the other big open source business model is to provide new services on top of open source software.  See Google and Yahoo! and most new internet companies.  They contribute to the underlying software to run the parts that they keep private.  Indeed those very same companies could become significant contributors collaborative mapping – they already spend significant money in licensing fees from commercial data providers, and if it made economic sense to put the money in to a collaborative map they likely would.

One could also imagine a company whose sole purpose is to do accuracy assessments of collaborative maps.  They would play a very key role, in that they’d be able to answer the question for companies ‘should I invest in collaborative mapping?’.  I maintain there is a tipping point for just about every organization, but it will be very painful if the area they need mapped has very poor coverage and they have a small budget.  So for a small fee there could be a company that lets you know how much investment it would take to get the area of interest to the accuracy that the organization needs.

Ok, this post is already long enough, I’ll continue soon with more on the business case, what steps might evolve us to a place where collaborative mapping is simply the smarter economic choice.  But my main point for this post is that it is possible to decouple the function of ‘ownership’ of a set of geospatial data from the functions that are needed for its upkeep.  Indeed such a decoupling could easily lead to a more efficient market around the upkeep of the data.  One thing we neglected to mention as well is that a collaborative map opens up the potential for non ‘expert’ contributors to do valuable work, as long as the structure is set up to minimize vandalism and the like.

Collaborative Mapping Redux

So after a lengthy hiatus, it’s time for me to get back in to the whole collaborative mapping thing.  I’ve been thinking about such things a lot lately, and didn’t have time to get to most of it at my Where 2.0 presentation.  We’ve also been making progress in GeoServer to support all kinds of collaborative mapping – put the power in the hands of users to determine their workflows and permissioning.  I’ll roughly break things up as I did in my talk – the premise is that the grass roots remapping can’t be stopped, but that there may be ways to help it go faster.  We’ve got Open Street Map, obviously the clear leader with an amazing community.  But while something like a collaborative street map may have wikipedia type properties – where it makes sense to have a single community crystallize around one – there are many, many more maps to be made than just road maps.  And while there are lots of software packages that aren’t MediaWiki (what runs Wikipedia), there are practically no options for collaborative mapping.  So The Open Planning Project (TOPP), my employer, has been putting a lot of effort in to making GeoServer a better platform for collaborative mapping.  We’re looking to make it a flexible platform for different communities to experiment with a variety of workflows, figuring out what works for them.  The 1.6.0-beta1 release should show off some of what we’ve been working on.

The low hanging fruit seems to be simple reporting functionality.  I’ve been talking to several different non-profit groups, and many just want an interface for users to throw some information on the map – a point and a description, like Google Earth.  But they don’t want to be passing KML files all around.  Which speaks to one way I’ve been articulating where we’re going with GeoServer – ‘cvs for the geospatial web’.  We give you a central location where you can do insert, update and delete on your data.  And our latest improvements are to do the versioning – history, diff, rollback – so you can keep editing the geospatial information and not be worried about non-experts corrupting it since you can always revert changes.  We’re building this on top of the WFS-T standard, and let normal WFS-T clients use versioning transparently – that is their edits will go through fine, and will get versioned with out them even knowing.  Version aware clients will be able to take advantage of the additional functionality – getting a log and a diff, doing a rollback, ect.  Ideally we have a variety of clients – web based, mobile based, desktop, ect.  Past the basics we’re thinking about a lot of workflow customization.

So the next few posts will explore in more depth some of the economic aspects, some ideas on the technical side – what further tools and functionality would be helpful, and the legal ground we need to clear before collaborative mapping can really take off.

Google and the Geospatial Web: A smaller piece of a much, much larger pie.

Well, I’ve been back from Where 2.0 for awhile, and as usual blogging hasn’t been the highest priority, but there was one topic that I’ve been really wanting to write about.

And that is that Google seems to be legitimately moving in a more open direction with regards to geospatial. I’ve rarely been overtly critical of their lack of openness, but it’s always been a source of frustration for me. And as I got to know more people there I definitely realized that their lack of collaboration wasn’t the result of any malicious intent, it was simply a perceived lack of resources – they felt they didn’t have time to put effort in to standards and working with others. And I’ll be the first to admit that it takes more work to be open and collaborative.

But I do say ‘perceived’, because the thing I’ve found again and again doing the ‘open’ thing is that it’s an investment that pays off in the medium and long term. Working alone definitely allows you to move faster in the short term, but working with others leaves you much better off in the longer term.

With regards to Google’s geo portfolio, the way I’ve always termed it is they could have a huge piece of a small pie or a sizeable piece of a much, much bigger pie. What pie is it we’re talking about? What I’ve referred to as the Geospatial Web, though I’m trying to call it the geoweb, since that term seems to be taking off more. They are obviously the clear leader, with Google Earth and Maps, specifically KML and mashups – as those both allow more geospatial data to get out in the world. And they could just push KML and their platform and do quite well. But it would be a silo. It wouldn’t be like the web, it’d be a greatly expanded and easier to use Geography Network. Much, much better and bigger, but still a single platform. It could potentially even become a platform like Windows, truly dominant, but the point for me is it still wouldn’t be as big as it could be. It wouldn’t be the World Wide Web, where innovation comes from all over building something far bigger than any single company could possibly make on their own.

The bigger pie is the vision of a true Geospatial Web, that diverse individuals and organizations all contribute to, and where technical innovations come from all over. To achieve this there must be an underpinning of open standards, that others can contribute to. There must be an ecosystem of companies and services, business models and startups and dot-orgs. The ecosystem can be dominated by an entity, but can’t be entirely dependent on a single entity, as would be the case if Google defines the software and the format and the search engine. But if this open geoweb is nurtured and encouraged the right way we’ll get exponential growth. Citizens will start demanding that governments and organizations data put their data on, just like we’ve seen happen with eGovernment on the WWW. It will become a default, and people will look at you weird if you have geospatial data that’s not on it.

I think it’s not crazy to aim for the majority of all spatial information to be available on it. It will be a much bigger pie than one that Google owns, as more and more people will feel comfortable making their data available, since it’s a public resource instead of clearly benefiting a single company. And it also allows further innovations to come from the outside. Google has a ton of smart people, but they don’t have all the smart people in the world. They can afford to let innovation come from elsewhere (though I’m sure they’ll probably just buy up the best ones), because they’ll start to do what the company does best: search. There’s no reason to own a geoweb when you can own the way most people find information on The geoweb.

Of course, even with search Google could constrain it to their web, as they did when geo search came out – it was called KML Search and only could find KML. What they are going for now is much more ambitious, and indeed a bit more risky. And so I applaud them for it – they are putting a stake in the ground that says ‘our best ideas are not behind us’. They are going to be a leading force in a much bigger pie, and turn this open collaboration in to a really good long term investment.

Ok, I’ve gone on speculating about things, I probably should give a bit of evidence. I admit that it’s pretty subtle, but based on it and a few conversations my gut tells me that they are legitimately on the level. At least for now, that’s not to say that some corporate decision could move things in an opposite direction: such is the fate of a publicly traded company. But they seem to be trying to do some work that will be hard to undo.

First, KML Search is now referred to as ‘geo search’, and is crawling not just KML but also GeoRSS, with more formats likely coming soon. This is one of the most important pieces to me, and was the announcement that excited me much more than StreetView. It is admitting that it’s ok for people to use other formats, even though KML is super nice and easy to use. Yes, more formats may confuse my grandmother (one of the eloquent arguments used by Google folks in the past for why we should all just use KML), but more formats also means extending an olive branch that says you can work with others.

Second, Google is an active sponsor in OGC’s OWS-5. I had been a bit skeptical of their throwing KML over the fence to OGC. Yes, it’s nice the copyright is with OGC, but it’s kind of meaningless to me unless KML actually aligns with the other open standards. And OGC would likely try to do that, but then it remains a question if Google would actually support the new standard. Or if they’d have this covert control over it with the ability to exclude any decisions they didn’t like by just not including an implementation in Google Earth and Maps. But they are sponsoring OWS-5 which will fund several server and client implementations to flesh out a new KML spec that incorporates other OGC standards. The OWS testbeds are the best way to develop specs in the OGC, and putting real money up for this definitely indicates for me a commitment to making KML a true open standard, not just a rubber stamped pseudo-standard. The one piece that I’m not sure on is how much they’ll have engineers working with OWS-5 to try out the new spec ideas on Google Earth and Maps. If they have a couple people show up at the kickoff meeting who are set to work on it for the next few months I will be very happy.

Third, John Hanke’s speech at Where 2.0 was the first time I had heard the Google geo team really tell the world that they want to work with others. Some of it was subtle, but there was definitely a flavor of openness and collaboration that I’d not felt before. Previous speeches would always come back to the innovations they’re doing, how great KML is, ect. There was little acknowledgment of an outside world, which could come across as fairly arrogant – that not only are we doing things the best way, we haven’t even looked in to how anyone else might do since we must be doing things the best.

And finally, in private conversations many googlers have talked about a more open shift in the past 6-9 months. There were always a few voices for that, but it sounds like a tipping point has been reached and there is now a critical mass. The voices are heard and effort is being oriented in that direction. I think it’s an investment that will really pay off for Google, and though I’m going to continue to work to push them in to ever more open directions (maybe even to be able to talk to them about what they’ve got in the pipeline without signing an NDA? Ah, to dream ;), count me as a skeptic who is becoming more and more convinced that we’re going to build a true, open, collaborative geoweb.

Where 2.0 slides

Hey, just wanted to throw a quick post up with my Where 2.0 slides. The slides are not too exciting, I need to learn to make prettier slides (or at least to try to spend more time on doing that), but I think the talk ended up decently. I plan to do blog posts of the new thoughts – the main thing I had fun with was thinking through how collaborative mapping will make business sense in the future, and how we’re going to get to tipping points where things will shift in bursts. It wasn’t exactly what I submitted as an abstract originally, but Schuyler hit on many of those thoughts, so I oriented my talk to expand on some of the points he didn’t have time to fully explore in his excellent keynote. Unfortunately I didn’t get to the legal part, time was tight after the lightning talks. But look for blog posts on the newer stuff, and I’m contemplating doing another tech talk, about this and more, which would get it on video for all to see.  Thanks to Brady and the whole O’Reilly crew for letting me talk, and for putting on a great conference.

Public domain imagery from iCubed for WorldWind and beyond?

So I’m watching this video about the new Java WorldWind.

And there’s a couple quotes of interest from Patrick Hogan, NASA’s lead on the project:

That’s access to different NASA datasets that you can leverage, public domain, so you can use and abuse that information as you like, do anything you want with it, but mostly have fun have fun with innovating, kind of going places we haven’t even dreamed of yet.

I should point out that the iLandsat is from a company called iCubed and they have provided that kind of, that dataset for the earth that typically costs about a quarter of a million plus just for internal use, and they have donated it to WorldWind for use by the public.

Public domain imagery from iCubed? Sounds like a dream come true to me. Of course this just opens up lots more questions, like what resolution, what part of the world, what year is it from, ect. But if it’s truly public domain that’s really great news for any collaborative mapping projects that are unsure about deriving their information from commercial imagery.

I’m hoping that someone will be able to hack in and figure out if the imagery is really available. But the server referred to in the source code seems to get ‘Server is too busy’ errors, and when I use WorldWind here I’m not getting any tiles. When I get some time I’ll maybe try to dig in to the source a bit more and maybe get some links to the imagery.

Looking at the source code does seem to reveal some references to GeoServer, for their placename layer, which we always like to see 🙂 I will encourage them to change the namespace prefix from ‘topp’ (which is the default and refers to the organization I work for), to something more appropriate like ‘nasa’ (though keeping it does make it easier for me to know it’s a GeoServer, which is nice…). And I’m curious about their ASPX cache – if you guys let me know what/how you’re caching I’d be happy to try to build it as a module for GeoServer.

I, for one, welcome our new proprietary SDI overlords…

So I’ve been slow on the uptake, as the geo blogosphere’s conversation about Google’s KML Search‘ took place awhile ago. Mostly because I’m only about half way through my meandering series of essays to make several points, one of which is that SDIs are crap, and that once there’s enough geospatial information of real value out there then a company will come along and let us search it. Well, reality has caught up with me as I’ve lost any kind of consistency with blog posts.

Allan, for one, asks the question of whether Google’s KML search is an Spatial Data Infrastructure or not. For me the whole SDI question is irrelevant, as they’ve been working at it for close to twenty years, dumped millions upon millions of dollars in to it and have come up with nothing. I read a paper that pointed that the Geospatial One Stop (GOS), a top dollar SDI portal, the center of US SDI development, had an ‘average 5622 user visits per week’ in April 2004. And lest you think this was them bemoaning that so few people were using it, this was an example of ‘success’. One of my non-profit’s blogs, focused on new york city traffic problems, gets more traffic than that. The GOS’s current Alexa rank is 491,396. Blogs about Google Earth get far, far more traffic. The paper goes on to point out that SDI’s take years, if not decades, before they are ‘fully operational’.

In less than two and half years Google has built a better ‘SDI’ than anyone else in the world. The whole point of SDI’s is to ‘facilitate the availability of and access to spatial data’. In my opinion they accomplished this before they even had search, since you could find more spatial data browsing the keyhole bbs than you could with a search box on any SDI, and could certainly access it more easily. You can pick just about any definition of SDI’s (of which there are many) and Google KML + Maps Mashups succeed on all their criteria except for explicit agreements between organizations (which is sort of just overhead, since the Web doesn’t need these kinds of agreements to have lots of information available).

Like everyone else, I have major concerns about putting all our eggs in the basket of a single company, no matter how much they promise to not be evil. A ‘true’ SDI is obviously not one controlled by a corporation (even if a GIS monopolist was trying and never really got anywhere). But I think they’re making steps by looking to standardize KML within the OGC. But I disagree with Raj that the intellectual property matters all that much without the corresponding harmonization with current or at least future OGC standards. And I really hope he’s not hinting that such a harmonization is not on the table and that OGC’s willing to rubber stamp KML because Google was so kind to donate it. The CTO of Opera explained the problems of this quite well in a nice article on CNET He was talking about Microsoft’s attempts to standardize its Office Open XML spec, which is a direct competitor of Open Document Format. His point can be summed up with:

While it’s healthy to have competition between different standards, it’s rarely productive to have competing standards within an organization.

He points to a couple cases where there have been competing standards and both have suffered. I believe it will be to the detriment of all if KML does not align its geometry objects with GML, and eventually separate data from presentation, as one does with SLD.

I’ll try to dig a bit more in to my feelings on KML itself in a future post, but the main thing I think they need to do is get a clear separation of concerns. GeoRSS/GML should not be viewed as a competitor, but instead a more standard way to handle the issue of geometries. KML is great as a package for delivering content to Google Earth, but google earth is a special environment, and it’d be nice if pieces of it were re-usable in other environments, and there were clear profiles on what makes sense to support in a 2d environment, what makes sense if you want to be crawled by KML search, which elements you need for 3d, ect.

But the main point of ‘KML search’ (could you please add GeoRSS, WMS and WFS and do ‘geospatial search’?) to me is that this raises the bar of what availability and access to geospatial data means. Though SDI builders have been dreaming of it for years, it will now be unacceptable if they provide anything less capable than the GoogSDI. So though I don’t trust even a ‘non-evil’ publicly traded corporation, I think they’re pushing the bar in a very positive way. We just need to be sure we build a truly open ‘geoweb‘ together, with innovations coming from all corners by leveraging and crafting open standards and platforms.

REST feature service sketches

So after much studying at the feet of the geospatial REST master, I’ve finally figured out enough that the master himself asked me to post what I wrote to the wfs simple list about what a restful feature service might look like (any chance of making those archives public, Raj?). For anyone wanting to catch up to where I am, read Sean’s ‘web‘ category, especially the ones talking about WFS and REST (though I imagine the fact that the WFS spec is permanently tattooed on my brain probably helps a bit for figuring out how to apply the REST concepts to WFS).

I’m strongly considering offering a geospatial REST interface to data already available on GeoServer as WFS. The big motivation for me is to be crawled by KML Search (which I’m hoping they’ll rename geospatial search and include GeoRSS), as that’s what I want instead of catalogs. After that I’ll look in to the editing stuff featureserver.org is looking to provide, though we’ll likely back it with our new versioning work (first targeting expanded WFS-T interfaces, but the hard work done should be re-usable).

So what would a REST feature service look like? The main point is that all features should be resources at stable URLs, that can be cached and crawled.

So

http://sigma.openplans.org/geoserver/wfs?request=GetFeature&featureid=major_roads.5

becomes

http://sigma.openplans.org/geoserver/major_roads/5

The results of a query can then just return those urls, which the client may already be caching

http://sigma.openplans.org/geoserver/major_roads?bbox=0,0,10,10

returns something like

<html>
<a href=’http://sigma.openplans.org/geoserver/major_roads/5′>5</a&gt;

<a href=’http://sigma.openplans.org/geoserver/major_roads/1′>1</a&gt;

<a href=’http://sigma.openplans.org/geoserver/major_roads/3′>3</a&gt;

<a href=’http://sigma.openplans.org/geoserver/major_roads/8′>8</a&gt;
</html>

If the client is already caching them, then it can just use its local copy. If it’s caching some, it can resolve the ones that it doesn’t know. You also could have a ‘full’ return, that resolves the hrefs for clients that want that (ideally just ones who are hitting the service for the first time, and know they’ll need to resolve all).

The other thing I believe you need is paging. Just a ‘startFeature’ to get the next chunk.

<html>
<a href=’http://sigma.openplans.org/geoserver/major_roads/5′>5</a&gt;

<a href=’http://sigma.openplans.org/geoserver/major_roads/8′>8</a&gt;

<a href=’http://sigma.openplans.org/geoserver/major_roads/?bbox=0,0,10,10&startfeature=10′>more</a&gt;
</html>

That way you can return lots of features without killing clients, as is so easy to do with wfs I routinely crash firefox by forgetting to put a maxFeatures). The default should be 100 features – or some number that won’t kill a web browser. If clients can handle more, they can do a larger maxFeatures (or even maxFeatures=-1 for unlimited).

We could also use default maxes at the featureType level resource:

http://sigma.openplans.org/geoserver/major_roads/

would not need to be a list of links to your 5 million features, but to the first 100, with a link to more.

The key with all this is crawl-ability and cache-ability. Everything links to the other resources, so the whole dataset can be crawled. You go to

http://sigma.openplans.org/geoserver/

and then there is an html page that will display in the browser, with links to the feature sets

<html>

<a href=’http://sigma.openplans.org/geoserver/major_roads/’>TIGER Major Roads</a>
<a href=’ http://sigma.openplans.org/geoserver/roads/’>TIGER Roads</a>
<a href=’ http://sigma.openplans.org/geoserver/water_shorelines/’>GSHHS Shorelines</a>

</html>

(though I would have this page contain much more meta-information, like author, keywords, abstract, date, ect., so a user can read about the datasets.   A more machine readable format would be available by supplying different params, but the default should be something anyone can read in a web browser).

The links can then be followed to the lists of urls of individual features, which can then be crawled, indexed in a search engine, and cached in clients.

With WFS you have to know what to ask for, a naive programmatic bot would not be able to extract all the feature data just by poking around and following links, it’d have to be written special for WFS services (and thus able to handle millions of features or else somehow divide up the data).

(yes, I know, it’s been a long hiatus.  I hope to be writing more, but we’ll see how that works out, no promises since then I’ll just feel guilty)