The next major area of tool improvement I see is expanding the wiki notion of editing to more of a merging revision control model, with branches, versions, patches and eventually expanding in to distributed repositories. The ‘patch‘ is a small piece of code that can be applied to a computer program to fix something. They are widely used in the open source software world, both to get the latest improvements, and to allow those who have commit rights to a source repository to review outside improvements before putting them in. This helps create the meritocracy around projects, as they don’t let just anyone in to the repository as they might break the build. Such a case is less likely with maps, but sometimes core contributors might want to see a couple sample patches before letting a new member in. In the GeoServer versioning WFS work we have a GetDiff operation that returns a WFS Transaction that can then be applied to another WFS. This fits in with the technical part of how a patch works – they’re really easy to apply to one’s dataset. But unfortunately a WFS transaction is not as easy to read as a code patch. The other great thing about patches is that when leaf nodes are updating their data they can just request the change set – the patches – instead of having to do a full check out. So I’m still not sure how to solve this problem, the WFS Transaction is the best I’ve got, but I think we can do better, have a nice little format that just describes what changed.
Once we’ve got patches people are going to want the ability to merge changes. If you made a patch and I made a patch and we both submit them then we need a way to see if they’re compatible. Ideally you could merge at the feature level – if you change the road type and I change the road length of Interstate 5 then we shouldn’t get a conflict. Even better, merge at the geometry level, if we changed different points on the road then those should merge nicely. This will become important as people start to ‘check out’ their geo repositories, do edits, and then try to submit back in. We could just do locking, which is what WFS-T does, but concurrent versioning is so much nicer – we just have to be able to pull off merging.
Right past merging is full on branches. Which of course are much easier to pull off if you’ve got nice merging in place. But branches will let people try out new geographic updates in their own sandbox before putting them on the mainstream. This can lead to better reviews of the updates. And with nice branching and merging you would be able to let a number of people work concurrently on their own area of the map, merging them seamlessly. This is obviously a really hard problem, one that even ArcSDE has trouble with for the things people actually want to do. I do think we’ll be able to get there in the open source world, indeed I believe we have a better chance of achieving it since once we get close we’ll get a lot of interest in people wanting it completed and meeting their needs, funding the iterative improvements.
The final piece, that I sort of don’t even want to think of yet, since it’s damn hard, is distributed versioning. I do think it’s extremely important though, to let everyone have their own editing repository, which can flow back in to the main one. I like the model a lot, and think it has great wins for geospatial. But since we’ve barely got an SVN equivalent I think it’s wiser to wait a bit on these issues till we sort out what a patch should look like. Indeed SVK was possible because SVn already existed. But I’m definitely excited by the possibilities, for every node of the map to have the potential to be edited. This can be a big win for areas with low bandwidth.
The next category of tool improvements is granular security settings. Right now there’s not even a way to limit editing the map to only some users. I think that many maps will flourish with the open to all editing style, making use of rollbacks to prevent vandalism. But some will likely want to keep the map to set group of committers. This way one could get commit rights after doing a number of good patches, perhaps ensuring higher quality for some maps. You also might have different permissions for different users on different layers. We should be able to get all of that with our current GeoServer security system, we just need to hook up a UI for it. The trickier thing will be a nice feature, and I think is possible – limiting users to certain geospatial areas or features with specific properties. Since the security system is integrated at the code level, and lets us use aspects, I think that this should be possible, will just take a bit of work to figure out.
Another area I see a lot of potential innovation is distributed processing of tiles. Tiles are the clear winner for how to display geospatial information, Google Maps has risen the bar so that anything that isn’t tiled just feels out of date. But tiling takes a ton of processing power. Google is all set up to do it, but the rest of us aren’t. To fully cache http://sigma.openplans.org to zoom level 17 would have taken me about 5 months. Open Street Map has been making tremendous strides on this with their Tiles@Home initiative, which I am very impressed by. OSM is lucky in many ways, in that they have a project that people want to devote their spare CPU cycles to. It could be cool to set up marketplaces for processing of tiles, where companies that are going to keep their data private, or just that don’t have the reputation of OSM, can engage other nodes and give them micropayments for their work. I think other areas of potential innovation include leveraging Amazon’s EC2 to process huge amounts of tiles. We’re also going to need to have the collaborative mapping stuff hook up with the tiling efforts, so that when there are massive edits the tiles can expire themselves and get processors started on generating new ones. We can likely leverage http’s Conditional GET functionality to let browsers and others cache geospatial data, but also get the most up to date data when its available.
The last area I’d like to see improvement on is more granular notification mechanisms. GeoRSS output is the obvious choice, but could also do email or SMS notifications. Speaking of which I’d love more innovation on mobile clients, and even super low tech versions like be able to SMS in a new or updated location by just entering cross streets or reading a position from GPS. But one should be able to have the notifications based on very granular rules – ‘send updates for highways in this bounding box’, or ’email all occurrences of the brown spotted pigeon along this river bank’. This would be useful not only for preventing vandalism, but also to enable people to take action on up to date reports. The map becomes not just an artifact of what has happened, but a living thing can help create more up to date information. If the brown spotted pigeon is seen in one area then it will alert more people who can then add updates on its location and get a more detailed map of its path.
I’m sure there are many more innovations to be had with tools, but this is just a start of the things that we’re starting to work on and the things I’d like to work on in the future. At TOPP we’re doing this stuff when we don’t have paid client work (or have met revenue targets for the year, since we’re a non-profit), but if there’s anyone out there who wants to see specific areas accelerate we’d be very excited to take on paid work to do any of the things talked about here <end shameless plug/>.