I take your S3 and raise you an EC2
Posted by cholmes on October 29, 2006
Just read Chris’s post about using Amazon’s S3 as a home for caches. The Amazon service I’ve actually been contemplating for tiling purposes is actually their Elastic Compute Cloud (EC2) . But before we get in to it, a bit on S3 and tiles. I’d actually still like the distributed peer to peer tile cache, as I talked about in my post on geodata distribution. But it makes a lot of sense to bootstrap existing services on the way to get there. S3 could certainaly help out as a ‘node of last resort’ - it’s nice to know that the tiles will definitely be available somewhere, if the cache isn’t yet popular enough to be distributed to someone else’s p2p cache on your more local network. I agree that bittorrent and coral aren’t up to snuff, but I do believe that distributing mapping tiles will work as p2p technology evolves. But first we have to get our act together with tiling in the geospatial community, so we can go with something concrete to the p2p guys. Which is why I’m excited about the work being done to figure this out.
As for EC2, I’ve been thinking about it in the context of doing caching with GeoServer. We’ve got some caching working with OpenLayers or Google Maps combined with either OSCache (with a tutorial for GeoServer) and Squid. I want to get it to the point where there’s not even a separate download, you just turn caching on, and then have a big ‘go’ button that walks the whole tile area caching it all on the way. The problem with it though is that huge datasets can take days, weeks and months to fully process. So this is where I think it could be kickass to use EC2 - provide a service to people where their ‘go’ button links to EC2 and it can throw tens, hundreds, or even thousands of servers to churn away at creating the tiles. Then return those to the server GeoServer’s on, or leave them on S3 - indeed this would save on the tile upload costs that Chris writes about, as you’d just send an SLD and the vector data in some nice compressed format. I imagine you could save on upload costs for rasters too, as you’d just upload the non-tiled images and do the tiling with EC2.
A next step for this tiling stuff would be to make a caching engine that can both pre-walk a tile set and be able to expire a spatial area of a tile set. The caching engine should store the cache according to the tile map service specification, but with the additional smarts the engine could be uploaded on to EC2 along with the tile creation software (GeoServer or MapServer), and just pre-walk the tiles, iterating through all the possible requests. And then it could also listen to a WFS-Transactional server that operates against the data used to generate the tiles in the first place. If a transaction takes place against a feature in a certain area, then that part of the cache would be expired, and could be either automatically or lazily regenerated (either send all the expired requests to the server right away, or wait until a user comes along and checks out that area again).
I like Paul’s WebMapServer href attribute in the tile map service spec, but I wonder if it’s sufficient… It might be nice if it had enough information for one to formulate enough of a GetMap request to replicate a given tile map service repository. I’m thinking the name of the layer and the ’style’ (a named style or a link to an SLD). Maybe I’m missing something, but all the other information seems to be there. With that information then perhaps a smart tiling map service client could look at multiple repositories and realize that they were generated from the same base WMS service in the same way. Then it could swarm from multiples simultaneously. This starts to hint at the way forward for p2p distribution - for each WMS service just keep a global index of where tiling map server repositories live and let clients figure out which is fastest or hit all of them at once - including potentially other clients. A catalog that has metadata plus information of where to get even faster tiles would definitely be a popular - especially if registering there automatically put a caching tile map service in front of your WMS. You could also register say the feed of latest changes (or even just the bounding boxes of latest changes) of the WFS-T that people use to update the WMS, and smart clients can just listen and expire the tiles in a given area when they get notification from the feed.
Posted in geospatial, geospatial web | 9 Comments »