So there are some open standards I like a lot, such as WMS and WFS, getting a map or raw geographic data from a server on the web. But there are some that I’m less of a fan on.
The Catalog (CS-W) specification is one. Past the fact it feels to heavyweight, I really have a hard time figuring out the ultimate vision… Everyone’s supposed to set up a catalog that can respond to queries about what they have? It analyzes the metadata and returns the best dataset. So like the city of New York is supposed to set up a catalog? But then probably the state should as well? And then there will be a national catalog? And a global one? Am I supposed to register on all of them if I have an NYC related dataset? I mean, is that the logical conclusion? I get the logical conclusion of WMS and WFS – everyone connects their spatial data to the web, and any client can ask them for a map or for the raw data. But this catalog thing is how I’m supposed to find them?
This is one of the cases where the library metaphor is hopelessly entrenched. The problem is you don’t know where to look in the first place. Lots of catalogs begs some sort of meta catalog, or else all the catalogs need to talk to each other in some way. Instead I think we should look to the web and search engines. They routinely _crawl_ the web and have complex algorithms to figure out what data is most relevant for a given search. As GeoSpatial gets on the web we can make use of much of the same web crawling technology to discover WMS and WFS services. Refractions is doing this with the help of Google in their OGC Survey, and MapDex is doing something similar (though it doesn’t seem to work all that well).
Unfortunately it’s definitely a chicken and egg problem. Organizations don’t share their spatial information since there’s no demand and no one will find it. And search engines won’t be built to organize information that’s just not there.
But I do believe that as valuable geospatial information becomes available, there will be an opportunity to search the spatial data, which the market will eventually fill. So how do we bootstrap out of this? I have two ideas, one which can be done now, the other which would take an organization with a lot of resources, or some creativity to pull together lots of idle resources.
The first is to just make a wiki-ish directory of useful services. Unfortunately information is available in a number of formats. The older stuff is usually ArcIMS, as they won the first round of spatial web services, as they win most every GIS related thing. After then WMS has caught on quite a bit, as an open standard to accomplish the same thing. But more recently KML, the Google Earth Format, has made a big splash. Ideally the infrastructure would also at least list shapefiles. And even better there would be a browser based viewer of the spatial information, so that search results could be combined and overlaid on one another. My preference would be WMS, the others could have reflector scripts that make them accessible as WMS. This directory could be organized like the open directory project – there are several examples of similar things being done in with Google Maps Mash-ups and Google Earth. But it’d be nice to have them in one place, and ideally even able to be overlaid on one another. And hopefully if the information was all listed in one place, it might motivate people to start to standardize on one format or another.
But regardless of format, a place where people know they can find links to useful information would be great to have. It would have to be a neutral location, so that no one feels someone else is gaining an advantage of some sort. And it’d be great if anyone could add new links, and comment on the usefulness of the data. Be able to add ‘metadata’, that may not be in the realm of traditional metadata, but which is useful to anyone else who may be investigating the dataset. Like what current users of the data are using it for, other datasets that might be similar, ect. One great location for this could be the OSGeo geodata committee. Which I just decided to join, since it’s really one of the most interesting things going in OSGeo, I’ll likely transition much of my effort there as Incubation settles.
The second is more complex, and bleeds in to other areas, so I’ll put it in its own post at a later date. But the point for me is that catalogs aren’t the answer. I can accept them if they’re just organizing one institution’s documents, but there’s no bigger vision than that, except for rumblings about having them link up in some way. Which strikes me as silly, one should just let a web service crawl the metadata documents themselves, which point to the services, instead of forcing it to figure out catalog protocols to get at information. Or just skip straight to the services and open the door for user generated metadata. We should make use of all the advances in general web search, and then just add the geospatial component, instead of reverting to an older way of searching that never really worked.
I recently talked to Peter Vretanos, the editor of the WFS spec, and it turns out he’s been thinking along the same lines, he wrote a thesis that examines the potential of Google plus WMS and WFS as a complete global SDI. I think the one thing he said needed adding was bounding box queries for search engines, so you could spatially constrain your queries. Which again, is just adding the spatial component to what’s already out there. Metacarta could some amazing stuff with this, returning not just explicitly spatial results, but also implicitly spatial results. If their GeoParser took a spatial constraint, so I could search for bookstores in New York. I guess google local does this, but it doesn’t seem to be all that good. And you can’t seem to plug in to the API anywhere, though they must do it internally somewhere.