Into The Pudding

thoughts on geospatial, augmenting capitalism, architectures of participation, and more

The Metadata problem. Or, the problem with metadata

Posted by cholmes on June 16, 2006

In the geospatial domain, a big problem that many worry about is 'metadata'. Metadata is the information about the data: who collected it, when it was last updated, how accurate it is, how it was made, who to contact to get it, ect. For many years the FGDC, the coordinating organization for sharing geospatial information in the US, has primarily focused on getting people to write metadata for their datasets, and to put the metadata in catalogs of information so that others know what information is out there.

Unfortunately, though millions of dollars have been spent educating people on metadata standards and how to fill them out, there is still shockingly little metadata, let alone actual data, available. Many believe that metadata has to be the basis of the coming 'geospatial web', that being able to at least search who has what information is the first step towards getting even more data available. If I know who has what information, then at least I can seek them out and offer to pay them for it, at least it exists in some form, or so the argument goes.

The big counter to this argument, however, is the World Wide Web. When you write a web page, how much metadata are you required to fill out? Absolutely none. Yes, there are some meta-tags in html, but none are required, and your web page will still be found if there are none. Why isn't this metadata needed? Because a whole industry has been built around helping you search web pages, indeed, to judge by what sites get the most traffic, it's definitely the most important. Why did these search engines come about? Because there was data. Lots of it. And people needed help finding it. In the early days it was Yahoo!, which was able to hire a bunch of people to search the web and categorize it. As the web started growing faster than a team of monkeys clicking all over the place could handle, automated techniques began to be used, with Google emerging as the clear winner.

And the web continues to innovate, with blogs that one person can follow for some other individual's recommendations of information that may be relevant to them, with community rated sites like slashdot and digg, and community tagging on sites like flickr and del.icio.us. Many people are looking to apply such things to geospatial, but what needs to happen first is to put data online.

Unfortunately many of the largest organizations that have data don't put it online. One argument is technical, that it costs too much and is too hard to set up a server to get the data out there. I hope that GeoServer, my main focus in the last few years, is able to offer a cost free easy to use alternative to make that argument less effective. But I believe there's a deeper issue, mostly related to psychology, with individuals being scared to put their data out there. Why? Because the individuals who produce it fear that what they've made isn't good enough, that it has to be perfect, or people will think less of them. And it gets even worse, since there's this whole metadata pressure, that says they better have good metadata if they want to put things out there.

I understand the fear well, when my boss first asked me to release my code to the public repository, where anyone could look at it, it freaked me out. I asked him for an extra week, and spent it adding more comments, redoing the quicker hacks I did for cleaner code, ect. At the end of the week he asked me again, and I still didn't feel ready. What if someone read it and realized I was a bad coder? It might hurt my chances of a future job. It was putting a piece of myself out there for others to judge, and it was very scary. But I eventually got over it, because I realized that even very code that I wrote is generally better than their alternative, which is nothing.

In the geospatial domain, for the most part, we get nothing. People are afraid others might find errors, or they don't have the time to fill out the appropriate metadata. And past that they lack the skills to set up a server, or a good place to just post their data. Though there is a freedom of information act in the US that basically requires most any information by the government to be available to all taxpayers, there is still just a tiny percentage of geospatial information available, let alone accessible to an average user.

I think one of the biggest things needed is a shift in thinking. Metadata needs an architecture of participation, and there needs to be a culture of encouragement. Indeed we need an architecture of participation around geospatial data, so that releasing it isn't opening yourself up to criticism, but instead it puts the onus on others to make what you've put out better, or to move on. This is how it works in the Open Source movement, code released is always seen as a good thing, even if it's not what I need. Once the data starts to get out there, I believe it will begin to make economic sense for companies to build search engines and participation based organization schemes that will organize it. The problem is not a lack of metadata, instead it's the focus on metadata that's slowing down getting real data out there for real innovation. I'll write more about what I think can help in a future post.

(for a great piece about metadata in general, see: Is it time for a Moratorium on  Metadata?)

3 Responses to “The Metadata problem. Or, the problem with metadata”

  1. [...] The key here is another kind of convergence – the complementary use of natural language analysis techniques alongside bodies of structured information (such as a gazetteer, set of GPS tracks etc) to complement and enrich each other. Arguments about metadata often overfocus on “human error” and our limited inclination, or capacity, to expend our semantic energy on describing things for unknown, future others. But here are glimpses of a solid “middle way” in automated suggestion that allow humans to focus energy on what they’re actually good at. [...]

  2. [...] Comments (RSS) « The Metadata problem. Or, the problem with metadata [...]

  3. [...] Throughout our history various documents have pointed the way to a more just world, and I firmly believe the Freedom of Information act was one of them. My lawyer friends say it’s an incredibly solid piece of law, that really clearly states that just about everything that a government does should be open and available to its citizens. Which makes infinite sense when viewed through the lens of what a truly democratic society should look like. But we’ve become used to a government often antagonistic to its people, and doing all that it can to keep things certain things secret. This can be for downright malicious reasons, but we need to remember not to attribute malice what can be explained by stupidity or ignorance. Often it’s the attitude that politicians feel they know what’s best, or even silly things like fear of reprisal if their works not perfect, as I hit on in ‘The Metadata Problem‘. Indeed I feel the same thing about corporations, that they aren’t evil controlled by evil capitalists, they’re just a weird institution that has followed its own logic too far and gotten out of hand. [...]

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>