Mor Naaman from Yahoo! Research Berkeley (Y!BR) wrote an article for October’s IEEE Computer Magazine (Vol 39., No. 10, Oct 2006), in which he talks about the state of play regarding metadata and location.
Couple’o'bits …
On Extracting useful content
Map-based browsing systems are powerful, but content overload ultimately is unavoidable. In a world where, as Susan Sontag wrote in 1977 (On Photography, Farrar, Straus and Giroux), “everything exists to end up in a photograph,†map-based presentation tools might have to quickly contend with thousands, and possibly millions, of photos—Flickr already sports close to four million at this time. Luckily, we can mine patterns in these collections to extract meaningful content.
On Support for your own collection
If you can’t see how all this data would be useful to your own personal georeferenced photo collection, consider this: If you take a photo near the area marked “Buckingham Palace,†chances are good that the photo is of the British monarch’s modest residence. At the very least, the palace is a reasonable guess for the photo’s content.
Such guesses could be generated from a database of landmarks, but are much more accurate when derived from tags by fellow photographers. Unlike landmark databases, user-supplied tags provide a notion of an object’s priority and importance. In addition, user-contributed tags are highly dynamic, changing quickly to reflect new landmarks and attractions.
In fact, tags can even be event-based: “Changing of the guards†could be a relevant label for your Buckingham Palace picture. This phrase doesn’t appear in any landmark database, yet Flickr contains dozens of public photos from London with this tag.
As the tagging system can now estimate with reasonable accuracy the content of your images, it can help you label the photos (by supplying tag suggestions) or even find photos later without having to annotate them at all.
On Context
Indeed, initial research indicates that a picture’s context—location, time, event detection, and other factors—is a better predictor of which people are likely to appear in an image than content analysis. Combining context and content features might help alleviate the semantic gap between information extracted from an image’s visual features and the human interpretation of that image.
There’s a link to the whole thing on his web page, Eyes on the World (752kb PDF). It’s worth a download and read if you have a spare 10 mins.
Filed under: general, geotags, hardware, maps, urban mapping, yahoo