The overdue Places post II - Prototyping Iconicness

One of the main aims of the Places project was to present people with ‘Iconic’ photos that represented a location really well.

Of course one of the other criteria is that we wouldn’t be picking these photos by hand, oh no, we’d be relying on computers and algorithms and all that stuff. In turn ‘all that stuff’ is ultimately the metadata that people add to the photos, titles, tags, location and so on. We’re just trying to make sense of it with computer brains.

To get things started I sketched some pictures and made a prototype. The sketch is below, the prototype detailed under that …

"Places" page, 1st draft concept sketch

Here’s something that we decided was fairly important early on, the photos couldn’t just be the most ‘interesting’ photos, as defined by our interestingness score. When we did that, we didn’t get stuff that we though was iconic enough. Below is a screen shot of the tag ’sanfrancisco’ sorted by interestingness, which is good, and many of those photos get used, but not always ‘iconic’.

Although of course now that I take a screen shot the current photos are pretty iconic, so maybe San Francisco is a terrible example, believe me though interesting != iconic all the time.

Photos tagged with sanfrancisco

… here’s where is gets fun, well for me anyway. Not wanting to distract the other engineers with questions, requests for php code, special database tables or queries to be written and so on, I used the public API, just like anyone else can.

Here’s my three step process for getting ‘iconic’ photos. And it’s one that I find myself still using now and then …

  1. Request the top 250 photos based on the tag you want, sorted by interestingness (or relevance), with tags as an extra.
  2. Throw the photos away, tally up the tags used for those interesting photos.
  3. Now use the top few tags in conjunction with the original tag to get the actual ‘iconic’ photos.

Here’s an example with ‘Bacon’ …

Javascript driven prototype of the Places Page (with added bacon)

… the top tags that go along with Bacon according to the results from the API are food, breakfast, egg, sausage, tomato, beans and so on.

This is a poor mans was of doing Flickr Clusters, in the below screen shot of real flickr bacon clusters you can see the first cluster has food, breakfast, eggs, egg, toast, sausage, cheese, beans, etc. which align quite well with what tallying the tags from the top 250 interesting bacon photos gives us.

Photos tagged with bacon

Anyway, once you have a handful of additional relevant tags you can then go and ask the API for the photos tagged with those and the original tag. i.e. bacon+food, bacon+egg, bacon+beans. I tended to pick three tags and make three different API calls with different timeframes to get a selection of photos to choose from.

And this is roughly how a prototype was written.

Once the basic premise was established, it moved onto far smarter engineers than I (like Kellan) to move the logic I had worked out in javascript further into the backend/database layer. Which roughly translates to being scalable.

Due to how the database worked (at the time, it’s slightly different now, more on that later) working out the ‘clusters’ for a Place, such as San Francisco couldn’t be calculated in real time. That was something else that wasn’t scalable.

Fortunately to our rescue we just so happen to have a big database full of Places.

Now each night (more or less) we crunch through each Place in the database looking at all the photos we know are taken there, to work out the location cluster tags we can use to find ‘iconic’ photos. We then find the photos based on them, in the case of San Francisco this’ll be things like Golden Gate Bridge, Bay Bridge, Bay, SF, Ocean, City, Fog etc. We chuck the photos we find into some database tables.

The next step is to score all the photos we found for a location, based on a combination of things, including the base interestingness score and extra bonus points for actually being geotagged.

If the total score of the photos is above a threshold we’ve defined then we keep the photos and the Place becomes a ‘Known Place’. When a user looks at the Place page for a known place we can pick out a selection of the photos we tucked away earlier (attempting to one have one photo per photographer for diversity).

If the location didn’t make the cut, when a user looks at the page for it we fall back to photos sorted by interestingness.

As we get more photos of a place, or maybe the interestingness of the current photos for a place go up it’ll tip a ‘unknown place’ over the threshold into ‘known places’.

An example of this is when we first started messing around with the nightly crunching of places 70,000 of them hit our prescribed threshold for being keepers. Just before launch this was up to around 110,000 know places, enough photos had been geotagged to knock an additional 40,000 places into the ‘known places’ buckets.

And it continues to grow.

Luckily as we were going through this process the backend database underwent a revamp and we had the opportunity to ask for things to be built into the core of it, based on what we’d learnt. Hopefully, when we have a couple more of our ducks on a row we’ll be able to get these related tag clusters for locations coming out on the fly. In turn meaning we should be able to grab photos on the fly without it being too expensive on the database. Which in turn means that any place, including neighborhoods should become fair game.

We’ll see how that plays out.

4 Responses to “The overdue Places post II - Prototyping Iconicness”

  1. [...] Prototyping Iconicness - how flickr went about identifying the iconic photo for Places [geobloggers] I think the work on Places is brilliant - this article describes how they identify ‘iconic’ photos for a location and how they went about working out how to do it: they hacked around with the public API. (tags: flickr geotagging places) « links for 2008-01-23 [...]

  2. [...] geobloggers » The overdue Places post II - Prototyping Iconicness Places is just Yummy. Yummy yummy yummy (tags: places geotagging revdancatt maps flickr geocoding) [...]

  3. I like the iconic computation (excerpt from our first TagMaps paper: “The system computes the top-scoring tags for each flat cluster.. and then picks a photo with tags that best match these top tags” - although we never added this to the World Explorer prototype). It does makes sense to pick photos from the “top of the distribution” of each dimension (in this case, the semantic dimension).

  4. [...] Geobloggers [...]

Leave a Reply