Jan 042012
 

Image matching software has become widely available. If you haven’t yet played with it, try dragging an image, for example, a photo of yourself, into the Google search bar. Or, for fun, you can try Doggelganger, “human to canine pairing software” promoted by pet food manufacturer Pedigree.

Doggelganger at work (Image courtesy printmediacentr.com)

Doggelganger’s purpose is to promote the adoption of homeless dogs in New Zealand by finding dogs that bear some facial resemblance to a potential adopter. And, we’re told, security agencies have sophisticated face recognition systems behind the cameras in airports and at border crossings, programmed to watch out for specific individuals.

Photo comparison brings up one of my favorite questions: what does it mean for two things to be “similar”?

Intuitively, we think we understand the concept well: two faces, for example, are similar if … well … if they look alike! It’s not quite that fuzzy: we can make the definition more rigorous by defining two faces as similar if a significant number of people rate them as looking similar.

Separated at Birth? Secretary of State Hillary Clinton and actress Emma Thompson (from inmirror.com)

Or we can use a performance-based definition and say that two faces are similar if, when asked to find matches among a number of photos, people frequently mistake one for the other.

The problem with these definitions is that, while they are operational, they are not algorithmic. They are of no use to us in programming a computer to judge the similarity between two faces.

Digging into the problem in the naïve ways (looking at pixels, colors, geometry) doesn’t help much. From a raw image perspective, there’s a lot more difference between two photos of you, one in which you’re wearing a hat and standing in front of a plain background in daylight and the other in which you’re hatless at night, then there is between a photo of you and one of me, both taken under similar lighting. Nevertheless humans (and pigeons, it turns out) are very good at recognizing individual faces despite hats, glasses, lighting, viewing angle, and other distracting features.

There’s an entire, well-developed  literature on cognitive and computational aspects of image similarity; it’s well beyond the scope of this brief blog entry.  Here, I just want to raise two points, each of which I’ll discuss in a future posting:

  1. There’s a lot more to similarity than there at first appears to be: Assessing  similarity requires deep knowledge. Comparing pictures of faces requires knowledge about people, and about how light and shadow work. Comparing musical recordings requires knowledge about music, musical instruments. Comparing paragraphs of text requires knowledge of meaning.
  2. When we design systems to assess similarity, we need to be very precise about the task. In particular, we need to be clear about what kind of mistakes and near misses we expect the system to make.
Dec 242011
 

I went to a well-presented hands-on tutorial the other night on the Semantic Web and how to query it. It was fun to run queries like Find all the world’s landlocked countries with a population greater than 15 million and list their names in English, French, and Chinese or, perhaps more demonstrative of the ability to use data from multiple sources, List the corporations whose founders share a birthday with any of the actors appearing in Star Wars: The Next Generation.

If you’re not familiar with the Semantic Web, it: (from the W3C FAQ)

… provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.

Alternatively, in the words of Tim Berners-Lee, writing in 1999 in “Weaving the Web”,

“If HTML and the Web made all the online documents look like one huge book, RDF, schema, and inference languages will make all the data in the world look like one huge database”

The key here is that, in contrast to the WWW, which is a web of documents, the Semantic Web is a web of machine-readable data. While, say, an online article about a corporation might mention its founder and its headquarters location, Semantic Web data about the corporation would store the founder and the headquarters location (among other data) as named attributes of a predictable, defined, structured data representation. It is these structured representations that make possible the queries above (as well as queries that are actually useful).

From where do we get all this machine readable data? The WWW has grown because pretty much anybody can write and publish pages, most of which contain links to other pages. Reading, writing, and creating links are natural to many people. And, huge amounts of information (for example, books) are easily converted into WWW documents. Building structured data representations is an entirely different matter. Creating data of the type found in the Semantic Web requires significant specialized technical expertise, as well as a pioneer’s commitment to blazing a trail that may or may not be followed by many others. Querying the Semantic Web requires significant technical expertise as well.

Interestingly, one way to get data into the Semantic Web in the proper machine readable form is to use software to extract it from web pages that were originally intended to be read by people. DBpedia is an impressive example of such an effort: it extracts data from Wikipedia articles, and currently contains over a billion facts about over three million things.

I think DBpedia makes the point pretty clear that the distinction between “human readable” and “machine readable” data is pretty fuzzy, if it exists at all. Wikipedia articles are clearly written for human consumption, and yet their content is being consistently converted, without much human intervention, into Semantic Web data.

And that’s a good thing: the number of people able and willing to write a Wikipedia article is enormous relative to the number of people able and willing to hand-craft structured knowledge representations. Building the Semantic Web by hand would be folly; but clever re-use and of information that people already creating for other purposes can lead to massive scale in a hurry. Other Web projects have succeeded by following this route; Google search is a significant example.

Now we just need a way to query the Semantic Web that is as simple and intuitive as typing a question into a Google search box…

Nov 152011
 

Today, former hospital CEO blogger Paul Levy takes on the infamously inefficient taxi loading scheme at Boston’s Logan Airport. Taxi queue at Logan Airport (Paul Levy photo)(Not Running a Hospital: Uncool and unLean taxi batching at Logan Airport.)

This sort of problem is tailor-made for some kind of simulation, perhaps an agent-based model. A good simulation that modeled the behavior of taxi-seekers, taxi-drivers, and other participants in the airport’s transportation ecosystem would have the potential to identify bottlenecks and opportunities for improvement, and provide an experimental testbed for alternative approaches to the problem of getting people from airplanes into taxis.

One key advantage of an agent based simulation (as opposed to, for example, a set of linear programming (LP) equations) is that a well-built simulation, with a good set of tools for manipulating the inputs and visualizing the outputs (such as an animated map of the airport showing the formation and movement of queues of people) makes intuitive sense to someone who is intimately familiar with the problem itself (an airport traffic expert, for example) who might not happen to be a mathematician.  Giving such a decision-maker some modeling tools with which he or she can interact directly is a big deal.

But who ought to fund the simulation work? And, ultimately, who are the stakeholders and what are their objectives? There are the travelers themselves, who just want to get to their destination. But they aren’t represented by any organizational entity. The taxi medallion owners? They don’t have any particular incentive to improve airport efficiency; they get a fixed fee per shift for each cab on the road. The taxi drivers? Like the passengers, they stand to benefit; unlike the passengers, there may be an organization that can speak for them.  The airport operator itself, Massport, has an incentive to operate an airport whose efficiency delights travelers. And a variety of entities, such as the hotel industry or the Convention Center operator, have a natural incentive to make travel to Boston as efficient and pleasant as possible.

As with any other kind of process improvement project, success depends upon commitment by someone capable of implementing the changes (Massport, in this case),  meaningful involvement by end users (taxi drivers and arriving travelers, at a minimum) and a clear sense of objectives.  Line up these three preconditions, and you could have a slam-dunk success.

 

Nov 102011
 

This two minute film, by Independent London filmmakers Sophie Windsor Clive and Liberty Smith (their site:  Islands and Rivers)  captures the filmmakers’ chance encounter with a murmuration of starlings during a canoe trip on the River Shannon. The video has circulated widely over the past week or so.

A murmuration (I didn’t know the word until a day ago; it’s a flock of thousands of starlings) is a striking  instance of flocking behavior. Flocks of birds and schools of fish are fascinating in part because it seems that a small number of very simple rules governing the behavior of individuals can result in sophisticated aggregate behavior of the flock. Craig Reynolds demonstrated this with  his 1986 simulation “Boids“, in which objects on the computer screen, each of which obeys only three simple rules, seem to form into a flock, which then swoops around the screen in a way that is quite realistically reminiscent of a flock of birds. Further research has led to interesting mathematical commonalities between birds in a flock and atoms in a crystal lattice. Brandon Keim mentions some of that research in a recent Wired article.

Flocking, emergent behavior, “the wisdom of crowds,” “hive intelligence,” and a host of related concepts have perhaps been overhyped, and yet, at the core, there is a solid intellectual framework, with quite a bit of sophisticated work taking up where some of the hype left off.

 

 

Nov 092011
 

Bostonography.com never fails to impress. In the words of its authors,  “a pair of cartography geeks,” the site is “at least a site for interesting visual representations of life and land in Greater Boston, and at best it exposes and explores the geographical sense of place in the city.”

I like it because it operates at the intersection of art, pedagogy, and geekiness; three things near and dear to my own heart, and because it addresses the question: “How can one process large amounts of data into a visual representation that is both illustrative and beautiful?”

Today’s entry, “An MBTA bus-iness day, uses over two million data points to display the speeds of city buses. If you look at it for a few moments, I think you’ll agree that it tells a remarkable number of stories.