How-Old.Net Is Basically Training For Robots
Traditionally, researchers focusing on artificial intelligence have kept most of their cutting-edge discoveries locked away, only occasionally tossing the masses a neat iPhone app or an online chess game. But now we finally have access to the mother lode—an age-guessing bot that draws on reams of cutting-edge machine learning algorithms at Microsoft Labs. How-Old.net is so cool that, even if the entire project is actually one big data phishing expedition, it’s probably worth it.
The platform is simple, and highly shareable (Microsoft says 35,000 users visited the site within the first few hours of its launch): Enter a photo of yourself, a friend or even a celebrity, and out pops age and gender predictions.
We spoke with Aditya Khosla, a graduate student at MIT who studies computer vision, to find out more about the intricacies of facial recognition software.
So, how does How-Old.net work?
Here’s my guess: First they have a face detector, very similar to the one you have on your iPhone or Android camera, where you can click on the photo and it’ll show you where all the faces are. Then, based on the face, it’s probably extracting some set of computer vision data and then classifying it by age and gender. Assuming they are using Deep Learning (and not some older feature) they probably have a lot of data collected previously that contains people’s ages and gender, and then they trained the algorithm to recognize this.
How did facial recognition work before Deep Learning?
It’s a bit technical to explain, but I’ll try to give you an example. HOG was one of the most popular methods. Essentially, you’d take all of the gradients inside an image—think of gradients as little lines within the image—and mark boundaries of the face and whatever was around it. So it would detect the eyes, nose, mouth, and work the face into a line drawing. Then you could take that image and extract little pieces of it, and calculate the angles and strength of each line. People used to design these features by hand, and it took a whole bunch of steps. You’d experiment carefully, and often get something fairly useful. But it ended up being rather complicated.
And then came Deep Learning.
Right. So recently, we’ve been using Deep Learning, where the representation of the image is learned automatically. You give the computer the image itself, and tell it that this image predicts age 35, and another image predicts age 60, and then the computer figures out, by itself, how we convert these image pixels into an age value—without needing us to specifically engineer the features. It turns out, it’s a lot more effective than what people had been using in the past.
Is Deep Learning new?
Not at all. Deep Learning had been around for awhile, but people only recently started looking at it again. The first few papers on the subject came out between 1990 and 1995, and at that point they weren’t really suggesting the use of Deep Learning for facial recognition. But what we were missing was faster computation and the ability to track a lot of data, which was not available in the 1990s. So people really couldn’t see its power at the time, and the method kind of faded out. Then around 2012, everyone began looking at it again, and we were surprised to find that Deep Learning could actually do many different things.
What can you do besides facial recognition?
All sorts of things. For instance, there’s object recognition and place recognition. Some of my lab mates recently used Deep Learning to recognize, based on a photo, whether we were looking at an office, restaurant or bar. And many other applications exist, too. These days, you have a new paper on Deep Learning coming out almost every day.
But it’s not perfect—How-Old.net makes some strange mistakes. Why doesn’t facial recognition always work?
Well, there’s a lot of variation. People have extremely different looks and appearances, and all of this is using machine learning. So I collect data and train my algorithm as best I can, but in the data there could always be certain biases that cause it to predict things wrongly. For example, different races might have different ways in which their faces show age. When the algorithm needs to figure it out and generalize based on poor data, this can be extremely challenging. Also, anything it recognizes as a face will force it to make a prediction. So even if it recognizes a cat’s face, it’ll make a prediction for how old that cat’s face is, even though, my guess is, it didn’t have any cat faces in its training data. So, not surprisingly, it makes a random prediction.
Besides addictive apps, how else could we use facial recognition?
One clearly obvious application is, if we want to have robots that can interact with humans and exist in the real world, we need them to understand the notion of “objects” to some extent. For example, if I want a robot to get me a red shirt, it needs to know where it can find a shirt, and what a red shirt looks like. So it needs to be able to identify that this is a bedroom, I should find a closet; now I’m inside the closet, I should find a red shirt. This kind of object and scene recognition is fairly crucial for robotics, and probably something you’re going to see pretty soon.