I don’t know about you, but my friends and I have always been clueless about the number system people use to rate someone’s attractiveness. What it means when someone says: “that girl is an 8”? Tucker and I attempted to rate someone and found out that we had wildly different opinions. But is it true that beauty is in the eye of the beholder? Can we find a mathematical formula to express someone’s attractiveness?
So for our CS229’s final project, Tucker Leavitt, Duncan Wood and I decided to build a model that can rate someone’s attractiveness based on their facial photos. I know, this is not exactly politically correct, but it’s fun. Our idea is not new–multiple researchers have attempted this in the past. In 2007, Yael Eisenthal et al at Tel-Aviv University, Israel used K-nearest neighbors and SVM, trained on 92 images of Austrian females and 92 images of Israeli females and achieved a Pearson correlation of 0.65 with the average human ratings. The same year, Amit Kagian et al, also at Tel-Aviv University, used 6972 distance vectors between 84 fiducial point locations and achieved a correlation of 0.82. Most recently, in July 2015, Avi Singh from Indian Institute of Technology, Kanpur used Gaussian Process Regression, K-nearest neighbor, Random Forest, SVM and Linear Regression on a dataset of 500 Asian females. He achieved correlations of 0.52, 0.6, 0.64, 0.22, 0.64 respectively. Notice that Eisenthal and Kagian tested their models on very small datasets. Singh used a larger dataset but got a much lower correlation.
We used the same dataset Singh used, which is the SCUT-FBP dataset. It contains 500 frontal images of 500 different Asian females with neutral expressions, simple backgrounds, and minimal occlusion. It’s also available for download for free. It’s amazing, I know. Each photo is rated independently by 75 users, both males and females, on a discrete scale from 1 to 5. We used the average ratings as the labels.
We aren’t sophisticated machine learning experts, so we did something dirty and quick. We used a multiclass logistic regression classifier without regularization to predict the average rating of each picture (as an integer between 1 and 5), using Python’s sklearn library. Instead of manually tagging all 500 of our photos, we used two existing face detection APIs, faceplusplus.com, and facemarkapi to tag geometric landmarks on photos (seen in Fig. 3). Our API’s produced a set of 68 landmarks, giving us a total of (68 choose 2) = 2278 features. We used Principal Component Analysis (PCA) to reduce the dimensionality of this feature space by an order of magnitude or more. Our analysis includes PCA-reduced feature numbers between 10 and 500. We ran linear regression on this reduced data, this time with L1 regularization.
We found that based solely on the set of distances between key points on a photo, we can predict with as much as 74.2% accuracy. This is fascinating considering how little information we actually used. We used solely spatial features, without features such as complexion and hair and eye color. This suggests that structure and positioning is the most important factor in finding an attractive face. This means that when you choose photos to put online, especially on dating websites, choose the one that shows your face in the most “spatially balanced” way, if that phrase means anything.
Andrew Ng spent almost 15 minutes at our poster. When we asked for his opinion, he laughed: “You should put this online.” We never put the model online for people to use, but maybe we should. Feel free to shoot us email if you’re interested in reading our full paper.