How attractive you are? Analysing person’s photo using OpenAI CLIP model
Overview
In this tutorial, we will learn how to use OpenAI Clip model to predict attractiveness of a person on a photo.
OpenAI CLIP model is a neural network that is trained to measure the distance between textual features and the features of a target image. It is widely used in zero-shot classification tasks (classification for set of labels which were not used during training).
In this post I will show how to build such a classifier in few minutes without any model training.
As a bonus: all tasks are effectively performed on CPU, so you don’t need to spend money on GPU resources.
Everything will be done in the following way:
- We use MediaPipe blaze fast face detector detects face on the image
- Then, we pad it and crop it to keep some background around
- After that, the CLIP model is being used for gender prediction and choose correct set of captions
- Finally, the same CLIP model is used to predict how attractive is a person on a photo
Requirements
To install requirement packages please run in terminal:
pip3 install deepface mediapipe
pip3 install git+https://github.com/openai/CLIP.git
Main Class
We are going to implement one class which will perform all the work via public methods. Let’s write the constructor
Face Detection
The easiest way to import face detector into your project is to use DeepFace package. It containsOpenCV
, SSD
, Dlib
, MTCNN
, RetinaFace
and MediaPipe
.
We are going to use BlazeFace detector (used in MediaPipe framework): BlazeFace is a fast, light-weight face detector from Google Research. It weighs less than 1Mb and yield
Detector model was already initialized in __init__(...)
method. Let’s declare detect_face_with_padding
function
Now it is possible to use BlazeFace detector to return cropped and padded face image. If something goes wrong we will simply return (None, None, None)
Gender classification & Attractiveness prediction
We are going to modify predict(…)
for gender classification. Let’s define predict_clip(image: Image.Image, text: List[str])
function
Now we have everything to implement predict(…)
We simply use ["man", "woman"]
and take argmax
out of the prediction. Then we use different captions depending on gender index
As one may note, we will use different set of captions for male and female:
captions = [
[“handsome”, “ugly”], # this one is for male
[“beautiful”, “ugly”] # this one is for female
]
The final version of predict method is below:
Finally, wrap the code into file
We want to run our code via python3 AttractiveMeter.py --image_path <path-to-image>
command. Let’s add some code to make everything executable
🤖 That is it. You can test the final result via Telegram bot
Example:
🌱 Code is available at GitHub
P.S.
As soon as this post reaches 30 claps, I will write a post about how to wrap all this code as an asynchronous telegram bot