Face recognition models are regular convolutional neural networks. These models are responsible to represent face images as vectors. We’ll find the distance between their vector representations to decide two facial photos are same. We will classify two face photos as same person if the distance is less than a threshold value. The question is that how to determine the threshold value? Most resources skip this determination step. In this post, we will learn how to find the best split point for a threshold.
Before reading this blog post, you should remember the common stages of a modern face recognition pipeline.
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy
A modern face recognition pipeline consists of 4 common stages: detect, align, represent and verify. This post will mainly focus on the verification stage of the pipeline.
Fine tuning the threshold for a face recognition pipeline in deep learning will be mentioned in the following video as well. You can either watch the vlog or follow this blog tutorial.
We will use the out-of-the-box facial recognition models in deepface. Supported models are VGG-Face, Google FaceNet, OpenFace and Facebook DeepFace. The default is VGG-Face. We will run our tests for VGG-Face as well. You can run this study for any other model.
The data set collected for deepface unit tests will be the master data set. There are 25 facial photos of 25 person existing in this folder.
idendities =We can generate 38 pairs for same identities. These are going to be stored in the positives data frame.
positives = [] for key, values in idendities.items(): for i in range(0, len(values)-1): for j in range(i+1, len(values)): positive = [] positive.append(values[i]) positive.append(values[j]) positives.append(positive) positives = pd.DataFrame(positives, columns = ["file_x", "file_y"]) positives["decision"] = "Yes"
We can generate 262 pairs for different identities as well. These are going to be stored in the negatives data frame.
samples_list = list(idendities.values()) negatives = [] for i in range(0, len(idendities) - 1): for j in range(i+1, len(idendities)): cross_product = itertools.product(samples_list[i], samples_list[j]) cross_product = list(cross_product) for cross_sample in cross_product: negative = [] negative.append(cross_sample[0]) negative.append(cross_sample[1]) negatives.append(negative) negatives = pd.DataFrame(negatives, columns = ["file_x", "file_y"]) negatives["decision"] = "No"
We need to merge both positives and negatives when generating pairs is over.
df = pd.concat([positives, negatives]).reset_index(drop = True) df.file_x = "dataset/"+df.file_x df.file_y = "dataset/"+df.file_y
We have image pairs and its label in the data frame. If we pass a image pairs as python list, then deepface framework will build a face recognition model once. This will speed us up dramatically. Otherwise, the framework will build same face recognition model for each image pair.
from deepface import DeepFace instances = df[["file_x", "file_y"]].values.tolist() resp_obj = DeepFace.verify(instances, model_name = "VGG-Face", distance_metric = "cosine")
The result of verification function is stored in resp_obj. This response object will store the distance values for each image pair. The orders of input pairs and outputs could be different. That’s why, we will match the distance value based on the index in the response.
distances = [] for i in range(0, len(instances)): distance = round(resp_obj["pair_%s" % (i+1)]["distance"], 4) distances.append(distance) df["distance"] = distances
As a baseline study, we can monitor the mean and standard deviation values for positive and negative sample. BTW, I mean image pairs of same person as positives, and image pairs of different person as negatives.
tp_mean = round(df[df.decision == "Yes"].mean().values[0], 4) tp_std = round(df[df.decision == "Yes"].std().values[0], 4) fp_mean = round(df[df.decision == "No"].mean().values[0], 4) fp_std = round(df[df.decision == "No"].std().values[0], 4)
We can distinguish the positive and negative classes based on its mean values.
Mean of true positives: 0.2263
Std of true positives: 0.0744
Mean of false positives: 0.6489
Std of false positives: 0.12
Monitoring the distributions of those two classes is interesting.
df[df.decision == "Yes"].distance.plot.kde() df[df.decision == "No"].distance.plot.kde()
Positive classes seem to have symmetrical distribution whereas negative ones have negative skew.
We can obviously classify two faces as same person if the distance is less than or equal to 0.3. Similarly, image pairs can be classified as different person if the distance is greater than or equal to 0.40.
The maximum value of the positive classes is 0.3637 whereas the minimum value of the negative classes is 0.3186. This means that some samples have both positive and negative labels between 0.3186 and 0.3637. This is the gray area. Remember fuzzy logic days.
We should separate those two classes with a threshold value which maximizes the gain.
Remember that 2 standard deviation corresponds 95.45% confidence and 3 standard deviation corresponds 99.73% confidence. Let’s set the threshold to 2 sigma.
threshold = round(tp_mean + sigma * tp_std, 4)
Mean of true positives was 0.2263 and standard deviation of positives was 0.0744. So, threshold is 2 sigma and it is 0.3751. So, we classify pairs as same person if the distance is less than 0.3751.
As an alternative to set the threshold to two sigma, finding the threshold with decision trees is a better method. Because decision tree algorithms split the data set where information gain maximizes.
I’ve used chefboost framework because it is lightweight and I can read built decision trees as if statements. Here, you can find a tutorial about the framework.
from chefboost import Chefboost as chef config = tmp_df = df[['distance', 'decision']].rename(columns = ).copy() model = chef.fit(tmp_df, config)
C4.5 algorithm builds the following decision tree for the data frame we have. We actually have distance and target columns. That’s why, a simple decision stump is created similar to adaboost algorithm.
def findDecision(distance): if distance<=0.3147: return 'Yes' elif distance>0.3147: return 'No'
So, I set the threshold to 0.3147 based on the decision tree approach.
I prefer to use decision trees to determine the threshold because I know that gain maximizes for this threshold value.
We finally set the classified pairs as true if its distance is less than or equal to the threshold.
df["prediction"] = "No" #init idx = df[df.distance <= threshold].index df.loc[idx, 'prediction'] = 'Yes'
This is a classification task and as you know accuracy is not enough to evaluate a model. Confusion matrix, precision and recall will inform us about the model accuracy.
from sklearn.metrics import confusion_matrix cm = confusion_matrix(df.decision.values, df.prediction.values) print(cm) tn, fp, fn, tp = cm.ravel() recall = tp / (tp + fn) precision = tp / (tp + fp) accuracy = (tp + tn)/(tn + fp + fn + tp) f1 = 2 * (precision * recall) / (precision + recall)
Threshold = 0.3751 (2 sigma)
Threshold = 0.3147 (C4.5 best split point)
F1 score 94.44444444444444%
Decision tree approach is good at precision whereas 2 sigma method is good at recall. Remember the definitions of precision and recall. Precision answers the how many of my positive predictions are correct. Recall answers how many of actual positives are predicted correctly.
If you are going to build a security-first application, then precision is more important because it is more confidential when you say they are same person.
I visualize the data set instances and distances. Green connections are positives based on threshold I defined and red ones are negatives. Two distinct identity groups exist obviously as seen.
In this post, we’ve studied on VGG-Face face recognition model and cosine similarity metric. We actually have different face recognition models and distance metrics as well. What if we run the same approach for VGG-Face, Google FaceNet, OpenFace and DeepFace face recognition models and cosine, euclidean and euclidean_l2 similarity metrics?
So, we have mentioned how to determine the threshold which plays a key role in face recognition studies. Even though I found a fine tuned threshold for VGG-Face model and cosine similarity metric, this approach can be applied for the other supported models and metrics.
I pushed the source code of this study as a notebook to the GitHub. You can support this study by starring the repo.
Support this blog if you do like!