@tamerthamoqa
Hello again! Your pre-trained model is trained on unaligned VGG2 dataset, so it performs well with variances over pose. But many projects pre-process the images to obtain aligned faces which helps them to increase the TAR @ FAR score with given CNN model.
So I wonder are you interested in testing what can we get with face alignment ?
I implemented face align as transformation for the torchvision.transforms which let me test your pre-trained model on the raw LFW with this transform. It obtained TAR: 0.6640+-0.0389 @ FAR: 0.0010 without training and without face-stretching, which I think is promising. Unfortunately it can not be used with the cropped VGG2 and LFW for training/testing, because the faces are deformed/stretched (although it can be made to stretch the faces as well) and some face detections fail.
Next thing I'm not sure about is whether we can obtain less false-positives if the input faces are not stretched but preserve their shape. This leads to the next question - why the input is chosen to be square 224×224 ? Can't we change it to rectangle (for example 208×240) to better fit the human face instead of stretching the (aligned) faces ?
I also see that the normalized tensors RGB values have range [-2;2] is this the best range ?
@tamerthamoqa
Hello again! Your pre-trained model is trained on unaligned VGG2 dataset, so it performs well with variances over pose. But many projects pre-process the images to obtain aligned faces which helps them to increase the TAR @ FAR score with given CNN model.
So I wonder are you interested in testing what can we get with face alignment ?
I implemented face align as transformation for the
torchvision.transformswhich let me test your pre-trained model on the raw LFW with this transform. It obtained TAR: 0.6640+-0.0389 @ FAR: 0.0010 without training and without face-stretching, which I think is promising. Unfortunately it can not be used with the cropped VGG2 and LFW for training/testing, because the faces are deformed/stretched (although it can be made to stretch the faces as well) and some face detections fail.Next thing I'm not sure about is whether we can obtain less false-positives if the input faces are not stretched but preserve their shape. This leads to the next question - why the input is chosen to be square 224×224 ? Can't we change it to rectangle (for example 208×240) to better fit the human face instead of stretching the (aligned) faces ?
I also see that the normalized tensors RGB values have range [-2;2] is this the best range ?