Skip to content

Commit 0122d25

Browse files
Initial commit
0 parents  commit 0122d25

40 files changed

+1452
-0
lines changed

LICENSE

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Copyright (c) 2019, The MathWorks, Inc.
2+
All rights reserved.
3+
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
4+
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
5+
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
6+
3. In all cases, the software is, and all modifications and derivatives of the software shall be, licensed to you solely for use in conjunction with MathWorks products and service offerings.
7+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

+71
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Face Detection and Alignment MTCNN
2+
3+
## [__Download the toolbox here__](TODO)
4+
5+
This repository implements a deep-learning based face detection and facial landmark localization model using [multi-task cascaded convolutional neural networks (MTCNNs)](https://kpzhang93.github.io/MTCNN_face_detection_alignment/).
6+
7+
- [📦 Installation](#installation)
8+
- [🏁 Getting Started](#getting-started)
9+
- [🔎😄 Usage](#usage)
10+
- [❓ About](#about)
11+
- [💬 Contribute](#contribute)
12+
13+
_Note: This code supports inference using a pretrained model. Training from scratch is not supported. Weights are imported from the [original MTCNN model](https://kpzhang93.github.io/MTCNN_face_detection_alignment/) trained in Caffe._
14+
15+
## Installation
16+
17+
- Face Detection and Alignment MTCNN requires the following products:
18+
- MATLAB R2019b or later
19+
- Deep Learning Toolbox
20+
- Computer Vision Toolbox
21+
- Image Processing Toolbox
22+
- Download the [latest release](TODO_add_link) of the Face Detection and Aligment MTCNN. To install, open the .mltbx file in MATLAB.
23+
24+
## Getting Started
25+
26+
To get started using the pretrained face detector, import an image and use the `mtcnn.detectFaces` function:
27+
28+
```matlab
29+
im = imread("visionteam.jpg");
30+
[bboxes, scores, landmarks] = mtcnn.detectFaces(im);
31+
```
32+
33+
This returns the bounding boxes, probabilities, and five-point facial landmarks for each face detected in the image.
34+
35+
![](doc/output1.jpg)
36+
37+
## Usage
38+
39+
The `detectFaces` function supports various optional arguments. For more details, refer to the help documentation for this function by typing `help mtcnn.detectFaces` at the command window.
40+
41+
To get the best speed performance from the detector, first create a `mtcnn.Detector` object, then call its `detect` method on your image. Doing so ensures that the pretrained weights and options are loaded before calling detect:
42+
43+
```matlab
44+
detector = mtcnn.Detector();
45+
[bboxes, scores, landmarks] = detector.detect(im);
46+
```
47+
48+
The detector object accepts the same optional arguments as the `mtcnn.detectFaces` function.
49+
50+
Refer to the MATLAB toolbox documentation or [click here](docs/gettings_started.md) for a complete example.
51+
52+
## About
53+
54+
The MTCNN face detector is fast and accurate. Evaluation on the [WIDER face benchmark](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html) shows significant performance gains over non-deep learning face detection methods. Prediction speed depends on the image, dimensions, pyramid scales, and hardware (i.e. CPU or GPU). On a typical CPU, for VGA resolution images, a frame rates ~10 fps should be achievable.
55+
56+
In comparisson to MATLAB's built in `vision.CascadeObjectDetector` the MTCNN detector is more robust to facial pose as demonstrated in the image below.
57+
58+
![](doc/output2.jpg)
59+
60+
_Face detection from MTCNN in yellow, detections from the built in vision.CascadeObjectDetector in teal._
61+
62+
63+
## Contribute
64+
65+
Please file any bug reports or feature requests as [GitHub issues](TODO_add_link). In particular comment on the following two issues if they interest you!
66+
67+
- [Support training MTCNN](TODO_add_link)
68+
- [Support MATLAB versions earlier than R2019b](TODO_add_link)
69+
70+
71+
_Copyright 2019 The MathWorks, Inc._
+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
function outBboxes = applyCorrection(bboxes, correction)
2+
% applyCorrection Apply bounding box regression correction.
3+
4+
% Copyright 2019 The MathWorks, Inc.
5+
6+
assert(all(bboxes(:, 3) == bboxes(:, 4)), ...
7+
"mtcnn:util:applyCorrection:badBox", ...
8+
"Correction assumes square bounding boxes.");
9+
10+
scaledOffset = bboxes(:, 3).*correction;
11+
outBboxes = bboxes + scaledOffset;
12+
13+
end
+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
function scales = calculateScales(im, minSize, maxSize, defaultScale, pyramidScale)
2+
% calculateScales Compute image scales for a given size range.
3+
%
4+
% Args:
5+
% im Input image
6+
% minSize Minimum pixel size for detection
7+
% maxSize Maximum pixel size for detection
8+
% defaultScale Pixel size of proposal network
9+
% pyramidScale Scale factor for image pyramid this must be
10+
% greater than 1 (optional, default = sqrt(2))
11+
%
12+
% Returns:
13+
% scales 1xn row vector of calculated pyramid scales
14+
15+
% Copyright 2019 The MathWorks, Inc.
16+
17+
if nargin < 5
18+
pyramidScale = sqrt(2);
19+
else
20+
assert(pyramidScale > 1, ...
21+
"mtcnn:util:calculateScales:badScale", ...
22+
"PyramidScale must be >1 but it was %f", pyramidScale);
23+
end
24+
25+
imSize = size(im);
26+
27+
% if maxSize is empty assume the smallest image dimension
28+
if isempty(maxSize)
29+
maxSize = min(imSize(1:2));
30+
end
31+
32+
% Calculate min and max scaling factors required
33+
minScale = minSize/defaultScale;
34+
maxScale = maxSize/defaultScale;
35+
36+
% Round to multiples or sqrt(2)
37+
alpha = floor(log(maxScale/minScale)/log(pyramidScale));
38+
scales = minScale.*pyramidScale.^(0:alpha);
39+
end

code/mtcnn/+mtcnn/+util/cropImage.m

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
function cropped = cropImage(im, bboxes, outSize)
2+
% cropImage Crop and resize image.
3+
4+
% Copyright 2019 The MathWorks, Inc.
5+
6+
nBoxes = size(bboxes, 1);
7+
cropped = zeros(outSize, outSize, 3, nBoxes, 'like', im);
8+
for iBox = 1:nBoxes
9+
thisBox = bboxes(iBox, :);
10+
cropped(:,:,:,iBox) = imresize(imcrop(im, thisBox), [outSize, outSize]);
11+
end
12+
13+
end

code/mtcnn/+mtcnn/+util/makeSquare.m

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
function outBox = makeSquare(bboxes)
2+
% makeSquare Make the bounding box square.
3+
4+
% Copyright 2019 The MathWorks, Inc.
5+
6+
maxSide = max(bboxes(:, 3:4), [], 2);
7+
cx = bboxes(:, 1) + bboxes(:, 3)/2;
8+
cy = bboxes(:, 2) + bboxes(:, 4)/2;
9+
outBox = [cx - maxSide/2, cy - maxSide/2, maxSide, maxSide];
10+
end

code/mtcnn/+mtcnn/+util/prelu.m

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
function x = prelu(x, params)
2+
% prelu Parameterized relu activation.
3+
4+
% Copyright 2019 The MathWorks, Inc.
5+
6+
x = max(0, x) + params.*min(0, x);
7+
8+
end

code/mtcnn/+mtcnn/Detector.m

+171
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
classdef Detector < matlab.mixin.SetGet
2+
% MTCNN Detector class
3+
%
4+
% When creating an mtcnn.Detector object
5+
% pass in any of the public properties as Name-Value pairs to
6+
% configure the detector. For more details of the available
7+
% options see the help for mtcnn.detectFaces.
8+
%
9+
% See also: mtcnn.detectFaces
10+
11+
% Copyright 2019 The MathWorks, Inc.
12+
13+
properties
14+
% Approx. min size in pixels
15+
MinSize {mustBeGreaterThan(MinSize, 12)} = 24
16+
% Approx. max size in pixels
17+
MaxSize = []
18+
% Pyramid scales for region proposal
19+
PyramidScale = sqrt(2)
20+
% Confidence threshold at each stage of detection
21+
ConfidenceThresholds = [0.6, 0.7, 0.8]
22+
% Non-max suppression overlap thresholds
23+
NmsThresholds = [0.5, 0.5, 0.5]
24+
% Use GPU for processing or not
25+
UseGPU = false
26+
end
27+
28+
properties (Constant)
29+
% Input sizes (pixels) of the networks
30+
PnetSize = 12
31+
RnetSize = 24
32+
OnetSize = 48
33+
end
34+
35+
properties (SetAccess=private)
36+
% Weights for the networks
37+
PnetWeights
38+
RnetWeights
39+
OnetWeights
40+
end
41+
42+
methods
43+
function obj = Detector(varargin)
44+
% Create an mtcnn.Detector object
45+
46+
obj.loadWeights();
47+
48+
if nargin > 1
49+
obj.set(varargin{:});
50+
end
51+
52+
if obj.UseGPU()
53+
obj.PnetWeights = dlupdate(@gpuArray, obj.PnetWeights);
54+
obj.RnetWeights = dlupdate(@gpuArray, obj.RnetWeights);
55+
obj.OnetWeights = dlupdate(@gpuArray, obj.OnetWeights);
56+
end
57+
end
58+
59+
function [bboxes, scores, landmarks] = detect(obj, im)
60+
% detect Run the detection algorithm on an image.
61+
%
62+
% Args:
63+
% im - RGB input image for detection
64+
%
65+
% Returns:
66+
% bbox - nx4 array of face bounding boxes in the
67+
% format [x, y, w, h]
68+
% scores - nx1 array of face probabilities
69+
% landmarks - nx5x2 array of facial landmarks
70+
%
71+
% See also: mtcnn.detectFaces
72+
73+
if obj.UseGPU()
74+
im = gpuArray(single(im));
75+
end
76+
77+
bboxes = [];
78+
scores = [];
79+
landmarks = [];
80+
81+
%% Stage 1 - Proposal
82+
scales = mtcnn.util.calculateScales(im, ...
83+
obj.MinSize, ...
84+
obj.MaxSize, ...
85+
obj.PnetSize, ...
86+
obj.PyramidScale);
87+
88+
for scale = scales
89+
[thisBox, thisScore] = ...
90+
mtcnn.proposeRegions(im, scale, ...
91+
obj.ConfidenceThresholds(1), ...
92+
obj.PnetWeights);
93+
bboxes = cat(1, bboxes, thisBox);
94+
scores = cat(1, scores, thisScore);
95+
end
96+
97+
if ~isempty(scores)
98+
[bboxes, ~] = selectStrongestBbox(gather(bboxes), scores, ...
99+
"RatioType", "Min", ...
100+
"OverlapThreshold", obj.NmsThresholds(1));
101+
else
102+
return % No proposals found
103+
end
104+
105+
%% Stage 2 - Refinement
106+
[cropped, bboxes] = obj.prepImages(im, bboxes, obj.RnetSize);
107+
[probs, correction] = mtcnn.rnet(cropped, obj.RnetWeights);
108+
[scores, bboxes] = obj.processOutputs(probs, correction, bboxes, 2);
109+
110+
if isempty(scores)
111+
return
112+
end
113+
114+
%% Stage 3 - Output
115+
[cropped, bboxes] = obj.prepImages(im, bboxes, obj.OnetSize);
116+
117+
% Adjust bboxes for the behaviour of imcrop
118+
bboxes(:, 1:2) = bboxes(:, 1:2) - 0.5;
119+
bboxes(:, 3:4) = bboxes(:, 3:4) + 1;
120+
121+
[probs, correction, landmarks] = mtcnn.onet(cropped, obj.OnetWeights);
122+
123+
% landmarks are relative to uncorrected bbox
124+
landmarks = extractdata(landmarks)';
125+
x = bboxes(:, 1) + landmarks(:, 1:5).*(bboxes(:, 3));
126+
y = bboxes(:, 2) + landmarks(:, 6:10).*(bboxes(:, 4));
127+
landmarks = cat(3, x, y);
128+
landmarks(extractdata(probs(2, :))' < obj.ConfidenceThresholds(3), :, :) = [];
129+
130+
[scores, bboxes] = obj.processOutputs(probs, correction, bboxes, 3);
131+
132+
% Gather and cast the outputs
133+
bboxes= gather(double(bboxes));
134+
scores = gather(double(scores));
135+
landmarks = gather(double(landmarks));
136+
end
137+
end
138+
139+
methods (Access=private)
140+
function loadWeights(obj)
141+
% loadWeights Load the network weights from file.
142+
obj.PnetWeights = load(fullfile(mtcnnRoot(), "weights", "pnet.mat"));
143+
obj.RnetWeights = load(fullfile(mtcnnRoot(), "weights", "rnet.mat"));
144+
obj.OnetWeights = load(fullfile(mtcnnRoot(), "weights", "onet.mat"));
145+
end
146+
147+
function [cropped, bboxes] = prepImages(obj, im, bboxes, outputSize)
148+
% prepImages Pre-process the images and bounding boxes.
149+
bboxes = mtcnn.util.makeSquare(bboxes);
150+
bboxes = round(bboxes);
151+
cropped = mtcnn.util.cropImage(im, bboxes, outputSize);
152+
cropped = dlarray(single(cropped)./255*2 - 1, "SSCB");
153+
154+
end
155+
156+
function [scores, bboxes] = ...
157+
processOutputs(obj, probs, correction, bboxes, netIdx)
158+
% processOutputs Post-process the output values.
159+
faceProbs = extractdata(probs(2, :))';
160+
correction = extractdata(correction)';
161+
bboxes = mtcnn.util.applyCorrection(bboxes, correction);
162+
bboxes(faceProbs < obj.ConfidenceThresholds(netIdx), :) = [];
163+
scores = faceProbs(faceProbs > obj.ConfidenceThresholds(netIdx));
164+
if ~isempty(scores)
165+
[bboxes, ~] = selectStrongestBbox(gather(bboxes), scores, ...
166+
"RatioType", "Min", ...
167+
"OverlapThreshold", obj.NmsThresholds(netIdx));
168+
end
169+
end
170+
end
171+
end

code/mtcnn/+mtcnn/detectFaces.m

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
function [bboxes, scores, landmarks] = detectFaces(im)
2+
% detectFaces Use a pretrained model to detect faces in an image.
3+
%
4+
% Args:
5+
% im - RGB input image for detection
6+
%
7+
% Returns:
8+
% bbox - nx4 array of face bounding boxes in the
9+
% format [x, y, w, h]
10+
% scores - nx1 array of face probabilities
11+
% landmarks - nx5x2 array of facial landmarks
12+
%
13+
% Name-Value pairs:
14+
% detectFaces also takes the following optional Name-Value pairs:
15+
% - MinSize - Approx. min size in pixels
16+
% (default=24)
17+
% - MaxSize - Approx. max size in pixels
18+
% (default=[])
19+
% - PyramidScale - Pyramid scales for region proposal
20+
% (default=sqrt(2))
21+
% - ConfidenceThresholds - Confidence threshold at each stage of detection
22+
% (default=[0.6, 0.7, 0.8])
23+
% - NmsThresholds - Non-max suppression overlap thresholds
24+
% (default=[0.5, 0.5, 0.5])
25+
% - UseGPU - Use GPU for processing or not
26+
% (default=false)
27+
%
28+
% Note:
29+
% The 5 landmarks detector are in the order:
30+
% - Left eye, right eye, nose, left mouth corner, right mouth corner
31+
% The final 2 dimensions correspond to x and y co-ords.
32+
%
33+
% See also: mtcnn.Detector.detect
34+
35+
% Copyright 2019 The MathWorks, Inc.
36+
37+
detector = mtcnn.Detector();
38+
[bboxes, scores, landmarks] = detector.detect(im);
39+
end

0 commit comments

Comments
 (0)