matlab-deep-learning
diff --git a/‎LICENSE
Lines changed: 7 additions & 0 deletions b/‎LICENSE
Lines changed: 7 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 71 additions & 0 deletions b/‎README.md
Lines changed: 71 additions & 0 deletions
diff --git a/‎code/mtcnn/+mtcnn/+util/applyCorrection.m
Lines changed: 13 additions & 0 deletions b/‎code/mtcnn/+mtcnn/+util/applyCorrection.m
Lines changed: 13 additions & 0 deletions
diff --git a/‎code/mtcnn/+mtcnn/+util/calculateScales.m
Lines changed: 39 additions & 0 deletions b/‎code/mtcnn/+mtcnn/+util/calculateScales.m
Lines changed: 39 additions & 0 deletions
diff --git a/‎code/mtcnn/+mtcnn/+util/cropImage.m
Lines changed: 13 additions & 0 deletions b/‎code/mtcnn/+mtcnn/+util/cropImage.m
Lines changed: 13 additions & 0 deletions
diff --git a/‎code/mtcnn/+mtcnn/+util/makeSquare.m
Lines changed: 10 additions & 0 deletions b/‎code/mtcnn/+mtcnn/+util/makeSquare.m
Lines changed: 10 additions & 0 deletions
diff --git a/‎code/mtcnn/+mtcnn/+util/prelu.m
Lines changed: 8 additions & 0 deletions b/‎code/mtcnn/+mtcnn/+util/prelu.m
Lines changed: 8 additions & 0 deletions
diff --git a/‎code/mtcnn/+mtcnn/Detector.m
Lines changed: 171 additions & 0 deletions b/‎code/mtcnn/+mtcnn/Detector.m
Lines changed: 171 additions & 0 deletions
diff --git a/‎code/mtcnn/+mtcnn/detectFaces.m
Lines changed: 39 additions & 0 deletions b/‎code/mtcnn/+mtcnn/detectFaces.m
Lines changed: 39 additions & 0 deletions
@@ -0,0 +1,7 @@
+Copyright (c) 2019, The MathWorks, Inc.
+All rights reserved.
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+3. In all cases, the software is, and all modifications and derivatives of the software shall be, licensed to you solely for use in conjunction with MathWorks products and service offerings. 
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,71 @@
+# Face Detection and Alignment MTCNN
+
+## [__Download the toolbox here__](TODO)
+
+This repository implements a deep-learning based face detection and facial landmark localization model using [multi-task cascaded convolutional neural networks (MTCNNs)](https://kpzhang93.github.io/MTCNN_face_detection_alignment/). 
+
+- [📦 Installation](#installation)
+- [🏁 Getting Started](#getting-started)
+- [🔎😄 Usage](#usage)
+- [❓ About](#about)
+- [💬 Contribute](#contribute)
+
+_Note: This code supports inference using a pretrained model. Training from scratch is not supported. Weights are imported from the [original MTCNN model](https://kpzhang93.github.io/MTCNN_face_detection_alignment/) trained in Caffe._
+
+## Installation
+
+- Face Detection and Alignment MTCNN requires the following products:
+  - MATLAB R2019b or later
+  - Deep Learning Toolbox
+  - Computer Vision Toolbox
+  - Image Processing Toolbox
+- Download the [latest release](TODO_add_link) of the Face Detection and Aligment MTCNN. To install, open the .mltbx file in MATLAB.
+
+## Getting Started
+
+To get started using the pretrained face detector, import an image and use the `mtcnn.detectFaces` function:
+
+```matlab
+im = imread("visionteam.jpg");
+[bboxes, scores, landmarks] = mtcnn.detectFaces(im);
+```
+
+This returns the bounding boxes, probabilities, and five-point facial landmarks for each face detected in the image.
+
+![](doc/output1.jpg)
+
+## Usage
+
+The `detectFaces` function supports various optional arguments. For more details, refer to the help documentation for this function by typing `help mtcnn.detectFaces` at the command window.
+
+To get the best speed performance from the detector, first create a `mtcnn.Detector` object, then call its `detect` method on your image. Doing so ensures that the pretrained weights and options are loaded before calling detect:
+
+```matlab
+detector = mtcnn.Detector();
+[bboxes, scores, landmarks] = detector.detect(im);
+```
+
+The detector object accepts the same optional arguments as the `mtcnn.detectFaces` function.
+
+Refer to the MATLAB toolbox documentation or [click here](docs/gettings_started.md) for a complete example.
+
+## About
+
+The MTCNN face detector is fast and accurate. Evaluation on the [WIDER face benchmark](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html) shows significant performance gains over non-deep learning face detection methods. Prediction speed depends on the image, dimensions, pyramid scales, and hardware (i.e. CPU or GPU). On a typical CPU, for VGA resolution images, a frame rates ~10 fps should be achievable.
+
+In comparisson to MATLAB's built in `vision.CascadeObjectDetector` the MTCNN detector is more robust to facial pose as demonstrated in the image below.
+
+![](doc/output2.jpg)
+
+_Face detection from MTCNN in yellow, detections from the built in vision.CascadeObjectDetector in teal._
+
+
+## Contribute
+
+Please file any bug reports or feature requests as [GitHub issues](TODO_add_link). In particular comment on the following two issues if they interest you!
+
+- [Support training MTCNN](TODO_add_link)
+- [Support MATLAB versions earlier than R2019b](TODO_add_link)
+
+
+_Copyright 2019 The MathWorks, Inc._
@@ -0,0 +1,13 @@
+function outBboxes = applyCorrection(bboxes, correction)
+% applyCorrection  Apply bounding box regression correction.
+
+% Copyright 2019 The MathWorks, Inc.
+
+    assert(all(bboxes(:, 3) == bboxes(:, 4)), ...
+        "mtcnn:util:applyCorrection:badBox", ...
+        "Correction assumes square bounding boxes.");
+
+    scaledOffset = bboxes(:, 3).*correction;
+    outBboxes = bboxes + scaledOffset;
+    
+end
@@ -0,0 +1,39 @@
+function scales = calculateScales(im, minSize, maxSize, defaultScale, pyramidScale)
+    % calculateScales Compute image scales for a given size range.
+    %
+    %   Args:
+    %       im              Input image
+    %       minSize         Minimum pixel size for detection
+    %       maxSize         Maximum pixel size for detection
+    %       defaultScale    Pixel size of proposal network
+    %       pyramidScale    Scale factor for image pyramid this must be
+    %                       greater than 1 (optional, default = sqrt(2))
+    %
+    %   Returns:
+    %       scales          1xn row vector of calculated pyramid scales
+    
+    % Copyright 2019 The MathWorks, Inc.
+    
+    if nargin < 5
+        pyramidScale = sqrt(2);
+    else
+        assert(pyramidScale > 1, ...
+            "mtcnn:util:calculateScales:badScale", ...
+            "PyramidScale must be >1 but it was %f", pyramidScale);
+    end
+    
+    imSize = size(im);
+    
+    % if maxSize is empty assume the smallest image dimension
+    if isempty(maxSize)
+        maxSize = min(imSize(1:2));
+    end
+    
+    % Calculate min and max scaling factors required
+    minScale = minSize/defaultScale;
+    maxScale = maxSize/defaultScale;
+    
+    % Round to multiples or sqrt(2)
+    alpha = floor(log(maxScale/minScale)/log(pyramidScale));
+    scales = minScale.*pyramidScale.^(0:alpha);
+end
@@ -0,0 +1,13 @@
+function cropped = cropImage(im, bboxes, outSize)
+% cropImage     Crop and resize image.
+
+% Copyright 2019 The MathWorks, Inc.
+    
+    nBoxes = size(bboxes, 1);
+    cropped = zeros(outSize, outSize, 3, nBoxes, 'like', im);
+    for iBox = 1:nBoxes
+        thisBox = bboxes(iBox, :);
+        cropped(:,:,:,iBox) = imresize(imcrop(im, thisBox), [outSize, outSize]);
+    end
+    
+end
@@ -0,0 +1,10 @@
+function outBox = makeSquare(bboxes)
+% makeSquare    Make the bounding box square.
+    
+% Copyright 2019 The MathWorks, Inc.
+    
+    maxSide = max(bboxes(:, 3:4), [], 2);
+    cx = bboxes(:, 1) + bboxes(:, 3)/2;
+    cy = bboxes(:, 2) + bboxes(:, 4)/2;
+    outBox = [cx - maxSide/2, cy - maxSide/2, maxSide, maxSide];
+end
@@ -0,0 +1,8 @@
+function x = prelu(x, params)
+% prelu   Parameterized relu activation.
+
+% Copyright 2019 The MathWorks, Inc.
+
+    x = max(0, x) + params.*min(0, x);
+    
+end
@@ -0,0 +1,171 @@
+classdef Detector < matlab.mixin.SetGet
+    % MTCNN Detector class
+    %
+    %   When creating an mtcnn.Detector object
+    %   pass in any of the public properties as Name-Value pairs to
+    %   configure the detector. For more details of the available
+    %   options see the help for mtcnn.detectFaces.
+    %
+    %   See also: mtcnn.detectFaces
+    
+    % Copyright 2019 The MathWorks, Inc.
+    
+    properties
+        % Approx. min size in pixels
+        MinSize {mustBeGreaterThan(MinSize, 12)} = 24
+        % Approx. max size in pixels
+        MaxSize = []
+        % Pyramid scales for region proposal
+        PyramidScale = sqrt(2)
+        % Confidence threshold at each stage of detection
+        ConfidenceThresholds = [0.6, 0.7, 0.8]
+        % Non-max suppression overlap thresholds
+        NmsThresholds = [0.5, 0.5, 0.5]
+        % Use GPU for processing or not
+        UseGPU = false
+    end
+    
+    properties (Constant)
+        % Input sizes (pixels) of the networks
+        PnetSize = 12
+        RnetSize = 24
+        OnetSize = 48
+    end
+    
+    properties (SetAccess=private)
+        % Weights for the networks
+        PnetWeights
+        RnetWeights
+        OnetWeights
+    end
+    
+    methods
+        function obj = Detector(varargin)
+            % Create an mtcnn.Detector object
+            
+            obj.loadWeights();
+            
+            if nargin > 1
+                obj.set(varargin{:});
+            end
+            
+            if obj.UseGPU()
+                obj.PnetWeights = dlupdate(@gpuArray, obj.PnetWeights);
+                obj.RnetWeights = dlupdate(@gpuArray, obj.RnetWeights);
+                obj.OnetWeights = dlupdate(@gpuArray, obj.OnetWeights);
+            end
+        end
+        
+        function [bboxes, scores, landmarks] = detect(obj, im)
+            % detect    Run the detection algorithm on an image.
+            % 
+            %   Args:
+            %       im  - RGB input image for detection
+            %
+            %   Returns:
+            %       bbox        - nx4 array of face bounding boxes in the 
+            %                   format [x, y, w, h]
+            %       scores      - nx1 array of face probabilities
+            %       landmarks   - nx5x2 array of facial landmarks
+            %
+            %   See also: mtcnn.detectFaces
+            
+            if obj.UseGPU()
+                im = gpuArray(single(im));
+            end
+            
+            bboxes = [];
+            scores = [];
+            landmarks = [];
+            
+            %% Stage 1 - Proposal
+            scales = mtcnn.util.calculateScales(im, ...
+                                                obj.MinSize, ...
+                                                obj.MaxSize, ...
+                                                obj.PnetSize, ...
+                                                obj.PyramidScale);
+            
+            for scale = scales
+                [thisBox, thisScore] = ...
+                    mtcnn.proposeRegions(im, scale, ...
+                                            obj.ConfidenceThresholds(1), ...
+                                            obj.PnetWeights);
+                bboxes = cat(1, bboxes, thisBox);
+                scores = cat(1, scores, thisScore);
+            end
+            
+            if ~isempty(scores)
+                [bboxes, ~] = selectStrongestBbox(gather(bboxes), scores, ...
+                    "RatioType", "Min", ...
+                    "OverlapThreshold", obj.NmsThresholds(1));
+            else
+                return % No proposals found
+            end
+            
+            %% Stage 2 - Refinement
+            [cropped, bboxes] = obj.prepImages(im, bboxes, obj.RnetSize);
+            [probs, correction] = mtcnn.rnet(cropped, obj.RnetWeights);
+            [scores, bboxes] = obj.processOutputs(probs, correction, bboxes, 2);
+            
+            if isempty(scores)
+                return
+            end
+            
+            %% Stage 3 - Output
+            [cropped, bboxes] = obj.prepImages(im, bboxes, obj.OnetSize);
+            
+            % Adjust bboxes for the behaviour of imcrop
+            bboxes(:, 1:2) = bboxes(:, 1:2) - 0.5;
+            bboxes(:, 3:4) = bboxes(:, 3:4) + 1;
+            
+            [probs, correction, landmarks] = mtcnn.onet(cropped, obj.OnetWeights);
+            
+            % landmarks are relative to uncorrected bbox
+            landmarks = extractdata(landmarks)';
+            x = bboxes(:, 1) + landmarks(:, 1:5).*(bboxes(:, 3));
+            y = bboxes(:, 2) + landmarks(:, 6:10).*(bboxes(:, 4));
+            landmarks = cat(3, x, y);
+            landmarks(extractdata(probs(2, :))' < obj.ConfidenceThresholds(3), :, :) = [];
+            
+            [scores, bboxes] = obj.processOutputs(probs, correction, bboxes, 3);
+            
+            % Gather and cast the outputs
+            bboxes= gather(double(bboxes));
+            scores = gather(double(scores));
+            landmarks = gather(double(landmarks));
+        end
+    end
+    
+    methods (Access=private)
+        function loadWeights(obj)
+            % loadWeights   Load the network weights from file.
+            obj.PnetWeights = load(fullfile(mtcnnRoot(), "weights", "pnet.mat"));
+            obj.RnetWeights = load(fullfile(mtcnnRoot(), "weights", "rnet.mat"));
+            obj.OnetWeights = load(fullfile(mtcnnRoot(), "weights", "onet.mat"));
+        end
+        
+        function [cropped, bboxes] = prepImages(obj, im, bboxes, outputSize)
+            % prepImages    Pre-process the images and bounding boxes.
+            bboxes = mtcnn.util.makeSquare(bboxes);
+            bboxes = round(bboxes);
+            cropped = mtcnn.util.cropImage(im, bboxes, outputSize);
+            cropped = dlarray(single(cropped)./255*2 - 1, "SSCB");
+            
+        end
+        
+        function [scores, bboxes] = ...
+                processOutputs(obj, probs, correction, bboxes, netIdx)
+            % processOutputs    Post-process the output values.
+            faceProbs = extractdata(probs(2, :))';
+            correction = extractdata(correction)';
+            bboxes = mtcnn.util.applyCorrection(bboxes, correction);
+            bboxes(faceProbs < obj.ConfidenceThresholds(netIdx), :) = [];
+            scores = faceProbs(faceProbs > obj.ConfidenceThresholds(netIdx));
+            if ~isempty(scores) 
+                [bboxes, ~] = selectStrongestBbox(gather(bboxes), scores, ...
+                                "RatioType", "Min", ...
+                                "OverlapThreshold", obj.NmsThresholds(netIdx));
+            end
+        end
+    end
+end
@@ -0,0 +1,39 @@
+function [bboxes, scores, landmarks] = detectFaces(im)
+% detectFaces   Use a pretrained model to detect faces in an image.
+%
+%   Args:
+%       im  - RGB input image for detection
+%
+%   Returns:
+%       bbox        - nx4 array of face bounding boxes in the 
+%                   format [x, y, w, h]
+%       scores      - nx1 array of face probabilities
+%       landmarks   - nx5x2 array of facial landmarks
+%
+%   Name-Value pairs:
+%       detectFaces also takes the following optional Name-Value pairs:
+%            - MinSize              - Approx. min size in pixels
+%                                     (default=24)
+%            - MaxSize              - Approx. max size in pixels
+%                                     (default=[])
+%            - PyramidScale         - Pyramid scales for region proposal
+%                                     (default=sqrt(2))
+%            - ConfidenceThresholds - Confidence threshold at each stage of detection
+%                                     (default=[0.6, 0.7, 0.8])
+%            - NmsThresholds        - Non-max suppression overlap thresholds
+%                                     (default=[0.5, 0.5, 0.5])
+%            - UseGPU               - Use GPU for processing or not
+%                                     (default=false)
+% 
+%   Note:
+%       The 5 landmarks detector are in the order:
+%           - Left eye, right eye, nose, left mouth corner, right mouth corner
+%       The final 2 dimensions correspond to x and y co-ords.
+%
+%   See also: mtcnn.Detector.detect
+
+% Copyright 2019 The MathWorks, Inc.
+
+    detector = mtcnn.Detector();
+    [bboxes, scores, landmarks] = detector.detect(im);
+end