iOS Black Technology (AVFoundation) Dynamic Face Recognition (II)

In the previous article, we introduced the static face recognition implemented by Core Image. Here we introduce dynamic face recognition, one of the powerful functions of AVFoundation.

First, introduce some methods of face recognition.

1. CoreImage Static Face Recognition, Photo Recognition, Image Recognition, etc.

Details can be found in the previous blog introduction

2. Face++

It is a new visual service platform under the banner of Beijing Broad-sighted Technology Co., Ltd. It aims to provide a simple, powerful and versatile visual service platform.
Face++ is a new generation of cloud vision service platform, providing a set of world-leading visual technology services for face detection, face recognition and face analysis.
Introduction to Face++ Baidu Encyclopedia
Face++ official website

3. OpenCV

It is composed of a series of C functions and a few C++ classes. It realizes many general algorithms in image processing and computer vision. Others are not very well understood.
This is the content of Baidu Encyclopedia.

4. Vision

Vision is Apple's CoreML-based image recognition framework launched in WWDC 2017 with iOS 11
according to Vision Official Documents Vision itself has Face Detection and Recognition (Face Detection and Recognition), Machine Learning Image Analysis (Machine Learning Picture Analysis), Barcode Detection (Barcode Detection), Text Detection (Text Detection)... Wait for these functions.
Interested students can check the relevant documents to learn, here is not much about the small edition of the introduction.

5. AVFoundation

Framework for using and creating time-based audiovisual media
Here we also use the AV Foundation framework for face recognition.

II. A Brief Introduction to Key Classes

1. AVCaptureDevice: Represents hardware devices

We can get the camera, sound sensor and so on from this kind of mobile phone hardware.
When we need to change the attributes of some hardware devices in the application (e.g. changing the camera, changing the flash mode, changing the focus of the camera), we must lock the device first and unlock it after the modification is completed.
Example: Switching Camera

//4. Remove old input and add new input
//4.1 Equipment Locking
session.beginConfiguration()
//4.2. Remove old equipment
session.removeInput(deviceIn)
//4.3 Adding new equipment
session.addInput(newVideoInput)
//4.4 Equipment Unlock
session.commitConfiguration()

2. AVCaptureDeviceInput: Device Input Data Management Object

The corresponding AVCaptureDeviceInput object can be created according to AVCaptureDevice.
This object will be added to the AVCaptureSession to manage, representing the input device, which configures ports for abstract hardware devices. Common input devices are (microphones, cameras, etc.)

3. AVCaptureOutput: Represents output data

The output can be images (AV Capture Still Image Output) or videos (AV Capture Movie File Output)

4. AVCaptureSession: Media (Audio, Video) Capture Session

Responsible for the capture of audio and video data output to the output device.
An AVCaptureSession can have multiple inputs or outputs.
It is a bridge connecting AVCaptureInput and AVCaptureOutput, which coordinates the transmission of data between input and output.
It has two methods: start Running and stop Running to open and end a session.
Each session is called a session, that is, if you need to change some configuration of the session (for example, switching the camera) during the running of the application, you need to turn on the configuration first, and then submit the configuration after the configuration is completed.

5. AV CaptureVideo Preview Layer: Image Preview Layer

How can our photos and videos be displayed on mobile phones? That's by adding this object to the UIView layer

Okay, let's go ahead and talk about so much nonsense. So how can we achieve face recognition? Here comes the dried goods.

3. Adding Scanning Equipment

Access Device (Camera)
Create input devices
Create scanned output
Create captured calls

1. Output equipment

Using AVCapture Metadata Output, we can scan face, two-dimensional code, bar code and other information.
Agents must be set, otherwise the scan results cannot be obtained.
You need to set what data you want to output: face, qr, etc.

//3. Create the output object of the original data
let metadataOutput = AVCaptureMetadataOutput()

//4. Set up agent to listen for the output data of the output object and refresh it in the main thread.
metadataOutput.setMetadataObjectsDelegate(self, queue: DispatchQueue.main)

//7. Tell the output object what kind of data to output. Recognize the face, up to 10 faces can be recognized.
metadataOutput.metadataObjectTypes = [.face]

The main code is as follows:

fileprivate func addScaningVideo(){
    //1. Getting input devices (cameras)
    guard let device = AVCaptureDevice.default(for: .video) else { return }

    //2. Create input objects based on input devices
    guard let deviceIn = try? AVCaptureDeviceInput(device: device) else { return }
    deviceInput = deviceIn

    //3. Create the output object of the original data
    let metadataOutput = AVCaptureMetadataOutput()

    //4. Set up agent to listen for the output data of the output object and refresh it in the main thread.
    metadataOutput.setMetadataObjectsDelegate(self, queue: DispatchQueue.main)
    //4.2 Setting Output Agent
    faceDelegate = previewView

    //5. Setting Output Quality (High Pixel Output)
    session.sessionPreset = .high

    //6. Add input and output to session
    if session.canAddInput(deviceInput!) {
        session.addInput(deviceInput!)
    }
    if session.canAddOutput(metadataOutput) {
        session.addOutput(metadataOutput)
    }

    //7. Tell the output object what kind of data to output. Recognize the face, up to 10 faces can be recognized.
    metadataOutput.metadataObjectTypes = [.face]

    //8. Create a preview layer
    previewLayer = AVCaptureVideoPreviewLayer(session: session)
    previewLayer.videoGravity = .resizeAspectFill
    previewLayer.frame = view.bounds
    previewView.layer.insertSublayer(previewLayer, at: 0)

    //9. Set the effective scanning area (default entire screen area) (0-1 for each value, with the upper right corner of the screen as the coordinate origin)
    metadataOutput.rectOfInterest = previewView.bounds

    //10. Start scanning
    if !session.isRunning {
        DispatchQueue.global().async {
            self.session.startRunning()
        }
    }
}

2. Switching Camera

Get the current camera direction
Create a new input
Remove old input capture and add new input capture
The code is as follows:

@IBAction func switchCameraAction(_ sender: Any) {
    //1. Carry out transit animation
    let anima = CATransition()
    anima.type = "oglFlip"
    anima.subtype = "fromLeft"
    anima.duration = 0.5
    view.layer.add(anima, forKey: nil)

    //2. Get the current camera
    guard let deviceIn = deviceInput else { return }
    let position: AVCaptureDevice.Position = deviceIn.device.position == .back ? .front : .back

    //3. Create a new input
    let deviceSession = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInWideAngleCamera], mediaType: .video, position: position)
    guard let newDevice = deviceSession.devices.filter({ $0.position == position }).first else { return }
    guard let newVideoInput = try? AVCaptureDeviceInput(device: newDevice) else { return }

    //4. Remove old input and add new input
    //4.1 Equipment Locking
    session.beginConfiguration()
    //4.2. Remove old equipment
    session.removeInput(deviceIn)
    //4.3 Adding new equipment
    session.addInput(newVideoInput)
    //4.4 Equipment Unlock
    session.commitConfiguration()

    //5. Save the latest input
    deviceInput = newVideoInput
}

3. Processing scanning results

Protocol Method for Implementing AVCaptureMetadata Output Objects Delegate

//` metadata Objects `is the scan result returned
optional public func metadataOutput(_ output: AVCaptureMetadataOutput, didOutput metadataObjects: [AVMetadataObject], from connection: AVCaptureConnection)

4. Introduction to AVMetadata FaceObject

faceID: Unique Identity of Face
- Everyone scanned has a different faceID
- The same person, in different states (shaking his head, tilting his head, raising his head, etc.), will have different faceID.
HasRoll Angle: Is there a tilt angle, a tilt angle (left and right tilt head) (BOOL type)
RoAngle: Tilt angle, roll angle (CGFloat type)
Has Yaw Angle: Is there a deflection angle (shaking head left and right)
Yaw Angle: Deflection Angle

5. Processing scanning results

5.1 Getting Face Array of Preview Layer

Traversing the scanned face array and converting it into an array of faces on the preview layer
It's mainly the transformation of the face on the left of the layer.
Returns the converted new array

fileprivate func transformedFaces(faceObjs: [AVMetadataObject]) -> [AVMetadataObject] {
    var faceArr = [AVMetadataObject]()
    for face in faceObjs {
        //Converting scanned face objects to face objects in the preview layer(Mainly the transformation of coordinates.)
        if let transFace = previewLayer.transformedMetadataObject(for: face){
            faceArr.append(transFace)
        }
    }
    return faceArr
}

5.2 Add red box according to face position

Set the frame of the red box

faceLayer?.frame = face.bounds

Acquisition of CATransform 3D from the angles of deflection and inclination

    fileprivate func transformDegress(yawAngle: CGFloat) -> CATransform3D {
        let yaw = degreesToRadians(degress: yawAngle)
        //around Y Axis rotation
        let yawTran = CATransform3DMakeRotation(yaw, 0, -1, 0)
        //Red Box Rotation Problem
        return CATransform3DConcat(yawTran, CATransform3DIdentity)
    }

    //Dealing with deflection angle
    fileprivate func transformDegress(rollAngle: CGFloat) -> CATransform3D {
        let roll = degreesToRadians(degress: rollAngle)
        //around Z Axis rotation
        return CATransform3DMakeRotation(roll, 0, 0, 1)
    }

    //Angle conversion
    fileprivate func degreesToRadians(degress: CGFloat) -> CGFloat{
        return degress * CGFloat(Double.pi) / 180
    }

Rotate the red box according to the deflection angle and inclination angle.

//3.4 Set deflection angle (shake head left and right)
if face.hasYawAngle{
    let tranform3D = transformDegress(yawAngle: face.yawAngle)

    //Matrix processing
    faceLayer?.transform = CATransform3DConcat(faceLayer!.transform, tranform3D)
}

//3.5 Set tilt angle and tilt angle (left and right tilt head)
if face.hasRollAngle{
    let tranform3D = transformDegress(rollAngle: face.rollAngle)

    //Matrix processing
    faceLayer?.transform = CATransform3DConcat(faceLayer!.transform, tranform3D)
}

So far, dynamic face recognition has been completed, and red boxes will be added to the face position, and the red boxes will be dynamically and real-time adjusted according to the face position.
Now pick up your camera and test it.

GitHub–Demo address

Be careful:
- Here is just a list of the main core code, specific code logic refer to demo
- If there are some places in this article that are not very detailed or have better suggestions, you are welcome to contact Xiaobian.

Posted by upperbid on Sun, 19 May 2019 17:13:08 -0700

Programmer Group

iOS Black Technology (AVFoundation) Dynamic Face Recognition (II)

iOS Black Technology (AVFoundation) Dynamic Face Recognition (II)

First, introduce some methods of face recognition.

1. CoreImage Static Face Recognition, Photo Recognition, Image Recognition, etc.

2. Face++

3. OpenCV

4. Vision

5. AVFoundation

II. A Brief Introduction to Key Classes

1. AVCaptureDevice: Represents hardware devices

2. AVCaptureDeviceInput: Device Input Data Management Object

3. AVCaptureOutput: Represents output data

4. AVCaptureSession: Media (Audio, Video) Capture Session

5. AV CaptureVideo Preview Layer: Image Preview Layer

3. Adding Scanning Equipment

1. Output equipment

2. Switching Camera

3. Processing scanning results

4. Introduction to AVMetadata FaceObject

5. Processing scanning results

5.1 Getting Face Array of Preview Layer

5.2 Add red box according to face position

GitHub–Demo address

Hot Keywords