iOS Black Technology (AVFoundation) Dynamic Face Recognition (II)
In the previous article, we introduced the static face recognition implemented by Core Image. Here we introduce dynamic face recognition, one of the powerful functions of AVFoundation.
First, introduce some methods of face recognition.
1. CoreImage Static Face Recognition, Photo Recognition, Image Recognition, etc.
- Details can be found in the previous blog introduction
2. Face++
- It is a new visual service platform under the banner of Beijing Broad-sighted Technology Co., Ltd. It aims to provide a simple, powerful and versatile visual service platform.
- Face++ is a new generation of cloud vision service platform, providing a set of world-leading visual technology services for face detection, face recognition and face analysis.
- Introduction to Face++ Baidu Encyclopedia
- Face++ official website
3. OpenCV
- It is composed of a series of C functions and a few C++ classes. It realizes many general algorithms in image processing and computer vision. Others are not very well understood.
- This is the content of Baidu Encyclopedia.
4. Vision
- Vision is Apple's CoreML-based image recognition framework launched in WWDC 2017 with iOS 11
- according to Vision Official Documents Vision itself has Face Detection and Recognition (Face Detection and Recognition), Machine Learning Image Analysis (Machine Learning Picture Analysis), Barcode Detection (Barcode Detection), Text Detection (Text Detection)... Wait for these functions.
- Interested students can check the relevant documents to learn, here is not much about the small edition of the introduction.
5. AVFoundation
- Framework for using and creating time-based audiovisual media
- Here we also use the AV Foundation framework for face recognition.
II. A Brief Introduction to Key Classes
1. AVCaptureDevice: Represents hardware devices
- We can get the camera, sound sensor and so on from this kind of mobile phone hardware.
- When we need to change the attributes of some hardware devices in the application (e.g. changing the camera, changing the flash mode, changing the focus of the camera), we must lock the device first and unlock it after the modification is completed.
- Example: Switching Camera
//4. Remove old input and add new input
//4.1 Equipment Locking
session.beginConfiguration()
//4.2. Remove old equipment
session.removeInput(deviceIn)
//4.3 Adding new equipment
session.addInput(newVideoInput)
//4.4 Equipment Unlock
session.commitConfiguration()
2. AVCaptureDeviceInput: Device Input Data Management Object
- The corresponding AVCaptureDeviceInput object can be created according to AVCaptureDevice.
- This object will be added to the AVCaptureSession to manage, representing the input device, which configures ports for abstract hardware devices. Common input devices are (microphones, cameras, etc.)
3. AVCaptureOutput: Represents output data
- The output can be images (AV Capture Still Image Output) or videos (AV Capture Movie File Output)
4. AVCaptureSession: Media (Audio, Video) Capture Session
- Responsible for the capture of audio and video data output to the output device.
- An AVCaptureSession can have multiple inputs or outputs.
- It is a bridge connecting AVCaptureInput and AVCaptureOutput, which coordinates the transmission of data between input and output.
- It has two methods: start Running and stop Running to open and end a session.
- Each session is called a session, that is, if you need to change some configuration of the session (for example, switching the camera) during the running of the application, you need to turn on the configuration first, and then submit the configuration after the configuration is completed.
5. AV CaptureVideo Preview Layer: Image Preview Layer
- How can our photos and videos be displayed on mobile phones? That's by adding this object to the UIView layer
Okay, let's go ahead and talk about so much nonsense. So how can we achieve face recognition? Here comes the dried goods.
3. Adding Scanning Equipment
- Access Device (Camera)
- Create input devices
- Create scanned output
- Create captured calls
1. Output equipment
- Using AVCapture Metadata Output, we can scan face, two-dimensional code, bar code and other information.
- Agents must be set, otherwise the scan results cannot be obtained.
- You need to set what data you want to output: face, qr, etc.
//3. Create the output object of the original data
let metadataOutput = AVCaptureMetadataOutput()
//4. Set up agent to listen for the output data of the output object and refresh it in the main thread.
metadataOutput.setMetadataObjectsDelegate(self, queue: DispatchQueue.main)
//7. Tell the output object what kind of data to output. Recognize the face, up to 10 faces can be recognized.
metadataOutput.metadataObjectTypes = [.face]
The main code is as follows:
fileprivate func addScaningVideo(){
//1. Getting input devices (cameras)
guard let device = AVCaptureDevice.default(for: .video) else { return }
//2. Create input objects based on input devices
guard let deviceIn = try? AVCaptureDeviceInput(device: device) else { return }
deviceInput = deviceIn
//3. Create the output object of the original data
let metadataOutput = AVCaptureMetadataOutput()
//4. Set up agent to listen for the output data of the output object and refresh it in the main thread.
metadataOutput.setMetadataObjectsDelegate(self, queue: DispatchQueue.main)
//4.2 Setting Output Agent
faceDelegate = previewView
//5. Setting Output Quality (High Pixel Output)
session.sessionPreset = .high
//6. Add input and output to session
if session.canAddInput(deviceInput!) {
session.addInput(deviceInput!)
}
if session.canAddOutput(metadataOutput) {
session.addOutput(metadataOutput)
}
//7. Tell the output object what kind of data to output. Recognize the face, up to 10 faces can be recognized.
metadataOutput.metadataObjectTypes = [.face]
//8. Create a preview layer
previewLayer = AVCaptureVideoPreviewLayer(session: session)
previewLayer.videoGravity = .resizeAspectFill
previewLayer.frame = view.bounds
previewView.layer.insertSublayer(previewLayer, at: 0)
//9. Set the effective scanning area (default entire screen area) (0-1 for each value, with the upper right corner of the screen as the coordinate origin)
metadataOutput.rectOfInterest = previewView.bounds
//10. Start scanning
if !session.isRunning {
DispatchQueue.global().async {
self.session.startRunning()
}
}
}
2. Switching Camera
- Get the current camera direction
- Create a new input
- Remove old input capture and add new input capture
- The code is as follows:
@IBAction func switchCameraAction(_ sender: Any) {
//1. Carry out transit animation
let anima = CATransition()
anima.type = "oglFlip"
anima.subtype = "fromLeft"
anima.duration = 0.5
view.layer.add(anima, forKey: nil)
//2. Get the current camera
guard let deviceIn = deviceInput else { return }
let position: AVCaptureDevice.Position = deviceIn.device.position == .back ? .front : .back
//3. Create a new input
let deviceSession = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInWideAngleCamera], mediaType: .video, position: position)
guard let newDevice = deviceSession.devices.filter({ $0.position == position }).first else { return }
guard let newVideoInput = try? AVCaptureDeviceInput(device: newDevice) else { return }
//4. Remove old input and add new input
//4.1 Equipment Locking
session.beginConfiguration()
//4.2. Remove old equipment
session.removeInput(deviceIn)
//4.3 Adding new equipment
session.addInput(newVideoInput)
//4.4 Equipment Unlock
session.commitConfiguration()
//5. Save the latest input
deviceInput = newVideoInput
}
3. Processing scanning results
Protocol Method for Implementing AVCaptureMetadata Output Objects Delegate
//` metadata Objects `is the scan result returned
optional public func metadataOutput(_ output: AVCaptureMetadataOutput, didOutput metadataObjects: [AVMetadataObject], from connection: AVCaptureConnection)
4. Introduction to AVMetadata FaceObject
-
faceID: Unique Identity of Face
- Everyone scanned has a different faceID
- The same person, in different states (shaking his head, tilting his head, raising his head, etc.), will have different faceID.
- HasRoll Angle: Is there a tilt angle, a tilt angle (left and right tilt head) (BOOL type)
- RoAngle: Tilt angle, roll angle (CGFloat type)
- Has Yaw Angle: Is there a deflection angle (shaking head left and right)
- Yaw Angle: Deflection Angle
5. Processing scanning results
5.1 Getting Face Array of Preview Layer
- Traversing the scanned face array and converting it into an array of faces on the preview layer
- It's mainly the transformation of the face on the left of the layer.
- Returns the converted new array
fileprivate func transformedFaces(faceObjs: [AVMetadataObject]) -> [AVMetadataObject] {
var faceArr = [AVMetadataObject]()
for face in faceObjs {
//Converting scanned face objects to face objects in the preview layer(Mainly the transformation of coordinates.)
if let transFace = previewLayer.transformedMetadataObject(for: face){
faceArr.append(transFace)
}
}
return faceArr
}
5.2 Add red box according to face position
- Set the frame of the red box
faceLayer?.frame = face.bounds
- Acquisition of CATransform 3D from the angles of deflection and inclination
fileprivate func transformDegress(yawAngle: CGFloat) -> CATransform3D {
let yaw = degreesToRadians(degress: yawAngle)
//around Y Axis rotation
let yawTran = CATransform3DMakeRotation(yaw, 0, -1, 0)
//Red Box Rotation Problem
return CATransform3DConcat(yawTran, CATransform3DIdentity)
}
//Dealing with deflection angle
fileprivate func transformDegress(rollAngle: CGFloat) -> CATransform3D {
let roll = degreesToRadians(degress: rollAngle)
//around Z Axis rotation
return CATransform3DMakeRotation(roll, 0, 0, 1)
}
//Angle conversion
fileprivate func degreesToRadians(degress: CGFloat) -> CGFloat{
return degress * CGFloat(Double.pi) / 180
}
- Rotate the red box according to the deflection angle and inclination angle.
//3.4 Set deflection angle (shake head left and right)
if face.hasYawAngle{
let tranform3D = transformDegress(yawAngle: face.yawAngle)
//Matrix processing
faceLayer?.transform = CATransform3DConcat(faceLayer!.transform, tranform3D)
}
//3.5 Set tilt angle and tilt angle (left and right tilt head)
if face.hasRollAngle{
let tranform3D = transformDegress(rollAngle: face.rollAngle)
//Matrix processing
faceLayer?.transform = CATransform3DConcat(faceLayer!.transform, tranform3D)
}
- So far, dynamic face recognition has been completed, and red boxes will be added to the face position, and the red boxes will be dynamically and real-time adjusted according to the face position.
- Now pick up your camera and test it.
GitHub–Demo address
- Be careful:
- Here is just a list of the main core code, specific code logic refer to demo
- If there are some places in this article that are not very detailed or have better suggestions, you are welcome to contact Xiaobian.