Why? Can browsers do face detection?

Keywords: Javascript OpenCV JQuery socket xml

The Shape Detection API has been released for some time, and its main capability is to give the front-end directly available feature detection interface (including barcode, face, text detection). In this paper, we will briefly introduce it and explain the universality of the front-end face detection. (In this paper, we will not talk about arithmetic tapping)

1 Background and Scene

Face Detection is an old topic, which is widely used in many industries, such as finance, security, e-commerce, smart phones, entertainment pictures and so on. The technologies involved are also evolving. Here are some ideas:

  1. Feature-based Face Detection

For example, a Harr classifier based on Viola-Jones target detection framework is built in opencv, which can directly call detectObject to complete the detection process by loading only a configuration file (haarcascade_frontalface_alt.xml), while supporting other feature detection (such as nose, mouth, etc.).

  1. Face detection based on learning is also a classifier with more accurate and fast response, which needs to extract local features from the image by operator, through classification, statistics, regression and so on.

2 sets of highlights

2.1 Backend Processing

The front-end transmits resources to the back-end through the network, and the back-end processes the images or video streams that need to be detected in a unified way, which challenges the architecture of the back-end. At the same time, the delay of the network often fails to bring real-time interactive effects to users.

2.2 Client Processing

Thanks to OpenCV's cross-language and cross-platform advantages, clients can also provide face detection capabilities at lower development costs, and can provide services to web containers through JsBridge, etc. However, once separated from this container, isolated pages will lose this capability. Until one day...

2.3 Open Services

I don't know when, cloud computing and other concepts have sprung up, the cost of computing is decreasing. Major R&D teams (such as Aliyun, Face++) are all passionate and slow to launch face detection services, even with a variety of features! Special! Take it! Business! Face recognition, living body recognition, OCR and face contrast, etc.

Although it not only provides the SDK of client side and API of front and back end, but also tell me about my pure front-end solution.

What brought about by the 3rd Age

Well, face recognition is still at the front end of the ancient era of slash-and-burn cultivation. However, our infrastructure has already started. I hope that the follow-up introduction will bring some inspiration to your officials.

3.1 Shape Detection API

With the increasing computing power of client hardware, browser level has more and more permissions. Because image processing requires a lot of computing resources, browser can actually undertake some work of image detection, so a Shape Detection API has been developed.

The following simple examples illustrate the basic usage. Before trying to edit and run the code, make sure that your Chrome version and the new feature are activated, and that the API is restricted by the Same-Origin policy:

chrome://flags/#enable-experimental-web-platform-features

  • Barcode Detection (For Chrome 56+)

var barcodeDetector = new BarcodeDetector();
barcodeDetector.detect(image)
  .then(barcodes => {
    barcodes.forEach(barcode => console.log(barcodes.rawValue))
  })
  .catch(err => console.error(err));
  • Face: Face Detection (For Chrome 56+)

var faceDetector = new FaceDetector();
faceDetector.detect(image)
  .then(faces => faces.forEach(face => console.log(face)))
  .catch(err => console.error(err));
  • Text: Text Detection (For Chrome 58+)

var textDetector = new TextDetector();
textDetector.detect(image)
  .then(boundingBoxes => {
    for(let box of boundingBoxes) {
      speechSynthesis.speak(new SpeechSynthesisUtterance(box.rawValue));
    }
  })
  .catch(err => console.error(err));

3.2 Face Detection in Images

Face detection in images is relatively simple. It only needs to pass in an element of a picture, and the API can be directly invoked for face recognition. Then we can show the results of the test by catching canvas.

The core code is as follows:

var image = document.querySelector('#image');
var canvas = document.querySelector('#canvas');

var ctx = canvas.getContext("2d");
var scale = 1;

image.onload = function () {
  ctx.drawImage(image,
    0, 0, image.width, image.height,
    0, 0, canvas.width, canvas.height);

  scale = canvas.width / image.width;
};
function detect() {
  if (window.FaceDetector == undefined) {
    console.error('Face Detection not supported');
    return;
  }

  var faceDetector = new FaceDetector();
  console.time('detect');
  return faceDetector.detect(image)
    .then(faces => {
      console.log(faces)
      // Draw the faces on the <canvas>.
      var ctx = canvas.getContext("2d");
      ctx.lineWidth = 2;
      ctx.strokeStyle = "red";
      for (var i = 0; i < faces.length; i++) {
        var item = faces[i].boundingBox;
        ctx.rect(Math.floor(item.x * scale),
          Math.floor(item.y * scale),
          Math.floor(item.width * scale),
          Math.floor(item.height * scale));
        ctx.stroke();
      }
      console.timeEnd('detect');
    })
    .catch((e) => {
      console.error("Boo, Face Detection failed: " + e);
    });
}

Face Detection in 3.3 Video

Face detection in video is not much different from that in image. Through getUserMedia, the camera can be opened to get the information of video/microphone. By detecting and displaying the video frame, the face detection in video can be realized.

The core code is as follows:

navigator.mediaDevices.getUserMedia({
    video: true,
    // audio: true
  })
    .then(function (mediaStream) {
      video.src = window.URL.createObjectURL(mediaStream);
      video.onloadedmetadata = function (e) {
        // Do something with the video here.
      };
    })
    .catch(function (error) {
      console.log(error.name);
    });

  setInterval(function () {
    ctx.clearRect(0, 0, canvas.width, canvas.height);
    ctx.drawImage(video, 0, 0);
    image.src = canvas.toDataURL('image/png');
    image.onload = function() {
      detect();
    }
  }, 60);

3.4 Back to API-free days

In fact, a long time ago, there were many solutions. Because of the hardware conditions and the lack of hardware acceleration restrictions, it has not been widely put into production.

  1. tracking.js

tracking.js is a js-encapsulated image processing library, which brings rich algorithms and technologies related to computational vision to browsers. Through it, color tracking, face detection and other functions can be realized. The specific characteristics are as follows:

  1. jquery.facedetection

jquery.facedetection is a jquery / zepto facial detection plug-in, based on strong cross-terminal capabilities ccv Image classifier and detector.

2.5 Node.js & OpenCv

node-opencv The module has been released for some years. Although it is not perfectly compatible with v3.x and the API provided is limited, it is perfectly compatible with opencv v2.4.x. The arrival of N-API may bring more surprises.

Imagine that in an Electron or Node-Webkit container, can we enable real-time face detection by opening websocket services locally? The idea code is as follows:

  • Backend processing logic

import cv from 'opencv';

const detectConfigFile = './node_modules/opencv/data/haarcascade_frontalface_alt2.xml';

// camera properties
const camWidth = 320;
const camHeight = 240;
const camFps = 10;
const camInterval = 1000 / camFps;

// face detection properties
const rectColor = [0, 255, 0];
const rectThickness = 2;

// initialize camera
const camera = new cv.VideoCapture(0);

camera.setWidth(camWidth);
camera.setHeight(camHeight);

const frameHandler = (err, im) => {
  return new Promise((resolve, reject) => {
    if (err) {
      return reject(err);
    }
    im.detectObject(detectConfigFile, {}, (error, faces) => {
      if (error) {
        return reject(error);
      }
      let face;
      for (let i = 0; i < faces.length; i++) {
        face = faces[i];
        im.rectangle([face.x, face.y], [face.width, face.height], rectColor, rectThickness);
      }
      return resolve(im);
    });
  });
};

module.exports = function (socket) {
  const frameSocketHanlder = (err, im) => {
    return frameHandler(err, im)
      .then((img) => {
        socket.emit('frame', {
          buffer: img.toBuffer(),
        });
      });
  };
  const handler = () => {
    camera.read(frameSocketHanlder);
  };
  setInterval(handler, camInterval);
};
  • Front-end call interface

socket.on('frame', function (data) {
  var unit8Arr = new Uint8Array(data.buffer);
  var str = String.fromCharCode.apply(null, unit8Arr);
  var base64String = btoa(str);

  img.onload = function () {
    ctx.drawImage(this, 0, 0, canvas.width, canvas.height);
  }
  img.src = 'data:image/png;base64,' + base64String;
});

4 Summary

4.1 Future Development

There is no doubt that these cutting-edge technologies will be more widely used and supported in the front-end. The future image in the front-end will also advance along with the traditional image processing - > Learning + image processing. All these contributions can not be separated from the gradual enhancement and improvement of infrastructure (hardware, browser, tools, libraries, etc.), including but not limited to:

  • Operation of getUserMedia/Canvas => Image/Video

  • Shape Detection API => Image Detection

  • Web Workers => Parallel Computing Ability

  • ConvNetJS => Deep Learning Framework

4.2 is actually not that optimistic.

4.2.1 Accuracy

For the front face (multiple) recognition rate is relatively high, but the detection effect is not ideal when there are obstacles on the side face.

4.2.2 Processing Speed

For example 2.2 of face detection in images, it takes 300 ms + (in fact, it can not meet the real-time processing of large-resolution video), which is three times as fast as the detection speed of 100ms calling Opencv.

4.2.3 Characteristics

There are still many things to be improved: such as not supporting the services provided by mainstream service providers such as glasses status, gender, age estimation, facial expression recognition, race, smile, blur detection, etc.

4.3 I can't finish what I want to say.

Source code has not been sorted out for the time being, working saturated, will be shared later: https://github.com/x-cold
As for the adaptability of face detection in different scenarios and the time consumed by face detection, there is no data support for the time being. Later, we consider introducing samples provided by PASCAL VOC and AT&T for small-scale testing.
In team recruitment, both front-end and designer school recruitment are required: ais-ued-jobs@list.alibaba-inc.com

5 References

  1. Summary of Face Recognition Technology (1): Face Detection & Alignment: http://blog.jobbole.com/85783/

  2. Real-person authentication technology in Alibaba live broadcasting prevention and control: https://xianzhi.aliyun.com/fo...

  3. What can the front end do in the era of artificial intelligence? : https://yq.aliyun.com/article...

  4. ConvNetJS Deep Learning in your browser: http://cs.stanford.edu/people...

  5. Face detection using Shape Detection API: https://paul.kinlan.me/face-d...

Posted by BZorch on Wed, 05 Jun 2019 12:21:52 -0700