Hongsoft face recognition 3.0 - Introduction to image data structure (Android)

Keywords: Mobile SDK Android Windows Database

Since the 2.0 SDK was opened by hongsoft, our company has used the hongsoft SDK in the face recognition access control application due to the characteristics of free and offline use. The recognition effect is good, so we pay more attention to the official dynamics of the hongsoft SDK. Recently, ArcFace 3.0 SDK version was launched, which really made a big update. First of all, this article introduces the update content of Android platform algorithm, and the next one will introduce the algorithm update of Windows platform.

  • Feature comparison supports comparison model selection, including life photo comparison model and witness comparison model
  • Recognition rate and anti attack effect are significantly improved
  • Eigenvalues are updated, and face database needs to be re registered after upgrading
  • New 64 bit SDK for Android platform
  • Image processing tools
  • Face detection supports both full angle and single angle
  • A new way of image data input is added

It is difficult to use the new image data structure in the actual development process. This paper will introduce the image data structure and its use in detail from the following points

  1. SDK interface changes

  2. Analysis of ArcSoftImageInfo class

  3. SDK related code analysis

  4. The function of step length

  5. Convert the Image returned by Camera2 to ArcSoftImageInfo

I. SDK interface changes

When accessing the SDK version 3.0, it is found that the functions of detectFaces, process, extractFaceFeature and other incoming image data in the FaceEngine class all have overload functions, and the interfaces of overload functions all use ArcSoftImageInfo object as the input image data. Take face detection as an example, the specific interfaces are as follows:

Original interface:

public int detectFaces(byte[] data, int width, int height, int format, List<FaceInfo> faceInfoList)

New interface:

public int detectFaces(ArcSoftImageInfo arcSoftImageInfo, List<FaceInfo> faceInfoList)

It can be seen that the overloaded function passes in arcsoftimageinfo object as image data for detection, and arcsoftimageinfo replaces the original data, width, height, format.

II. Analysis of ArcSoftImageInfo class

After I used it in practice, I found that ArcSoftImageInfo is not only a simple encapsulation, but also a one-dimensional array data modified to a two-dimensional array planes, and a step array strings corresponding to planes was added.

Introduction to the concept of step length: The step size can be understood as the number of bytes in a row of pixels.

The class structure is as follows:

public class ArcSoftImageInfo {
    private int width;
    private int height;
    private int imageFormat;
    private byte[][] planes;
    private int[] strides;
    ...
}

Introduction to this category in official documents:

  • Member description
type Variable name describe
int width Image width
int height Image height
int imageFormat Image format
byte[][] planes Image channel
int[] strides Step size of each image channel
  • Introduction to composition
// For example, arcSoftImageInfo consists of:

// NV21 format data, with two channels,
// The step size of Y channel is generally the image width. If the image is aligned by 8 bytes, 16 bytes, etc., the step size of aligned image shall be filled in
// The step size of VU channel is generally the image width. If the image is aligned by 8 bytes, 16 bytes, etc., the image step size after alignment shall be filled in
ArcSoftImageInfo arcSoftImageInfo = new ArcSoftImageInfo(width, height, FaceEngine.CP_PAF_NV21, new byte[][]{planeY, planeVU}, new int[]{yStride, vuStride});

// GRAY, there's only one channel,
// The step size is generally the image width. If the image is aligned by 8 bytes, 16 bytes, etc., you need to fill in the image step size after alignment
arcSoftImageInfo = new ArcSoftImageInfo(width, height, FaceEngine.CP_PAF_GRAY, new byte[][]{gray}, new int[]{grayStride});

// BGR24, only one channel,
// Generally, the step size is three times the width of the image. If the image is aligned by 8 bytes, 16 bytes, etc., the step size after alignment should be filled in
arcSoftImageInfo = new ArcSoftImageInfo(width, height, FaceEngine.CP_PAF_BGR24, new byte[][]{bgr24}, new int[]{bgr24Stride});

// Depth? U16, only one channel,
// Generally, the step size is twice the width of the image. If the image is aligned by 8 bytes, 16 bytes, etc., the step size after alignment should be filled in
arcSoftImageInfo = new ArcSoftImageInfo(width, height, FaceEngine.CP_PAF_DEPTH_U16, new byte[][]{depthU16}, new int[]{depthU16Stride});

As you can see, ArcSoftImageInfo is used to store the separated image data. For example, NV21 data has two channels, and the two-dimensional array planes stores two arrays: y array and vu array. Here is how NV21 data is arranged:

NV21 image format belongs to YUV In the YUV420SP format in the color space, every four Y components share a set of U and V components, Y is continuously stored, and U and V are cross stored.

The arrangement is as follows (take the image of 8x4 as an example):

Y Y   Y Y   Y Y   Y Y
Y Y   Y Y   Y Y   Y Y
Y Y   Y Y   Y Y   Y Y
Y Y   Y Y   Y Y   Y Y
V U   V U   V U  V U
V U   V U   V U  V U

The above data is divided into two channels, first continuous Y data, then cross stored V and U data. If we use the Camera API, we can't use the ArcSoftImageInfo class, because the NV21 data returned by the Camera API is continuous, and we can use the old version interface directly; while when we use other APIs, we can get discontinuous data, such as android.media.Image objects obtained by Camera2 API and MediaCodec, whose image data is also divided into channels , we can get Y-channel data and VU channel data according to their channel contents, and form ArcSoftImageInfo object in NV21 format for processing.

III. SDK related code analysis

Let's look at the verification code in the SDK to determine whether the image data is legal:

Note: since the original code has been modified by the compiler, the reading experience is poor. The following code is modified by me. It is more convenient to replace the constant value with the constant name.

  • Verifying separated image information data

        private static boolean isImageDataValid(byte[] data, int width, int height, int format) {
            return 
            (format == CP_PAF_NV21 && (height & 1) == 0 && data.length == width * height * 3 / 2)|| 
            (format == CP_PAF_BGR24 && data.length == width * height * 3)|| 
            (format == CP_PAF_GRAY && data.length == width * height) ||
            (format == CP_PAF_DEPTH_U16 && data.length == width * height * 2);
        }
    

    Interpretation: The requirements for each image data are as follows:

    1. NV21 format image data height is even, data size is: W x H x 3 / 2
    2. The size of BGR24 format image data is: W x H x 3
    3. The size of GRAY format image data is: width x height
    4. The size of image data in depth format is: W x H x 2
  • Verifying ArcSoftImageInfo objects

        private static boolean isImageDataValid(ArcSoftImageInfo arcSoftImageInfo) {
            byte[][] planes = arcSoftImageInfo.getPlanes();
            int[] strides = arcSoftImageInfo.getStrides();
            if (planes != null && strides != null) {
                if (planes.length != strides.length) {
                    return false;
                } else {
                    byte[][] var3 = planes;
                    int var4 = planes.length;
    
                    for(int var5 = 0; var5 < var4; ++var5) {
                        byte[] plane = var3[var5];
                        if (plane == null || plane.length == 0) {
                            return false;
                        }
                    }
    
                    switch(arcSoftImageInfo.getImageFormat()) {
                    case CP_PAF_BGR24:
                    case CP_PAF_GRAY:
                    case CP_PAF_DEPTH_U16:
                        return planes.length == 1 && planes[0].length == arcSoftImageInfo.getStrides()[0] * arcSoftImageInfo.getHeight();
                    case CP_PAF_NV21:
                        return (arcSoftImageInfo.getHeight() & 1) == 0 && planes.length == 2 && planes[0].length == planes[1].length * 2 && planes[0].length == arcSoftImageInfo.getStrides()[0] * arcSoftImageInfo.getHeight() && planes[1].length == arcSoftImageInfo.getStrides()[1] * arcSoftImageInfo.getHeight() / 2;
                    default:
                        return false;
                    }
                }
            } else {
                return false;
            }
        }
    

    Interpretation:

    1. The data size of each channel is: Height x step size of each channel
    2. There is only one channel for BGR24, GRAY and depth ﹣ U16 format image data, but the above example composition description mentioned that their steps are different, and the relationship is as follows:
      • BGR24 format image data step size is generally 3 x width
      • GRAY format image data step size is generally width
      • The step size of image data in depth ﹣ U16 format is generally 2 x width
    3. NV21 format image data height is even, there are two channels, and the data size of the 0 channel is twice the data size of the 1 channel.

IV. function of step length

  • Examples of specific steps

    As shown in the figure below, this is the data obtained when previewing at 1520x760 resolution is specified when Camera2 API is used on a mobile phone. Although the specified resolution is 1520x760, the actual size of preview data is 1536x760. After analyzing the saved image data, it is found that the 16 pixel content filled on the right side is 0. At this time, if we take out this group of YUV data with the resolution of 1520x760 and convert it to NV21, and the incoming width during face detection is 1520, the SDK will not be able to detect the face; if we use 1536x760 to detect the face The generated NV21 is transmitted to the SDK, and when the width of the incoming NV21 is 1536, the SDK can detect the face.

  • The importance of step size Just by subtracting these pixels, why can't the face be detected? As mentioned before, the step size can be understood as the number of bytes in a row of pixels. If there is a deviation in the reading of the first row of pixels, the reading of subsequent pixels will also be affected. <br>

The following is the result of analyzing the NV21 image data with an asynchronous length of 1000x554:

Analysis with correct step size Resolve with wrong step size

We can see that for an image, if we use the wrong step size to parse, we may not be able to see the correct image content.

Conclusion: the problem of high byte alignment can be effectively avoided by introducing image step size.

V. convert the returned Image of Camera2 to ArcSoftImageInfo

  • Camera2 API returned data processing
    For the above scenarios, we can extract the Y, U and V channel data of android.media.Image object to form the NV21 format ArcSoftImageInfo object, which is passed into the SDK for processing. The sample code is as follows:

    • Retrieve the Y, U, V channel data of Camera2 API returned data
                 private class OnImageAvailableListenerImpl implements ImageReader.OnImageAvailableListener{
                     private byte[] y;
                     private byte[] u;
                     private byte[] v;
      
                     @Override
                     public void onImageAvailable(ImageReader reader) {
                         Image image = reader.acquireNextImage();
                         // The actual result is generally Y:U:V == 4:2:2
                         if (camera2Listener != null && image.getFormat() == ImageFormat.YUV_420_888) {
                             Image.Plane[] planes = image.getPlanes();
                             // Reuse the same batch of byte arrays to reduce gc frequency
                             if (y == null) {
                                 y = new byte[planes[0].getBuffer().limit() - planes[0].getBuffer().position()];
                                 u = new byte[planes[1].getBuffer().limit() - planes[1].getBuffer().position()];
                                 v = new byte[planes[2].getBuffer().limit() - planes[2].getBuffer().position()];
                             }
                             if (image.getPlanes()[0].getBuffer().remaining() == y.length) {
                                 planes[0].getBuffer().get(y);
                                 planes[1].getBuffer().get(u);
                                 planes[2].getBuffer().get(v);
                                 camera2Listener.onPreview(y, u, v, mPreviewSize, planes[0].getRowStride());
                             }
                         }
                         image.close();
                     }
                 }
      
    • Convert to ArcSoftImageInfo object

    Note: the YUV data you get may be YUV422 or YUV420. You need to implement the function of converting the two to the ArcSoftImageInfo object in NV21 format.

           @Override
           public void onPreview(final byte[] y, final byte[] u, final byte[] v, final Size previewSize, final int stride) {
               if (arcSoftImageInfo == null) {
                   arcSoftImageInfo = new ArcSoftImageInfo(previewSize.getWidth(), previewSize.getHeight(), FaceEngine.CP_PAF_NV21);
               }
               // The returned data is YUV422
               if (y.length / u.length == 2) {
                   ImageUtil.yuv422ToNv21ImageInfo(y, u, v, arcSoftImageInfo, stride, previewSize.getHeight());
               }
               // The returned data is YUV420
               else if (y.length / u.length == 4) {
                   ImageUtil.yuv420ToNv21ImageInfo(y, u, v, arcSoftImageInfo, stride, previewSize.getHeight());
               }
               // At this time, arcSoftImageInfo data can be transferred to SDK for use
               if (faceEngine != null) {
                   List<FaceInfo> faceInfoList = new ArrayList<>();
                   int code = faceEngine.detectFaces(arcSoftImageInfo, faceInfoList);
                   if (code == ErrorInfo.MOK) {
                       Log.i(TAG, "onPreview: " + code + "  " + faceInfoList.size());
                   } else {
                       Log.i(TAG, "onPreview: no face detected , code is : " + code);
                   }
               } else {
                   Log.e(TAG, "onPreview: faceEngine is null");
                   return;
               }
               ...
           }
    

In the above code, the data returned by Camera2 API is converted into ArcSoftImageInfo object and detected. The following is the specific implementation of making Y, U and V data into ArcSoftImageInfo object.

  • Make Y, U, V data into ArcSoftImageInfo object

For Y channel, you can copy directly. For U channel and V channel, you need to consider whether the format of this group of YUV data is YUV420 or YUV422, and then obtain the U and V data

         /**
          * YUV420 ArcSoftImageInfo in NV21 format
          *
          * @param y                YUV420 y component of data
          * @param u                YUV420 u component of data
          * @param v                YUV420 v component of data
          * @param arcSoftImageInfo NV21 ArcSoftImageInfo in format
          * @param stride           y In general, because of the corresponding relationship of YUV data, Y component step size is determined, and U and V are also determined
          * @param height           Image height
          */
         public static void yuv420ToNv21ImageInfo(byte[] y, byte[] u, byte[] v, ArcSoftImageInfo arcSoftImageInfo, int stride, int height) {
             if (arcSoftImageInfo.getPlanes() == null) {
                 arcSoftImageInfo.setPlanes(new byte[][]{new byte[stride * height], new byte[stride * height / 2]});
                 arcSoftImageInfo.setStrides(new int[]{stride, stride});
             }
             System.arraycopy(y, 0, arcSoftImageInfo.getPlanes()[0], 0, y.length);
             // Note that the vuLength cannot be calculated directly through the step length and height. It is found that the data returned by Camera2 API is missing and the real data length needs to be used
             byte[] vu = arcSoftImageInfo.getPlanes()[1];
             int vuLength = u.length / 2 + v.length / 2;
             int uIndex = 0, vIndex = 0;
             for (int i = 0; i < vuLength; i++) {
                 vu[i] = v[vIndex++];
                 vu[i + 1] = u[uIndex++];
             }
         }
         /**
          * YUV422 ArcSoftImageInfo in NV21 format
          *
          * @param y                YUV422 y component of data
          * @param u                YUV422 u component of data
          * @param v                YUV422 v component of data
          * @param arcSoftImageInfo NV21 ArcSoftImageInfo in format
          * @param stride           y In general, because of the corresponding relationship of YUV data, Y component step size is determined, and U and V are also determined
          * @param height           Image height
          */
         public static void yuv422ToNv21ImageInfo(byte[] y, byte[] u, byte[] v, ArcSoftImageInfo arcSoftImageInfo, int stride, int height) {
             if (arcSoftImageInfo.getPlanes() == null) {
                 arcSoftImageInfo.setPlanes(new byte[][]{new byte[stride * height], new byte[stride * height / 2]});
                 arcSoftImageInfo.setStrides(new int[]{stride, stride});
             }
             System.arraycopy(y, 0, arcSoftImageInfo.getPlanes()[0], 0, y.length);
             byte[] vu = arcSoftImageInfo.getPlanes()[1];
             // Note that the vuLength cannot be calculated directly through the step length and height. It is found that the data returned by Camera2 API is missing and the real data length needs to be used
             int vuLength = u.length / 2 + v.length / 2;
             int uIndex = 0, vIndex = 0;
             for (int i = 0; i < vuLength; i += 2) {
                 vu[i] = v[vIndex];
                 vu[i + 1] = u[uIndex];
                 vIndex += 2;
                 uIndex += 2;
             }
         }

Vi. advantages of ArcSoftImageInfo

  1. When the acquired image data source is the data of different channels, using ArcSoftImageInfo object to pass in the separated image data can avoid the extra memory consumption required by data splicing.
  2. The concept of step length is introduced, and the step length of each channel is introduced when using, so that developers can understand image data more clearly when using SDK.

Android Demo is available in Hongsoft face recognition open platform download

Posted by vishi83 on Sun, 08 Dec 2019 19:58:56 -0800