Interpretation of Low Memory Utilization and Weak Types of PHP Arrays

Keywords: PHP C Attribute Programming

These two days task completed ahead of schedule, you can take a breath to settle down, in-depth study of PHP. Actually, I wanted to know something about PHP performance optimization, but I was shocked by a sentence on the Internet: "PHP array memory utilization is low, C language 100MB memory array, PHP needs 1G". Is PHP really so memory-intensive? So I took this opportunity to understand how PHP implements data types.
Let's start with a test:

<?php  
    echo memory_get_usage() , '<br>';  
    $start = memory_get_usage();  
    $a = Array();  
    for ($i=0; $i<1000; $i++) {  
      $a[$i] = $i + $i;  
    }  
    $end =  memory_get_usage();  
    echo memory_get_usage() , '<br>';  
    echo 'argv:', ($end - $start)/1000 ,'bytes' , '<br>';  

The results are as follows:

    353352
    437848
    argv:84.416bytes

The integer array of 1000 elements consumes memory (437848 - 353352) bytes, about 82KB, that is, 84 bytes of memory per element. In C language, an int takes up 4 bytes, which is 20 times different as a whole.
However, it is also said on the Internet that the results returned by memery_get_usage() are not all occupied by arrays, but also include some structures of PHP itself. Therefore, in another way, try to use PHP built-in functions to generate arrays:

<?php  
    $start = memory_get_usage();  
    $a = array_fill(0, 10000, 1);  
    $end = memory_get_usage(); //10k elements array;  
    echo 'argv:', ($end - $start )/10000,'byte' , '<br>';  

The output is:

  argv:54.5792byte

It's slightly better than before, but it's 54 bytes, which is about 10 times worse.
The reason is also from the bottom implementation of PHP. PHP is a weak type of language, regardless of int, double, string and so on, a unified'$'can solve all the problems. The bottom of PHP is implemented in C language. Each variable corresponds to a zval structure, which is defined in detail as:

typedef struct _zval_struct zval;  
struct _zval_struct {  
    /* Variable information */  
    zvalue_value value;     /* The value 1 12 Bytes (32 bits are 12, 64 bits require 8 + 4 + 4 = 16) */  
    zend_uint refcount__gc; /* The number of references to this value (for GC) 4 byte */  
    zend_uchar type;        /* The active type 1 byte*/  
    zend_uchar is_ref__gc;  /* Whether this value is a reference (&) 1 byte*/  
}; 

PHP uses the union structure to store the value of variables. The value variable of type zvalue_value in zval is a union, which is defined as follows:

typedef union _zvalue_value {  
    long lval;                  /* long value */  
    double dval;                /* double value */  
    struct {                    /* string value */  
        char *val;  
        int len;  
    } str;   
    HashTable *ht;              /* hash table value */  
    zend_object_value obj;      /*object value */  
} zvalue_value;  

The size of memory occupied by the union type depends on the data space occupied by its largest members. In zvalue_value, the int of str structure is 4 bytes and the char pointer is 4 bytes, so the memory of zvalue_value is 8 bytes.
The size of zval is 8 + 4 + 1 + 1 = 14 bytes.
Notice that there is another HashTable in zvalue_value for what? In zval, arrays, strings and objects also need additional storage structure, the storage structure of arrays is HashTable.
HashTable defines:

typedef struct _hashtable {  
     uint nTableSize; //Table length, not number of elements  
     uint nTableMask;//The mask of the table is always equal to nTableSize-1  
     uint nNumOfElements;//Number of elements stored  
     ulong nNextFreeElement;//Point to the next empty element position  
     Bucket *pInternalPointer;//When foreach loops, they are used to record the location of elements currently traversed  
     Bucket *pListHead;  
     Bucket *pListTail;  
     Bucket **arBuckets;//Stored array of elements  
     dtor_func_t pDestructor;//Destructive function  
     zend_bool persistent;//Is it persistent? From this we can see that PHP arrays can be persisted in memory without reloading every request.  
     unsigned char nApplyCount;  
     zend_bool bApplyProtection;  
} HashTable; 

In addition to several attribute variables that record table size and number of elements contained, Bucket has been used many times. How Bucket is defined?

typedef struct bucket {  
     ulong h; //Array Index  
     uint nKeyLength; //Length of string index  
     void *pData; //Storage address of actual data  
     void *pDataPtr; //Introduced Data Storage Address  
     struct bucket *pListNext;  
     struct bucket *pListLast;  
     struct bucket *pNext; //Address of the next element in a bi-directional list  
     struct bucket *pLast;//Next element address of bi-directional list  
     char arKey[1]; /* Must be last element */  
} Bucket; 

It's a bit like a linked list. Bucket is like a linked list node, with specific data and pointers. HashTable is an array that holds a bunch of Bucket elements. The implementation of multidimensional arrays in PHP is just another HashTable in Bucket.
It takes 39 bytes to calculate HashTable and 33 bytes to calculate Bucket. An empty array takes 14 + 39 + 33 = 86 bytes. The Bucket structure requires 33 bytes. The key length of more than four bytes is appended to the Bucket. The element value is probably a zval structure. In addition, each array is assigned an array of Bucket pointers pointed to by arBuckets. Although it cannot be said that every additional element requires a pointer, the actual situation may be worse. In this way, an array element takes up 54 bytes, almost the same as the above estimate.
From the space point of view, the average cost of small arrays is higher, of course, a script will not be filled with a large number of small arrays, you can get programming speed at a small space cost. But if we use arrays as containers, it will be another scene. In practical application, we often encounter multi-dimensional arrays, and most of the elements. For example, one-dimensional arrays of 10K elements consume about 540k memory, while two-dimensional arrays of 10k x 10 theoretically only need about 6M of space, but according to the result of memory_get_usage, the three-dimensional arrays of [10k,5,2] consume 23M. Small arrays are really not worth it.
The reason for the low memory utilization of PHP arrays is discussed here. The next article will explain the implementation of the operation of PHP arrays.

Posted by BoukeBuffel on Thu, 06 Jun 2019 12:42:49 -0700