Redis source code learning - 1.8 - ordered collection

Keywords: Redis

Source location: intset.h and intset.c

8.1 data structure

/* Note that these encodings are ordered, so:
 * INTSET_ENC_INT16 < INTSET_ENC_INT32 < INTSET_ENC_INT64. */
#define INTSET_ENC_INT16 (sizeof(int16_t))
#define INTSET_ENC_INT32 (sizeof(int32_t))
#define INTSET_ENC_INT64 (sizeof(int64_t))

// All the following data are stored in memory using the small end
typedef struct intset {
    uint32_t encoding;          // For coding method, see the above three types of INTSET_ENC_INT**
    uint32_t length;            // length
    int8_t contents[];          // Data content
} intset;

8.2 common functions

// Internal function
static uint8_t intsetSearch(intset *is, int64_t value, uint32_t *pos);     // Binary search value location
// External API functions
intset *intsetNew(void);                                                   // Initialize intset*
intset *intsetAdd(intset *is, int64_t value, uint8_t *success);            // Insert value
intset *intsetRemove(intset *is, int64_t value, int *success);             // Delete value
uint8_t intsetFind(intset *is, int64_t value);                             // Find value and call intsetSearch
int64_t intsetRandom(intset *is);                                          // Call rand() to get the elements at random
uint8_t intsetGet(intset *is, uint32_t pos, int64_t *value);               // Gets the element of the post location
uint32_t intsetLen(const intset *is);                                      // Length of integer set
size_t intsetBlobLen(intset *is);                                          // Gets the byte length of the collection
int intsetValidateIntegrity(const unsigned char *is, size_t size, int deep); // Check whether the integer set is legal
// deep = 0: only encoding, length
// deep = 1: check that encoding, length and content array are in order and that content array elements are not repeated

8.3 bottom layer implementation

8.3.1: insertion

/* Insert an integer in the intset */
// [OUT]success: 1 succeeded, 0 failed
intset *intsetAdd(intset *is, int64_t value, uint8_t *success) {
    uint8_t valenc = _intsetValueEncoding(value);   // Get data encoding type:
    uint32_t pos;
    if (success) *success = 1;

    // Is - > encoding calls intrev32ifbe to convert to small end data when saving
    // When you read it, you should call intrev32ifbe again to convert it back
    if (valenc > intrev32ifbe(is->encoding)) {
        // Insert value range > current encoding value range, upgrade all data one by one (upgrade data type), algorithm complexity: O(N)
        return intsetUpgradeAndAdd(is,value);
    } else {
        // Binary search value location
        if (intsetSearch(is,value,&pos)) {
            // Data already exists, failed to insert new value
            if (success) *success = 0;
            return is;
        }

        // New array length, + 1 indicates a new element
        is = intsetResize(is,intrev32ifbe(is->length)+1); 
        // pos and its data move back to one location, and intsetMoveTail calls memmove function. The algorithm complexity is O(1).
        if (pos < intrev32ifbe(is->length)) intsetMoveTail(is,pos,pos+1);
    }
    
    // Write value
    _intsetSet(is,pos,value);
    // Update length
    is->length = intrev32ifbe(intrev32ifbe(is->length)+1);
    return is;
}

Upgrade and downgrade:

  • The benefits of upgrading are obvious. When the data value is relatively small, it can save memory space and be compatible with large data values
  • Demote: once an ordered integer set is upgraded, demote is not supported

Time complexity:

  • If the new element is data that cannot be saved by the original data type (overflow), update all data one by one, with complexity O(N)
  • If the original data type of the new element can be saved, you only need to call memmove once to move the memory data. The complexity is O(1)

8.3.2: delete

/* Delete integer from intset */
// [OUT]success: 1 succeeded, 0 failed
intset *intsetRemove(intset *is, int64_t value, int *success) {
    uint8_t valenc = _intsetValueEncoding(value);
    uint32_t pos;
    if (success) *success = 0;

    // Value is within the allowable range. Binary search the position of value
    if (valenc <= intrev32ifbe(is->encoding) && intsetSearch(is,value,&pos)) {
        uint32_t len = intrev32ifbe(is->length);

        if (success) *success = 1;

        // After pos, the data moves forward one position, and the memmove function is called in intsetMoveTail. The algorithm complexity is O(1).
        if (pos < (len-1)) intsetMoveTail(is,pos+1,pos);
        // An element is released, and the memory size and array length need to be adjusted
        is = intsetResize(is,len-1);
        is->length = intrev32ifbe(len-1);
    }
    return is;
}

Time complexity:

  • Binary search element does not exist: direct return failed, complexity: O(logN)
  • The binary search element exists. You need to call the memmove function again to move the memory data. Complexity: O(logN)

8.4 some questions

Why use small end data storage in ordered integer sets?

In RDB persistence mode, when intset and ziplist are written into RDB files, they will directly write the data in the corresponding memory into the files without any conversion. Because intset and ziplist are stored in continuous memory blocks, no additional processing is required. The data in structures such as list dict skiplist are associated with pointers and cannot be written directly to the file. Additional conversion operations are required, specifically converting the data into a string and writing it to the file. In this way, if we load RDB files on other machines to recover data, we don't need to consider the size side of list dict skiplist data, because their data is stored in string, while intset and ziplist are stored in binary of the whole structure, so we need to consider the size side.

Author: iEternity
Link: https://www.zhihu.com/question/65629444/answer/693414175
Source: Zhihu
The copyright belongs to the author. For commercial reprint, please contact the author for authorization, and for non-commercial reprint, please indicate the source.

Posted by remmargorp on Fri, 19 Nov 2021 17:58:03 -0800