Redis source code learning integer set

Keywords: Database Redis

intset application

  • Redis will use memory mapping to replace the internal data structure under the condition of such a small amount of data. This makes two kinds of memory saving data structures, integer set and zip list, come into being
  • Integer set is one of the underlying implementations of set key. When a set contains only integer value elements and the number of elements in the set is small, Redis will use integer set as the underlying implementation of set key
  • The bottom layer of the integer set is implemented as an array. This array saves the set elements in an orderly and non repetitive way. When necessary, the program will change the type of this array according to the type of newly added elements
  • Integer sets only support upgrade operations, not downgrade operations

Data structure of intset

If a set satisfies the two conditions of saving only integer elements and a small number of elements, Redis will use intset to save the data set

typedef struct intset {
    uint32_t encoding; // Coding mode
    uint32_t length; // Number of elements contained in the collection
    int8_t contents[]; // Save an array of elements
} intset;

Among them, encoding Field represents the encoding mode of the integer set, Redis Provides three modes of macro definition
// As you can see, although the type indicated in the contents section is int8_t. But the data is not stored in this type
// Data in int16_t type storage, each accounting for 2 bytes, can store integers in the range of - 32768 ~ 32767
#define INTSET_ENC_INT16 (sizeof(int16_t)) 
// Data in int32_t type storage, each accounting for 4 bytes, can store integers in the range of - 2 ^ 32-1 ~ 2 ^ 32
#define INTSET_ENC_INT32 (sizeof(int32_t)) 
// Data in int64_t type storage, each accounting for 8 bytes, can store integers in the range of - 2 ^ 64-1 ~ 2 ^ 64
#define INTSET_ENC_INT64 (sizeof(int64_t)) 
  • The length field is used to hold the number of elements in the collection
  • The contents field is used to save integers. The elements in the array must not contain duplicate integers and be arranged in descending order. When reading and writing, they are read and written according to the specified encoding encoding mode

upgrade

  • When the integer added to the intset exceeds the current encoding type, the intset will upgrade to the encoding mode that can accommodate the integer type
  • Redis provides the intsetUpgradeAndAdd function to upgrade the integer set and add data
  • Return value: the collection of integers after adding a new element
  • Upgrade the encoding of the integer set according to the encoding method used by the value value, and add the value value to the upgraded integer set
static intset *intsetUpgradeAndAdd(intset *is, int64_t value) {   
    // Current encoding method
    uint8_t curenc = intrev32ifbe(is->encoding);
    // The encoding method required for the new value
    uint8_t newenc = _intsetValueEncoding(value);
    // Number of elements in the current collection
    int length = intrev32ifbe(is->length);
    // According to the value of value, decide whether to add it to the top or bottom of the underlying array
    // Note that the encoding of value is larger than that of other elements of the collection
    // Therefore, value is either greater than all elements in the set or less than all elements in the set
    // Therefore, value can only be added to the top or bottom of the underlying array
    int prepend = value < 0 ? 1 : 0;
    // Update the encoding of the collection
    is->encoding = intrev32ifbe(newenc);
    // Adjust the space of the set (the underlying array of) according to the new code
    is = intsetResize(is,intrev32ifbe(is->length)+1);
    // According to the original encoding method of the collection, the collection elements are taken from the underlying array
    // The element is then added to the collection in a new encoding
    // When this step is completed, all the original elements in the collection are converted from the old code to the new code
    // Because the newly allocated space is placed at the back end of the array, the program moves the elements from the back end to the front end first
    while(length--)         _intsetSet(is,length+prepend,_intsetGetEncoded(is,length,curenc));
    // Set a new value and decide whether to add to the array header or the array tail according to the value of prepend
    if (prepend)
        _intsetSet(is,0,value);
    else
        _intsetSet(is,intrev32ifbe(is->length),value);
    // Update the number of elements in the integer set
    is->length = intrev32ifbe(intrev32ifbe(is->length)+1);
    return is;
}

inset basic operation

  1. Create intset
    When Redis creates an intset set, int16 is used by default_ T coding format
intset *intsetNew(void) {
    intset *is = zmalloc(sizeof(intset));
    is->encoding = intrev32ifbe(INTSET_ENC_INT16);
    is->length = 0;
    return is;
}
  1. Add element
    intset needs to judge the size of new data when adding elements. If it exceeds the range that can be represented by the original encoding format, call the intsetUpgradeAndAdd function above to add it. If it does not exceed, it will be added directly to the specified location
/*
 Try adding the element value to the integer collection.
 success The value of indicates whether the addition was successful:
 If the addition is successful, set the value of * success to 1.
 When adding fails because the element already exists, set the value of * success to 0.
 T = O(N)
 */

intset *intsetAdd(intset *is, int64_t value, uint8_t *success) {
    // Calculate the length required for encoding value
    uint8_t valenc = _intsetValueEncoding(value);
    uint32_t pos;
    // The default setting is successful insertion
    if (success) *success = 1;
    // If the encoding of value is larger than the current encoding of the integer set
    // Then the representation value must be added to the set of integers
    // In addition, the integer set needs to be upgraded to meet the encoding requirements of value
    if (valenc > intrev32ifbe(is->encoding)) {
        /* This always succeeds, so we don't need to curry *success. */
        // T = O(N)
        return intsetUpgradeAndAdd(is,value);
    } else {
        // Run here to indicate that the existing encoding method of the integer set is applicable to value
        // Find value in the integer set to see if it exists:
        // -If it exists, set * success to 0 and return the unchanged set of integers
        // -If it does not exist, the position where value can be inserted will be saved in the pos pointer
        //   Waiting for subsequent programs
        if (intsetSearch(is,value,&pos)) {
            if (success) *success = 0;
            return is;
        }

        // Run here to indicate that value does not exist in the collection
        // The program needs to add value to the integer set
        // Allocate space in the collection for value
        is = intsetResize(is,intrev32ifbe(is->length)+1);
        // If the new element is not added to the end of the underlying array
        // Then, you need to move the data of the existing element to vacate the position on the pos for setting the new value
        if (pos < intrev32ifbe(is->length)) intsetMoveTail(is,pos,pos+1);
    }

    // Sets the new value to the specified location in the underlying array
    _intsetSet(is,pos,value);

    // A counter that increments the number of collection elements
    is->length = intrev32ifbe(intrev32ifbe(is->length)+1);
    // Returns a collection of integers after adding a new element
    return is;
}
/* 
 *Find the index of the value value in the underlying array of the collection is.
 * When value is found successfully, the function returns 1 and sets the value of * pos to the index where value is located
 * Returns 0 when no value is found in the array.
 * And set the value of * pos to value, which can be inserted into the position in the array.
 * T = O(log N)
 */
static uint8_t intsetSearch(intset *is, int64_t value, uint32_t *pos) {
    int min = 0, max = intrev32ifbe(is->length)-1, mid = -1;
    int64_t cur = -1;
    // Handle the case when is is empty
    if (intrev32ifbe(is->length) == 0) {
        if (pos) *pos = 0;
        return 0;
    } else {

        // Because the underlying array is ordered, if value is larger than the last value in the array
        // Then value must not exist in the set,
        // And you should add value to the end of the underlying array
        if (value > _intsetGet(is,intrev32ifbe(is->length)-1)) {
            if (pos) *pos = intrev32ifbe(is->length);
            return 0;
        // Because the underlying array is ordered, if value is smaller than the first value in the array
        // Then value must not exist in the set,
        // And it should be added to the front of the underlying array
        } else if (value < _intsetGet(is,0)) {
            if (pos) *pos = 0;
            return 0;
        }
    }

    // Binary lookup in ordered array
    // T = O(log N)
    while(max >= min) {
        mid = (min+max)/2;
        cur = _intsetGet(is,mid);
        if (value > cur) {
            min = mid+1;
        } else if (value < cur) {
            max = mid-1;
        } else {
            break;
        }
    }
    // Check if value has been found
    if (value == cur) {
        if (pos) *pos = mid;
        return 1;
    } else {
        if (pos) *pos = min;
        return 0;
    }
}
 //Moves the array elements in the specified index range forward or successively
static void intsetMoveTail(intset *is, uint32_t from, uint32_t to) {
    void *src, *dst;
    uint32_t bytes = intrev32ifbe(is->length)-from;
    uint32_t encoding = intrev32ifbe(is->encoding);
    // Make corresponding processing according to the coding format
    // src is the initial location of the memory to be moved
    // dst is the initial location of the memory block to be moved to
    // Bytes is the number of bytes to be moved
    if (encoding == INTSET_ENC_INT64) {
        src = (int64_t*)is->contents+from;
        dst = (int64_t*)is->contents+to;
        bytes *= sizeof(int64_t);
    } else if (encoding == INTSET_ENC_INT32) {
        src = (int32_t*)is->contents+from;
        dst = (int32_t*)is->contents+to;
        bytes *= sizeof(int32_t);
    } else {
        src = (int16_t*)is->contents+from;
        dst = (int16_t*)is->contents+to;
        bytes *= sizeof(int16_t);
    }
    memmove(dst,src,bytes);
}
  1. Remove data
intset *intsetRemove(intset *is, int64_t value, int *success) {
    // Encoding method for calculating value
    uint8_t valenc = _intsetValueEncoding(value);
    uint32_t pos;
    // The default setting ID value is delete failed
    if (success) *success = 0;
    // When the encoding size of value is less than or equal to the current encoding method of the set (indicating that value may exist in the set)
    // If the result of intsetSearch is true, delete is executed
    // T = O(log N)
    if (valenc <= intrev32ifbe(is->encoding) && intsetSearch(is,value,&pos)) {
        // Gets the current number of elements in the collection
        uint32_t len = intrev32ifbe(is->length);
        // Set the identity value to delete successfully
        if (success) *success = 1;
        // If value is not at the end of the array
        // Then you need to move the element that was originally after value
        if (pos < (len-1)) intsetMoveTail(is,pos+1,pos);
        // Reduce the size of the array and remove the space occupied by the deleted elements
        // T = O(N)
        is = intsetResize(is,len-1);
        // Update the number of elements in the collection
        is->length = intrev32ifbe(len-1);
    }
    return is;
}
  1. summary
    The underlying implementation of the integer set intset is an array. The elements in the array are stored orderly and without repetition. In order to better save memory, intset provides upgrade operation, but does not support downgrade operation

Posted by chrbar on Fri, 10 Sep 2021 16:07:30 -0700