Implementation of Unlocked Ordered Link List

Keywords: less github

http://www.cppblog.com/kevinlynx/archive/2015/05/05/210555.html

The unlocked ordered list can guarantee the uniqueness of elements, make it available for the barrel of hash table, even as a map with less efficiency. The unlocked implementation of ordinary linked lists is relatively simple, because inserting elements can be inserted in the header of the table, and the insertion of ordered linked lists is arbitrary.

This paper is mainly based on the thesis. High Performance Dynamic Lock-Free Hash Tables Realization.

Main problems

The main operations of the linked list include insert and remove. If you simply implement a version, you will see the problem. The following code is only used as an example:

struct node_t {
        key_t key;
        value_t val;
        node_t *next;
    };

    int l_find(node_t **pred_ptr, node_t **item_ptr, node_t *head, key_t key) {
        node_t *pred = head;
        node_t *item = head->next;
        while (item) {
            int d = KEY_CMP(item->key, key);
            if (d >= 0) {
                *pred_ptr = pred;
                *item_ptr = item;
                return d == 0 ? TRUE : FALSE;
            }
            pred = item;
            item = item->next;
        } 
        *pred_ptr = pred;
        *item_ptr = NULL;
        return FALSE;
    }

    int l_insert(node_t *head, key_t key, value_t val) {
        node_t *pred, *item, *new_item;
        while (TRUE) {
            if (l_find(&pred, &item, head, key)) {
                return FALSE;
            }
            new_item = (node_t*) malloc(sizeof(node_t));
            new_item->key = key;
            new_item->val = val;
            new_item->next = item;
            // A. If pred itself is removed
            if (CAS(&pred->next, item, new_item)) {
                return TRUE;
            }
            free(new_item);
        }
    }

    int l_remove(node_t *head, key_t key) {
        node_t *pred, *item;
        while (TRUE) {
            if (!l_find(&pred, &item, head, key)) {
                return TRUE;
            }
            // B. If pred is removed; if item is removed
            if (CAS(&pred->next, item, item->next)) {
                haz_free(item);
                return TRUE;
            }
        }
    }

The l_find function returns the found prefix elements and the elements themselves. Code A and B, although they get pred and item, may be removed by other threads in CAS. Even in the l_find process, every element of it may be removed. The problem is that whenever you get an element, you're not sure if it's still valid. The validity of an element includes whether it is still in the list and whether the memory it points to is still valid.

Solution

By adding a validity flag to the element pointer and cooperating with the mutual exclusion of CAS operation, the problem of determining element validity can be solved.

Because node_t is aligned in memory, it will not be used to point to the lower number of points of node_t, so it can set flags in the lower number of bits. In doing CAS, the effect of DCAS is realized, which is equivalent to turning two logical operations into an atomic operation. Imagine thread security for reference counting objects, where wrapped pointers are thread-safe, but not for the object itself.

CAS mutually exclusive, when several threads CAS the same object, only one thread will succeed, the failed threads can be used to determine that the target object has changed. Improved code (code for example only, not guaranteed to be correct):

typedef size_t markable_t;
    // Minimum position 1, indicating that the element is deleted
    #define HAS_MARK(p) ((markable_t)p & 0x01)
    #define MARK(p) ((markable_t)p | 0x01)
    #define STRIP_MARK(p) ((markable_t)p & ~0x01)

    int l_insert(node_t *head, key_t key, value_t val) {
        node_t *pred, *item, *new_item;
        while (TRUE) {
            if (l_find(&pred, &item, head, key)) { 
                return FALSE;
            }
            new_item = (node_t*) malloc(sizeof(node_t));
            new_item->key = key;
            new_item->val = val;
            new_item->next = item;
            // A. Although find gets the legal pred, PRED may be deleted before the following code, where PRED - > next is marked
            //    PRED - > next!= item, the CAS will fail, try again after failure
            if (CAS(&pred->next, item, new_item)) {
                return TRUE;
            }
            free(new_item);
        }
        return FALSE;
    }

    int l_remove(node_t *head, key_t key) {
        node_t *pred, *item;
        while (TRUE) {
            if (!l_find(&pred, &item, head, key)) {
                return FALSE;
            }
            node_t *inext = item->next;
            // B. Mark item - > next before deleting item. If CAS fails, it will be the same as insert. There are other threads after find.
            //    Delete item and retry after failure
            if (!CAS(&item->next, inext, MARK(inext))) {
                continue;
            }
            // C. When deleting item from the same element, only one thread will succeed.
            if (CAS(&pred->next, item, STRIP_MARK(item->next))) {
                haz_defer_free(item);
                return TRUE;
            }
        }
        return FALSE;
    }

    int l_find(node_t **pred_ptr, node_t **item_ptr, node_t *head, key_t key) {
        node_t *pred = head;
        node_t *item = head->next;
        hazard_t *hp1 = haz_get(0);
        hazard_t *hp2 = haz_get(1);
        while (item) {
            haz_set_ptr(hp1, pred);
            haz_set_ptr(hp2, item);
            /* 
             If tagged, item may be removed or even released immediately after that, so it needs to be looked up again.
            */
            if (HAS_MARK(item->next)) { 
                return l_find(pred_ptr, item_ptr, head, key);
            }
            int d = KEY_CMP(item->key, key);
            if (d >= 0) {
                *pred_ptr = pred;
                *item_ptr = item;
                return d == 0 ? TRUE : FALSE;
            }
            pred = item;
            item = item->next;
        } 
        *pred_ptr = pred;
        *item_ptr = NULL;
        return FALSE;
    }

Functions such as haz_get, haz_set_ptr are hazard pointer implementations that support memory GC in multi-threaded environments. In the above code, when deleting an element item, item - > next is marked so that the CAS in insert does not need to make any adjustments. Summarize the threading competition here:

  • find in insert to normal pred and item, pred - > next == item, and then a thread deletes pred before CAS, when pred - > next == MARK(item), CAS failure, retry; deletion is divided into two cases: a) remove from the list, get the tag, pred can continue to access; b) pred may be released memory, then use pred will be wrong. In order to deal with case b, hazard-like is introduced. Pointer mechanism, can effectively guarantee that any pointer p as long as there are threads using it, its memory will not be really released.
  • There are multiple threads inserting elements after pred in insert, which is also guaranteed by CAS in insert, not to mention much.
  • In remote case, insert and find get valid pred and next, but in CAS, pred is deleted by other threads. In this case, insert and CAS fail and retry.
  • Whenever you change the list structure, whether remove or insert, you need to retry the operation
  • When traversing in find, you may encounter items that are deleted by tags. At this time, items are likely to be deleted according to the implementation of remote, so you need to start traversing again.

ABA problem

ABA problem still exists, insert:

if (CAS(&pred->next, item, new_item)) {
        return TRUE;
    }

If the item after pred is removed before CAS and added with the same address value, but its value changes, CAS will succeed, but the list may not be ordered. pred->val < new_item->val > item->val

To solve this problem, other bits of pointer value address alignment can be used to store a count to indicate the number of changes in PRED - > next. When insert gets pred, the counting assumption stored in PRED - > next is 0. Before CAS, other threads remove PRED - > next and add back item. At this time, the count in PRED - > next increases, which leads to CAS failure in insert.

// Leave the lowest bit as the deletion mark
    #define MASK ((sizeof(node_t) - 1) & ~0x01)

    #define GET_TAG(p) ((markable_t)p & MASK)
    #define TAG(p, tag) ((markable_t)p | (tag))
    #define MARK(p) ((markable_t)p | 0x01)
    #define HAS_MARK(p) ((markable_t)p & 0x01)
    #define STRIP_MARK(p) ((node_t*)((markable_t)p & ~(MASK | 0x01)))

Implementation of remote:

/* Mark before delete */
    if (!CAS(&sitem->next, inext, MARK(inext))) {
        continue;
    }
    int tag = GET_TAG(pred->next) + 1;
    if (CAS(&pred->next, item, TAG(STRIP_MARK(sitem->next), tag))) {
        haz_defer_free(sitem);
        return TRUE;
    }

The count of PRED - > next can also be updated in insert.

summary

The realization of lock-free will depend essentially on the mutual exclusion of CAS. Implementing a lock free data structure from scratch can give us a deep sense of the tricky implementation of lock free. The final code can be obtained from Here github Obtain. For simplicity, a less powerful hazard is implemented in the code pointer, yes Refer to previous blog posts.

Posted by crouchl on Fri, 29 Mar 2019 11:51:29 -0700