[Java Series 004] don't underestimate Redis distributed lock

Keywords: Programming Jedis Redis Zookeeper Database

Hello, I'm miniluo. For positions requiring distributed experience, interviewers always like to ask questions about distributed locks. Recently, I have the honor to participate in the company's interview, and I often ask about the knowledge of distributed locks. Most of the candidates' answers are more limited to using, and they have not thought deeply. Today, we will take Redis distributed lock as an example to learn the pit I stepped on.

Jedis provides us with convenient distributed lock methods such as setex and setnx. The difference between setex and setnx is that setex can set the timeout (note the unit of second). Instead of setex, we use set to specify nxxx and expx. Release the lock. To ensure atomic operation, we use LUA command. See the following two parts of the code for details.

/**
 * NX-Only set the key if it does not already exist.
 * XX -- Only set the key if it already exist.
 */
private static final String SET_IF_NOT_EXIST = new String("NX");
/**
 * EX|PX, expire time units: EX = seconds; PX = milliseconds
 */
private static final String SET_WITH_EXPIRE_TIME = new String("PX");
private static final String LOCK_OK = new String("OK");
private static final Long RELEASE_SUCCESS = new Long(1);/**
 * Try to get lock
 *
 * @param jedis
 * @param lockKey
 * @param value
 * @param milliseconds(Milliseconds)
 * @return
 */
private static Boolean tryGetLock(Jedis jedis, String lockKey, String value, int milliseconds) {
    String result = jedis.set(lockKey, value, SET_IF_NOT_EXIST, SET_WITH_EXPIRE_TIME, milliseconds);
    if (LOCK_OK.equals(result)) {
        log.info("Get lock, thread name==" + Thread.currentThread().getName());
        return true;
    }
    log.info("Lock not acquired, thread name==" + Thread.currentThread().getName());
    return false;
}

/**
 * Release lock
 *
 * @param jedis
 * @param lockKey
 * @param value
 * @return
 */
private static Boolean releaseLock(Jedis jedis, String lockKey, String value) {
    String script = "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end";
    Long result = (Long) jedis.eval(script, Collections.singletonList(lockKey), Collections.singletonList(value));
    if (RELEASE_SUCCESS.equals(result)) {
        log.info("Release lock, thread name==" + Thread.currentThread().getName());
        return true;
    }
    log.info("Lock not released, thread name==" + Thread.currentThread().getName());
    return false;
}

Next, we simulate multithreading to obtain lock resources concurrently.

public static void main(String[] args) {
    String lockKey = "LF-TEST:DISTRIBUTION:LOCK";
    String value = "123";
    int maxThread = 1000;
    ExecutorService fixedCacheThreadPool = Executors.newFixedThreadPool(maxThread);
    CountDownLatch cdLatch = new CountDownLatch(maxThread);
    for (int i = 0; i < maxThread; i++) {
        fixedCacheThreadPool.execute(() -> {
            RedisUtil redisUtil = RedisUtil.getInstance();
            Jedis jedis = redisUtil.getJedis();
            try {
                if(tryGetLock(jedis, lockKey, value, 200)){
                    TimeUnit.MILLISECONDS.sleep(100);
                    releaseLock(jedis, lockKey, value);
                }
            } catch (Exception ex) {
                log.error("Incorrect lock acquisition or release:", ex);
            } finally {
                if (null != jedis) {
                  jedis.close();//Return connection
                }
            }
            cdLatch.countDown();
        });
    }
    try {
        cdLatch.await(30, TimeUnit.SECONDS);
    } catch (InterruptedException e) {
       log.error("cdLatch Exception:", e);
    }

}

After writing the use case, we execute the program, take out the log and count 3 threads to get the lock, and only 2 threads to release the lock.

Is there another thread that does not release the lock? In fact, it's not. It's A concurrency problem. Because the key and value of each thread are the same, just when thread A is ready to release the lock, it just times out (lock release). At this time, thread B obtains the lock, so thread A releases the lock originally belonging to thread B. So even if you write LUA scripts, don't think it's proper (A program that hasn't been unit tested is A rogue).

How can we improve? The goal is to achieve thread isolation, and the value s are not consistent. The purpose is clear. We naturally think that we can use ThreadLocal and UUID to achieve the purpose. Let's try after we improve the code.

 


private static final ThreadLocal<String> thdLocalLockValue = new ThreadLocal<>();
public static void main(String[] args) {
    String lockKey = "LF-TEST:DISTRIBUTION:LOCK";
    int maxThread = 1000;
    ExecutorService fixedCacheThreadPool = Executors.newFixedThreadPool(maxThread);
    CountDownLatch cdLatch = new CountDownLatch(maxThread);
    for (int i = 0; i < maxThread; i++) {
        fixedCacheThreadPool.execute(() -> {
            if(Objects.isNull(thdLocalLockValue.get())){
                thdLocalLockValue.set(StringUtils.replace(UUID.randomUUID().toString(), "-", ""));
            }
            RedisUtil redisUtil = RedisUtil.getInstance();
            Jedis jedis = redisUtil.getJedis();
            try {
                if(tryGetLock(jedis, lockKey, thdLocalLockValue.get(), 200)){
                    TimeUnit.MILLISECONDS.sleep(100);
                    releaseLock(jedis, lockKey, thdLocalLockValue.get());
                }
            } catch (Exception ex) {
                log.error("Incorrect lock acquisition or release:", ex);
            } finally {
              if (null != jedis) {
                 jedis.close();//Return connection
                }
                thdLocalLockValue.remove();
            }
            cdLatch.countDown();
        });
    }
    try {
        cdLatch.await(30, TimeUnit.SECONDS);
    } catch (InterruptedException e) {
       log.error("cdLatch Exception:", e);
    }
}

After counting the output log, we found that the number of threads obtaining lock is the same as the number of threads releasing lock, and it is the same after several attempts (no screenshot due to the length problem, interested friends can try to finish the code).

summary

Today, we learned about the implementation of Redis distributed lock and the pit we encountered. This pit is caused by the concurrency of two threads: Lock expiration and lock release. Therefore, we use ThreadLocal to solve the problem that the isolation between threads is different from the resources of each lock. Of course, we all know that in addition to Redis's ability to implement distributed locks, zookeeper can use "temporary sequential nodes" to implement distributed locks. In fact, for a single library, it can also be implemented through optimistic locks and pessimistic locks.

 

Thinking and discussion

1. As mentioned above, Zookeeper, optimistic lock and pessimistic lock of database can realize distributed lock. Do you know the difference between them?

2. Why does jedis.close() in the code say return, not close the connection?

3. We used ThreadLocal. Why do we need to explicitly remove() in the end? What's the problem with not removing?

Welcome to share and correct with me! You are also welcome to share this article with your friends or colleagues.

Thank you for reading. See you next time!

Scan the code to follow us and work with you

Posted by Sekka on Sun, 05 Apr 2020 01:39:05 -0700