Alibaba P6 Interviewer: how does Redis implement distributed locks? What if the lock expires?

Keywords: Java Redis Back-end Distributed lock

Principle of Redis implementing distributed lock

We talked about the application of Redis in the actual business scenario. Now let's learn about the application of Redisson functional scenario, that is, the implementation scenario of distributed locks that we often use.

The concept of distributed lock will not be described in this paper.

• introduce redisson dependency

<dependency>      <groupId>org.redisson</groupId>      <artifactId>redisson</artifactId>      <version>3.16.0</version>  </dependency>

• write simple test code

public class RedissonTest {      private static RedissonClient redissonClient;      static {          Config config=new Config();          config.useSingleServer().setAddress("redis://192.168.221.128:6379 "); reisonclient = reisson. Create (config);} public static void main (string [] args) throws interruptedexception {RLOCK RLOCK = reisonclient. Getlock (" updateorder "); / / wait up to 100 seconds and unlock automatically after 10 seconds if (rlock.trylock (100,10, timeunit. Seconds)) {system. Out. Println ("lock acquisition succeeded");} thread. Sleep (2000); RLOCK. Unlock(); redissonclient. Shutdown();}}

Implementation principle of Redisson distributed lock

You will find that redisson can easily realize the functions we need. Of course, this is just the tip of redisson's iceberg. Redisson's most powerful feature is that it provides common tool classes with distributed characteristics, which enables the toolkit originally used to coordinate concurrent programs of single machine multithreaded concurrent programs to obtain the ability to coordinate Distributed Multi-level multithreaded concurrent systems, It reduces the difficulty for programmers to solve distributed problems in a distributed environment. Let's analyze the implementation principle of RedissonLock

RedissonLock.tryLock

@Overridepublic boolean tryLock(long waitTime, long leaseTime, TimeUnit unit) throws InterruptedException {    long time = unit.toMillis(waitTime);    long current = System.currentTimeMillis();    long threadId = Thread.currentThread().getId();    //Try to acquire the lock through the tryAcquire method. Long TTL = tryAcquire (waittime, leasetime, unit, ThreadID); / / lock acquired if (TTL = = null) {/ / indicates that the lock is acquired successfully, and a return true is returned directly;} / / some codes are omitted...}

tryAcquire

private <T> RFuture<Long> tryAcquireAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId) {    RFuture<Long> ttlRemainingFuture;    //leaseTime is the lease time, which is the expiration time of the redis key. If (leaseTime! = - 1) {/ / if the expiration time ttlremainingfuture = trylookinnerasync (waittime, leaseTime, unit, ThreadID, rediscommands. Eval_long);} else {/ / if the expiration time is not set, get the key timeout from the configuration. The default is 30s. Ttlremainingfuture = tryLockInnerAsync (waittime, internalLockLeaseTime, timeunit.milliseconds, ThreadID, rediscommands. Eval_long);} //After trylockinginnerasync is executed, the following callback ttlremainingfuture. Oncomplete ((ttlremaining, e) - > {if (E! = null) {/ / indicates that an exception occurs and a return is returned directly;} / / lock acquired if (ttlremaining = = null) {/ / indicates that the lock key if (leaseTime! = - 1) is set for the first time {/ / indicates that the timeout is set. Update internalLockLeaseTime and return internalLockLeaseTime = unit. Tomilis (leaseTime);} else {/ / leaseTime = - 1. Start watch dog scheduleexpirationrenewal (ThreadID);}}}; return ttlremainingfuture;}

tryLockInnerAsync

The locking operation is implemented through lua script

1. Judge whether the lock key exists or not, directly call hset to store the current thread information and set the expiration time, return nil and tell the client to directly obtain the lock. 2. Judge whether the lock key exists, increase the number of reentries by 1, reset the expiration time, return nil and tell the client to directly obtain the lock. 3. It has been locked by other threads and return the remaining value of the lock validity period In the spare time, tell the client to wait.

<T> RFuture<T> tryLockInnerAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {    return evalWriteAsync(getRawName(), LongCodec.INSTANCE, command,                          "if (redis.call('exists', KEYS[1]) == 0) then " +                          "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +                          "redis.call('pexpire', KEYS[1], ARGV[1]); " +                          "return nil; " +                          "end; " +                          "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +                          "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +                          "redis.call('pexpire', KEYS[1], ARGV[1]); " +                          "return nil; " +                          "end; " +                          "return redis.call('pttl', KEYS[1]);",                          Collections.singletonList(getRawName()), unit.toMillis(leaseTime), getLockName(threadId));}

We will explain the Lua script later.

unlock lock release process

The process of releasing the lock, the script looks a little more complicated

1. If the lock key does not exist, a message is sent through the publish command to indicate that the lock is available. 2. If the lock is not locked by the current thread, nil is returned. 3. Because reentry is supported, the reentry times need to be reduced by 14 when unlocking. If the calculated reentry times > 0, the expiration time is reset. 5. If the calculated reentry times < = 0, a message is sent to say that the lock is available

protected RFuture<Boolean> unlockInnerAsync(long threadId) {    return evalWriteAsync(getRawName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,                          "if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then " +                          "return nil;" +                          "end; " +                          "local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); " +                          "if (counter > 0) then " +                          "redis.call('pexpire', KEYS[1], ARGV[2]); " +                          "return 0; " +                          "else " +                          "redis.call('del', KEYS[1]); " +                          "redis.call('publish', KEYS[2], ARGV[1]); " +                          "return 1; " +                          "end; " +                          "return nil;",                          Arrays.asList(getRawName(), getChannelName()), LockPubSub.UNLOCK_MESSAGE, internalLockLeaseTime, getLockName(threadId));}

RedissonLock has competition

In the case of contention, the lua script on the redis side is the same, but different redis commands are executed under different conditions. When it is found that the lock is applied by other threads through tryAcquire, it is necessary to enter the waiting contention logic

1.this.await returns false, indicating that the waiting time has exceeded the maximum waiting time for obtaining the lock. Cancel the subscription and return the failure to obtain the lock. 2.this.await returns true and enters the loop to try to obtain the lock.

Continue to look at the code of the second half of RedissonLock.tryLock as follows:

public boolean tryLock(long waitTime, long leaseTime, TimeUnit unit) throws InterruptedException {//Omit part of code time - = system. Currenttimemillis() - current; if (time <= 0) {            acquireFailed(waitTime, unit, threadId);            return false;        }        current = System.currentTimeMillis();       //  Subscribe to listen to redis messages, and create redisonlockentry rfuture < redisonlockentry > subscribefuture = subscribe (ThreadID)// Block the result object waiting for the future of the subscribe. If the subscribe method call exceeds time, it indicates that the maximum wait time set by the client has been exceeded, then it directly returns false, cancels the subscription and does not continue to apply for locks. If (! Subscribefuture. Await (time, timeunit. Milliseconds)) {if (! Subscribefuture. Cancel (false)) {/ / unsubscribe subscribefuture. Oncomplete ((RES, e) - > {if (E = = null) {unsubscribe (subscribefuture, ThreadID);}});} Acquirefailed (waittime, unit, ThreadID); / / indicates that the lock preemption failed. Return false. / / false} try {/ / is returned to judge whether the timeout has occurred. If the timeout occurs, the lock acquired failed time - = system. Currenttimemillis() - current; if (time < = 0) {acquirefailed is returned (waittime, unit, ThreadID); return false;} / / try to compete for lock again through the while loop while (true) {long currenttime = system. Currenttimemillis(); ttl = tryacquire (waittime, leasetime, unit, ThreadID) ; / / compete for the lock and return the lock timeout time. / / lock acquired if (ttl = = null) {/ / if the timeout time is null, it indicates that the lock is obtained successfully, and return true;} / / judge whether the timeout occurs. If the timeout occurs, it indicates that the lock acquisition fails. Time - = system. Currenttimemillis() -Currenttime; if (time < = 0) {acquirefailed (waittime, unit, ThreadID); return false;} / / block through semaphores (shared locks) and wait for unlocking messages. (reduce the frequency of requesting lock calls) / / if the remaining time (ttl) If it is less than the wait time, obtain a permission from the semaphore of the Entry within the ttl time (unless interrupted or no permission is available). / / otherwise, wait within the wait time. You can use the semaphore currenttime = system. Currenttimemillis(); if (ttl > = 0 & & ttl < time) {subscribefuture. Getnow(). Getlatch(). Tryacquire (ttl, timeunit. Milliseconds);} else {subscribefuture. Getnow(). Getlatch(). Tryacquire (time, timeunit. Milliseconds);} / / update wait time (maximum wait time - elapsed blocking time) Time - = system. Currenttimemillis() - currenttime; if (time < = 0) {/ / failed to acquire lock. Acquirefailed (waittime, unit, ThreadID); return false;}}} finally {unsubscribe (subscribefuture, ThreadID); / / unsubscribe} //        return get(tryLockAsync(waitTime, leaseTime, unit));    }

What if the lock expires?

Generally speaking, when we obtain a distributed lock, in order to avoid deadlock, we will set a timeout for the lock. However, in one case, if the current thread does not finish executing within the specified time, and the lock is released due to the lock timeout, other threads will get the lock, resulting in some failures.

In order to avoid this situation, Redisson introduces a Watch Dog mechanism to automatically renew the lock for distributed locks. In short, if the current thread that obtains the lock has not finished executing, Redisson will automatically extend the timeout for the target key in Redis.

By default, the watchdog renewal time is 30s, which can also be specified by modifying Config.lockWatchdogTimeout.

@Overridepublic boolean tryLock(long waitTime, TimeUnit unit) throws InterruptedException {    return tryLock(waitTime, -1, unit);  //leaseTime=-1}

In fact, when we do not pass the timeout through the tryLock method, a timeout of 30s will be set by default to avoid deadlock.

private <T> RFuture<Long> tryAcquireAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId) {    RFuture<Long> ttlRemainingFuture;    if (leaseTime != -1) {         ttlRemainingFuture = tryLockInnerAsync(waitTime, leaseTime, unit, threadId, RedisCommands.EVAL_LONG);    } else { //When leaseTime is - 1, leaseTime=internalLockLeaseTime, the default is 30s, which indicates the expiration time of the current lock. / / this. internalLockLeaseTime = commandexecutor. Getconnectionmanager(). Getcfg(). Getlockwatchdogtimeout(); ttlremainingfuture = trylockinnerasync (waittime, internalLockLeaseTime, timeunit.milliseconds, ThreadID, rediscommands. Eval_long);} ttlremainingfuture. Oncomplete ((ttlremaining, e) - > {if (E! = null) {/ / indicates that an exception occurs and a return is returned directly;} / / lock acquired if (ttlremaining = = null) {/ / indicates that the lock key is set for the first time. If (leaseTime! = - 1) {/ / indicates that the timeout has been set. Update internalLockLeaseTime and return internalLockLeaseTime = unit.tomilis (leaseTime);} else {/ / leaseTime = - 1. Start watch dog scheduleexpirationrenewal (ThreadID);}}}}) ;    return ttlRemainingFuture;}

Since an expiration time of 30s is set by default, in order to prevent the current thread from executing after expiration, the expiration time is renewed through scheduled tasks.

• first, we will judge whether there is an entryName in the expirationRenewalMap, which is a map structure, mainly to judge whether the lock key of the locking client in this service instance exists, • if it already exists, it will be returned directly; mainly considering that RedissonLock is a reentrant lock.

protected void scheduleExpirationRenewal(long threadId) {    ExpirationEntry entry = new ExpirationEntry();    ExpirationEntry oldEntry = EXPIRATION_RENEWAL_MAP.putIfAbsent(getEntryName(), entry);    if (oldEntry != null) {        oldEntry.addThreadId(threadId);    } else {// It will be called when locking for the first time, and the watchdog entry. Addthreadid (ThreadID); renewexpiration();}}

Define a timed task that calls the renewExpirationAsync method to renew the contract.

private void renewExpiration() {    ExpirationEntry ee = EXPIRATION_RENEWAL_MAP.get(getEntryName());    if (ee == null) {        return;    }    //The time wheel mechanism is used. Timeout task = commandexecutor. Getconnectionmanager(). Newtimeout (New timertask() {@ override public void run (timeout timeout) throws exception {expirationentry ent = expiration_future_map.get (getentryname()); if (ENT = = null) {return;} Long ThreadID = ent. Getfirstthreadid(); if (ThreadID = = null) {return;} / / renewexpirationasync lease renewal rfuture < Boolean > future = renewexpirationasync (ThreadID); future. Oncomplete ((RES, e) - > {if (E! = null) {log.error( "Can't update lock " + getRawName() + " expiration", e);                    EXPIRATION_RENEWAL_MAP.remove(getEntryName());                    return;                }                if (res) {                    // reschedule itself                    renewExpiration();                }            });        }    } , internallockleasetime / 3, timeunit. Milliseconds); / / execute ee.setTimeout(task) at 1 / 3 of each lease interval;}

Execute the Lua script to renew the specified key.

protected RFuture<Boolean> renewExpirationAsync(long threadId) {    return evalWriteAsync(getRawName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,                          "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +                          "redis.call('pexpire', KEYS[1], ARGV[1]); " +                          "return 1; " +                          "end; " +                          "return 0;",                          Collections.singletonList(getRawName()),                          internalLockLeaseTime, getLockName(threadId));}

Lua script

Lua is an efficient lightweight scripting language (similar to JavaScript). It is written in standard C language and open in the form of source code. Its design purpose is to embed it in the application, so as to provide flexible expansion and customization functions for the application. Lua means "Moon" in Portuguese, and its logo means "satellite language" , which can be easily embedded in other languages; in fact, many common frameworks have the function of embedding Lua scripts, such as OpenResty, Redis, etc.

Benefits of using Lua scripts:

1. Reduce network overhead. In Lua script, multiple commands can be run in the same script. 2. Atomic operation, redis will execute the whole script as a whole without being inserted by other commands. In other words, there is no need to worry about race conditions during script writing. 3. Reusability. The scripts sent by the client will always be stored in redis, which means that other The client can reuse this pin to complete the same logic

Download and installation of Lua

Lua is an independent scripting language, so it has a special compilation and execution tool. Let's take you to install it.

• download Lua source package: https://www.lua.org/download.... • installation steps are as follows:

tar -zxvf lua-5.4.3.tar.gz  cd lua-5.4.3  make linux  make install

If an error is reported, saying that readline/readline.h cannot be found, you can install it using the yum command

yum -y install readline-devel ncurses-devel

Finally, you can enter the Lua console by directly entering the Lua command. The Lua script has its own syntax, variables, logical operators, functions, etc. I won't explain it too much here. Students who have used JavaScript should only spend a few hours to learn it all. A simple demonstration of two cases is as follows.

array = {"Lua", "mic"}for i= 0, 2 do   print(array[i])end
array = {"mic", "redis"}for key,value in ipairs(array)do   print(key, value)end

Redis and Lua

Redis integrates Lua compiler and executor, so we can define Lua script in redis to execute. Meanwhile, in Lua script, you can directly call redis commands to operate data in redis.

redis.call('set','hello','world')local value=redis.call('get','hello')

The return value of the redis.call function is the execution result of the redis command. As described earlier, the types of returned values of the five types of data in redis are also different. The redis.call function will convert the return values of these five types into the corresponding Lua data types

In many cases, we need the script to have a return value. After all, this script is also a command set we have written. We can call our own script like calling other redis built-in commands, so redis will automatically convert the Lua data type of the script return value into the redis return value type. In the script, you can use the return statement to return the value It is returned to the redis client and executed through the return statement. If return is not executed, the default return is nil.

Execute Lua script related commands in Redis

After writing the script, the most important thing is to execute the script in the program. Redis provides EVAL command, which enables developers to call the script like calling other redis built-in commands.

EVAL command - execute script

[EVAL] [script content] [number of key parameters] [key...] [arg...]

You can pass data to the script through the two parameters key and arg, and their values can be accessed in the script using the two types of global variables KEYS and ARGV respectively.

For example, we implement a set command by script, and call it through the redis client.

eval "return redis.call('set',KEYS[1],ARGV[1])" 1 lua hello

The above script is equivalent to using Lua script to call Redis's set command and store a key=lua, value=hello in Redis.

EVALSHA command

Considering that we execute lua script through Eval, when the script is relatively long, we need to pass the entire script to redis every time we call the script, which takes up more bandwidth. To solve this problem, redis provides the EVALSHA command, which allows developers to execute the script through the SHA1 summary of the script content. The usage of this command is the same as that of Eval, except to replace the script content into the script SHA1 summary of content

1. When executing EVAL command, Redis will calculate the SHA1 summary of the script and record it in the script cache. 2. When executing EVALSHA command, Redis will find the corresponding script content from the script cache according to the provided summary. If found, execute the script. Otherwise, return "NOSCRIPT No matching script,Please use EVAL"

# Add script to cache and generate sha1 command script load "return redis.call('get','lua')"# ["13bd040587b891aedc00a72458cbf8588a27df90"]# Pass the value of sha1 to execute the command evalsha "13bd040587b891aedc00a72458cbf858a27df90" 0

Redisson executes Lua script

Implement an access frequency restriction function through lua script.

The idea is to define a key, which contains the ip address. value is the number of accesses within the specified time, for example, it can only be accessed 3 times in 10 seconds.

• define Lua scripts.

local times=redis.call('incr',KEYS[1])  -- If it is the first time to come in, set an expiration time  if times == 1 then     redis.call('expire',KEYS[1],ARGV[1])  end  -- If the number of accesses is greater than the specified number within the specified time, 0 is returned, indicating that access is restricted  if times > tonumber(ARGV[2]) then     return 0  end  -- Return 1 to allow access  return 1

• define the controller and provide access test methods

@RestController  public class RedissonController {      @Autowired      RedissonClient redissonClient;      private final String LIMIT_LUA=          "local times=redis.call('incr',KEYS[1])\n" +          "if times == 1 then\n" +          "   redis.call('expire',KEYS[1],ARGV[1])\n" +          "end\n" +          "if times > tonumber(ARGV[2]) then\n" +          "   return 0\n" +          "end\n" +          "return 1";      @GetMapping("/lua/{id}")      public String lua(@PathVariable("id") Integer id) throws ExecutionException, InterruptedException {          List<Object> keys= Arrays.asList("LIMIT:"+id);          RFuture<Object> future=redissonClient.getScript().              evalAsync(RScript.Mode.READ_WRITE,LIMIT_LUA, RScript.ReturnType.INTEGER,keys,10,3);          return future.get().toString();      }  }

It should be noted that there will be problems during the execution of the above script, because the default serialization method of redis causes the value to be converted into an object type when passed to the script. It is necessary to modify the redisson.yml file and increase the serialization method of codec.

•application.yml

spring:    redis:      redisson:        file: classpath:redisson.yml

•redisson.yml

singleServerConfig:    address: redis://192.168.221.128:6379  codec: !<org.redisson.codec.JsonJacksonCodec> {}

Atomicity of Lua script

Redis's script execution is atomic, that is, redis will not execute other commands during script execution. All commands can only be executed after the script is executed. To prevent a script execution time process from causing redis to fail to provide services, redis provides the Lua time limit parameter to limit the maximum running time of the script. The default is 5 seconds.

Non transactional operations

When the running time of the script exceeds this limit, Redis will start to accept other commands but will not execute them (to ensure the atomicity of the script), but return a BUSY error. This is demonstrated below.

Open two client windows and execute the loop of lua script in the first window

eval "while true do end" 0

Run get lua in the second window and you will get the following exception.

(error) BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.

We will find that the execution result is Busy. Then we use the script kill Command to terminate the currently executed script, and the display of the second window returns to normal.

There are transactional operations

If the currently executed Lua script modifies Redis data (SET, DEL, etc.), the SCRIPT KILL Command cannot terminate the script operation, because the atomicity of the script operation should be guaranteed. If the script is partially terminated, it violates the requirements of script atomicity. Finally, make sure that the scripts are either executed or not executed

Also open two windows. The first window runs the following command

eval "redis.call('set','name','mic') while true do end" 0

Run in the second window

get lua

The result is the same. It is still busy, but at this time, through the script kill Command, you will find an error and can't kill.

(error) UNKILLABLE Sorry the script already executed write commands against the dataset. You can either wait the script termination or kill the server in a hard way using the SHUTDOWN NOSAVE command.

In this case, you can only forcibly terminate redis through the shutdown nosave command.

The difference between shutdown nosave and shutdown is that shutdown nosave does not perform persistence, which means that database modifications after the last snapshot will be lost.

Redisson's Lua script

After learning about Lua, it's not difficult to understand Redisson's Lua script.

<T> RFuture<T> tryLockInnerAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {    return evalWriteAsync(getRawName(), LongCodec.INSTANCE, command,                          "if (redis.call('exists', KEYS[1]) == 0) then " +                          "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +                          "redis.call('pexpire', KEYS[1], ARGV[1]); " +                          "return nil; " +                          "end; " +                          "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +                          "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +                          "redis.call('pexpire', KEYS[1], ARGV[1]); " +                          "return nil; " +                          "end; " +                          "return redis.call('pttl', KEYS[1]);",                          Collections.singletonList(getRawName()), unit.toMillis(leaseTime), getLockName(threadId));}

Pub/Sub mechanism in Redis

The following is the code for releasing locks in Redisson. In the code, we found a publish instruction redis.call('publish', KEYS[2], ARGV[1]). What is this instruction for?

protected RFuture<Boolean> unlockInnerAsync(long threadId) {    return evalWriteAsync(getRawName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,                          "if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then " +                          "return nil;" +                          "end; " +                          "local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); " +                          "if (counter > 0) then " +                          "redis.call('pexpire', KEYS[1], ARGV[2]); " +                          "return 0; " +                          "else " +                          "redis.call('del', KEYS[1]); " +                          "redis.call('publish', KEYS[2], ARGV[1]); " +                          "return 1; " +                          "end; " +                          "return nil;",                          Arrays.asList(getRawName(), getChannelName()), LockPubSub.UNLOCK_MESSAGE, internalLockLeaseTime, getLockName(threadId));}

Redis provides a set of commands to enable developers to implement the publish / subscribe mode. This mode can also realize message transmission between processes. Its implementation principle is as follows:

• the PUBLISH / subscribe mode contains two roles: publisher and subscriber. Subscribers can subscribe to one or more channels, and publishers can send messages to the specified channels. All subscribers who subscribe to this channel will receive the message • the command of publishers to PUBLISH messages is PUBLISH. The usage is

PUBLISH channel message

For example, send a message to channel.1: hello

PUBLISH channel.1 "hello"

In this way, the message is sent. The return value of the command indicates the number of subscribers who received the message. Because no subscriber has subscribed to the channel when this command is executed, the return is 0. In addition, it is worth noting that the message will not be persistent when it is sent. If there is no subscriber before sending, subsequent subscribers will subscribe to the channel, and the previous message will not be received

The commands for subscribers to subscribe to messages are:

SUBSCRIBE channel [channel ...]

This command can SUBSCRIBE to multiple channels at the same time, such as SUBSCRIBE channel.1. After executing the SUBSCRIBE command, the client will enter the subscription state.

In general, we will not use pub/sub as the message sending mechanism. After all, there are so many MQ technologies.

Programmer Group