Use and Analysis of [Curator] Shared Lock

Keywords: Apache Session Junit Java

Shared Lock

and Shared Reentrant Lock Similar, but not reentrant

> A complete distributed lock means that there will be no two holders for the same lock at the same time point. That is, at the same time, the same lock, there is at most one holder.

1. Key API s

org.apache.curator.framework.recipes.locks.InterProcessSemaphoreMutex

2. Mechanisms

There is also no Lock in the class name.

The name implies that semaphores are mutually exclusive between processes.

Shared Lock is actually a Shared Reentrant Lock that customizes lease management.

3. Usage

> How to use it and Shared Reentrant Lock Like that, I won't repeat it here.

4. Error handling

Also with Shared Reentrant Lock Like that, I won't repeat it here.

5. Source code analysis

Class 5.1 Definitions

Let's first look at class definitions:

public class InterProcessSemaphoreMutex implements InterProcessLock{}
  • Only the interface of org.apache.curator.framework.recipes.locks.InterProcessLock is implemented.
    • api defining lock operation
      • acquire
      • release
      • isAcquiredInThisProcess

5.2 Membership Variables

public class InterProcessSemaphoreMutex implements InterProcessLock
{
    private final InterProcessSemaphoreV2 semaphore;
    private volatile Lease lease;
}
  • semaphore
    • final
    • Semaphore
    • org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2
    • Managing a lease locked between processes
  • lease
    • volatile
    • org.apache.curator.framework.recipes.locks.Lease
    • lease
    • Represents a lease obtained from semaphore

5.2.1 InterProcessSemaphoreV2

InterProcess Semaphore Mutex's internal operation logic relies heavily on InterProcess Semaphore V2, so it's necessary to look at this class:

public class InterProcessSemaphoreV2
{
    private final Logger log = LoggerFactory.getLogger(getClass());
    private final InterProcessMutex lock;
    private final CuratorFramework client;
    private final String leasesPath;
    private final Watcher watcher = new Watcher()
    {
        @Override
        public void process(WatchedEvent event)
        {
            notifyFromWatcher();
        }
    };

    private volatile byte[] nodeData;
    private volatile int maxLeases;

    private static final String LOCK_PARENT = "locks";
    private static final String LEASE_PARENT = "leases";
    private static final String LEASE_BASE_NAME = "lease-";
    public static final Set<String> LOCK_SCHEMA = Sets.newHashSet(
            LOCK_PARENT,
            LEASE_PARENT
    );
}
  • log
  • lock
    • final
    • org.apache.curator.framework.recipes.locks.InterProcessMutex
  • Client: ZK Client
  • leasesPath
    • final
    • zk node path corresponding to lease
  • watcher
    • Monitor
  • nodeData
    • volatile
    • Data written in a node
  • maxLeases
    • volatile
    • Maximum group approximation
  • LOCK_PARENT
    • Private Constants
  • LEASE_PARENT
    • Private Constants
  • LEASE_BASE_NAME
    • Private Constants
  • LOCK_SCHEMA
    • Common Constants

You can see that there is an InterProcess Mutex inside InterProcess Semaphore V2.( Shared Reentrant Lock)

As you can see here, == Shared Lock== is actually a customized lease management== Shared Reentrant Lock==.

5.3 Constructor

Only one:

public InterProcessSemaphoreMutex(CuratorFramework client, String path)
{
    this.semaphore = new InterProcessSemaphoreV2(client, path, 1);
}

It's actually to initialize an InterProcess Semaphore V2 with a maximum lease of 1, without using org.apache.curator.framework.recipes.shared.SharedCountReader.

5.3.1 InterProcessSemaphoreV2

public InterProcessSemaphoreV2(CuratorFramework client, String path, int maxLeases)
{
    this(client, path, maxLeases, null);
}

public InterProcessSemaphoreV2(CuratorFramework client, String path, SharedCountReader count)
{
    this(client, path, 0, count);
}

private InterProcessSemaphoreV2(CuratorFramework client, String path, int maxLeases, SharedCountReader count)
{
    this.client = client;
    path = PathUtils.validatePath(path);
    lock = new InterProcessMutex(client, ZKPaths.makePath(path, LOCK_PARENT));
    this.maxLeases = (count != null) ? count.getCount() : maxLeases;
    leasesPath = ZKPaths.makePath(path, LEASE_PARENT);

    if ( count != null )
    {
        count.addListener
            (
                new SharedCountListener()
                {
                    @Override
                    public void countHasChanged(SharedCountReader sharedCount, int newCount) throws Exception
                    {
                        InterProcessSemaphoreV2.this.maxLeases = newCount;
                        notifyFromWatcher();
                    }

                    @Override
                    public void stateChanged(CuratorFramework client, ConnectionState newState)
                    {
                        // no need to handle this here - clients should set their own connection state listener
                    }
                }
            );
    }
}
  • Initialized member variables
  • Initialized distributed lock
  • If SharedCountReader mode is used, a counter listener is added
    • Shared Lock uses maxLeases mode, so listeners are not added here

5.4 Locking

As can be seen from Section 3.2, the locking action is accomplished by the acquire method. So let's see how to lock it.

public void acquire() throws Exception
{
    lease = semaphore.acquire();
}

public boolean acquire(long time, TimeUnit unit) throws Exception
{
    Lease acquiredLease = semaphore.acquire(time, unit);
    if ( acquiredLease == null )
    {
        return false;   // important - don't overwrite lease field if couldn't be acquired
    }
    lease = acquiredLease;
    return true;
}

The simpler logic is essentially the process of applying for semaphores.

  • It can be found from the constructor that the maximum lease of this semaphore is only 1, so naturally, it becomes a non-reentrant lock implementation.
  • All logic is implemented in semaphore

5.4.1 InterProcessSemaphoreV2

Let's look at how this semaphore is applied:

public Lease acquire() throws Exception
{
    Collection<Lease> leases = acquire(1, 0, null);
    return leases.iterator().next();
}

public Collection<Lease> acquire(int qty) throws Exception
{
    return acquire(qty, 0, null);
}

public Lease acquire(long time, TimeUnit unit) throws Exception
{
    Collection<Lease> leases = acquire(1, time, unit);
    return (leases != null) ? leases.iterator().next() : null;
}

public Collection<Lease> acquire(int qty, long time, TimeUnit unit) throws Exception
{
    long startMs = System.currentTimeMillis();
    boolean hasWait = (unit != null);
    long waitMs = hasWait ? TimeUnit.MILLISECONDS.convert(time, unit) : 0;

    Preconditions.checkArgument(qty > 0, "qty cannot be 0");

    ImmutableList.Builder<Lease> builder = ImmutableList.builder();
    boolean success = false;
    try
    {
        while ( qty-- > 0 )
        {
            int retryCount = 0;
            long startMillis = System.currentTimeMillis();
            boolean isDone = false;
            while ( !isDone )
            {
                switch ( internalAcquire1Lease(builder, startMs, hasWait, waitMs) )
                {
                    case CONTINUE:
                    {
                        isDone = true;
                        break;
                    }

                    case RETURN_NULL:
                    {
                        return null;
                    }

                    case RETRY_DUE_TO_MISSING_NODE:
                    {
                        // gets thrown by internalAcquire1Lease when it can't find the lock node
                        // this can happen when the session expires, etc. So, if the retry allows, just try it all again
                        if ( !client.getZookeeperClient().getRetryPolicy().allowRetry(retryCount++, System.currentTimeMillis() - startMillis, RetryLoop.getDefaultRetrySleeper()) )
                        {
                            throw new KeeperException.NoNodeException("Sequential path not found - possible session loss");
                        }
                        // try again
                        break;
                    }
                }
            }
        }
        success = true;
    }
    finally
    {
        if ( !success )
        {
            returnAll(builder.build());
        }
    }

    return builder.build();
}

As you can see, InterProcess Semaphore V2 has four acquisition methods. Essentially all logic is implemented by the last acquisition (int qty, long time, TimeUnit unit), and the other three are just templates. So let's focus on the logic of this approach.

Let's first look at what javadoc says about this method:

> Acquire qty leases. If there are not enough leases available, this method blocks until either the maximum number of leases is increased enough or other clients/processes close enough leases. However, this method will only block to a maximum of the time parameters given. If time expires before all leases are acquired, the subset of acquired leases are automatically closed.

> The client must close the leases when it is done with them. You should do this in a finally block. NOTE: You can use returnAll(Collection) for this.

> Used to apply for qty semaphores. If there are not enough available semaphores, this method will block until the maximum number of available semaphores is increased enough, or when other clients/processes release enough semaphores. This method, then, does not permanently block the wait, and can specify the maximum duration of the wait through parameters. When it expires, if not enough semaphores are obtained, all the assigned semaphores will be released.

> For the client that acquires the semaphore, it must actively release the semaphore after processing. The release action should be completed in the final code block, and the return All (Collection) method can be called to complete the release.

Back in the source code section, let's see how the above logic is implemented: Several local variables are defined:

  • startMs: Start time
  • hasWait: Is there a waiting period?
  • WatMs: Converted to milliseconds by length of time per parameter unit
  1. An immutable List is defined to store the semaphore lease applied for. com.google.common.collect.ImmutableList.Builder
  2. Repeat the application action as long as there are not enough qty semaphores
  3. Each round of application (each semaphore application) initializes several variables
    • retryCount: Number of retries
    • startMillis: The start time of this round of applications
    • isDone: Completed or not
      1. As long as it's not done, keep trying.
        • Using a state machine routine
      2. The basis for the next action based on the return status of internal Acquire1Lease
        • CONTINUE
          • Continue
          • This round of application is completed and the next round can be started.
        • RETURN_NULL
          • Failure of application
          • Return null
        • RETRY_DUE_TO_MISSING_NODE
          • Node information error
            • For example, when the link is disconnected, session fails, etc.
          • If not, try again
          • If it has timed out, throw KeeperException. NoNoNodeException
  4. If it doesn't all succeed
    • The final block of code cleans up the established semaphores

It can be found that the application action for a single semaphore is actually completed by the internal Acquire1Lease (builder, startMs, hasWait, waitMs) method, so let's see what this method does:

private InternalAcquireResult internalAcquire1Lease(ImmutableList.Builder<Lease> builder, long startMs, boolean hasWait, long waitMs) throws Exception
{
    if ( client.getState() != CuratorFrameworkState.STARTED )
    {
        return InternalAcquireResult.RETURN_NULL;
    }

    if ( hasWait )
    {
        long thisWaitMs = getThisWaitMs(startMs, waitMs);
        if ( !lock.acquire(thisWaitMs, TimeUnit.MILLISECONDS) )
        {
            return InternalAcquireResult.RETURN_NULL;
        }
    }
    else
    {
        lock.acquire();
    }

    Lease lease = null;

    try
    {
        PathAndBytesable<String> createBuilder = client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL);
        String path = (nodeData != null) ? createBuilder.forPath(ZKPaths.makePath(leasesPath, LEASE_BASE_NAME), nodeData) : createBuilder.forPath(ZKPaths.makePath(leasesPath, LEASE_BASE_NAME));
        String nodeName = ZKPaths.getNodeFromPath(path);
        lease = makeLease(path);

        if ( debugAcquireLatch != null )
        {
            debugAcquireLatch.await();
        }

        synchronized(this)
        {
            for(;;)
            {
                List<String> children;
                try
                {
                    children = client.getChildren().usingWatcher(watcher).forPath(leasesPath);
                }
                catch ( Exception e )
                {
                    if ( debugFailedGetChildrenLatch != null )
                    {
                        debugFailedGetChildrenLatch.countDown();
                    }
                    returnLease(lease); // otherwise the just created ZNode will be orphaned causing a dead lock
                    throw e;
                }
                if ( !children.contains(nodeName) )
                {
                    log.error("Sequential path not found: " + path);
                    returnLease(lease);
                    return InternalAcquireResult.RETRY_DUE_TO_MISSING_NODE;
                }

                if ( children.size() <= maxLeases )
                {
                    break;
                }
                if ( hasWait )
                {
                    long thisWaitMs = getThisWaitMs(startMs, waitMs);
                    if ( thisWaitMs <= 0 )
                    {
                        returnLease(lease);
                        return InternalAcquireResult.RETURN_NULL;
                    }
                    wait(thisWaitMs);
                }
                else
                {
                    wait();
                }
            }
        }
    }
    finally
    {
        lock.release();
    }
    builder.add(Preconditions.checkNotNull(lease));
    return InternalAcquireResult.CONTINUE;
}
  1. First, determine the state of the zk client, if the RETURN_NULL is not played back correctly
    • Failure of application
  2. Locking using internal distributed locks
    • Decide which lock method to call based on whether to wait or not
    • If the lock fails within the set time limit, RETURN_NULL is replaced
      • Failure of application
  3. Once the lock is acquired, a temporarily ordered node is created for semaphore recording.
  4. Call makeLease to wrap the node path of the previous step into a Lease object
    • After that, the operation Lease essentially operates on the temporarily ordered node created in step 3.
  5. Synchronized plus synchronized lock
    1. Continuous retries to determine whether the semaphore is available
      1. Get the semaphore node list
      2. If the list does not contain the current semaphore node
        • Explain that the current session is invalid, or that the zk node is deleted by mistake, etc.
        • Returning to RETRY_DUE_TO_MISSING_NODE requires a new application.
      3. If the number of lists is less than or equal to maxleases
        • Explain that the semaphore for the application is available
        • Exit the cycle
      4. If you need to wait, calculate the waiting time and wait.
  6. finally releases the distributed lock in step 2
  7. Put the resulting semaphores in an immutable List
  8. Play back CONTINUE

Here are a few points to explain.

  1. Locking problem
    • InterProcess Mutex is used internally. Shared Reentrant Lock As mentioned, this lock is reentrant
      • That is, threads holding locks can be re-entered directly
    • So in step 5 above, when a distributed lock has been acquired, a synchronized synchronized lock is still needed to control the concurrency of local multithreads.
  2. zk node
    • Nodes for distributed locks: path +/locks
      • Temporary ordered nodes for locks
      • This node will be cleared (deleted) when the lock is released as the semaphore application action is completed (whether the application is successful or not).
    • Nodes for semaphores: path +/leases
      • Temporary ordered nodes of semaphores
      • This node path is encapsulated in Lease objects
        • As the semaphore is used, the close method is executed
          • This method clears the semaphore nodes (deletes)

5.4.2 Summary

The process of locking can be summarized briefly.

  1. The locking process is controlled by a signal with a maximum of 1.
    1. Solve the competition between processes through a distributed re-entrainable lock
    2. Resolve the competition between threads through a synchronized synchronized lock
  2. Since only one semaphore is available globally, there is only one lock between processes, threads and the same thread.
    • As long as the lock is not released, even the same thread cannot apply for the lock again.

5.5 release lock

public void release() throws Exception
{
    Lease lease = this.lease;
    Preconditions.checkState(lease != null, "Not acquired");
    this.lease = null;
    lease.close();
}

You can see that Shared Lock's release logic is to turn off semaphores. So let's look at the implementation logic of org. apache. curator. framework. recipes. locks. Lease close.

As described in the previous section, Lease is implemented in: org. apache. curator. framework. recipes. locks. InterProcess Semaphore V2 # makeLease

private Lease makeLease(final String path)
{
    return new Lease()
    {
        @Override
        public void close() throws IOException
        {
            try
            {
                client.delete().guaranteed().forPath(path);
            }
            catch ( KeeperException.NoNodeException e )
            {
                log.warn("Lease already released", e);
            }
            catch ( Exception e )
            {
                ThreadUtils.checkInterrupted(e);
                throw new IOException(e);
            }
        }

        @Override
        public byte[] getData() throws Exception
        {
            return client.getData().forPath(path);
        }

        @Override
        public String getNodeName() {
            return ZKPaths.getNodeFromPath(path);
        }
    };
}

You can see the close method, in fact, is to delete the semaphore node, so as to release the semaphore.

6. Testing

Because of this and Shared Reentrant Lock Similarly, this example looks at reentrancing in a single thread.

package com.roc.curator.demo.locks

import org.apache.commons.lang3.RandomStringUtils
import org.apache.curator.framework.CuratorFramework
import org.apache.curator.framework.CuratorFrameworkFactory
import org.apache.curator.framework.recipes.locks.InterProcessSemaphoreMutex
import org.apache.curator.retry.ExponentialBackoffRetry
import org.junit.Before
import org.junit.Test
import java.util.*
import java.util.concurrent.TimeUnit

/**
 * Created by roc on 2017/5/30.
 */
class InterProcessSemaphoreMutexTest {

    val LATCH_PATH: String = "/test/locks/ipsm"

    var client: CuratorFramework = CuratorFrameworkFactory.builder()
            .connectString("0.0.0.0:8888")
            .connectionTimeoutMs(5000)
            .retryPolicy(ExponentialBackoffRetry(1000, 10))
            .sessionTimeoutMs(3000)
            .build()

    @Before fun init() {
        client.start()
    }

    @Test fun runTest() {
        var id: String = RandomStringUtils.randomAlphabetic(10)
        println("id : $id ")
        val time = Date()
        var lock: InterProcessSemaphoreMutex = InterProcessSemaphoreMutex(client, LATCH_PATH)

        while (true) {

            if (lock.acquire(3, TimeUnit.SECONDS)) {
                println("$id Successful Locking $time")
                while (lock.isAcquiredInThisProcess) {
                    println("$id implement $time")
                    TimeUnit.SECONDS.sleep(2)
                    if (Math.random() > 0.5) {
                        if (lock.acquire(3, TimeUnit.SECONDS)) {
                            println("$id Successful lock-up again $time")
                        } else {
                            println("$id Failed to lock again $time")
                        }
                    }
                    if (Math.random() > 0.5) {
                        println("$id Release lock $time")
                        lock.release()
                    }
                }
            } else {
                println("$id Failure to lock $time")
            }
        }
        println("$id End: $time")

    }
}

Operation:

id : xPZcpRyivX 
xPZcpRyivX Successful Locking Tue May 30 16:02:10 CST 2017
xPZcpRyivX implement Tue May 30 16:02:10 CST 2017
xPZcpRyivX Release lock Tue May 30 16:02:10 CST 2017
xPZcpRyivX Successful Locking Tue May 30 16:02:10 CST 2017
xPZcpRyivX implement Tue May 30 16:02:10 CST 2017
xPZcpRyivX implement Tue May 30 16:02:10 CST 2017
xPZcpRyivX Release lock Tue May 30 16:02:10 CST 2017
xPZcpRyivX Successful Locking Tue May 30 16:02:10 CST 2017
xPZcpRyivX implement Tue May 30 16:02:10 CST 2017
xPZcpRyivX Failed to lock again Tue May 30 16:02:10 CST 2017
xPZcpRyivX implement Tue May 30 16:02:10 CST 2017
xPZcpRyivX implement Tue May 30 16:02:10 CST 2017
xPZcpRyivX Release lock Tue May 30 16:02:10 CST 2017
xPZcpRyivX Successful Locking Tue May 30 16:02:10 CST 2017
xPZcpRyivX implement Tue May 30 16:02:10 CST 2017
xPZcpRyivX Failed to lock again Tue May 30 16:02:10 CST 2017
xPZcpRyivX implement Tue May 30 16:02:10 CST 2017
xPZcpRyivX Release lock Tue May 30 16:02:10 CST 2017
xPZcpRyivX Successful Locking Tue May 30 16:02:10 CST 2017
xPZcpRyivX implement Tue May 30 16:02:10 CST 2017
xPZcpRyivX Failed to lock again Tue May 30 16:02:10 CST 2017

As you can see, in the same thread, re-locking fails and fails.

zookeeper node:

ls /test/locks/ipsm
[leases, locks]

ls /test/locks/ipsm/locks
[_c_cad2ad46-127d-4871-a6f8-7c11c0175f9a-lock-0000000014]

get /test/locks/ipsm/locks/_c_1b35ce47-ff75-46bc-8aea-483117fbf803-lock-0000000020
192.168.60.165
cZxid = 0x1e21a
ctime = Tue May 30 16:03:28 CST 2017
mZxid = 0x1e21a
mtime = Tue May 30 16:03:28 CST 2017
pZxid = 0x1e21a
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x15156529fae07fa
dataLength = 14
numChildren = 0

ls /test/locks/ipsm/leases
[_c_a3198a00-da20-4d43-8b1e-76c2101ce5ef-lease-0000000043, _c_0dc5f8e5-dc77-4a83-a782-6184d584a014-lease-0000000041]

get /test/locks/ipsm/leases/_c_a774be87-1867-4041-9ad7-ef14dcdfa315-lease-0000000049
192.168.60.165
cZxid = 0x1e290
ctime = Tue May 30 16:05:22 CST 2017
mZxid = 0x1e290
mtime = Tue May 30 16:05:22 CST 2017
pZxid = 0x1e290
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x15156529fae07fa
dataLength = 14
numChildren = 0

You can see that

  • Under / test/locks/ipsm, there are two nodes, one is locked and the other is semaphore.
  • All are temporarily ordered nodes
  • Occasionally you see two semaphore nodes
    • This is not to say that two locks were created.
    • According to the above analysis of the locking process, we first create semaphore nodes, and then query the list for quantitative judgment.
    • So, the occasional occurrence of two semaphore nodes does not mean that two nodes are allocated, but that lock competition occurs at that point in time.

Posted by duckula on Wed, 26 Jun 2019 14:05:19 -0700