Snowflake algorithm-native

Keywords: Programming less Java MySQL

Snowflake algorithm is a distributed primary key generation algorithm published by Twitter. It can guarantee the non-repeatability of primary keys in different processes and the orderliness of primary keys in the same process.

In the same process, it is guaranteed not to repeat through the time bit first, and if the time is the same, it is guaranteed by the sequence bit. At the same time, because the time bits are monotonically increasing, and if the servers do time synchronization in general, the generated primary keys in the distributed environment can be considered as the overall order, which ensures the efficiency of inserting index fields. For example, the primary key of MySQL's Innodb storage engine.

The binary representation of the primary key generated by snowflake algorithm consists of four parts: 1 bit symbol bit, 41 bit timestamp bit, 10 bit working process bit and 12 bit sequence number bit.

Symbol bit (1bit)

The reserved symbol bits are always zero.

Time stamp bit (41bit)

The number of milliseconds that a 41-bit timestamp can accommodate is 2.41 power, and the number of milliseconds used in a year is 365*24*60*60*1000. The calculation shows that:

Math.pow(2, 41) / (365 * 24 * 60 * 60 * 1000L);

The result is about 69.73 years. ShardingSphere's snowflake algorithm dates from 0:00 on November 1, 2016, and can be used until 1986. It is believed that it can meet the requirements of most systems.

Work process bit (10bit)

This flag is unique within the Java process, and if deployed in a distributed application, the id of each working process should be different. The default value is 0, which can be set by attributes.

Sequence Number (12bit)

This sequence is used to generate different ID s in the same millisecond. If the number generated in this millisecond exceeds 4096 (the 12th power of 2), the generator will wait until the next millisecond to continue generating.

Clock callback

Server clock callbacks result in repetitive sequences, so the default distributed primary key generator provides a maximum tolerant number of milliseconds of clock callbacks. If the clock callback time exceeds the maximum tolerant millisecond threshold, the program will report an error; if within the tolerant range, the default distributed primary key generator will wait for the clock to synchronize until the last primary key generation time before continuing to work. The default value of the maximum tolerable clock callback milliseconds is 0, which can be set by attributes.

The detailed structure of the snowflake algorithm's primary key is shown in the figure below.

The advantages and disadvantages of this method are:

Advantage:

The number of milliseconds is high, the self-increasing sequence is low, and the whole ID tends to increase.
It does not depend on third-party systems such as databases, and deploys in a service manner. It has higher stability, and the performance of ID generation is also very high.
bit bits can be allocated according to their own business characteristics, which is very flexible.

Disadvantages:

Strong dependence on machine clocks, if the clock on the machine callback, will lead to repeated calls or services will be unavailable.

1. Generate snowflake ID

/**
 * Twitter_Snowflake<br>
 * SnowFlake The structure is as follows: <br>
 * 0 - 0000000000 0000000000 0000000000 0000000000 0 - 00000 - 00000 - 000000000000 <br>
 * 1 Bit identifier, because long basic type is symbolic in Java, the highest bit is symbolic bit, positive number is 0, negative number is 1, so id is generally positive, the highest bit is 0 < br >
 * 41 Bit time cut (in milliseconds), note that the 41-bit time cut is not the time cut for storing the current time, but the difference between the storage time cut (current time cut-start time cut)
 * The starting time cut here is usually the time when our id generator starts to use, which is specified by our program (as follows: the startTime property of the IdWorker class). The time cut of 41 bits can be used for 69 years, year T = 1 L < 41 / (1000L * 60 * 60 * 24 * 365) = 69 < br >
 * 10 Bit data machine bits can be deployed at 1024 nodes, including 5-bit data center Id and 5-bit workerId < br>.
 * 12 Bit sequences, counts in milliseconds, and 12-bit counting sequence numbers support 4096 ID numbers per millisecond (same machine, same time cut) for each node.
 * It adds up to 64 bits, which is a Long type. <br>
 * SnowFlake The advantage of SnowFlake is that it can generate about 260,000 IDs per second, because it can be sorted by time, and there is no ID collision (distinguished by data center ID and machine ID) in the whole distributed system.
 */
public class SnowflakeIdWorker {

    // ==============================Fields===========================================
    /** Start time cut-off (2015-01-01) */
    private final long twepoch = 1420041600000L;

    /** Number of digits occupied by machine id */
    private final long workerIdBits = 5L;

    /** The number of digits occupied by the data identifier id */
    private final long datacenterIdBits = 5L;

    /** Supported maximum machine id, the result is 31 (this shift algorithm can quickly calculate the maximum decimal number represented by several bits of binary number) */
    private final long maxWorkerId = -1L ^ (-1L << workerIdBits);

    /** Supported maximum data identifier id, resulting in 31 */
    private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);

    /** Number of digits in id of sequence */
    private final long sequenceBits = 12L;

    /** Machine ID moved 12 bits to the left */
    private final long workerIdShift = sequenceBits;

    /** Data id moved 17 bits to the left (12 + 5) */
    private final long datacenterIdShift = sequenceBits + workerIdBits;

    /** Time truncation moves 22 bits to the left (5 + 5 + 12) */
    private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;

    /** The mask of the generated sequence is 4095 (0b111111111111111111111 = 0xfff = 4095) */
    private final long sequenceMask = -1L ^ (-1L << sequenceBits);

    /** Work Machine ID (0-31) */
    private long workerId;

    /** Data Center ID (0-31) */
    private long datacenterId;

    /** Sequences in milliseconds (0-4095) */
    private long sequence = 0L;

    /** Time cut of last ID generation */
    private long lastTimestamp = -1L;

    //==============================Constructors=====================================
    /**
     * Constructor
     * @param workerId Work ID (0-31)
     * @param datacenterId Data Center ID (0-31)
     */
    public SnowflakeIdWorker(long workerId, long datacenterId) {
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        this.workerId = workerId;
        this.datacenterId = datacenterId;
    }

    // ==============================Methods==========================================
    /**
     * Get the next ID (this method is thread-safe)
     * @return SnowflakeId
     */
    public synchronized long nextId() {
        long timestamp = timeGen();

        //If the current time is less than the time stamp generated by the last ID, the system clock should throw an exception when it falls back.
        if (timestamp < lastTimestamp) {
            throw new RuntimeException(
                    String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }

        //If it is generated at the same time, the sequence in milliseconds is performed.
        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            //Sequence overflow in milliseconds
            if (sequence == 0) {
                //Blocking to the next millisecond to get a new timestamp
                timestamp = tilNextMillis(lastTimestamp);
            }
        }
        //Time stamp change, sequence reset in milliseconds
        else {
            sequence = 0L;
        }

        //Time cut of last ID generation
        lastTimestamp = timestamp;

        //Shift and assemble 64-bit ID s together by operation or operation
        return ((timestamp - twepoch) << timestampLeftShift) //
                | (datacenterId << datacenterIdShift) //
                | (workerId << workerIdShift) //
                | sequence;
    }

    /**
     * Block to the next millisecond until a new timestamp is obtained
     * @param lastTimestamp Time cut of last ID generation
     * @return Current timestamp
     */
    protected long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

    /**
     * Returns the current time in milliseconds
     * @return Current time (milliseconds)
     */
    protected long timeGen() {
        return System.currentTimeMillis();
    }

    //==============================Test=============================================
    /** test */
    public static void main(String[] args) {
        SnowflakeIdWorker idWorker = new SnowflakeIdWorker(0, 0);
        for (int i = 0; i < 1000; i++) {
            long id = idWorker.nextId();
            System.out.println(Long.toBinaryString(id));
            System.out.println(id);
        }
    }
}

2. Machine ID and Data Center ID for Snowflake ID Resolution

For example, id: 1146667501642584065

SELECT (1146667501642584065>>12)&0x1f as workerId,(1146667501642584065>>17)&0x1f as datacenterId;

Result:

workerId: 1
datacenterId: 18

Reference resources:
https://www.cnblogs.com/yx88/p/11285120.html?tdsourcetag=s_pcqq_aiomsg

https://blog.csdn.net/weixin_38657051/article/details/94713695?tdsourcetag=s_pcqq_aiomsg

Posted by geroid on Thu, 10 Oct 2019 19:51:55 -0700

Programmer Group

Snowflake algorithm-native

Clock callback

1. Generate snowflake ID

2. Machine ID and Data Center ID for Snowflake ID Resolution

Hot Keywords