Is the snowflake algorithm really useful for System.currentTimeMillis() optimization?

Keywords: Back-end Distribution

As mentioned earlier, the snowflake algorithm uses System.currentTimeMillis() to obtain time. There is a saying that System.currentTimeMillis() is slow because each call will deal with the system. In the case of high concurrency, a large number of concurrent system calls are easy to affect the performance (calling it is even more time-consuming than new, an ordinary object. After all, the object generated by new is only in the heap in Java Memory). We can see that it calls the native method:

// Returns the current time in milliseconds. Note that although the time unit of the returned value is milliseconds, the granularity of the value depends on the underlying operating system and may be larger. For example, many operating systems measure time in tens of milliseconds.
public static native long currentTimeMillis();

Therefore, it has been proposed to use the background thread to update the clock regularly, which is a single example to avoid dealing with the system every time and frequent thread switching. This may improve efficiency.

Is this optimization true?

Optimize code first:

package snowflake;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

public class SystemClock {

    private final int period;

    private final AtomicLong now;

    private static final SystemClock INSTANCE = new SystemClock(1);

    private SystemClock(int period) {
        this.period = period;
        now = new AtomicLong(System.currentTimeMillis());

    private void scheduleClockUpdating() {
        ScheduledExecutorService scheduleService = Executors.newSingleThreadScheduledExecutor((r) -> {
            Thread thread = new Thread(r);
            return thread;
        scheduleService.scheduleAtFixedRate(() -> {
        }, 0, period, TimeUnit.MILLISECONDS);

    private long get() {
        return now.get();

    public static long now() {
        return INSTANCE.get();


Just replace System.currentTimeMillis() with

The SnowFlake algorithm code is also here:

package snowflake;

public class SnowFlake {

    // Data center (machine room) id
    private long datacenterId;
    // Machine ID
    private long workerId;
    // Same time series
    private long sequence;

    public SnowFlake(long workerId, long datacenterId) {
        this(workerId, datacenterId, 0);

    public SnowFlake(long workerId, long datacenterId, long sequence) {
        // Legal judgment
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        System.out.printf("worker starting. timestamp left shift %d, datacenter id bits %d, worker id bits %d, sequence bits %d, workerid %d",
                timestampLeftShift, datacenterIdBits, workerIdBits, sequenceBits, workerId);

        this.workerId = workerId;
        this.datacenterId = datacenterId;
        this.sequence = sequence;

    // Start timestamp (2021-10-16 22:03:32)
    private long twepoch = 1634393012000L;

    // The machine room number, the number of digits occupied by the ID of the machine room, 5 bit s, the maximum: 11111 (binary) - > 31 (decimal)
    private long datacenterIdBits = 5L;

    // The maximum number of digits occupied by the machine ID is 5 bit s: 11111 (binary) - > 31 (decimal)
    private long workerIdBits = 5L;

    // 5 bit can only have 31 digits at most, that is, the machine id can only be within 32 at most
    private long maxWorkerId = -1L ^ (-1L << workerIdBits);

    // 5 bit can only have 31 digits at most, and the machine room id can only be within 32 at most
    private long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);

    // The number of bits occupied by the sequence at the same time is 12 bits 111111111111 = 4095. At most, 4096 bits are generated in the same millisecond
    private long sequenceBits = 12L;

    // Offset of workerId
    private long workerIdShift = sequenceBits;

    // Offset of datacenter ID
    private long datacenterIdShift = sequenceBits + workerIdBits;

    // Offset of timestampLeft
    private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;

    // Serial number mask 4095 (0b111111 = 0xfff = 4095)
    // It is used for the sum operation of serial number to ensure that the maximum value of serial number is between 0-4095
    private long sequenceMask = -1L ^ (-1L << sequenceBits);

    // Last timestamp
    private long lastTimestamp = -1L;

    // Get machine ID
    public long getWorkerId() {
        return workerId;

    // Get machine room ID
    public long getDatacenterId() {
        return datacenterId;

    // Get the latest timestamp
    public long getLastTimestamp() {
        return lastTimestamp;

    // Get next random ID
    public synchronized long nextId() {
        // Gets the current timestamp in milliseconds
        long timestamp = timeGen();

        if (timestamp < lastTimestamp) {
            System.err.printf("clock is moving backwards.  Rejecting requests until %d.", lastTimestamp);
            throw new RuntimeException(String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds",
                    lastTimestamp - timestamp));

        // duplicate removal
        if (lastTimestamp == timestamp) {

            sequence = (sequence + 1) & sequenceMask;

            // Sequence sequence greater than 4095
            if (sequence == 0) {
                // Method called to the next timestamp
                timestamp = tilNextMillis(lastTimestamp);
        } else {
            // If it is the first acquisition of the current time, it is set to 0
            sequence = 0;

        // Record last timestamp
        lastTimestamp = timestamp;

        // Offset calculation
        return ((timestamp - twepoch) << timestampLeftShift) |
                (datacenterId << datacenterIdShift) |
                (workerId << workerIdShift) |

    private long tilNextMillis(long lastTimestamp) {
        // Get latest timestamp
        long timestamp = timeGen();
        // If the latest timestamp is found to be less than or equal to the timestamp whose serial number has exceeded 4095
        while (timestamp <= lastTimestamp) {
            // If not, continue
            timestamp = timeGen();
        return timestamp;

    private long timeGen() {
        // return System.currentTimeMillis();

    public static void main(String[] args) {
        SnowFlake worker = new SnowFlake(1, 1);
        long timer = System.currentTimeMillis();
        for (int i = 0; i < 10000000; i++) {
        System.out.println(System.currentTimeMillis() - timer);

Windows: i5-4590 16G memory 4-core 512 solid state

Mac: Mac pro 2020 512G solid state 16G memory

Linux: deepin system, virtual machine, 160G disk, 8G memory

Test System.currentTimeMillis() in a single thread environment:

Platform / data volume10000100000010000000100000000

Test in a single threaded environment:

Platform / data volume10000100000010000000100000000

The above single thread test does not reflect the advantage of background clock thread processing. On the contrary, under windows, when the amount of data is large, it becomes abnormally slow. On linux system, it is not fast, but a little slow.

Multithreaded test code:

    public static void main(String[] args) throws InterruptedException {
        int threadNum = 16;
        CountDownLatch countDownLatch = new CountDownLatch(threadNum);
        int num = 100000000 / threadNum;
        long timer = System.currentTimeMillis();
        thread(num, countDownLatch);
        System.out.println(System.currentTimeMillis() - timer);


    public static void thread(int num, CountDownLatch countDownLatch) {
        List<Thread> threadList = new ArrayList<>();
        for (int i = 0; i < countDownLatch.getCount(); i++) {
            Thread cur = new Thread(new Runnable() {
                public void run() {
                    SnowFlake worker = new SnowFlake(1, 1);
                    for (int i = 0; i < num; i++) {
        for (Thread t : threadList) {

Let's test 100000000 (one hundred million) data volume System.currentTimeMillis() with different thread numbers:

Platform / thread24816

Test 100000000 (100 million) data volume with different number of threads

Platform / thread24816

In the case of multithreading, we can see that there is not much change on the mac. With the increase of the number of threads, the speed becomes faster until it exceeds 8, but it is obviously slower on windows. During the test, I began to brush up small videos before running the results. Moreover, this data is also related to the core of the processor. When the number of threads in Windows exceeds 4 After, it slows down because my machine has only four cores. If it exceeds, many context switches will occur.

On linux, due to the virtual machine, the number of cores does not play much role when it increases, but the time is slower than directly calling System.currentTimeMillis().

But there is still a problem. Which is the greater probability of time repetition for different method calls?

    static AtomicLong atomicLong = new AtomicLong(0);
    private long timeGen() {
        // return;
        return System.currentTimeMillis();

The following is the number of times timeGen() is called with 10 million IDs and eight threads, that is, the number of time conflicts can be seen:

Platform /

You can see that Maintaining your own time and obtaining the same time are more likely to trigger more repeated calls and more conflicts. This is an adverse factor! Another cruel fact is that the obtained time is not so accurate when you refresh the background time defined by yourself. In linux, the gap is even greater, and there are too many time conflicts.


In the actual test, we did not find that can optimize great efficiency. On the contrary, it is more likely to obtain time conflict due to competition. JDK developers are really not stupid. They should also have been tested for a long time, which is much more reliable than our own tests. Therefore, from my personal point of view, it is finally proved that this optimization is not so reliable.

Don't believe a conclusion easily. If you have any questions, please do experiments or find enough authoritative statements.

[about the author]:
Qin Huai, official account [Qinhuai grocery store] Author, the road of technology is not at the moment. Although it is slow, it does not stop. My personal writing direction: Java source code analysis, JDBC, Mybatis, Spring, redis, distributed, sword finger Offer, LeetCode, etc. I seriously write every article. I don't like the title party and fancy. I mostly write a series of articles. I can't guarantee that what I write is completely correct, but I promise What you have written has been practiced or searched for information. Please correct any omissions or errors.

Sword finger Offer all questions PDF

What did I write in 2020?

Open source programming notes

Posted by JessePHP on Tue, 30 Nov 2021 08:51:27 -0800