Snowflake algorithm-variant-53 bits

Keywords: Programming Java Javascript less github

For most common applications, there is no need for more than 4 million IDs per second, and the number of machines is less than 1024. So we can improve the method of ID generation by using shorter IDs:

53 bitID is composed of 32 bits second timestamp + 16 bits self-increasing + 5 bits machine identification. It accumulates 32 machines and generates 65,000 serial numbers per second. Core code:

package com.itranswarp.util;

import java.net.InetAddress;
import java.net.UnknownHostException;
import java.time.LocalDate;
import java.time.ZoneId;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * 53 bits unique id:
 *
 * |--------|--------|--------|--------|--------|--------|--------|--------|
 * |00000000|00011111|11111111|11111111|11111111|11111111|11111111|11111111|
 * |--------|---xxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxx-----|--------|--------|
 * |--------|--------|--------|--------|--------|---xxxxx|xxxxxxxx|xxx-----|
 * |--------|--------|--------|--------|--------|--------|--------|---xxxxx|
 *
 * Maximum ID = 11111_11111111_11111111_11111111_11111111_11111111_11111111
 *
 * Maximum TS = 11111_11111111_11111111_11111111_111
 *
 * Maximum NT = ----- -------- -------- -------- ---11111_11111111_111 = 65535
 *
 * Maximum SH = ----- -------- -------- -------- -------- -------- ---11111 = 31
 *
 * It can generate 64k unique id per IP and up to 2106-02-07T06:28:15Z.
 */
public final class IdUtil {

	private static final Logger logger = LoggerFactory.getLogger(IdUtil.class);

	private static final Pattern PATTERN_LONG_ID = Pattern.compile("^([0-9]{15})([0-9a-f]{32})([0-9a-f]{3})$");

	private static final Pattern PATTERN_HOSTNAME = Pattern.compile("^.*\\D+([0-9]+)$");

	private static final long OFFSET = LocalDate.of(2000, 1, 1).atStartOfDay(ZoneId.of("Z")).toEpochSecond();

	private static final long MAX_NEXT = 0b11111_11111111_111L;

	private static final long SHARD_ID = getServerIdAsLong();

	private static long offset = 0;

	private static long lastEpoch = 0;

	public static long nextId() {
		return nextId(System.currentTimeMillis() / 1000);
	}

	private static synchronized long nextId(long epochSecond) {
		if (epochSecond < lastEpoch) {
			// warning: clock is turn back:
			logger.warn("clock is back: " + epochSecond + " from previous:" + lastEpoch);
			epochSecond = lastEpoch;
		}
		if (lastEpoch != epochSecond) {
			lastEpoch = epochSecond;
			reset();
		}
		offset++;
		long next = offset & MAX_NEXT;
		if (next == 0) {
			logger.warn("maximum id reached in 1 second in epoch: " + epochSecond);
			return nextId(epochSecond + 1);
		}
		return generateId(epochSecond, next, SHARD_ID);
	}

	private static void reset() {
		offset = 0;
	}

	private static long generateId(long epochSecond, long next, long shardId) {
		return ((epochSecond - OFFSET) << 21) | (next << 5) | shardId;
	}

	private static long getServerIdAsLong() {
		try {
			String hostname = InetAddress.getLocalHost().getHostName();
			Matcher matcher = PATTERN_HOSTNAME.matcher(hostname);
			if (matcher.matches()) {
				long n = Long.parseLong(matcher.group(1));
				if (n >= 0 && n < 8) {
					logger.info("detect server id from host name {}: {}.", hostname, n);
					return n;
				}
			}
		} catch (UnknownHostException e) {
			logger.warn("unable to get host name. set server id = 0.");
		}
		return 0;
	}

}

By subtracting a fixed value from the timestamp, this scheme can be supported up to 2106.

What if 65,000 serial numbers per second are not enough? It doesn't matter. You can continue to increment the timestamp and borrow 65,000 serial numbers in the next second.

At the same time, it solves the problem of time callback.

Machine identification adopts a simple host name scheme. As long as the host name conforms to host-1, host-2 can automatically extract the machine identification without configuration.

Finally, why use up to 53-bit integers instead of 64-bit integers? This is because considering that most applications are Web applications, JavaScript loses precision when it comes to dealing with JavaScript because the largest integer supported by JavaScript is 53 bits. Therefore, using 53-bit integers can be read directly by JavaScript, and over 53-bit integers must be converted into strings to ensure that JavaScript is processed correctly, which brings additional complexity to the API interface. This is why the API interface of Sina Weibo returns id and idstr at the same time.

 

Reference resources:

https://www.liaoxuefeng.com/article/1280526512029729

https://github.com/michaelliao/itranswarp/blob/master/src/main/java/com/itranswarp/util/IdUtil.java

Posted by yellowzm on Thu, 10 Oct 2019 20:14:05 -0700