Redis Cultivation - cardinality statistics: HyperLogLog

Keywords: Redis Spring SpringBoot Apache

There is no end to learning.

brief introduction

HyperLogLog is an advanced data structure in Redis. It is mainly used for cardinal Statistics (de duplication Statistics) of massive data (2 ^ 64 data can be counted). It is characterized by high speed and small space (12KB). But there are errors in the calculation, the standard error is 0.81%. HyperLogLog only calculates the cardinality based on the input elements, and does not store the input elements themselves, so it cannot judge whether a given element already exists.

Basic instructions

pfadd(key,value...)

You can add multiple elements by adding the specified elements to the HyperLogLog

    public void pfAdd(String key, String... value) {
        stringRedisTemplate.opsForHyperLogLog().add(key, value);
    }

pfcount(key...)

Returns the cardinality estimate for the given HyperLogLog. When multiple hyperloglogs are counted at one time, it is necessary to compare multiple HyperLogLog structures and put the result of union into a temporary HyperLogLog. The performance is not high. Use it with caution

    public Long pfCount(String... key) {
        return stringRedisTemplate.opsForHyperLogLog().size(key);
    }

pfmerge(destkey, sourcekey...)

Merge multiple hyperloglogs and put the result of the union into a specified HyperLogLog

    public void pfMerge(String destKey, String... sourceKey) {
        stringRedisTemplate.opsForHyperLogLog().union(destKey, sourceKey);
    }

Error testing

SpringBoot based error testing, initialization of 5 HyperLogLog, each randomly added 10000 elements, and then call pfcount to see the specific error:

@RestController
@RequestMapping("/redis/hll")
public class HyperController {

    private final RedisService redisService;

    public HyperController(RedisService redisService) {
        this.redisService = redisService;
    }

    @GetMapping("/init")
    public String init() {
        for (int i = 0; i < 5; i++) {
            Thread thread = new Thread(() -> {
                String name = Thread.currentThread().getName();
                Random r = new Random();
                int begin = r.nextInt(100) * 10000;
                int end = begin + 10000;
                for (int j = begin; j < end; j++) {
                    redisService.pfAdd("hhl:" + name, j + "");
                }
                System.out.printf("Thread [%s]Complete data initialization, interval[%d, %d)\n", name, begin, end);
            },
                    i + "");
            thread.start();
        }
        return "success";
    }

    @GetMapping("/count")
    public String count() {
        long a = redisService.pfCount("hhl:0");
        long b = redisService.pfCount("hhl:1");
        long c = redisService.pfCount("hhl:2");
        long d = redisService.pfCount("hhl:3");
        long e = redisService.pfCount("hhl:4");
        System.out.printf("hhl:0 -> count: %d, rate: %f\n", a, (10000 - a) * 1.00 / 100);
        System.out.printf("hhl:1 -> count: %d, rate: %f\n", b, (10000 - b) * 1.00 / 100);
        System.out.printf("hhl:2 -> count: %d, rate: %f\n", c, (10000 - c) * 1.00 / 100);
        System.out.printf("hhl:3 -> count: %d, rate: %f\n", d, (10000 - d) * 1.00 / 100);
        System.out.printf("hhl:4 -> count: %d, rate: %f\n", e, (10000 - e) * 1.00 / 100);
        return "success";
    }
}

Initialization data, calling interface: http://localhost:8080/redis/hll/init

Thread [4] completes data initialization, interval [570000, 580000)
Thread [2] completes data initialization, interval [70000, 80000)
Thread [0] completes data initialization, interval [670000, 680000)
Thread [1] completes data initialization, interval [210000, 220000)
Thread [3] completes data initialization, interval [230000, 240000)

Check the specific statistics and calculate the error: http://localhost:8080/redis/hll/count

hhl:0 -> count: 10079, rate: -0.790000
hhl:1 -> count: 9974, rate: 0.260000
hhl:2 -> count: 10018, rate: -0.180000
hhl:3 -> count: 10053, rate: -0.530000
hhl:4 -> count: 9985, rate: 0.150000

actual combat

For example, we need to count the popularity of articles and the number of effective user clicks. You can count the heat through the counter of Reis, and execute the incr instruction every time. Use HyperLogLog to count the number of valid users.

Implementation ideas

Through AOP and user-defined annotation, the statistics of articles to be counted are as follows:

  • Annotate the article interface that needs statistics
  • Set the custom annotation value to the key corresponding to the HyperLogLog
  • Set AOP's pointcut as a custom annotation
  • Get annotation value in AOP
  • Judge user information by token or cookie in AOP
  • Accumulated heat and user volume

pom

Introduction of redis and aop

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-redis</artifactId>
    </dependency>
    
    <!-- redis Lettuce Mode connection pool -->
    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-pool2</artifactId>
    </dependency>
    
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-aop</artifactId>
    </dependency>

Define custom annotations

@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface Article {

    /**
     * The value is the key of the corresponding HyperLogLog
     */
    String value() default "";
}

Define AOP

@Aspect
@Component
public class ArticleAop {

    private static final String PV_PREFIX = "PV:";

    private static final String UV_PREFIX = "UV:";

    @Autowired
    private RedisService redisService;

    /**
     * Defining entry points
     */
    @Pointcut("@annotation(org.ylc.note.redis.hyperloglog.annotation.Article)")
    private void statistics() {
    }

    @Around("statistics()")
    public Object doAround(ProceedingJoinPoint proceedingJoinPoint) throws Throwable {
        // Get annotations
        Method method = ((MethodSignature) proceedingJoinPoint.getSignature()).getMethod();
        Article visitPermission = method.getAnnotation(Article.class);
        String value = visitPermission.value();

        // Get request information
        ServletRequestAttributes attributes = (ServletRequestAttributes) RequestContextHolder.getRequestAttributes();
        HttpServletRequest request = attributes.getRequest();
        // It's used for simulation and passed in directly through parameters. In the actual project, it can be implemented according to token or cookie
        String userId = request.getParameter("userId");

        // Degree of heat
        redisService.incr(PV_PREFIX + value);
        // User volume
        redisService.pfAdd(UV_PREFIX + value, userId);

        // Specific methods of implementation
        return proceedingJoinPoint.proceed();
    }
}

Defining interfaces

Add @ Article() annotation to the interface requiring statistics

@RestController
@RequestMapping("/redis/article")
public class ArticleController {

    @Autowired
    private RedisService redisService;

    @Article("it")
    @GetMapping("/it")
    public String it(String userId) {
        String pv = redisService.get("PV:it");
        long uv = redisService.pfCount("UV:it");
        return String.format("Current user:[%s],current it Heat like degree:[%s],Number of users visited:[%d]", userId, pv, uv);
    }

    @Article("news")
    @GetMapping("/news")
    public String news(String userId) {
        String pv = redisService.get("PV:news");
        long uv = redisService.pfCount("UV:news");
        return String.format("Current user:[%s],current news Heat like degree:[%s],Number of users visited:[%d]", userId, pv, uv);
    }

    @GetMapping("/statistics")
    public Object statistics() {
        String pvIt = redisService.get("PV:it");
        long uvIt = redisService.pfCount("UV:it");

        String pvNews = redisService.get("PV:news");
        long uvNews = redisService.pfCount("UV:news");

        redisService.pfMerge("UV:merge", "UV:it", "UV:news");
        long uvMerge = redisService.pfCount("UV:merge");

        Map<String, String> result = new HashMap<>();
        result.put("it", String.format("it Heat like degree:[%s],Number of users visited:[%d];", pvIt, uvIt));
        result.put("news", String.format("news Heat like degree:[%s],Number of users visited:[%d]", pvNews, uvNews));
        result.put("merge", String.format("Number of users accessed after merging:[%d]", uvMerge));
        return result;
    }
}

Access source code

All codes are uploaded to Github for easy access

>>>>>>Redis practice - hyperloglog<<<<<<

Daily praise

It's not easy to create. If you think it's helpful, ask for some support

Seeking attention

WeChat public address: Yu Daxian

Published 7 original articles, won praise 6, visited 102
Private letter follow

Posted by scavok on Sat, 01 Feb 2020 00:06:32 -0800