How does Redis realize the function of "people nearby"?

Aiming at the application scenario of "people nearby" in the field of location services, common can be realized by using the spatial indexes of various DB S such as PG, MySQL and MongoDB. Redis has found another way, combined with its ordered queue zset and geohash coding, to realize the spatial search function, and has high operation efficiency.

 

This paper will analyze the algorithm principle from the perspective of source code, and calculate the query time complexity.


To provide a complete "nearby people" service, the most basic thing is to realize the functions of "add", "delete" and "query". The following will be introduced separately, in which the query function will be analyzed.

Operation command

Since Redis 3.2, redis has provided geographic location related functions based on geohash and ordered collection. Redis Geo module contains the following six commands:

 

  • GEOADD: add the given location object (latitude, longitude, name) to the specified key;
  • GEOPOS: returns the positions (longitude and latitude) of all given location objects from the key;
  • GEODIST: returns the distance between two given positions;
  • Geohash: returns the Geohash representation of one or more location objects;
  • GEORADIUS: take the given latitude and longitude as the center, and return all location objects in the target set whose distance from the center does not exceed the given maximum distance;
  • Gearadiusbymember: returns all location objects whose distance does not exceed the given maximum distance with the given location object as the center.

 

Among them, the combination of GEOADD and GEORADIUS can realize the basic functions of "adding" and "checking" in "nearby people".

 

To realize the "people nearby" function in wechat, you can directly use the gearadiusbymember command. The "given location object" is the user himself, and the search object is other users.

 

However, in essence, geordiusbymember = geopos + geordius, that is, first find the user location, and then search for other user objects nearby that meet the mutual distance condition.


The following will analyze the GEOADD and GEORADIUS commands from the perspective of source code and analyze their algorithm principle.

Redis geo Operations only include "add" and "query" operations, and there is no special "delete" command. Mainly because Redis Use ordered sets internally(zset)Save location object, available zrem Delete.
 
stay Redis Source code geo.c In the file comment of, it only indicates that the file is GEOADD,GEORADIUS and GEORADIUSBYMEMBER Implementation file (in fact, the other three commands are also implemented). From the side, the other three commands are auxiliary commands.

GEOADD

 

Mode of use

GEOADD key longitude latitude member [longitude latitude member ...]

Add the given location object (latitude, longitude, name) to the specified key.

 

Where key is the collection name and member is the object corresponding to the latitude and longitude. In practice, when the number of objects to be stored is too large, you can shard the object set in disguise by setting multiple keys (such as saving one key) to avoid too many single sets.

 

Return value after successful insertion:

 

(integer) N

 

Where N is the number of successful inserts.

Source code analysis

/* GEOADD key long lat name [long2 lat2 name2 ... longN latN nameN] */
void geoaddCommand(client *c) {
 
//Parameter verification
    /* Check arguments number for sanity. */
    if ((c->argc - 2) % 3 != 0) {
        /* Need an odd number of arguments if we got this far... */
        addReplyError(c, "syntax error. Try GEOADD key [x1] [y1] [name1] "
                         "[x2] [y2] [name2] ... ");
          return;
    }
 
//Parameter extraction Redis
    int elements = (c->argc - 2) / 3;
    int argc = 2+elements*2; /* ZADD key score ele ... */
    robj **argv = zcalloc(argc*sizeof(robj*));
    argv[0] = createRawStringObject("zadd",4);
    argv[1] = c->argv[1]; /* key */
    incrRefCount(argv[1]);
 
//Parameter traversal+transformation
    /* Create the argument vector to call ZADD in order to add all
     * the score,value pairs to the requested zset, where score is actually
     * an encoded version of lat,long. */
    int i;
    for (i = 0; i < elements; i++) {
        double xy[2];
 
    //Extract latitude and longitude
        if (extractLongLatOrReply(c, (c->argv+2)+(i*3),xy) == C_ERR) {
            for (i = 0; i < argc; i++)
                if (argv[i]) decrRefCount(argv[i]);
            zfree(argv);
            return;
        }
     
    //Convert longitude and latitude to 52 bit geohash As a score & Extract object name
        /* Turn the coordinates into the score of the element. */
        GeoHashBits hash;
        geohashEncodeWGS84(xy[0], xy[1], GEO_STEP_MAX, &hash);
        GeoHashFix52Bits bits = geohashAlign52Bits(hash);
        robj *score = createObject(OBJ_STRING, sdsfromlonglong(bits));
        robj *val = c->argv[2 + i * 3 + 2];
 
    //Set the object element name and score of the ordered collection
        argv[2+i*2] = score;
        argv[3+i*2] = val;
        incrRefCount(val);
    }
 
//call zadd Command to store the converted object
    /* Finally call ZADD that will do the work for us. */
    replaceClientCommandVector(c,argc,argv);
    zaddCommand(c);
}

Through the source code analysis, it can be seen that Redis uses an ordered set (zset) to save location objects. Each element in the ordered set is an object with location, and the score value of the element is the 52 bit geohash value corresponding to its longitude and latitude.

double Type accuracy is 52 bits;
geohash So base32 Mode coding, 52 bits Up to 10 bits can be stored geohash Value, the corresponding geographic area size is 0.6*0.6 A grid of meters. In other words, Jing Redis geo The converted position will theoretically be about 0.3*1.414=0.424 Meter error.

Algorithm summary

Briefly summarize what the GEOADD command does:
1. Parameter extraction and verification;
2. Convert the longitude and latitude of the incoming parameter into a 52 bit geohash value (score);
3. Call the ZADD command to store the member and its corresponding score in the set key.

 

GEORADIUS

Mode of use

GEORADIUS key longitude latitude radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [ASC|DESC] [COUNT count] [STORE key] [STORedisT key]

Taking the given latitude and longitude as the center, returns all position objects in the target set whose distance from the center does not exceed the given maximum distance.

 

Range unit: m | km | ft | Mi -- > m | km | ft | mile

 

Additional parameters:

-WITHDIST: returns the distance between the location object and the center while returning the location object. The unit of distance is consistent with the range unit given by the user.

 

-WITHCOORD: returns the longitude and dimension of the location object together.

 

-WITHHASH: returns the ordered set score of the location object encoded by the original geohash in the form of a 52 bit signed integer. This option is mainly used for low-level application or debugging, and has little effect in practice.

 

-ASC|DESC: return location object elements from near to far | return location object elements from far to near- COUNT count: select the first N matching location object elements. (if not set, all elements will be returned) - STORE key: saves the geographic location information of the returned result to the specified key- STORedisT key: saves the distance from the returned result to the center point to the specified key.

because STORE and STORedisT The existence of two options, GEORADIUS and GEORADIUSBYMEMBER The command is technically marked as a write command, so that only the primary instance is queried (written), QPS If it is too high, it is easy to cause excessive reading and writing pressure on the main instance.

 

To solve this problem, in Redis 3.2.10 and Redis 4.0.0 In, new GEORADIUS_RO and GEORADIUSBYMEMBER_RO Two read-only commands.


However, in the actual development, the author found that java package Redis.clients.jedis.params.geo of GeoRadiusParam Parameter class does not contain STORE and STORedisT Two parameter options, called georadius Whether only the primary instance is queried or whether it is encapsulated read-only. Interested friends can study it by themselves.

Return value after successful query:

Without the WITH restriction, a member list is returned, such as:

 

["member1","member2","member3"]

WITH the limit of WITH, each member in the member list is also a nested list, such as:

[
  ["member1", distance1, [longitude1, latitude1]]
  ["member2", distance2, [longitude2, latitude2]]
]

Source code analysis

This section of the source code is long. If you can't read it, you can directly read the Chinese notes or jump to the summary
/* GEORADIUS key x y radius unit [WITHDIST] [WITHHASH] [WITHCOORD] [ASC|DESC]
 * [COUNT count] [STORE key] [STORedisT key]
 * GEORADIUSBYMEMBER key member radius unit ... options ... */
void georadiusGeneric(client *c, int flags) {
    robj *key = c->argv[1];
    robj *storekey = NULL;
    int stoRedist = 0; /* 0 for STORE, 1 for STORedisT. */
 
//according to key Get ordered collection
    robj *zobj = NULL;
    if ((zobj = lookupKeyReadOrReply(c, key, shared.null[c->resp])) == NULL ||
        checkType(c, zobj, OBJ_ZSET)) {
        return;
    }
 
//According to user input (latitude and longitude)/member)Confirm the longitude and latitude of the center point
    int base_args;
    double xy[2] = { 0 };
    if (flags & RADIUS_COORDS) {
    ......
    }
 
//Get query range distance
    double radius_meters = 0, conversion = 1;
    if ((radius_meters = extractDistanceOrReply(c, c->argv + base_args - 2,
                                                &conversion)) < 0) {
        return;
    }
 
//Get optional parameters( withdist,withhash,withcoords,sort,count)
    int withdist = 0, withhash = 0, withcoords = 0;
    int sort = SORT_NONE;
    long long count = 0;
    if (c->argc > base_args) {
        ... ...
    }
 
//obtain STORE and STORedisT parameter
    if (storekey && (withdist || withhash || withcoords)) {
        addReplyError(c,
            "STORE option in GEORADIUS is not compatible with "
            "WITHDIST, WITHHASH and WITHCOORDS options");
        return;
    }
 
//Set sort
    if (count != 0 && sort == SORT_NONE) sort = SORT_ASC;
 
//Calculate the range of the target area using the center point and radius
    GeoHashRadius georadius =
        geohashGetAreasByRadiusWGS84(xy[0], xy[1], radius_meters);
 
//On the center point and its surrounding 8 geohash Search the grid area to find the element objects in the range
    geoArray *ga = geoArrayCreate();
    membersOfAllNeighbors(zobj, georadius, xy[0], xy[1], radius_meters, ga);
 
//Unmatched return empty
    /* If no matching results, the user gets an empty reply. */
    if (ga->used == 0 && storekey == NULL) {
        addReplyNull(c);
        geoArrayFree(ga);
        return;
    }
 
//Setting and returning of some return values
    ......
    geoArrayFree(ga);
}

There are two core steps in the above code: one is "calculate the center point range" and the other is "find the center point and its surrounding 8 geohash grid areas".

 

The corresponding functions are geohashGetAreasByRadiusWGS84 and membersOfAllNeighbors.

 

Let's look at it in turn:

  • Calculation center point range:

// geohash_helper.c

GeoHashRadius geohashGetAreasByRadiusWGS84(double longitude, double latitude,
                                           double radius_meters) {
    return geohashGetAreasByRadius(longitude, latitude, radius_meters);
}
 
//Return 9 items that can cover the target area geohashBox
GeoHashRadius geohashGetAreasByRadius(double longitude, double latitude, double radius_meters) {
//Some parameter settings
    GeoHashRange long_range, lat_range;
    GeoHashRadius radius;
    GeoHashBits hash;
    GeoHashNeighbors neighbors;
    GeoHashArea area;
    double min_lon, max_lon, min_lat, max_lat;
    double bounds[4];
    int steps;
 
//Calculate the longitude and latitude range of the circumscribed rectangle in the target area (the target area is a circle with the target longitude and latitude as the center and the radius as the specified distance)
    geohashBoundingBox(longitude, latitude, radius_meters, bounds);
    min_lon = bounds[0];
    min_lat = bounds[1];
    max_lon = bounds[2];
    max_lat = bounds[3];
 
//According to the latitude and radius of the center point of the target area, calculate the value of 9 search boxes with query geohash Precision (bit)
//Used here latitude The accuracy is mainly adjusted for polar conditions (the higher the latitude, the smaller the number of digits)
    steps = geohashEstimateStepsByRadius(radius_meters,latitude);
 
//Set the maximum and minimum value of latitude and longitude:-180<=longitude<=180, -85<=latitude<=85
    geohashGetCoordRange(&long_range,&lat_range);
     
//Set the latitude and longitude to be checked to the specified accuracy( steps)Code into geohash value
    geohashEncode(&long_range,&lat_range,longitude,latitude,steps,&hash);
     
//take geohash The value is expanded in 8 directions to determine the surrounding 8 Box(neighbors)
    geohashNeighbors(&hash,&neighbors);
     
//according to hash Value determination area Latitude and longitude range
    geohashDecode(long_range,lat_range,hash,&area);
 
//Some special cases
    ......
 
//Build and return results
    radius.hash = hash;
    radius.neighbors = neighbors;
    radius.area = area;
    return radius;
}
  • Find the center point and its surrounding 8 geohash grid areas:
  •  

    // geo.c

//In 9 months hashBox Get the desired element from the
int membersOfAllNeighbors(robj *zobj, GeoHashRadius n, double lon, double lat, double radius, geoArray *ga) {
    GeoHashBits neighbors[9];
    unsigned int i, count = 0, last_processed = 0;
    int debugmsg = 0;
 
//Get 9 searches hashBox
    neighbors[0] = n.hash;
    ......
    neighbors[8] = n.neighbors.south_west;
 
//In each hashBox Search for target points in
    for (i = 0; i < sizeof(neighbors) / sizeof(*neighbors); i++) {
        if (HASHISZERO(neighbors[i])) {
            if (debugmsg) D("neighbors[%d] is zero",i);
            continue;
        }
 
  //Eliminate possible duplication hashBox (Search radius>5000KM May occur when)
        if (last_processed &&
            neighbors[i].bits == neighbors[last_processed].bits &&
            neighbors[i].step == neighbors[last_processed].step)
        {
            continue;
        }
 
  //search hashBox Objects that meet the criteria in
        count += membersOfGeoHashBox(zobj, neighbors[i], ga, lon, lat, radius);
        last_processed = i;
    }
    return count;
}
 
 
int membersOfGeoHashBox(robj *zobj, GeoHashBits hash, geoArray *ga, double lon, double lat, double radius) {
//obtain hashBox Maximum and minimum in geohash Value (52 bits)
    GeoHashFix52Bits min, max;
    scoresOfGeoHashBox(hash,&min,&max);
 
//According to maximum and minimum geohash Value filtering zobj Points in the set that meet the conditions
    return geoGetPointsInRange(zobj, min, max, lon, lat, radius, ga);
}
 
 
int geoGetPointsInRange(robj *zobj, double min, double max, double lon, double lat, double radius, geoArray *ga) {
 
//search Range Parameter boundary settings for (i.e. 9) hashBox Boundary range of one of them)
    zrangespec range = { .min = min, .max = max, .minex = 0, .maxex = 1 };
    size_t origincount = ga->used;
    sds member;
 
//Search collection zobj May have ZIPLIST and SKIPLIST There are two coding methods, here SKIPLIST For example, the logic is the same
    if (zobj->encoding == OBJ_ENCODING_ZIPLIST) {
        ......
    } else if (zobj->encoding == OBJ_ENCODING_SKIPLIST) {
        zset *zs = zobj->ptr;
        zskiplist *zsl = zs->zsl;
        zskiplistNode *ln;
 
  //Get in hashBox The first element in the range (jump table data structure, efficiency comparable to binary lookup tree), if not, return 0
        if ((ln = zslFirstInRange(zsl, &range)) == NULL) {
            /* Nothing exists starting at our min. No results. */
            return 0;
        }
 
  //Traverse the collection from the first element
        while (ln) {
            sds ele = ln->ele;
    //Traversal element exceeded range Scope rule break
            /* Abort when the node is no longer in range. */
            if (!zslValueLteMax(ln->score, &range))
                break;
    //Element verification (calculate the distance between the element and the center point)
            ele = sdsdup(ele);
            if (geoAppendIfWithinRadius(ga,lon,lat,radius,ln->score,ele)
                == C_ERR) sdsfree(ele);
            ln = ln->level[0].forward;
        }
    }
    return ga->used - origincount;
}
 
int geoAppendIfWithinRadius(geoArray *ga, double lon, double lat, double radius, double score, sds member) {
    double distance, xy[2];
 
//Decoding error, return error
    if (!decodeGeohash(score,xy)) return C_ERR; /* Can't decode. */
 
//Final distance verification(Calculate spherical distance distance See if it is less than radius)
    if (!geohashGetDistanceIfInRadiusWGS84(lon,lat, xy[0], xy[1],
                                           radius, &distance))
    {
        return C_ERR;
    }
 
//Build and return elements that meet the conditions
    geoPoint *gp = geoArrayAppend(ga);
    gp->longitude = xy[0];
    gp->latitude = xy[1];
    gp->dist = distance;
    gp->member = member;
    gp->score = score;
    return C_OK;
}
  • Algorithm summary

    In addition to many optional parameters, the following is a brief summary of how the GEORADIUS command uses geohash to obtain the target location object:

     

    1. Parameter extraction and verification;

     

    2. Calculate the range of the area to be checked by using the center point and the input radius. This range parameter includes the highest geohash grid level (accuracy) that meets the conditions and the corresponding Jiugong grid position that can cover the target area; (detailed description will be given later)

     

    3. Traverse the Jiugong grid and select the location object according to the range box of each geohash grid. Further find the object whose distance from the center point is less than the input radius and return.

     

    The direct description is not easy to understand. We simply demonstrate the algorithm through the following two figures:

  •  

     

     

     

    The center of the left figure is the search center, the green circular area is the target area, all points are the location objects to be searched, and the red points are the location objects that meet the conditions.

     

    During the actual search, the geohash grid level (i.e. the grid size level in the right figure) will be calculated according to the search radius, and the position of the Jiugong grid (i.e. the position information of the red Jiugong grid) will be determined; then the distance between the points in the Jiugong grid (blue points and red points) and the center point will be found and calculated in turn, and finally the points within the distance range (red points) will be filtered.

     

    algorithm analysis

    Why should we use this algorithm strategy for query, or what are the advantages of this strategy? Let's analyze and explain it in the way of question and answer.

     

    Why do you want to find the highest geohash grid level that meets the conditions? Why use the Jiugong grid?

     

    This is actually a problem. In essence, it is a preliminary screening of all element objects.   In the multi-layer geohash grid, each low-level geohash grid is spliced by four high-level grids (as shown in the figure).

  •  

     

    The center of the left figure is the search center, the green circular area is the target area, all points are the location objects to be searched, and the red points are the location objects that meet the conditions.

     

    During the actual search, the geohash grid level (i.e. the grid size level in the right figure) will be calculated according to the search radius, and the position of the Jiugong grid (i.e. the position information of the red Jiugong grid) will be determined; then the distance between the points in the Jiugong grid (blue points and red points) and the center point will be found and calculated in turn, and finally the points within the distance range (red points) will be filtered.

     

    algorithm analysis

    Why should we use this algorithm strategy for query, or what are the advantages of this strategy? Let's analyze and explain it in the way of question and answer.

     

    Why do you want to find the highest geohash grid level that meets the conditions? Why use the Jiugong grid?

     

    This is actually a problem. In essence, it is a preliminary screening of all element objects.   In the multi-layer geohash grid, each low-level geohash grid is spliced by four high-level grids (as shown in the figure).

  •  

     

    It has the query efficiency similar to binary lookup tree, the average operation time complexity is O(log(N)), and all the elements at the bottom are arranged in order in the form of linked list.

     

    Therefore, when querying, you only need to find the first value in the target geohash grid in the set, and then compare it in turn without multiple searches.  

     

    The reason why the Jiugong grid cannot be checked together is that the geohash value corresponding to each grid of the Jiugong grid is not continuous. Only when it is continuous, the query efficiency will be high, otherwise many distance operations will be done.

     

    To sum up, we have analyzed the detailed process of "GEOADD" and "GEORADIUS" in Redis Geo module from the perspective of source code, and can calculate the function of GEORADIUS in Redis to find people nearby. The time complexity is O(N+log(M))

     

    Where N is the number of location elements within the specified radius, and M is the number of elements surrounded by the Jiugong grid to calculate the distance. Combined with Redis's memory based storage characteristics, it has very high operation efficiency in actual use.

Posted by slobodnium on Sun, 31 Oct 2021 17:35:38 -0700