How does Redis actually achieve the function of "nearby people"?

Keywords: Java Redis encoding less MySQL

Preface

For the application scenario of "nearby people" in the field of location-based services, many spatial indexes such as PG, MySQL and Mongo DB can be used to implement it. Redis, which combines its ordered queue zset and geohash coding, achieves the function of spatial search and has a very high operational efficiency. This paper will analyze the algorithm principle from the source code point of view, and calculate the query time complexity.

To provide a complete "nearby people" service, the most basic thing is to realize the functions of "adding", "deleting" and "checking". The following will be introduced separately, which will focus on the analysis of query function.

Operation command

Since Redis 3.2, Redis has provided geolocation-related functions based on geohash and ordered sets. The Redis Geo module contains the following six commands:

  • GEOADD: Adds a given location object (latitude, longitude, name) to the specified key;

  • GEOPOS: Returns the location (longitude and latitude) of all given location objects from key.

  • GEODIST: Returns the distance between two given locations;

  • GEOHASH: A Geohash representation that returns one or more location objects;

  • GEORADIUS: Regarding the given longitude and latitude as the center, returns all the location objects whose distance from the center in the target set does not exceed the given maximum distance.

  • GEORADIUSBYMEMBER: Returns all location objects whose distances do not exceed the maximum given distance as the center of a given location object.

Among them, the combination of GEOADD and GEORADIUS can realize the basic functions of "adding" and "checking" in "nearby people". In order to realize the function of "people nearby" in micro-letters, the GEORADIUSBYMEMBER command can be used directly. The "given location object" is the user himself, and the search object is other users. However, essentially, GEORADIUSBYMEMBER = GEOPOS + GEORADIUS, that is to say, first find the user's location and then search for other user objects that satisfy the condition of location distance.

The following will analyze GEOADD and GEORADIUS commands from the source code point of view, and analyze their algorithm principles.

Redis geo operations only include "add" and "check" operations, and there is no special "delete" command. Mainly because Redis uses ordered sets to store location objects, which can be deleted by zrem.

In the file annotation of Redis source code geo.c, only the implementation files of GEOADD, GEORADIUS and GEORADIUSBYMEMBER (in fact, three other commands have been implemented). It can be seen from the side that the other three commands are auxiliary commands.

GEOADD

Usage mode

GEOADD key longitude latitude member [longitude latitude member ...]

Adds a given location object (latitude, longitude, name) to the specified key.

The key is the collection name and the member is the object corresponding to the latitude and longitude. In practice, when the number of objects needed to be stored is too large, the object set can be sharding in disguised form by setting multiple keys (such as one key saved), so as to avoid the excessive number of single sets.

Return value after successful insertion:

(integer) N

N is the number of successful inserts.

Source code analysis

/* GEOADD key long lat name [long2 lat2 name2 ... longN latN nameN] */void geoaddCommand(client *c) {//Parameter checking
 /* Check arguments number for sanity. */
 if ((c->argc - 2) % 3 != 0) { /* Need an odd number of arguments if we got this far... */
 addReplyError(c, "syntax error. Try GEOADD key [x1] [y1] [name1] "
 "[x2] [y2] [name2] ... ");
 return;
 }//Parameter extraction Redis
 int elements = (c->argc - 2) / 3;
 int argc = 2+elements*2; /* ZADD key score ele ... */
 robj **argv = zcalloc(argc*sizeof(robj*));
 argv[0] = createRawStringObject("zadd",4);
 argv[1] = c->argv[1]; /* key */
 incrRefCount(argv[1]);//Parameter traversal + conversion
 /* Create the argument vector to call ZADD in order to add all
 * the score,value pairs to the requested zset, where score is actually
 * an encoded version of lat,long. */
 int i;
 for (i = 0; i < elements; i++) {
 double xy[2]; //Extraction of longitude and latitude
 if (extractLongLatOrReply(c, (c->argv+2)+(i*3),xy) == C_ERR) {
 for (i = 0; i < argc; i++)
 if (argv[i]) decrRefCount(argv[i]);
 zfree(argv);
 return;
 } 
 //Convert longitude and latitude to 52-bit geohash as score & extract object name
 /* Turn the coordinates into the score of the element. */
 GeoHashBits hash;
 geohashEncodeWGS84(xy[0], xy[1], GEO_STEP_MAX, &hash);
 GeoHashFix52Bits bits = geohashAlign52Bits(hash);
 robj *score = createObject(OBJ_STRING, sdsfromlonglong(bits));
 robj *val = c->argv[2 + i * 3 + 2]; //Setting object element names and values for ordered sets
 argv[2+i*2] = score;
 argv[3+i*2] = val;
 incrRefCount(val);
 }//Call zadd command to store converted objects
 /* Finally call ZADD that will do the work for us. */
 replaceClientCommandVector(c,argc,argv);
 zaddCommand(c);
}

Through source code analysis, we can see that Redis uses ordered set to store location objects. Each element in ordered set is a location object, and the score value of the element is 52-bit geohash value corresponding to its longitude and latitude.

The precision of double type is 52 bits.

Gehash is coded in the way of base 32. 52 bits can store up to 10 bits of geohash value, corresponding to a grid of 0.6*0.6 meters in size. In other words, the position transformed by Redis geo will theoretically have an error of about 0.3*1.414=0.424 meters.

Algorithm summary

Summarize briefly what the GEOADD command did:

1. Parameter extraction and verification;

2. Convert the latitude and longitude of the input data to the geohash value (score) of 52 bits.

3. Call the ZADD command to store member and its corresponding score in the collection key.

GEORADIUS

Usage mode

GEORADIUS key longitude latitude radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [ASC|DESC] [COUNT count] [STORE key] [STORedisT key]

Regarding the given latitude and longitude as the center, it returns all position objects whose distance from the center in the target set does not exceed the given maximum distance.

Scope unit: m | km | ft | mi - > m | km | feet | miles

Additional parameters:

- WITHDIST: While returning the location object, the distance between the location object and the center is also returned. The unit of distance is consistent with the unit of range given by the user.

- WITHCOORD: The longitude and dimension of the location object are returned together.

- WITHHASH: In the form of 52-bit signed integers, returns the ordered set score of the location object encoded by the original geohash. This option is mainly used for underlying applications or debugging, but it does not play a significant role in practice.

- ASC|DESC: Return location object elements from near to far | Return location object elements from far to near. - COUNT count: Select the first N matching location object elements. STORE key: Save the geographic location information of the returned result to the specified key. - STORedisT key: Save the distance from the center of the return result to the specified key.

Because of the existence of STORE and STORedisT, GEORADIUS and GEORADIUSBYMEMBER commands are technically marked as write commands, which can only query (write) the main instance. When QPS is too high, it is easy to cause the reading and writing pressure of the main instance to be too high. To solve this problem, two new read-only commands, GEORADIUS_RO and GEORADIUSBYMEMBER_RO, were added in Redis 3.2.10 and Redis 4.0.0, respectively.

However, in the actual development, I found that the GeoRadiusParam parameter class of java package Redis.clients.jedis.params.geo does not contain STORE and STORedisT parameter options. Whether the main instance is really queried or read-only encapsulated when georadius is invoked? Interested friends can study for themselves.

Return value after successful query:

Returns a member list without WITH qualifications, such as:

["member1","member2","member3"]

With WITH restriction, each member in the member list is also a nested list, such as:

[
	["member1", distance1, [longitude1, latitude1]]
	["member2", distance2, [longitude2, latitude2]]
]

Source code analysis

This section has a long source code. If you can't read it, you can either read the Chinese annotations directly or skip to the summary directly.

/* GEORADIUS key x y radius unit [WITHDIST] [WITHHASH] [WITHCOORD] [ASC|DESC]
 * [COUNT count] [STORE key] [STORedisT key]
 * GEORADIUSBYMEMBER key member radius unit ... options ... */void georadiusGeneric(client *c, int flags) {
 robj *key = c->argv[1];
 robj *storekey = NULL; int stoRedist = 0; /* 0 for STORE, 1 for STORedisT. *///Getting ordered sets by key
 robj *zobj = NULL; if ((zobj = lookupKeyReadOrReply(c, key, shared.null[c->resp])) == NULL ||
 checkType(c, zobj, OBJ_ZSET)) { return;
 }//Verify the longitude and latitude of the center point based on user input (longitude and latitude/member)
 int base_args; double xy[2] = { 0 }; if (flags & RADIUS_COORDS) {
		......
 }//Getting Query Range Distance
 double radius_meters = 0, conversion = 1; if ((radius_meters = extractDistanceOrReply(c, c->argv + base_args - 2,
 &conversion)) < 0) { return;
 }//Get the optional parameters (withdist, withhash, withcoords, sort, count)
 int withdist = 0, withhash = 0, withcoords = 0; int sort = SORT_NONE; long long count = 0; if (c->argc > base_args) {
 ... ...
 }//Get STORE and STORedisT parameters
 if (storekey && (withdist || withhash || withcoords)) {
 addReplyError(c, "STORE option in GEORADIUS is not compatible with "
 "WITHDIST, WITHHASH and WITHCOORDS options"); return;
 }//Set sort
 if (count != 0 && sort == SORT_NONE) sort = SORT_ASC;//Using Center Point and Radius to Calculate Target Area Range
 GeoHashRadius georadius =
 geohashGetAreasByRadiusWGS84(xy[0], xy[1], radius_meters);//Search the center point and eight geohash grid areas around it, and find out the element objects in the range.
 geoArray *ga = geoArrayCreate();
 membersOfAllNeighbors(zobj, georadius, xy[0], xy[1], radius_meters, ga);//Unmatched return air
 /* If no matching results, the user gets an empty reply. */
 if (ga->used == 0 && storekey == NULL) {
 addReplyNull(c);
 geoArrayFree(ga); return;
 }//Setting and returning some return values
 ......
 geoArrayFree(ga);
}

There are two core steps in the code above, one is "computing the scope of the center point", and the other is "searching for the center point and eight geohash grid areas around it". The corresponding functions are geohashGetAreas ByRadius WGS84 and membersOfAllNeighbors. Let's look at it in turn:

  • Calculate the central point range:

// geohash_helper.c

GeoHashRadius geohashGetAreasByRadiusWGS84(double longitude, double latitude, double radius_meters) { return geohashGetAreasByRadius(longitude, latitude, radius_meters);
}//Returns 9 geohashBoxGeoHashRadius geohashGetAreasByRadius(double longitude, double latitude, double radius_meters) {// some parameter settings that can cover the target area
 GeoHashRange long_range, lat_range;
 GeoHashRadius radius;
 GeoHashBits hash;
 GeoHashNeighbors neighbors;
 GeoHashArea area; double min_lon, max_lon, min_lat, max_lat; double bounds[4]; int steps;//Calculate the longitude and latitude range of the rectangle outside the target area (the target area is a circle with the target longitude and latitude as the center and the radius as the specified distance)
 geohashBoundingBox(longitude, latitude, radius_meters, bounds);
 min_lon = bounds[0];
 min_lat = bounds[1];
 max_lon = bounds[2];
 max_lat = bounds[3];//According to the latitude and radius of the center point in the target area, the geohash accuracy of nine search boxes with queries is calculated. / / Latitude is used here to adjust the accuracy mainly for the polar situation (the higher the latitude, the smaller the number of digits)
 steps = geohashEstimateStepsByRadius(radius_meters,latitude);//Set the maximum and minimum longitude and latitude: -180<=longitude<=180,-85<=latitude<=85
 geohashGetCoordRange(&long_range,&lat_range); 
//Encode the longitude and latitude to be checked into geohash value according to the specified precision (steps)
 geohashEncode(&long_range,&lat_range,longitude,latitude,steps,&hash); 
//The geohash values are expanded in eight directions to determine the surrounding eight Box (neighbors)
 geohashNeighbors(&hash,&neighbors); 
//Determination of area latitude and longitude range based on hash value
 geohashDecode(long_range,lat_range,hash,&area);//Some special cases
 ......//Build and return results
 radius.hash = hash;
 radius.neighbors = neighbors;
 radius.area = area; return radius;
}
  • Search the center point and eight geohash grid areas around it:

// geo.c

//Get the desired element int membersOfAllNeighbors(robj*zobj, GeoHashRadius n, double lon, double lat, double radius, geoArray*ga) in nine hashBox es{
 GeoHashBits neighbors[9]; unsigned int i, count = 0, last_processed = 0; int debugmsg = 0;//Get 9 search hashboxes
 neighbors[0] = n.hash;
 ......
 neighbors[8] = n.neighbors.south_west;//Search for target points in each hashBox
 for (i = 0; i < sizeof(neighbors) / sizeof(*neighbors); i++) { if (HASHISZERO(neighbors[i])) { if (debugmsg) D("neighbors[%d] is zero",i); continue;
 }	//Eliminate possible duplicate hashBox (when search radius > 5000KM may occur)
 if (last_processed &&
 neighbors[i].bits == neighbors[last_processed].bits &&
 neighbors[i].step == neighbors[last_processed].step)
 { continue;
 }	//Search for eligible objects in hashBox
 count += membersOfGeoHashBox(zobj, neighbors[i], ga, lon, lat, radius);
 last_processed = i;
 } return count;
}int membersOfGeoHashBox(robj *zobj, GeoHashBits hash, geoArray *ga, double lon, double lat, double radius) {//Get the maximum and minimum geohash values in hashBox (52 bits)
 GeoHashFix52Bits min, max;
 scoresOfGeoHashBox(hash,&min,&max);//Selection of points satisfying conditions in zobj sets based on maximum and minimum geohash values
 return geoGetPointsInRange(zobj, min, max, lon, lat, radius, ga);
}int geoGetPointsInRange(robj *zobj, double min, double max, double lon, double lat, double radius, geoArray *ga) {//Search Range's parameter boundary settings (that is, one of nine hashBox es)
 zrangespec range = { .min = min, .max = max, .minex = 0, .maxex = 1 };
 size_t origincount = ga->used;
 sds member;//The search set zobj may be encoded in two ways: zip list and SKIPLIST. In this case, the logic is the same.
 if (zobj->encoding == OBJ_ENCODING_ZIPLIST) {
 ......
 } else if (zobj->encoding == OBJ_ENCODING_SKIPLIST) {
 zset *zs = zobj->ptr;
 zskiplist *zsl = zs->zsl;
 zskiplistNode *ln;	//Get the first element in the hashBox range (jump table data structure, efficiency comparable to binary search tree), return 0 if not
 if ((ln = zslFirstInRange(zsl, &range)) == NULL) { /* Nothing exists starting at our min. No results. */
 return 0;
 }	//Traversing the collection from the first element
 while (ln) {
 sds ele = ln->ele;		//break if traversal element is beyond range
 /* Abort when the node is no longer in range. */
 if (!zslValueLteMax(ln->score, &range)) break;		//Element Check (Calculating the Distance between Element and Center Point)
 ele = sdsdup(ele); if (geoAppendIfWithinRadius(ga,lon,lat,radius,ln->score,ele)
 == C_ERR) sdsfree(ele);
 ln = ln->level[0].forward;
 }
 } return ga->used - origincount;
}int geoAppendIfWithinRadius(geoArray *ga, double lon, double lat, double radius, double score, sds member) { double distance, xy[2];//Decoding error, return error
 if (!decodeGeohash(score,xy)) return C_ERR; /* Can't decode. *///Final distance check (calculating spherical distance to see if it is less than radius)
 if (!geohashGetDistanceIfInRadiusWGS84(lon,lat, xy[0], xy[1],
 radius, &distance))
 { return C_ERR;
 }//Construct and return elements that satisfy conditions
 geoPoint *gp = geoArrayAppend(ga);
 gp->longitude = xy[0];
 gp->latitude = xy[1];
 gp->dist = distance;
 gp->member = member;
 gp->score = score; return C_OK;
}

Algorithm summary

Regardless of many optional parameters, a brief summary of how GEORADIUS commands use geohash to obtain target location objects is given:

1. Parameter extraction and verification;

2. Calculate the area to be inspected by using the center point and input radius. This range parameter includes the highest geohash grid level (accuracy) that meets the criteria and the corresponding nine-grid location that can cover the target area; (details will be provided later)

3. Traverse the Nine-palace grid and select the location object according to the range frame of each geohash grid. Further, the object whose distance from the center point is less than the input radius is found and returned.

The direct description is not very easy to understand. We demonstrate the algorithm simply through the following two graphs:



Let the center of the left image be the search center, the green circle area be the target area, all the points are the location objects to be searched, and the red points are the location objects satisfying the conditions.

In the actual search, the geohash grid level (i.e. the grid size level in the right picture) is calculated according to the search radius, and the location of the nine palaces (i.e. the location information of the red nine palaces) is determined. Then the distance between the points in the nine palaces (blue points and red points) and the center point is searched and calculated in turn, and finally the points within the distance range (red points) are screened out.

algorithm analysis

Why do we use this strategy to query, or what are the advantages of this strategy? Let's analyze and explain it in the form of question and answer.

  • Why find the highest geohash grid level that meets the criteria? Why use nine palaces?

  • This is actually a problem. In essence, it is a preliminary screening of all element objects. In multi-layer geohash grids, each low-level geohash grid is composed of four high-level grids (as shown in the figure).


  • In other words, the higher the geohash grid level, the smaller the geographic location coverage. When we calculate the highest level of nine-grid which can cover the target area according to the input radius and the position of the center point, we have already screened out the elements outside the nine-grid. The main reason why we use nine palaces instead of a single grid is to minimize the query area as much as possible in order to avoid boundary conditions. Imagine centering at 0 latitude and longitude, even if you look at a 1-meter range, a single grid covering the entire Earth. This problem can be effectively avoided by extending the circle to eight directions around it.

  • How to select element objects through the scope box of geohash grid? How efficient is it?

  • Firstly, the geohash values in each geohash grid are continuous and have a fixed range. So we just need to find out the location objects in the ordered set. The following is the jump table data structure of ordered sets:


  • Its query efficiency is similar to that of binary search tree, and the average operation time complexity is O (log (N). And all the elements at the bottom are arranged in order in the form of a linked list. So when querying, we only need to find the first value in the set in the target geohash grid, and then compare it sequentially. We don't need to search many times. The reason why the nine-grid can not be checked together is that the geohash values corresponding to each grid of the nine-grid are not continuous. Only if the query is continuous, the query efficiency will be high. Otherwise, many distance operations will be done.

To sum up, we analyze the detailed process of "GEOADD" and "GEORADIUS" in Redis Geo module from the source code perspective. The time complexity of GEORADIUS in Redis is O (N + log (M), where N is the number of location elements within a specified radius, and M is the number of elements enclosed by nine palaces to calculate distance. Combined with Redis's memory-based storage characteristics, Redis has a very high operational efficiency in the actual use process.


Posted by reidme on Mon, 14 Oct 2019 03:14:34 -0700