How to design automatic routing in the framework of sub database and sub table

Keywords: dotNETCore sharding efcore

ShardingCore

ShardingCore is easy to use, simple, high-performance and universal. It is an extended solution for tables and databases under the efcore ecosystem. It supports all versions of efcore2 +, all databases of efcore2 +, custom routing, dynamic routing, high-performance paging and read-write separation, If you like this component or it is helpful to you, please click distribute star so that more. Neters can see and use it

Github Source Code Help dotnet ecology Gitee Source Code

At present, ShardingCore has supported. net 6.0, because the whole project architecture of ShardingCore only depends on efcore and efcore.relational. Basically, it can be said that it is "zero dependency". Other parsing is self implemented and does not depend on the third-party framework.

As we all know, there is no good automatic table and library component under the. net framework (except ShardingCore). Basically, after looking at the table and library frameworks, they all say that they support automatic table and library, and the codes are semi-automatic or even manual. Today, I'll talk about how to design a true and automatic table / library framework. (don't say anything about encapsulation, then you do.)

So how can we design the sub table and sub database corresponding to the non aware route for the user's query, instead of specifying which tables to query on the full screen. The specified route can have, but should not be used as a query. The real automatic sub table and sub database route should be filtered through the where condition, and then route the data to the corresponding table, Next, I will explain how to design a field routing framework to realize table splitting automation in a simple and understandable way.

present situation

At present, it seems that few people have really done this in the. NET environment. The better thing is that they hit the foot of the real automatic meter and didn't even hit their knee. So I'm going to open a blog to talk about it, by the way ShardingCore The principle of

Definition of sub table

First of all, because of different businesses, the sub tables designed by most people may have different writing differences, but because the basic parts are roughly the same. For example, if the mold is taken, it must be 00,01.... 99 or the time must be 20202021.... 2030, etc.

Simple mold split table

Now let's assume that we take the module according to the cardinality, for example, by pressing 5, we can take out and set the order table as order_00,order_01,order_02,order_03,order_04 we divide the order form into four forms.

Sub table name Split table field Tabulation method All table suffixes
order Id Modulo 4 and left Complement 2 bits '00','01','02','03','04'

We now define our query condition select * from order where Id='12345 '. Through the condition, we can analyze the useful information

select * from order where Id='12345'

Route parse engine = parse = > the following results are obtained

Key Value
Table name order
field Id
Condition judge =
condition '12345'
Conditional connector nothing

Therefore, we can compare the field id and the string "12345" with the equal sign, so we can hash "12345" first, for example, "12345". HashCode() is equal to 9, then 9% 5 = 4. We fill "0" to the left of 4 to get the result "04", so we can draw a conclusion:

select * from order where Id='12345' ==select * from order_04 where Id='12345'

So far, we have known the general process of a simple modular table routing, and draw the following conclusions

  1. Is the order table split
    2. Is the Id after where a split table field
    3. Whether to convert a conditional filter into a table suffix for a split table field

A more complex modular table

As we all know, the advantage of modular split table is that it can maximize the uniformity of data and is relatively simple to implement, but there are also many problems. For example, when migrating data to expand the table in the later stage, in order to minimize the migration data, the table must be multiplied, but even if the minimum migration volume is multiplied, it is 50%.
Of course, this is only one of the advantages and disadvantages of the modular sub table, which is not the focus of this time. Next, let's rewrite the sql SELECT * from order where id ='12345 'or id ='54321' what information can we get through this transformation

Key Value
Table name order
field Id
Condition judge =
condition '12345' and '54321'
Conditional connector or

In this case, how can we perform table splitting routing? First, we can compare the equal symbols between the field ID and the string "12345", so we can hash the value of "12345", such as "12345". HashCode() is equal to 9, then 9% 5 = 4. We fill "0" to the left of 4 to get the result "04", and then we can use the field ID and string "54321" For the comparison of equal symbols, we can hash "54321" first, such as "54321". HashCode() is equal to 8, then 8% 5 = 3. We fill "0" to the left of 3 to get the result "03". Because the conditional connection symbol is or, we want ['03 ',' 04 '], so select * from order where Id='12345' or Id='54321 'will be rewritten as select * from order_ 03 where Id='12345' or Id='54321' + select * from order_ 04 where id ='12345 'or id ='54321' aggregate results of two sql statements,
If it's and, it's order_03 going again order_04 so the result is empty, then we can draw the following conclusion

  1. Is the order table split
    2. Is the Id after where a split table field
    3. Whether to convert a conditional filter into a table suffix for a split table field
    3. How to filter multiple table suffixes

Upgrade the sub table by time

Suppose our current order is monthly_ 202105,order_ 202106,order_ 202107,order_ 202108,order_ 202109 assume that we have five tables at present, and the orders are divided into time tables through the field time,

If we need to parse select * from order where time > '2021 / 06 / 05 00:00:00', we first parse and extract keywords through the program

Key Value
Table name order
field time
Condition judge >
condition '2021/06/05 00:00:00'
Conditional connector nothing

Through keyword extraction and parsing, we can know that the query should be order_202106,order_202107,order_202108,order_2021093 sheets

Let's upgrade again

If we need to parse select * from order where time > '2021 / 06 / 05 00:00:00' and time < '2021 / 08 / 05 00:00:00', we should first parse and extract keywords through the program

Key Value
Table name order
field time
Condition judge >,<
condition '2021/06/05 00:00:00','2021/08/05 00:00:00'
Conditional connector and

We are transforming the existing sql

select * from order where Id='12345 'is rewritten as select * from order where' 12345 '= ID
In this case, how can we judge the existing expression? There must be a conversion: condition on right
Then the = we encounter is actually no different from the actual, but >, < if it is the opposite, it will affect the result, so we need to reverse the corresponding expression, so

condtion on right ? yes no
= = =
!= != !=
>= >= <=
> > <
<= <= >=
< < >

If the condition is on the right, we do not need to convert the condition judge. If it is not on the right, we need to convert it into the corresponding condition judge to simplify the logical judgment of our writing

Through keyword extraction and parsing, we can know that the query should be order_202106,order_202107,order_2021082 sheets

After the above description, we can roughly design an idea and how to design a sub table routing

1. Judge whether the table is divided into tables
2. Judge whether the meaning is divided into table fields
3. Can the split table field narrow the table range
4. All operations are performed by filtering existing table suffixes

After having the above ideas, as dotnet developers, we can consider how to transform orm. Of course, you can also choose to transform ado.net (which is more difficult)

Expression based split table

First, blow a wave c#, with good expression tree design and elegant linq syntax. Through the analysis of expressions, we can divide the design into the following steps

Simply get the expression, and you can convert the expression

                var op = binaryExpression.NodeType switch
                {
                    ExpressionType.GreaterThan => conditionOnRight ? ShardingOperatorEnum.GreaterThan : ShardingOperatorEnum.LessThan,
                    ExpressionType.GreaterThanOrEqual => conditionOnRight ? ShardingOperatorEnum.GreaterThanOrEqual : ShardingOperatorEnum.LessThanOrEqual,
                    ExpressionType.LessThan => conditionOnRight ? ShardingOperatorEnum.LessThan : ShardingOperatorEnum.GreaterThan,
                    ExpressionType.LessThanOrEqual => conditionOnRight ? ShardingOperatorEnum.LessThanOrEqual : ShardingOperatorEnum.GreaterThanOrEqual,
                    ExpressionType.Equal => ShardingOperatorEnum.Equal,
                    ExpressionType.NotEqual => ShardingOperatorEnum.NotEqual,
                    _ => ShardingOperatorEnum.UnKnown
                };

1. Filter table suffix

var list=new List<string>(){"00","01"....};
var filterTails=list.Where(o=>Filter(o)).ToList();

In fact, for routing, what we need to do is to filter out effective suffixes and reduce unnecessary performance consumption

2.Filter can be roughly divided into two types. One is and and the other is or, which means that the internal part of filter should be the filter combination of suffix tail, such as "00"or"01", "00" and "01". How can "00" be reflected? Then it must be through the comparison value, such as' 12345'.HashCode().Convert2Tail()
The compared condition value is converted into the suffix corresponding to the database, and then compared with the existing suffix. If it is the same, it means that it is selected. The expression is existtail = > existtail = = tail. If the suffix passed in to the existing list is compared with the calculated suffix, it means that the suffix of the list needs to be used, so that our single = symbol has been processed, How to handle the syntax for or? We use or to connect the previous expression, which can be rewritten as existstair = > (existstair | existstair 1), so filter = existstair = > (existstair | existstair 1),
In the simple mold table

title content
sql select * from order where Id='12345' or Id='54321'
expression db.where(o=>o.Id"12345" || o.Id"54321")
Suffix Filtering Filter=existsTail=>(existsTailtail || existsTailtail1)
result ["00"... "04"] are substituted into the Filter respectively. Tail is "04" and tail1 is "03", so we can get two table suffixes ["04", "03"]
title content
sql select * from order where time>'2021/06/05 00:00:00' and time <'2021/08/05 00:00:00'
expression db.where(o=>o.time>'2021/06/05 00:00:00' && o.time<'2021/08/05 00:00:00')
Suffix Filtering Filter=existsTail=>(existsTail>=tail && existsTail<=tail1)
result ["202105"... "202109"] are substituted into the Filter respectively. Tail is "202106" and tail1 is "202108", so we can get three table suffixes ["202106", "202107" and "202108"]

Therefore, we can basically complete the whole automatic routing design here. The condition is directly and, and is used to combine many conditions. If it is or or in, or is used to connect.
Here, the basic idea of table routing has been established. Now that the idea has been established, we can officially get to the point.

Custom ShardingCore routing

First, let's take a look at the default modulo routing provided by sharding core

/// <summary>
    ///Modular split table with string field
    /// </summary>
    /// <typeparam name="T"></typeparam>
    public abstract class AbstractSimpleShardingModKeyStringVirtualTableRoute<T>: AbstractShardingOperatorVirtualTableRoute<T,string> where T:class
    {
        protected readonly int Mod;
        protected readonly int TailLength;
        protected readonly char PaddingChar;
        /// <summary>
        /// 
        /// </summary>
        ///< param name = "tailength" > monkey length < / param >
        ///< param name = "mod" > modulo divisor < / param >
        ///< param name = "paddingchar" > what parameter should be added if tailLength is not enough after mold taking < / param >
        protected AbstractSimpleShardingModKeyStringVirtualTableRoute(int tailLength,int mod,char paddingChar='0')
        {
            if(tailLength<1)
                throw new ArgumentException($"{nameof(tailLength)} less than 1 ");
            if (mod < 1)
                throw new ArgumentException($"{nameof(mod)} less than 1 ");
            if (string.IsNullOrWhiteSpace(paddingChar.ToString()))
                throw new ArgumentException($"{nameof(paddingChar)} cant empty ");
            TailLength = tailLength;
            Mod = mod;
            PaddingChar = paddingChar;
        }
        /// <summary>
        ///How to convert shardingkey to corresponding tail
        /// </summary>
        /// <param name="shardingKey"></param>
        /// <returns></returns>
        public override string ShardingKeyToTail(object shardingKey)
        {
            var shardingKeyStr = ConvertToShardingKey(shardingKey);
            return Math.Abs(ShardingCoreHelper.GetStringHashCode(shardingKeyStr) % Mod).ToString().PadLeft(TailLength,PaddingChar);
        }
        /// <summary>
        ///Convert shardingKey to corresponding string
        /// </summary>
        /// <param name="shardingKey"></param>
        /// <returns></returns>
        protected override string ConvertToShardingKey(object shardingKey)
        {
            return shardingKey.ToString();
        }
        /// <summary>
        ///Get all suffixes of the corresponding type in the database
        /// </summary>
        /// <returns></returns>
        public override List<string> GetAllTails()
        {
            return Enumerable.Range(0, Mod).Select(o => o.ToString().PadLeft(TailLength, PaddingChar)).ToList();
        }
        /// <summary>
        ///How do routing expressions route to the correct table
        /// </summary>
        /// <param name="shardingKey"></param>
        /// <param name="shardingOperator"></param>
        /// <returns></returns>
        protected override Expression<Func<string, bool>> GetRouteToFilter(string shardingKey, ShardingOperatorEnum shardingOperator)
        {
            var t = ShardingKeyToTail(shardingKey);
            switch (shardingOperator)
            {
                case ShardingOperatorEnum.Equal: return tail => tail == t;
                default:
                {
#if DEBUG
                    Console.WriteLine($"shardingOperator is not equal scan all table tail");           
#endif
                    return tail => true;
                }
            }
        }
    }

At first glance, there are only four methods, three of which are easy to understand, that is, how to convert the sub table value into a suffix: ShardingKeyToTail, how to convert the sub flag into a string: ConvertToShardingKey, and return all existing suffixes: it is necessary to judge and create a table when GetAllTails is started.
The most complex method of GetRouteToFilter returns a comparison Expression between the suffix and the current sub table value. Many people may wonder why Expression is used, because Expression has and and and or can have multiple combinations to meet our suffix filtering. For modeling, we only need to resolve the case of equal = and return true in other cases It means that all other suffixes need to be related to the query, because you can't judge whether they are in it. Of course, you can throw errors. It means that the route of the current table must be specified, and there can't be cases where you can't judge.

Custom sub table

I talked about module fetching (hash) under user-defined sub table earlier The advantages of this mode are simple and uniform data distribution, but the disadvantages are also obvious. For the data migration required after adding servers, all data needs to be migrated in the most popular case, and half of the data needs to be migrated in the best case. In this case, is there a simple and uniform data distribution similar to hash modeling that will not be used in the process of data migration On the premise of moving too much data, the answer is yes. This route is a simple implementation version of consistent hash.

Consistent Hashing

There are many tutorials and explanations on the consistency hash network, which is an algorithm to prevent the cache avalanche effect caused by the overall failure of the existing cache caused by the increase of the server. Although there are many parsing and examples on the network, the implementation process may not be very simple, and many concepts can not be understood by some beginners, so In fact, there is a simple implementation here, which is basically an algorithm that everyone can understand.

This algorithm is large-scale modular storage. It is to segment the original hash modular again to ensure that the number of servers will not be increased. It needs to migrate data in a large range and directly upload it to the code

            var stringHashCode = ShardingCoreHelper.GetStringHashCode("123");
            var hashCode = stringHashCode % 10000;
            if (hashCode >= 0 && hashCode <= 3000)
            {
                return "A";
            }
            else if (hashCode >= 3001 && hashCode <= 6000)
            {
                return "B";
            }
            else if (hashCode >= 6001 && hashCode < 10000)
            {
                return "C";
            }
            else
                throw new InvalidOperationException($"cant calc hash route hash code:[{stringHashCode}]");

This should be the simplest route that everyone can understand. Take the module 10000 of hashcode and get 0-9999. The probability of dividing it into three segments [0-3000], [3001-6000] and [6001-9999] is about 3, 3 and 4, which are relatively average. Then we still encounter the problem mentioned above. If we need to add a server now, first modify the route

            var stringHashCode = ShardingCoreHelper.GetStringHashCode("123");
            var hashCode = stringHashCode % 10000;
            if (hashCode >= 0 && hashCode <= 3000)
            {
                return "A";
            }
            else if (hashCode >= 3001 && hashCode <= 6000)
            {
                return "B";
            }
            else if (hashCode >= 6001 && hashCode <= 8000)
            {
                return "D";
            }
            else if (hashCode >= 8001 && hashCode < 10000)
            {
                return "C";
            }
            else
                throw new InvalidOperationException($"cant calc hash route hash code:[{stringHashCode}]");

We have added A server to perform data segmentation for [6001-9999] segments and [8001-9999] The table suffix in the interval has not changed. In fact, we only need to modify one-fifth of the data, so we can perfectly migrate the data and evenly distribute the data. If we need to add another one, we only need to score two points for 'A' or 'B', then we can gradually increase the server, and the number of data migrations responds to the data to be migrated with the increase of the server The percentage decreases gradually. In the worst case, the server needs to migrate 50% of the data. Compared with the previous best case, it is very cost-effective to migrate 50% of the data. Moreover, the routing rules are simple and easy to write, which can be written by individuals.

So how do we write this routing rule in sharding core

    public class OrderHashRangeVirtualTableRoute:AbstractShardingOperatorVirtualTableRoute<Order,string>
    {
        //How to convert the value of a sharding key into a corresponding value
        protected override string ConvertToShardingKey(object shardingKey)
        {
            return shardingKey.ToString();
        }

        //How to convert the value of the sharding key into the corresponding table suffix
        public override string ShardingKeyToTail(object shardingKey)
        {
            var stringHashCode = ShardingCoreHelper.GetStringHashCode("123");
            var hashCode = stringHashCode % 10000;
            if (hashCode >= 0 && hashCode <= 3000)
            {
                return "A";
            }
            else if (hashCode >= 3001 && hashCode <= 6000)
            {
                return "B";
            }
            else if (hashCode >= 6001 && hashCode <= 10000)
            {
                return "C";
            }
            else
                throw new InvalidOperationException($"cant calc hash route hash code:[{stringHashCode}]");
        }

        //Returns all existing Order table suffixes
        public override List<string> GetAllTails()
        {
            return new List<string>()
            {
                "A", "B", "C"
            };
        }

        //How to filter suffixes (condition on right has been implemented) users do not need to care about the condition location and how to parse the condition logic judgment, nor do they need to consider and or
        protected override Expression<Func<string, bool>> GetRouteToFilter(string shardingKey, ShardingOperatorEnum shardingOperator)
        {
            //Because hash routing only supports equal, it only needs to write equal
            var t = ShardingKeyToTail(shardingKey);
            switch (shardingOperator)
            {
                case ShardingOperatorEnum.Equal: return tail => tail == t;
                default:
                {
                    return tail => true;
                }
            }
        }
    }

Default route

ShardingCore It provides table routing for some columns and has corresponding index support

abstract Routing rules tail Indexes
AbstractSimpleShardingModKeyIntVirtualTableRoute Take mold 0,1,2... =,contains
AbstractSimpleShardingModKeyStringVirtualTableRoute Take mold 0,1,2... =,contains
AbstractSimpleShardingDayKeyDateTimeVirtualTableRoute By time yyyyMMdd >,>=,<,<=,=,contains
AbstractSimpleShardingDayKeyLongVirtualTableRoute By timestamp yyyyMMdd >,>=,<,<=,=,contains
AbstractSimpleShardingWeekKeyDateTimeVirtualTableRoute By time yyyyMMdd_dd >,>=,<,<=,=,contains
AbstractSimpleShardingWeekKeyLongVirtualTableRoute By timestamp yyyyMMdd_dd >,>=,<,<=,=,contains
AbstractSimpleShardingMonthKeyDateTimeVirtualTableRoute By time yyyyMM >,>=,<,<=,=,contains
AbstractSimpleShardingMonthKeyLongVirtualTableRoute By timestamp yyyyMM >,>=,<,<=,=,contains
AbstractSimpleShardingYearKeyDateTimeVirtualTableRoute By time yyyy >,>=,<,<=,=,contains
AbstractSimpleShardingYearKeyLongVirtualTableRoute By timestamp yyyy >,>=,<,<=,=,contains

Note: contains means o = > IDs. Contains (o.shardingkey)
Note: using the default routing rule by time table will allow you to rewrite a GetBeginTime method. This method must use static values, such as new DateTime(2021,1,1). Dynamic values, such as DateTime.Now, cannot be used, because this method will be called every time you restart. In dynamic cases, it will lead to inconsistency every time

summary

So far, it is unknown. I believe that for ordinary users, they should have known how to implement the routing under the sub table and sub database ShardingCore How to write a user-defined path to realize the processing of sub table and sub database

Table and database component praise star

Blog

QQ group: 771630778

Personal QQ: 326308290 (welcome technical support to provide your valuable opinions)

Personal email: 326308290@qq.com

Posted by DoddsAntS on Tue, 09 Nov 2021 01:04:48 -0800