Optimization of 20 SQL statements with twice the result with half the effort

Keywords: Database MySQL SQL

Optimization of 20 SQL statements with twice the result with half the effort

1. Use the where condition to limit the data to be queried to avoid returning redundant rows

Suppose the business scenario is to query whether a user is a member

Before optimization:

List<Long> userIds = sqlMap.queryList("select userId from user where isVip=1");
boolean isVip = userIds.contains(userId);

After optimization:

Long userId = sqlMap.queryObject("select userId from user where userId='userId' and isVip='1' ")
boolean isVip = userId!=null;

Reason: only query the required data to avoid returning unnecessary data and save overhead.
2. Try to avoid using mysql built-in functions on index columns
Assume that the business scenario is to query users who have logged in in the last seven days (assuming loginTime is indexed)

Before optimization:

select userId,loginTime from loginuser where Date_ADD(loginTime,Interval 7 DAY) >=now();

After optimization:

explain  select userId,loginTime from loginuser where  loginTime >=Date_ADD(NOW(),INTERVAL - 7 DAY);

reason:

  • If mysql's built-in function is used on the index column, the index will become invalid
  • If the index column does not add a built-in function, the index will still go.

3. Try to avoid expression operations on fields in the where clause, which will cause the system to abandon the use of indexes and scan the whole table

Counterexample:

select * from user where price+1 =20;

After optimization:

select * from user where price =19;

Reason: although price is indexed, the index is invalid due to its operation.

4.Inner join, left join and right join are preferred. If it is left join, the results in the left table should be as small as possible

  • Inner join. When two tables are queried, only the exact matching result sets in the two tables are retained.
  • left join when two tables are joined, all rows in the left table will be returned.
  • right join when two tables are joined, all rows in the right table will be returned.

Before optimization:

select * from table1 t1 left join table2 t2  on t1.size = t2.size where t1.id>2;

After optimization

select * from (select * from table1 where id >2) t1 left join table2 t2 on t1.size = t2.size;

Reason: the left join is used. The data results of the left table are as small as possible, and the conditions are processed on the left as much as possible, which means that the number of returned rows may be relatively small.

5. Avoid using in where clause= Or < > operator
Counterexample:

select price,name  from user where price <>55;

After optimization:

select price,name  from user where price <55;
select price,name  from user where price >55;
//You can consider writing two sql statements separately

Reason: use= And < > may invalidate the index, so the engine will reduce the performance of full table scanning.

6. When using a joint index, pay attention to the order of index columns and generally follow the leftmost matching principle
Suppose the table structure is a joint index idx_userid_age, userId first, age last

Create table first:

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `userId` int(11) NOT NULL,
  `age` int(11) DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_userid_age` (`userId`,`age`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

Error demonstration:

select * from user where age = 10;

Correct demonstration:

//Conform to the leftmost matching principle
select * from user where userid=10 and age =10;
//Conform to the leftmost matching principle
select * from user where userid =10;

reason:

  • The union index does not meet the leftmost principle, and the index will generally fail, but this is also related to the Mysql optimizer.
  • When creating a joint index, such as (k1,k2,k3), it is equivalent to creating three indexes (k1), (k1,k2) and (k1,k2,k3). This is the leftmost matching principle.

7. Consider establishing indexes on the columns involved in where and order by to avoid full table scanning as far as possible
Error demonstration:

select * from user where address ='Shanghai' order by age ;

Correct demonstration:

//Add index
alter table user add index idx_address_age (address,age)

8. If too much data is inserted, consider batch insertion
Error demonstration:

for(User u :list){
 INSERT into user(name,age) values(#name#,#age#)   
}

Correct demonstration:

insert into user(name,age) values
<foreach collection="list" item="item" index="index" separator=",">
    (#{item.name},#{item.age})
</foreach>
//500 batch insertions at a time, in batches

Reason: batch insertion has good performance and saves more time

9. Use the distinct keyword with caution

The distinct keyword is generally used to filter duplicate records. Although it can be optimized when querying one field or few fields, it will greatly reduce the query efficiency when using it when there are many fields.
Error demonstration:

SELECT DISTINCT * from  user;

Correct demonstration:

select DISTINCT name from user;

Reason: when querying many fields, if distinct is used, the database engine will compare the data and filter out duplicate data. However, this comparison and filtering process will occupy system resources and cpu time.
10. If the amount of data is large, optimize your modify / delete statements
Error demonstration

//Delete 100000 or 1 million +?
delete from user where id <100000;
//Or use a single cycle operation, low efficiency and long time
for(User user: list){
   delete from user; 
}

Correct demonstration:

//Delete in batches, such as 500 each time
delete user where id<500
delete product where id>=500 and id<1000;

Reason: if too much data is deleted at one time, there may be an error of cpu overload, so batch operation is recommended

11. Delete redundant and duplicate indexes

Error demonstration:

  KEY `idx_userId` (`userId`)  
  KEY `idx_userId_age` (`userId`,`age`)

Correct demonstration:

  KEY `idx_userId_age` (`userId`,`age`)

Reason: duplicate indexes need to be maintained, and the optimizer needs to consider them one by one when optimizing queries, which will affect the performance.

12. Optimize like statements

In development, if you need to query fuzzy keywords, it's easy to think of like, but like is likely to invalidate your index.

Error demonstration:

select userId,name from user where userId like '%123';

Correct demonstration:

select userId,name from user where userId like '123%';

Reason: put% in front and don't use index; If you put% after the keyword, you will still go to the index.

13. Optimize limit paging
limit is usually used to implement paging requirements, but when the offset is particularly large, the query efficiency becomes low.

Error demonstration:

select id,name,age from employee limit 10000,10

Correct demonstration:

//Scheme 1: return the maximum record (offset) of the last query
select id,name from employee where id>10000 limit 10.

//Scheme 2: order by + index
select id,name from employee order by id  limit 10000,10

//Scheme 3: limit the number of pages if business allows:

reason:

  • When the offset is the largest, the query efficiency will be lower, because Mysql does not skip the offset to get the subsequent data directly, but first adds the offset + the number of entries to get, and then discards the data in the previous offset section and returns it.
  • If optimization scheme 1 is used, the last maximum query record (offset) is returned, which can skip the offset and improve the efficiency.
  • Scheme 2 uses order by + index, which can also improve query efficiency.
  • For scheme 3, it is recommended to discuss with the business. Is it necessary to check the pagination after this.

14. Avoid using or to join conditions in the where clause
Suppose a user table with a common index userId. The table structure is as follows:

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `userId` int(11) NOT NULL,
  `age` int(11) NOT NULL,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_userId` (`userId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Assuming that you need to query users with userid 2 or age 25, it is easy to have the following SQL statements:

Error demonstration:

select * from user where userid=2 or age =25;

Correct demonstration:

//Using union all 
select * from user where userid=2
union all 
select * from user where age = 25;

//Or write two separate sql statements:
select * from user where userid=2;
select * from user where age = 25;

Cause: using or may invalidate the index, resulting in a full table scan.

In the case of or + age without index, it is assumed that it goes through the index of userId, but when it comes to the age query criteria, it also needs a full table scan, that is, it needs a three-step process: full table scan + index scan + merge. If it goes through a full table scan at the beginning, it will be finished directly. mysql has an optimizer. Considering efficiency and cost, when encountering or conditions, the index may fail, which seems reasonable.

*15. Avoid using select, but select specific fields
Error demonstration:

select * from stu_name;

Correct demonstration:

select id,name from stu_name;
reason:
  • select * full table scanning will be performed when querying.
  • Only the required fields are taken to save resources and reduce network overhead.
    16. If you know that there is only one query result or only one maximum / minimum record, it is recommended to use limit 1

Suppose there is a student list now. Find a person named Mike

CREATE TABLE `student` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `date` datetime DEFAULT NULL,
  `sex` int(1) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Error demonstration:

select id,name from student where name='Mike'

Correct demonstration:

select id,name from student where name='Mike' limit 1;

reason:

  • limit 1 means that as long as a corresponding record is found, it will not continue to scan down, and the efficiency will be greatly improved.
  • If name is the only index, it is unnecessary to add limit 1, because limit exists mainly to prevent full table scanning and improve performance.
  • If a statement itself can predict that there is no need for full table scanning, there is no limit, and the performance difference is not big.

17. Do not have more than 5 meter connections

  • If you have to connect many tables to get data, it means that this may be a bad design.
  • The more connected tables, the greater the compilation time and overhead.
  • The connection table is disassembled into smaller execution tables with higher readability.

18. Rational utilization of exist & in
Assuming that table A represents the employee table of an enterprise and table B represents the Department table, it is easy to query all employees of all departments with the following SQL statements:

select * from A where deptId in (select deptId from B);

This is equivalent to:

Query the Department table B select deptId from B first
Then query the employee of A by deptId of department select * from A where A.deptId = B.deptId

The statement can be abstracted into such a loop code:

 List<> resultSet ;
    for(int i=0;i<B.length;i++) {
          for(int j=0;j<A.length;j++) {
          if(A[i].id==B[j].id) {
             resultSet.add(A[i]);
             break;
          }
       }
    }

Besides using in, we can also use exists to implement the same query function:

select * from A where exists (select 1 from B where A.deptId = B.deptId); 

exists query is to execute the main query first, obtain the data, and then put it into the sub query for conditional verification. According to the verification result (true or false), whether to retain the data result of the main query is determined.

Similarly, it can be abstracted into circular code:

   List<> resultSet ;
    for(int i=0;i<A.length;i++) {
          for(int j=0;j<B.length;j++) {
          if(A[i].deptId==B[j].deptId) {
             resultSet.add(A[i]);
             break;
          }
       }
    }

Reason: if the data amount of B is less than A, it is suitable to use in. If the data amount of B is greater than A, it is suitable to select exist.

19. Consider using default value instead of null in the where clause
Error demonstration:

select * from user where name is not null;

Correct demonstration:

select * from user where age>0;
//Set 0 as default

reason:

  • If you don't use is null or is not null, you don't have to use the index. This is related to the mysql version and the query cost.
    Generally, the cost of query is high, and the mysql optimizer automatically gives up.
  • Changing the null value to the default value makes it possible to go to the index many times. At the same time, the meaning will be relatively clear.

20. Indexes should not be too many, generally less than 5

  • The index number of a table should not exceed 5. If there are too many indexes, it is necessary to consider whether some indexes do not exist
  • The index may be rebuilt during insert or update, so it needs to be considered carefully, depending on the specific situation
  • The index is not the more the better. Although the index improves the efficiency of query, it also reduces the efficiency of insertion and update

Posted by edking1 on Sun, 31 Oct 2021 13:51:16 -0700