Ali doesn't let MySQL multi-table Join? I want it!

Keywords: MySQL SQL Oracle Database

Links to the original text: https://mp.weixin.qq.com/s?__biz=MzUzMTA2NTU2Ng==&mid=2247487444&idx=1&sn=5325b3e29504162cd9c882c2c7f0e1d7&chksm=fa497065cd3ef973e229966f3c9c612cca8526d7201790e773ede29b9a855d27eb6d86449776&mpshare=1&scene=23&srcid=&sharer_sharetime=1565145500644&sharer_s

First, the question is: Why is it that more than three tables in the Alibaba JAVA Development Manual prohibit join ing?

2. Question analysis: Do you have any doubts about this conclusion? I don't know which one of the sages said that we should not follow suit. Today I designed sql to verify this conclusion. (The experiment has not been analyzed from the code point of view, so far it can not be achieved. mysql can be used as a black box to validate this conclusion by using angle). When validating the conclusion, there will be a lot of discoveries. Look back.

3. Experimental environment: VMware 10 + centos7.4 + mysql 5.7.22, centos7 memory 4.5G, 4 cores, 50G hard disk. mysql is configured as 2G, which specifies that the hard disk is SSD.

Fourth, I outline my experiment: there are four tables, student student table, teacher table, course curriculum table, sc intermediate relation table, which record students'elective courses and scores. Specific sql script, see the end of the article, I attach. In the middle, I wrote my own script to create data, and it ends.

The purpose of the experiment is to solve a problem: to inquire the names and achievements of the students who take the course given by the "tname553" teacher. Query sql is:

select Student.Sname,course.cname,score
    from Student,SC,Course ,Teacher
    where Student.s_id=SC.s_id and SC.c_id=Course.c_id  and sc.t_id=teacher.t_id
    and Teacher.Tname='tname553'
    and SC.score=(select max(score)from SC where sc.t_id=teacher.t_Id);

Let me analyze this statement: four tables are equivalent join, and one sub-query. It's a relatively simple sql statement (compared with 10 tables of ERP action, it's very simple). I will also decompose this statement into three simple sqls:

 select max(score)  from SC ,Teacher where sc.t_id=teacher.t_Id and Teacher.Tname='tname553';
   select sc.t_id,sc.s_id,score   from SC ,Teacher
   where sc.t_id=teacher.t_Id
   and score=590
   and Teacher.Tname='tname553';
    select Student.Sname,course.cname,score
    from Student,SC ,course
    where Student.s_id=SC.s_id and  sc.s_id in (20769800,48525000,26280200) and course.c_id = sc.c_id;

Let me analyze: the first sentence is to query the highest score, get the highest score 590. The second sentence is to query the student id with the highest score and get it.

20769800,48525000,26280200

. The third sentence is to find out the names and scores of the students. In this way, the three sentences can query the names of the students with the highest scores and their achievements.

Next, I will create data separately: 10 million elective records (one student takes two courses), 5 million students, 1 million teachers (one teacher takes five students, quite high-end bar), 1,000 courses. Query with the above query statement. Among them, sc table I tested the following indexing and no indexing situation, see the table below. Next, I will make a record of 100 million elective courses (one student takes two courses), 50 million students, 10 million teachers and 1,000 courses. Then execute the above statements separately. Finally, I will execute the above statement on the oracle database.

The following two tables are the test results

 

6. Looking at the table carefully, we can find that:

  1. Step 3.1 does not add an index on the join key, and the query is very slow. It shows that when multi-table Association query is carried out, it is necessary to ensure that the associated fields need an index.

  2. Step 6.1, 6.2, 6.3, replaced by simple sql, the query time is still barely acceptable with more than 100 million data. At this point, it shows that mysql query is somewhat laborious, but it is still tender.

  3. Step 5.1: mysql can't query, 4 tables join. For my native mysql, 150 million data exceed the limit. (I tuned this SQL, execution plan and index are gone, no problem, show profile is displayed in send data. This issue is discussed in detail in another article.)

4. Comparing the 1.1 and 5.1 steps of sql query, 4 tables connection, for my local mysql, 15 million data queries are very fluent, which is a watershed of MySQL data volume fluency. (This is just a phenomenon, not very accurate, need to calculate the capacity of the table at the same time).

5. In step 5.1, compared with 6.1, 6.2, 6.3, multi-table join is a bit of a hassle for mysql.

6. More than three tables prohibit join, which is a rule for mysql. Later, I will see that with the same machine, the same amount of data, the same memory, I can calculate 150 million data joins perfectly. For such a rule, for development, it is necessary to put some logic into the application layer to query.

Summary: This rule forbids join in more than three tables. When the amount of data is too large, mysql can't query at all, which leads to such a rule in Ali. (In fact, if the table data volume is small, 10 tables are not a problem, you can try it yourself.) And our company's payment system is designed for large-scale high concurrency goals, so, follow this rule. At the business level, writing simple sql and putting more logic into the application layer will help me understand my needs better, and it will be much easier to implement specific joins in the application layer.

7. Let's take a look at the excellent performance of oracle database.

Look at step 7.1, oracle still queries the results in 26 seconds without an index and many join tables. So I would say that mysql's join is weak. So the question arises. Why are many people using MySQL now? That's another question. I'll talk about my thinking in addition.

After reading this article, I also attached a gift, the so-called huddle grass beat rabbits. That's fast data generation. You can write your own script to create the data and see how I make the data. You will know my skills.

VIII. Attached partial screenshots

8. Attach sql statement and data script

use stu;
drop table if exists student;
create table student
  (  s_id int(11) not null auto_increment ,
     sno    int(11),
     sname varchar(50),
     sage  int(11),
     ssex  varchar(8) ,
     father_id int(11),
      mather_id int(11),
      note varchar(500),
     primary key (s_id),
   unique key uk_sno (sno)
  ) engine=innodb default charset=utf8mb4;
truncate table student;
  delimiter $$
drop function if exists   insert_student_data $$
create function insert_student_data()
 returns  int deterministic
    begin
    declare  i int;
      set i=1;
      while  i<50000000 do
      insert into student  values(i ,i, concat('name',i),i,case when floor(rand()*10)%2=0 then 'f' else 'm' end,floor(rand()*100000),floor(rand()*1000000),concat('note',i) );
      set i=i+1;
      end while;
      return 1;
    end$$
delimiter ;
select  insert_student_data();
select count(*) from student;
use stu;
create table course
  (
     c_id int(11) not null auto_increment ,
     cname varchar(50)
     note varchar(500), primary key (c_id)
  )  engine=innodb default charset=utf8mb4;
truncate table course;
  delimiter $$
drop function if exists   insert_course_data $$
create function insert_course_data()
 returns  int deterministic
    begin
    declare  i int;
      set i=1;
      while  i<=1000 do
      insert into course  values(i , concat('course',i),floor(rand()*1000),concat('note',i) );
      set i=i+1;
      end while;
      return 1;
    end$$
delimiter ;
select  insert_course_data();
select count(*) from course;
use stu;
drop table if exists sc;
create table sc
  (
     s_id    int(11),
     c_id    int(11),
     t_id    int(11),
     score int(11)
  )  engine=innodb default charset=utf8mb4;
truncate table sc;
  delimiter $$
drop function if exists   insert_sc_data $$
create function insert_sc_data()
 returns  int deterministic
    begin
    declare  i int;
      set i=1;
      while  i<=50000000 do
      insert into sc  values( i,floor(rand()*1000),floor(rand()*10000000),floor(rand()*750)) ;
      set i=i+1;
      end while;
      return 1;
    end$$
delimiter ;
select  insert_sc_data();
commit;
select  insert_sc_data();
commit;
create index idx_s_id  on sc(s_id)   ;
create index idx_t_id  on sc(t_id)   ;
create index idx_c_id  on sc(c_id)   ;
select count(*) from sc;
use stu;
drop table if exists teacher;
create table teacher
  (
    t_id  int(11) not null auto_increment ,
     tname varchar(50) ,
     note varchar(500),primary key (t_id)
  )  engine=innodb default charset=utf8mb4;

  truncate table teacher;
  delimiter $$
drop function if exists   insert_teacher_data $$
create function insert_teacher_data()
 returns  int deterministic
    begin
    declare  i int;
      set i=1;
      while  i<=10000000 do
      insert into teacher  values(i , concat('tname',i),concat('note',i) );
      set i=i+1;
      end while;
      return 1;
    end$$
delimiter ;
select  insert_teacher_data();
commit;
select count(*) from teacher;
This is oracle Test and build data scripts
create tablespace scott_data  datafile  '/home/oracle/oracle_space/sitpay1/scott_data.dbf'  size 1024m autoextend on;
create tablespace scott_index   datafile  '/home/oracle/oracle_space/sitpay1/scott_index.dbf'  size 64m  autoextend on;
create temporary tablespace scott_temp  tempfile  '/home/oracle/oracle_space/sitpay1/scott_temp.dbf'  size 64m autoextend on;
drop user  scott cascade;
create user  scott  identified by  tiger  default tablespace scott_data  temporary tablespace scott_temp  ;
grant resource,connect,dba to  scott;
drop table student;
create table student
  (  s_id number(11) ,
     sno    number(11) ,
     sname varchar2(50),
     sage  number(11),
     ssex  varchar2(8) ,
     father_id number(11),
      mather_id number(11),
      note varchar2(500)
  ) nologging;
truncate table student;
create or replace procedure insert_student_data
 is
   q number(11);
    begin
     q:=0;
      for i in  1..50 loop
      insert /*+append*/ into student   select rownum+q as s_id,rownum+q  as sno, concat('sutdent',rownum+q ) as sname,floor(dbms_random.value(1,100)) as sage,'f' as ssex,rownum+q  as father_id,rownum+q  as mather_id,concat('note',rownum+q ) as note from dual connect by level<=1000000;
      q:=q+1000000;
      commit;
      end loop;
end insert_student_data;
/
call insert_student_data();
alter table student  add constraint  pk_student primary key (s_id);
commit;
select count(*) from student;
create table course
  (
     c_id number(11) primary key,
     cname varchar2(50),
     note varchar2(500)
  )  ;
truncate table course;
 create or replace procedure insert_course_data
 is
   q number(11);
    begin

      for i in  1..1000 loop
      insert /*+append*/ into course  values(i , concat('name',i),concat('note',i) );
      end loop;
end insert_course_data;
/
call insert_course_data();
commit;
select count(*) from course;
create table sc
  (
     s_id    number(11),
     c_id    number(11),
     t_id    number(11),
     score number(11)
  ) nologging;
truncate table sc;
 create or replace procedure insert_sc_data
 is
   q number(11);
    begin
     q:=0;
      for i in  1..50 loop
      insert /*+append*/ into sc   select rownum+q as s_id, floor(dbms_random.value(0,1000))  as c_id,floor(dbms_random.value(0,10000000)) t_id,floor(dbms_random.value(0,750)) as score from dual connect by level<=1000000;
      q:=q+1000000;
      commit;
      end loop;
end insert_sc_data;
/
call insert_sc_data();
create index idx_s_id  on sc(s_id)   ;
create index idx_t_id  on sc(t_id)   ;
create index idx_c_id  on sc(c_id)   ;
select count(*) from sc;
create table teacher
  (
    t_id  number(11) ,
     tname varchar2(50) ,
     note varchar2(500)
  )nologging ;
    truncate table teacher;
create or replace procedure insert_teacher_data
 is
   q number(11);
    begin
     q:=0;
      for i in  1..10 loop
      insert /*+append*/ into teacher   select rownum+q as t_id, concat('teacher',rownum+q ) as tname,concat('note',rownum+q ) as note from dual connect by level<=1000000;
      q:=q+1000000;
      commit;
      end loop;
end insert_teacher_data;
/
call insert_teacher_data();
alter table teacher  add constraint  pk_teacher primary key (t_id);
select count(*) from teacher;

Posted by seanpaulcail on Thu, 15 Aug 2019 05:37:27 -0700