First, the question is: Why is it that more than three tables in the Alibaba JAVA Development Manual prohibit join ing?
2. Question analysis: Do you have any doubts about this conclusion? I don't know which one of the sages said that we should not follow suit. Today I designed sql to verify this conclusion. (The experiment has not been analyzed from the code point of view, so far it can not be achieved. mysql can be used as a black box to validate this conclusion by using angle). When validating the conclusion, there will be a lot of discoveries. Look back.
3. Experimental environment: VMware 10 + centos7.4 + mysql 5.7.22, centos7 memory 4.5G, 4 cores, 50G hard disk. mysql is configured as 2G, which specifies that the hard disk is SSD.
Fourth, I outline my experiment: there are four tables, student student table, teacher table, course curriculum table, sc intermediate relation table, which record students'elective courses and scores. Specific sql script, see the end of the article, I attach. In the middle, I wrote my own script to create data, and it ends.
The purpose of the experiment is to solve a problem: to inquire the names and achievements of the students who take the course given by the "tname553" teacher. Query sql is:
select Student.Sname,course.cname,score from Student,SC,Course ,Teacher where Student.s_id=SC.s_id and SC.c_id=Course.c_id and sc.t_id=teacher.t_id and Teacher.Tname='tname553' and SC.score=(select max(score)from SC where sc.t_id=teacher.t_Id);
Let me analyze this statement: four tables are equivalent join, and one sub-query. It's a relatively simple sql statement (compared with 10 tables of ERP action, it's very simple). I will also decompose this statement into three simple sqls:
select max(score) from SC ,Teacher where sc.t_id=teacher.t_Id and Teacher.Tname='tname553'; select sc.t_id,sc.s_id,score from SC ,Teacher where sc.t_id=teacher.t_Id and score=590 and Teacher.Tname='tname553'; select Student.Sname,course.cname,score from Student,SC ,course where Student.s_id=SC.s_id and sc.s_id in (20769800,48525000,26280200) and course.c_id = sc.c_id;
Let me analyze: the first sentence is to query the highest score, get the highest score 590. The second sentence is to query the student id with the highest score and get it.
20769800,48525000,26280200
. The third sentence is to find out the names and scores of the students. In this way, the three sentences can query the names of the students with the highest scores and their achievements.
Next, I will create data separately: 10 million elective records (one student takes two courses), 5 million students, 1 million teachers (one teacher takes five students, quite high-end bar), 1,000 courses. Query with the above query statement. Among them, sc table I tested the following indexing and no indexing situation, see the table below. Next, I will make a record of 100 million elective courses (one student takes two courses), 50 million students, 10 million teachers and 1,000 courses. Then execute the above statements separately. Finally, I will execute the above statement on the oracle database.
The following two tables are the test results
6. Looking at the table carefully, we can find that:
-
Step 3.1 does not add an index on the join key, and the query is very slow. It shows that when multi-table Association query is carried out, it is necessary to ensure that the associated fields need an index.
-
Step 6.1, 6.2, 6.3, replaced by simple sql, the query time is still barely acceptable with more than 100 million data. At this point, it shows that mysql query is somewhat laborious, but it is still tender.
-
Step 5.1: mysql can't query, 4 tables join. For my native mysql, 150 million data exceed the limit. (I tuned this SQL, execution plan and index are gone, no problem, show profile is displayed in send data. This issue is discussed in detail in another article.)
4. Comparing the 1.1 and 5.1 steps of sql query, 4 tables connection, for my local mysql, 15 million data queries are very fluent, which is a watershed of MySQL data volume fluency. (This is just a phenomenon, not very accurate, need to calculate the capacity of the table at the same time).
5. In step 5.1, compared with 6.1, 6.2, 6.3, multi-table join is a bit of a hassle for mysql.
6. More than three tables prohibit join, which is a rule for mysql. Later, I will see that with the same machine, the same amount of data, the same memory, I can calculate 150 million data joins perfectly. For such a rule, for development, it is necessary to put some logic into the application layer to query.
Summary: This rule forbids join in more than three tables. When the amount of data is too large, mysql can't query at all, which leads to such a rule in Ali. (In fact, if the table data volume is small, 10 tables are not a problem, you can try it yourself.) And our company's payment system is designed for large-scale high concurrency goals, so, follow this rule. At the business level, writing simple sql and putting more logic into the application layer will help me understand my needs better, and it will be much easier to implement specific joins in the application layer.
7. Let's take a look at the excellent performance of oracle database.
Look at step 7.1, oracle still queries the results in 26 seconds without an index and many join tables. So I would say that mysql's join is weak. So the question arises. Why are many people using MySQL now? That's another question. I'll talk about my thinking in addition.
After reading this article, I also attached a gift, the so-called huddle grass beat rabbits. That's fast data generation. You can write your own script to create the data and see how I make the data. You will know my skills.
VIII. Attached partial screenshots
8. Attach sql statement and data script
use stu; drop table if exists student; create table student ( s_id int(11) not null auto_increment , sno int(11), sname varchar(50), sage int(11), ssex varchar(8) , father_id int(11), mather_id int(11), note varchar(500), primary key (s_id), unique key uk_sno (sno) ) engine=innodb default charset=utf8mb4; truncate table student; delimiter $$ drop function if exists insert_student_data $$ create function insert_student_data() returns int deterministic begin declare i int; set i=1; while i<50000000 do insert into student values(i ,i, concat('name',i),i,case when floor(rand()*10)%2=0 then 'f' else 'm' end,floor(rand()*100000),floor(rand()*1000000),concat('note',i) ); set i=i+1; end while; return 1; end$$ delimiter ; select insert_student_data(); select count(*) from student; use stu; create table course ( c_id int(11) not null auto_increment , cname varchar(50) note varchar(500), primary key (c_id) ) engine=innodb default charset=utf8mb4; truncate table course; delimiter $$ drop function if exists insert_course_data $$ create function insert_course_data() returns int deterministic begin declare i int; set i=1; while i<=1000 do insert into course values(i , concat('course',i),floor(rand()*1000),concat('note',i) ); set i=i+1; end while; return 1; end$$ delimiter ; select insert_course_data(); select count(*) from course; use stu; drop table if exists sc; create table sc ( s_id int(11), c_id int(11), t_id int(11), score int(11) ) engine=innodb default charset=utf8mb4; truncate table sc; delimiter $$ drop function if exists insert_sc_data $$ create function insert_sc_data() returns int deterministic begin declare i int; set i=1; while i<=50000000 do insert into sc values( i,floor(rand()*1000),floor(rand()*10000000),floor(rand()*750)) ; set i=i+1; end while; return 1; end$$ delimiter ; select insert_sc_data(); commit; select insert_sc_data(); commit; create index idx_s_id on sc(s_id) ; create index idx_t_id on sc(t_id) ; create index idx_c_id on sc(c_id) ; select count(*) from sc; use stu; drop table if exists teacher; create table teacher ( t_id int(11) not null auto_increment , tname varchar(50) , note varchar(500),primary key (t_id) ) engine=innodb default charset=utf8mb4; truncate table teacher; delimiter $$ drop function if exists insert_teacher_data $$ create function insert_teacher_data() returns int deterministic begin declare i int; set i=1; while i<=10000000 do insert into teacher values(i , concat('tname',i),concat('note',i) ); set i=i+1; end while; return 1; end$$ delimiter ; select insert_teacher_data(); commit; select count(*) from teacher;
This is oracle Test and build data scripts create tablespace scott_data datafile '/home/oracle/oracle_space/sitpay1/scott_data.dbf' size 1024m autoextend on; create tablespace scott_index datafile '/home/oracle/oracle_space/sitpay1/scott_index.dbf' size 64m autoextend on; create temporary tablespace scott_temp tempfile '/home/oracle/oracle_space/sitpay1/scott_temp.dbf' size 64m autoextend on; drop user scott cascade; create user scott identified by tiger default tablespace scott_data temporary tablespace scott_temp ; grant resource,connect,dba to scott; drop table student; create table student ( s_id number(11) , sno number(11) , sname varchar2(50), sage number(11), ssex varchar2(8) , father_id number(11), mather_id number(11), note varchar2(500) ) nologging; truncate table student; create or replace procedure insert_student_data is q number(11); begin q:=0; for i in 1..50 loop insert /*+append*/ into student select rownum+q as s_id,rownum+q as sno, concat('sutdent',rownum+q ) as sname,floor(dbms_random.value(1,100)) as sage,'f' as ssex,rownum+q as father_id,rownum+q as mather_id,concat('note',rownum+q ) as note from dual connect by level<=1000000; q:=q+1000000; commit; end loop; end insert_student_data; / call insert_student_data(); alter table student add constraint pk_student primary key (s_id); commit; select count(*) from student; create table course ( c_id number(11) primary key, cname varchar2(50), note varchar2(500) ) ; truncate table course; create or replace procedure insert_course_data is q number(11); begin for i in 1..1000 loop insert /*+append*/ into course values(i , concat('name',i),concat('note',i) ); end loop; end insert_course_data; / call insert_course_data(); commit; select count(*) from course; create table sc ( s_id number(11), c_id number(11), t_id number(11), score number(11) ) nologging; truncate table sc; create or replace procedure insert_sc_data is q number(11); begin q:=0; for i in 1..50 loop insert /*+append*/ into sc select rownum+q as s_id, floor(dbms_random.value(0,1000)) as c_id,floor(dbms_random.value(0,10000000)) t_id,floor(dbms_random.value(0,750)) as score from dual connect by level<=1000000; q:=q+1000000; commit; end loop; end insert_sc_data; / call insert_sc_data(); create index idx_s_id on sc(s_id) ; create index idx_t_id on sc(t_id) ; create index idx_c_id on sc(c_id) ; select count(*) from sc; create table teacher ( t_id number(11) , tname varchar2(50) , note varchar2(500) )nologging ; truncate table teacher; create or replace procedure insert_teacher_data is q number(11); begin q:=0; for i in 1..10 loop insert /*+append*/ into teacher select rownum+q as t_id, concat('teacher',rownum+q ) as tname,concat('note',rownum+q ) as note from dual connect by level<=1000000; q:=q+1000000; commit; end loop; end insert_teacher_data; / call insert_teacher_data(); alter table teacher add constraint pk_teacher primary key (t_id); select count(*) from teacher;