MYSQL 1) Application Scenario of Temporary Table 2) merge Storage Engine Query Application Scenario after Tabulation

Keywords: SQL Session Database MySQL

For practical work, let's sum up here. It's easy to forget. mark it:

1) Temporary table: Search the Internet and compare the temporary table with the memory table.

  • The table structure and data of temporary tables are stored in memory, and session starts and ends its life cycle.
  • The table structure of the memory table is stored in the database, and the data is stored in memory. It is recommended to use the memory table online, but I think my application scenario is more suitable for temporary tables.

My needs:
A table stores vcf data of children and parents (sequencing data, each sample contains millions of rows of individual data). I need to compare primary key of children with primary key of parents (note, primary key is PRIMARY KEY(col1,col2,col3,col4) form, need to compare col1,2,3 equally, col4 equally), match data between children and fathers, and match data with mothers. Pick up the matched data separately, and then inner join. In fact, this sentence of sql can be solved, but, because they are all in a table, and the data is tens of millions of rows level, time-consuming and imaginable.
After various attempts, the provisional table was finally selected:
First, take the vcf data of the child/father/mother from the same table and put it in the temporary table.

querySample ="create temporary table if not exists temp_table" + child +"(INDEX my_index_name(CHROM,POS,REF,ALT)) charset=latin1 as (select CHROM, POS, REF, ALT, round((AO/(AO+RO)),3) as AF from sampleinfor where SAMPLE=:sample and length(REF)=1 and length(ALT)=1)";
querySample ="create temporary table if not exists temp_table" + father +"(INDEX my_index_name(CHROM,POS,REF,ALT)) charset=latin1 as (select CHROM, POS, REF, ALT, round((AO/(AO+RO)),3) as AF from sampleinfor where SAMPLE=:sample and length(REF)=1 and length(ALT)=1)";
querySample ="create temporary table if not exists temp_table" + mother +"(INDEX my_index_name(CHROM,POS,REF,ALT)) charset=latin1 as (select CHROM, POS, REF, ALT, round((AO/(AO+RO)),3) as AF from sampleinfor where SAMPLE=:sample and length(REF)=1 and length(ALT)=1)";

Note that the columns that need to be compared in the future should be indexed to facilitate queries.

ok. Next, match the children with the father and the mother respectively.

String childFather = "create temporary table if not exists temp_fatherDiff" + father + "(PRIMARY KEY(CHROM,POS,REF(32),ALT(32))) charset=latin1 as (select child.CHROM,child.POS,child.REF,child.ALT,child.AF as ChildAF,father.ALT as FatherALT,father.AF as FatherAF, null as MotherALT, null as MotherAF,null as SampleId,null as FatherId,null as MotherId from temp_table" + child + " as child,temp_table" + father + " father where(child.CHROM=father.CHROM and child.POS=father.POS and child.REF=father.REF and child.ALT!=father.ALT))";

Similar to mothers, the focus here is to establish primary key to facilitate the use of index in subsequent inner join and where statements.

Finally, the data of children and parents are internally joined

String finalQuery = "select father.CHROM,father.POS,father.REF,father.ALT,father.ChildAF,father.FatherALT,father.FatherAF,mother.MotherALT,mother.MotherAF, :child as SampleId, :father as FatherId, :mother as MotherId" + " from temp_fatherDiff" + father + " father inner join temp_motherDiff" + mother + " mother on(father.CHROM=mother.CHROM and father.POS=mother.POS and father.REF=mother.REF)";

This temporary table exists in memory, the query efficiency is improved, the memory operation is faster, and the temporary table will disappear after the subsequent session closes, without artificial deletion.

2) merge Storage Engine Queries Data after Tables

Here's one of my scenarios. My gnomad data is based on the table of chromosome creation: table Chr1, table Chr2... ... Each table is at the level of tens of millions of rows, one multi-G size.
The front operation here is actually mentioned above. Comparing the inner join data of children with their parents, we can find out the different data between children and their parents. These data need left join gnomad data. ok, the problem is that gnomad is tabulated. Of course, we can use different tables union, but I have 24 tables, and the autosomal and exons are created according to the chromosomal tables. There are 48 tables in total. How long do sql statements have to be written? So the merge storage engine used here is queried in the form of a summary table.

The structure of the sub-tables here is the same, so the structure of the total table created later should be the same as the structure of the sub-tables, and the engine needs MRG_MYISAM(mysql)
The form of establishing a summary table is
create table gnochrall (CHROM varchar(32) not null, START int not null, END int not null, REF varchar(10000) not null, ALT varchar(10000) not null, AF_GNOMAD varchar(255) default null, AF_AFR varchar(255) default null, AF_AMR varchar(255) default null, AF_ASJ varchar(255) default null, AF_EAS varchar(255) default null, AF_FIN varchar(255) default null, AF_NFE varchar(255) default null, AF_OTH varchar(255) default null, primary key(CHROM(32),START,END,REF(255),ALT(255))) engine=MRG_MYISAM default charset=latin1 union=(gnochr1,gnochr2,gnochr3,gnochr4,gnochr5,gnochr6,gnochr7,gnochr8,gnochr9,gnochr10,gnochr11,gnochr12,gnochr13,gnochr14,gnochr15,gnochr16,gnochr17,gnochr18,gnochr19,gnochr20,gnochr21,gnochr22,gnochrX,gnochrY);

Emphasis is placed on the selection of MRG_MYISAM by engine, which sub-tables need to be merge summary table: union= (table1,tabl2,... ).

ok, so you can write a left join gnochrall on() in sql.
....................................................
end

Posted by GESmithPhoto on Thu, 20 Jun 2019 18:19:14 -0700