[Database series] arrangement of basic knowledge points of database

Keywords: Database Big Data

The content comes from the job hunting Dictionary of kingly programmers

1, Basic concepts

1. Data model

The core and foundation of database system is data model. Generally speaking, a data model is a set of strictly defined concepts. These concepts accurately describe the static characteristics, dynamic characteristics and integrity constraints of the system. Therefore, the data model generally consists of three parts: data structure, data operation and integrity constraints

  1. Data structure: a collection of object types stored in a database. It is used to describe the constituent objects of the database and the relationships between objects
  2. Data operation: refers to the collection of operations allowed for various object instances in the database, including operations and their related operation rules
  3. Integrity constraint: refers to a set of general integrity rules followed by data and its connections in a given data model, which can ensure the correctness and consistency of data

According to the different application purposes of the model, the data model is divided into two categories:

  • first kind
    • 1) Conceptual model: also known as information model. It models data and information according to the user's point of view, which is mainly used in database design
  • Type II
    • 2) Logical model: it mainly includes hierarchical model, mesh model, relational model, object-oriented model and object relational model
    • 3) Physical model: it is the lowest abstraction of data. It describes the representation and access methods of data in the system, and the storage and access methods on disk or tape. It is oriented to the computer system

Relational model is the most important data type at present. Relational database system uses relational model as the organization of data

  • The logical structure of data in the relational model is a two-dimensional table, or the relational data structure is a table
  • The data operations of relational data model mainly include querying, inserting, deleting and updating data
  • The integrity constraints of relational model include three categories: entity integrity, reference integrity and user-defined integrity
    • Entity integrity rule of relationship model: if attribute (refers to one or A group of attributes) A is the primary attribute of basic relationship R, then A cannot be null (A direct conclusion can be drawn from this rule: the primary key cannot be null)
    • Referential integrity rule of relationship model: if attribute (or attribute group) F is the foreign key of a basic relationship R and it corresponds to the primary key of basic relationship R1, the value on each F in R is either null or equal to the primary key value in R1

2. Primary key and foreign key

  • Candidate code: an attribute group that can uniquely identify a tuple in a relationship (two-dimensional table)
  • Primary key: if a table has multiple candidate codes, select one of them as the primary key
  • Foreign key: if an attribute set in relationship mode R is not the primary key of R, but the primary key of another relationship R1, the attribute set is the foreign key of relationship mode R. A foreign key represents the relationship between two relationships (tables). A table with a foreign key of another relationship as the primary key is called the master table, and a table with this foreign key is called the slave table of the master table
  • Primary attribute and non primary attribute: the attributes of candidate codes are called primary attributes. Attributes that are not included in any candidate code are called non primary attributes

3. Services

Transaction refers to a user-defined sequence of database operations. These operations are either done or not done at all. It is an inseparable work unit

Transaction has four characteristics: atomicity, consistency, isolation and persistence. Referred to as ACID property

4. Index

An index is a structure that sorts the values of one or more columns in a database. Using an index, you can quickly access specific information in a database table

Advantages and disadvantages of setting indexes for tables:

  • benefit
    • By creating a unique index, you can ensure the uniqueness of each row of data in the database table
    • It can greatly speed up the retrieval of data (the main reason for creating indexes)
    • When using group by and order by clauses for data retrieval, it can also significantly reduce the time of grouping and sorting in the query
    • It can accelerate the connection between tables, especially in realizing the referential integrity of data
  • Disadvantages
    • First, it increases the storage space of the database
    • Second, it takes more time to insert and delete data (because the index will change accordingly)

Indexes are built on some columns in the database table. When creating an index, you should consider which columns can create an index and which columns cannot:

  • In general, you should create indexes on these columns
    • 1) Creating indexes on columns that often need to be searched can speed up the search
    • 2) Create an index on a column that is the primary key, which forces the uniqueness of the column and the arrangement structure of the data in the organization table
    • 3) Create indexes on columns that are often used for connection. These columns are mainly foreign keys, which can speed up the connection
    • 4) Create an index on a column that often needs to be searched by range because the index is sorted and its specified range is continuous
    • 5) Create an index on the columns that often need to be sorted because the index has been sorted. In this way, the query can use the sorting of the index to speed up the sorting query time
    • 6) Create an index on the column that is often used in the WHERE clause to speed up the judgment of conditions
  • In general, these columns that should not be indexed have the following characteristics
    • 1) Columns that are rarely used in queries should not be indexed. It is rarely used, so even creating an index will not bring much performance improvement. The index will bring the burden of space and maintenance
    • 2) Columns with few data values should not be indexed. For example, the data rows of the result set account for a large proportion of the data rows in the table, that is, a large proportion of the data rows to be searched in the table. Increasing the index does not significantly accelerate the retrieval speed
    • 3) Columns defined as data types such as text and bit should not be indexed. Because the data volume of these columns is either quite large or few values, it is not conducive to the use of indexes
    • 4) When the modification operation is much larger than the retrieval operation, the index should not be created. Because the modification performance and retrieval performance are contradictory. When the index is increased, the retrieval performance will be improved, but the modification performance will be reduced. When the index is reduced, the performance of modification will be improved and the performance of retrieval will be reduced

5. View

A view is a table that is exported (or attempted) from one or more base tables. Unlike the base table, it is a virtual table

The database only stores the definition of the view, not the data corresponding to the view. These data are still stored in the original basic table. Therefore, when the data in the basic table changes, the data queried in the view also changes. In this sense, the view is like a window through which you can see the data you are interested in and its changes in the database

Once a view is defined, it can be queried and deleted like the basic table

2, SQL statement

SQL statements mainly include:

1. Data definition


Define base table

CREATE TABLE <Table name> (<Listing> <data type> [Column level integrity constraints]
                     [, <Listing> <data type> [Column level integrity constraints]]
                     [, <Table level integrity constraints>]);
  • primary key (A1,A2,A3,...): Specifies the primary key attribute set
  • foreign key (A1,A2,A3,...) references T2: declare that the value of any tuple in the relationship on the attribute (A1,A2,A3,...) must correspond to the value of a tuple in T2 on the main code attribute

Data type:

  • int: shaping. Equivalent to full name integer
  • smallint: small integer type
  • real, double precision: floating point number and double precision floating point number (precision is machine related)
  • float(n): a floating-point number with a precision of at least N bits
  • char(n): fixed length string
  • varchar(n): variable length string

Example: create a "Student information" table Student:

 Sname CHAR(20) UNIQUE,
 Ssex CHAR(2),
 Sdept CHAR(20)


Modify basic table

ALTER TABLE <Table name>
[ADD <New column name> <data type> [integrity constraint]]
[DROP <integrity constraint>]
[MODIFY COLUMN <Listing> <data type>];
  • ADD clause: ADD new columns and new integrity constraints
  • DROP clause: deletes the specified integrity constraint
  • MODIFY COLUMN clause: modify the definition of the original column, including column name and data type


ALTER TABLE Student ADD S_entrance DATE;     //Add the "enrollment time" column to the Student table, and its data type is date type
ALTER TABLE Student MODITY COLUMN Sage INT;  //Change the data type of age from character type to integer
ALTER TABLE Student ADD UNIQUE(Sname);       //Add the constraint that the Student table Sname must take a unique value


Delete base table

  • RESTRICT: deletion is restricted. The basic table to be deleted cannot be referenced by other table constraints (such as check, foreign key and other constraints), and cannot have views, triggers, stored procedures or functions. If there are objects that depend on the table, the table cannot be deleted
  • CASCADE: there are no restrictions on deletion. While deleting the table, related dependent objects, such as views, will be deleted together

2. Data query


SELECT [ALL | DISTINCT] <Target list expression> [, <Target list expression>]...
FROM <Table name or view name> [, <Table name or view name>]...
[WHERE <Conditional expression>]
[GROUP BY <Column name 1> [HAVING <Conditional expression>]]
[ORDER BY <Column name 2> [ASC | DESC]]; 
  • ALL: display ALL (without de duplication)
  • DISTINCT: remove duplicates

The meaning of the whole SELECT statement is to find out the tuples that meet the conditions FROM the basic table or view specified in the FROM clause according to the conditional expression of the WHERE clause, and then SELECT the attribute values in the tuples according to the target list expression in the SELECT clause to form the result table
If there is a GROUP BY clause, the resu lt s are grouped according to the value of < column name 1 >, and tuples with equal column values of this attribute are a group. Aggregate functions are usually used in each group. If the GROUP BY clause has a HAVING clause, only groups that meet the specified conditions are output
If there is an ORDER BY clause, the resu lt table is also arranged in ascending or descending order of the values of < column name 2 >


The WHERE clause can be filtered using some of the following conditional expressions:

  • =: Specifies that the value of the property is the given value
  • IS: such as IS NULL. Cannot be replaced by =
  • like: String Matching
    • %: match any substring
    • _: Match any character
  • and,or,not
  • IN(...): Specifies that the value of the attribute is the value given IN
SELECT * from Student WHERE Sname='Bill Gates';                   //The name is Bill Gates
SELECT * from Student WHERE Sname like '%Bill%';                  //The name contains Bill's name
SELECT * from Student WHERE Sage BETWEEN 20 AND 23;               //Aged 20-23
SELECT Sname , Ssex from Student WHERE Sdept IN('CS','IS','MA');  //CS, IS or MA
SELECT * FROM Student WHERE Sage IS NULL;                         //No age information


Sort the query results by ascending (ACS) or descending (DESC) of one or more attribute columns. Ascending is the default


SELECT * FROM Student ORDER BY Sdept, Sage desc; //Sort by major in ascending order, and then sort by age in descending order for the same major


Can be used to force SELECT to return the specified number of records

Accept 1 or 2 numeric parameters. Argument must be an integer constant:

  • 1 Parameter: indicates the number of top record lines returned
  • 2 parameters: the first parameter specifies the offset of the first returned record line (calculated from 0), and the second parameter specifies the maximum number of returned record lines


SELECT * FROM Student LIMIT 5, 10;  //Return record line 6-15
SELECT * FROM Student LIMIT 5;      //Return the first 5 record lines

5) Aggregate function

Aggregation functions include count, sum, avg, max and min

  • Total: select count(*) as totalcount from table1;
  • Sum: select sum(field1) as sumvalue from table1;
  • Average: select avg(field1) as avgvalue from table1;
  • Max: select max(field1) as maxvalue from table1;
  • Min: select min(field1) as minvalue from table1;


Tuples are grouped according to the values of one or more attributes, and those with the same values are a group

After grouping Aggregate function Will act on each group, that is, each group has a function value

If it is required to filter these groups according to certain conditions after grouping, and finally only the groups that meet the specified conditions are output, use the HAVING phrase to specify the filtering conditions


//Group by age, count the number of people at each age, and output (age, number of people at that age)
select Sage, count(*) from Student group by Sage;
//Group by age, count the number of people at each age, select the group whose number is greater than 1, and output (age, number of people at that age)
select Sage, count(*) from Student group by Sage having count(*) > 1;

7) Connection query

A query involves multiple tables

Suppose there are two tables - Student table and SC table (course selection table):

  • Inner connection (natural connection): when inner connection is used, if some students in the Student do not choose courses, there is no corresponding tuple in SC. The final query result discards the information of these students
  • External connection: if you want to list the basic information of each Student and their course selection based on the Student table. Even if a Student does not choose a course, it will still be displayed in the query result (fill in the blank value on the attribute of SC table). You need to use an external connection


//Internal connection: query each student and their elective courses (students who do not choose courses will not be listed)
SELECT Student.*, SC.*
FROM Student , SC
WHERE Student.Sno=SC.Sno;

//External connection: query each student and their elective courses (students who do not choose courses will also be listed)
SELECT Student.*, SC.*
FROM Student LEFT JOIN SC ON(Student.Sno=SC.Sno);

3. Data operation


Insert tuple

INTO table1(field1,field2...)

If the INTO statement does not specify any attribute column name, the newly inserted tuple must have a value on each attribute column


INSERT INTO Student(Sno, Sname, Ssex, Sdept, Sage)
VALUES('201009013', 'Wang Ming', 'M', 'CS', 23);


Modify (update) data

UPDATE table1
SET field1=value1, field2=value2
WHERE Range;

The function is to modify tuples that meet the conditions of WHERE clause in the specified table. If the WHERE clause is omitted, all tuples in the table will be modified


UPDATE Student
SET Sage=22
WHERE Sno='201009013';


Delete element

FROM table1
WHERE Range;

The function is to delete tuples that meet the conditions of WHERE clause in the specified table. If the WHERE clause is omitted, all tuples in the table are deleted. But the table still exists


FROM Student
where Sno='201009013';

3, Examples

    Nickname Varchar(30) NOT NULL
2) select * from tableQQ where Nickname='QQ' order by ID desc;
3) delete from tableQQ where ID=1234;
4) insert into tableQQ values(5555,'1234');
5) drop table tableQQ;

1) SELECT sc.sno from sc , c
    where sc.cno=c.cno and c.cname='db';
2) SELECT sno, avg(grade) as g from sc
    group by sno order by g desc limit 1;
3) SELECT cno, count(sno) from sc
    where grade > 90 group by cno;
4) SELECT s.sno, s.sname from s, (select sc.sno FROM sc, c where sc.cno=c.cno
          and c.cname in ('math', 'english') group by sno having count
          (DISTINCT c.cno)=2)x
    where s.sno=x.sno;
5) SELECT s.sno, s.sname , avg(sc.grade) as avggrade from s, sc, (select sno
        FROM sc where grade<60 group by sno having count(DISTINCT cno)>=2)x
    where s.sno=x.sno and sc.sno=x.sno group by s.sno;
6) SELECT s.sname from s,
  (select sno, grade from sc where cno in (select cno from c where
  (select sno, grade from sc where cno in (select cno from c where
   where s.sno=A.sno and s.sno=B.sno and A.grade>B.grade;

Posted by marcth on Fri, 10 Sep 2021 18:35:40 -0700