Basic knowledge and optimization of MySQL database

zydybaby 2021-01-21 21:13:09
basic knowledge optimization mysql database

MySQL The basic knowledge of database and the knowledge points of optimization , How much do you have ?

Recommended reading :


SQL Sentence basic knowledge and optimization

SQL What are the main types of sentences *

  • Data definition language DDL(Data Definition Language): There are mainly CREATE,DROP,ALTER And so on have operations on the logical structure , Including table structure 、 Views and indexes .
  • Database query language DQL(Data Query Language): Mainly with SELECT Mainly
  • Data manipulation language DML(Data Manipulation Language): It mainly includes INSERT,UPDATE,DELETE
  • Data control function DCL(Data Control Language): The main thing is that permission control can operate , Include GRANT,REVOKE,COMMIT,ROLLBACK etc. .

SQL What are the constraints ? **

  • Primary key constraint : A primary key is a combination of one or more columns in a table , Can uniquely identify each row in the table . A table has only one primary key , And the column of primary key constraint cannot be empty .
  • Foreign key constraints : Foreign key constraints are used to establish a relationship between two tables , You need to specify which column of the main table is referenced . Only the primary key of the primary table can be used as a foreign key by the secondary table , The columns of the constrained slave table may not be primary keys , Therefore, to create a foreign key constraint, you need to define the primary key of the main table first , Then define the foreign key from the table .
  • Unique constraint : Make sure that a column of data in the table does not have the same value , A table can define multiple unique constraints .
  • Default constraint : When new data is inserted , If the row does not specify data , The system assigns the default value to the line , If there is no setting, there is no default value , Then for NULL.
  • Check constraint :Check Will judge the validity of data through logical expression , Used to limit the range of input values for one or more columns . When a column updates data , The input must satisfy Check The constraints .

What is subquery ? **

Subquery : Use the results of one query in another

Subqueries can be divided into the following categories :

  • Scalar subquery : A subquery returns a value , have access to =,>,<,>=,<=,<> The scalar results of the subquery are compared with the scalar results of the subquery , Generally, subqueries are placed on the right side of the comparison .

    SELECT * FROM user WHERE age = (SELECT max(age) from user) // Ask the oldest person
  • Column query : The result of a subquery is n Row by column , It is generally used to query and return a field in a table . have access to IN、ANY、SOME and ALL Wait for the operator , Can't be used directly

    SELECT num1 FROM table1 WHERE num1 > ANY (SELECT num2 FROM table2)
  • Line sub query : A row of results returned by a subquery n Column

    SELECT * FROM user WHERE (age,sex) = (SELECT age,sex FROM user WHERE name="zhangsan")
  • Table sub query : The subquery is n That's ok n A data table of columns

    SELECT * FROM student WHERE (name,age,sex) IN (SELECT name,age,sex FROM class1) // Find the class in the student table 1 The students of the class

understand MySQL Several kinds of connection query ? ***

MySQl The connection query of can be divided into external connection , Internal connection , Cross connect

  • External connection

    External connection is mainly divided into left external connection (LEFT JOIN)、 Right connection (RIGHT JOIN)、 Full outer join .

    The left outer join : Display all the data in the left table and the qualified data in the right table , The unqualified data in the right table is null.

 Insert picture description here

Right connection : Display all the data in the left table and the qualified data in the right table , The unqualified data in the right table is null.

 Insert picture description here

MySQL All external connections are not supported in .

  • Internal connection : Only the data that meets the conditions are displayed

 Insert picture description here

  • Cross connect : A connection using Cartesian product .

    The cartesian product , Baidu Encyclopedia's explanation : Two sets X and Y The Cartesian product of is expressed as X × Y, The first object is X And the second object is Y One of the members of all possible ordered pairs . for example :A={a,b},B={0,1,2},A × B = {(a,0),(a,1),(a,2),(b,0),(b,1),(b,2)}

Examples are as follows : There are two tables divided into L Table and R surface .

L surface

a1 b1
a2 b2
a3 b3

R surface

b1 c1
b2 c2
b4 c3
  • The left outer join :select L.`*`,R.`*` from L left join R on L.b=R.b

    A B B C
    a1 b1 b1 c1
    a2 b2 b2 c2
    a3 b3 null null
  • Right connection :select L.`*`,R.`*` from L right join R on L.b=R.b

    B C A B
    b1 c1 a1 b1
    b2 c2 a2 b2
    b4 c3 null null
  • Internal connection :select L.`*`,R.`*` from L inner join R on L.b=R.b

    A B B C
    a1 b1 b1 c1
    a2 b2 b2 c2
  • Cross connect :select L.`*`,R.`*` from L,R

    A B B C
    a1 b1 b1 c1
    a1 b1 b2 c2
    a1 b1 b4 c3
    a2 b2 b1 c1
    a2 b2 b2 c2
    a2 b2 b4 c3
    a3 b3 b1 c1
    a3 b3 b2 c2
    a3 b3 b4 c3

mysql in in and exists The difference between ? **

in and exists Generally used for subqueries .

  • Use exists The appearance query will be performed first , Bring the data of each row into the inner table query to see if the conditions are met ; Use in Generally, we will first query the inner table to get the result set , Then query the surface to match the result set , Return the data .
  • in Indexes are used in both inner table and outer table queries .
  • exists The index is only used when querying the inner table
  • Generally speaking , When the result set of a subquery is large , Small in appearance, use exist More efficient ; When a subquery finds a smaller result set , When the appearance is bigger , Use in More efficient .
  • about not in and not exists,not exists Efficient than not in The high efficiency , Independent of the result set of the subquery , because not in For both internal and external tables, full table scanning is carried out , Indexes are not used .not exists The index on the table can be used in the subquery of .

varchar and char The difference between ? ***

  • varchar It means longer ,char It means the length is fixed . When the inserted characters exceed their length , In strict mode , The insertion is rejected and an error message is prompted , In general mode , Will intercept and insert . Such as char(5), No matter what the length of the inserted character is , All the lengths are 5, Insert characters less than 5, Then fill in the blanks . about varchar(5), If the length of the inserted character is less than 5, The length of the stored character is the length of the inserted character , It doesn't fill .
  • Different storage capacity , about char Come on , The maximum number of characters that can be stored is 255. about varchar, The maximum number of characters that can be stored is 65532.
  • Different storage speeds ,char Fixed length , Storage speed will be faster than varchar faster , But it takes up extra space in terms of space , It is a strategy of space for time . and varchar Space utilization will be higher , But storage is slow , It's a strategy of time for space .

MySQL in int(10) and char(10) and varchar(10) The difference between ? ***

int(10) Medium 10 Represents the length of the displayed data , and char(10) and varchar(10) Represents the size of the stored data .

drop、delete and truncate The difference between ? **

drop delete truncate
Speed fast Delete line by line , slow Faster
Roll back Cannot roll back Roll back Cannot roll back
Delete content Delete the entire table , data row 、 Indexes will be deleted The structure of the watch is still , Delete some or all of the data in the table The structure of the watch is still , Delete all data from the table

In general , Delete the entire table , Use drop, Delete part of the table data using delete, Keep the table structure and delete all the data of the table truncate.

UNION and UNION ALL The difference between ? **

union and union all The role of both is to merge two result sets together .

  • union The results are de duplicated and sorted ,union all Directly return the merged result , No duplication, no sorting .
  • union all Performance ratio of union Good performance .

What is a temporary watch , When will the temporary table be used , When to delete the temporary table ? *

MySQL In execution SQL Statement will temporarily create some tables to store intermediate result sets , This kind of table is called a temporary table , Temporary tables are only visible to the current connection , After the connection is closed , Temporary tables are deleted and free up space .

Temporary table is mainly divided into memory temporary table and disk temporary table . The memory temporary table uses MEMORY Storage engine , The disk temporary table uses MyISAM Storage engine .

Temporary tables are usually used in the following situations :

  • FROM Subqueries in
  • DISTINCT Query and add ORDER BY
  • ORDER BY and GROUP BY The temporary table will be generated when the clauses are different
  • Use UNION Queries produce temporary tables

How to optimize big table data query ? ***

  • Index optimization
  • SQL Statements to optimize
  • Horizontal split
  • Split Vertically
  • Create an intermediate table
  • Using caching technology
  • Fixed length tables are faster to access
  • The smaller the column, the faster the access

Learn about slow log queries ? Is the statistics too slow ? How to optimize slow queries ? ***

Slow queries are generally used to record queries whose execution time exceeds a critical value SQL Statement log .

Related parameters :

  • slow_query_log: Whether to enable slow log query ,1 Open for indication ,0 Means closing .
  • slow_query_log_file:MySQL Database slow query log storage path .
  • long_query_time: Slow query threshold , When SQL Statement query time is greater than the threshold , It will be recorded in the log .
  • log_queries_not_using_indexes: Queries that are not indexed are logged to the slow query log .
  • log_output: How to store logs .“FILE” Means to save the log to a file .“TABLE” Indicates that the log is stored in the database .

How to optimize slow queries ?

  • Analysis statement execution plan , see SQL Whether the index of the statement hits
  • Optimize the structure of the database , Decompose a table with many fields into multiple tables , Or consider building intermediate tables .
  • Optimize LIMIT Pagination .

Why set the primary key ? **

A primary key is a unique identifier that uniquely distinguishes each row in a table , If there is no primary key , Updating or deleting specific rows in a table can be difficult , Because you can't uniquely and accurately identify a line .

The primary key is usually self incrementing ID still UUID? **

Use self increasing ID The benefits of :

  • The length of the field is longer uuid It will be much smaller .
  • Automatic database numbering , Store in order , Easy to retrieve
  • Don't worry about duplicate primary keys

Use self increasing ID The shortcomings of :

  • Because it's self increasing , In some business scenarios , It's easy for others to find the business volume .
  • When data migration occurs , Or table merging can be very cumbersome
  • In a high concurrency scenario , Competing for self incrementing locks will reduce the throughput of the database

UUID: Universal unique identification code ,UUID It's based on the current time 、 Counter and hardware identification data are generated by calculation .

Use UUID The advantages of :

  • Unique identification , I don't think about repetition , In data splitting 、 Global uniqueness can also be achieved when merging .
  • It can be generated in the application layer , Improve the throughput of the database .
  • No need to worry about the leakage of business volume .

Use UUID The shortcomings of :

  • because UUID It's randomly generated , So random things happen IO, Affects insertion speed , And it will cause low utilization of hard disk .
  • UUID Large space occupation , The more indexes you build , The greater the impact .
  • UUID The size of the comparison between them is more self increasing ID It's a lot slower , Affect query speed .

Finally, let's draw a conclusion , General situation MySQL Self increasing is recommended ID. Because in MySQL Of InnoDB In the storage engine , A primary key index is a clustered index , Primary key index B+ The leaf nodes of the tree store the primary key values and data in order , If the primary key index is self increasing ID, Just put it back in order , If it is UUID,ID It's randomly generated , When data is inserted, it will cause a lot of data movement , Generate a lot of memory fragmentation , It causes the performance degradation of insertion .

Why should the field be set to not null? **

First of all, let's say ,NULL It's not the same as null , A null value does not take up space , and NULL It takes up space , So the field is set to NOT NULL After that, you can still insert null values .

The field is set to not null There are mainly the following reasons :

  • NULL Values affect the statistics of some functions , Such as count, encounter NULL value , This record won't count .

  • B Trees don't store NULL, So the index doesn't use NULL, It will cause the problems mentioned in the first point that can't be counted .

  • NOT IN Subquery in NULL In this case, all the returned results are null .

    for example user The table is as follows

    id username
    0 zhangsan
    1 lisi
    2 null

    select * from `user` where username NOT IN (select username from `user` where id != 0), This query should find zhangsan This data , But it turns out that null.

  • MySQL In comparison ,NULL Will participate in the comparison of fields , because NULL It is a special data type , Special number processing is needed in database processing , Increases the complexity of database processing records .

How to optimize the data access in the query process ? ***

In terms of reducing data access :

  • Use index correctly , Try to cover the index as much as possible
  • Optimize SQL Implementation plan

In terms of returning less data :

  • Data paging
  • Only return the required fields

From reducing servers CPU Cost considerations :

  • Rational use of sorting
  • Reduce the operation of comparison
  • Complex operations are handled on the client side

In terms of increasing resources :

  • Client multi process parallel access
  • Database Parallel Processing

How to optimize long and difficult query statements ? **

  • Decompose a large query into several small queries
  • Decompose associated query , Make caching more efficient

How to optimize LIMIT Pagination ? **

  • stay LIMIT When the offset is large , Query efficiency will be lower , You can record the maximum number of... Taken out each time ID, Next time you query, you can use ID The query

  • Build composite index

How to optimize UNION Inquire about **

If you do not need to de duplicate or sort the result set, it is recommended to use UNION ALL, It will be better .

How to optimize WHERE Clause ***

  • Not in where Used in clauses != and <> Making decisions is not the same as judging , This will result in abandoning the index for a full table scan .
  • Not in where Used in clauses null Or null judgment , Try to set the field to not null.
  • Use as much as possible union all Instead of or
  • stay where and order by Index the columns involved
  • Minimize the use of in perhaps not in, A full scan will be performed
  • stay where Using parameters in Clause results in a full table scan
  • To avoid the where Operation on fields and expressions or functions in clause will cause the storage engine to abandon the import and scan the whole table

SQL What's the reason for the slow execution of statements ? ***

  • If SQL Statements are only occasionally slow , Maybe the lock was encountered during execution , It could be redo log The log is full of , To put redo log Synchronize the data in the database to the disk .
  • If SQL Sentences are always slow , Maybe there is no index on the field, or the field has an index but it doesn't use an index .

SQL The order in which statements are executed ? *

right_table ON join_condition

The order of execution is as follows :

 Insert picture description here

  • FROM: Yes SQL Statement when executing a query , First, join the tables on both sides of the keyword in the form of Cartesian product , And produce a virtual table V1. A virtual table is a view , The data will come from the execution results of multiple tables .

  • ON: Yes FROM The result of the connection is ON Filter , And create a virtual table V2

  • JOIN: take ON Add the filtered left table , And create a new virtual table V3

  • WHERE: For virtual tables V3 Conduct WHERE Screening , Create a virtual table V4

  • GROUP BY: Yes V4 Group the records in , Create a virtual table V5

  • HAVING: Yes V5 To filter , Create a virtual table V6

  • SELECT: take V6 The results in are based on SELECT Screening , Create a virtual table V7

  • DISTINCT: Yes V7 The results in the table are de duplicated , Create a virtual table V8, If used GROUP BY Clause, you don't need to use DISTINCT, Because grouping is to group the only values in a column , And each group returns only one line of records , So all the records are h Is different .

  • ORDER BY: Yes V8 Sort the results in the table .

Database optimization

How to optimize the big watch ? ***

  • Limit the range of data : Avoid query statements without any restrictions on the range of data .
  • Read / write separation : The main database is responsible for writing , Read from the library .
  • Vertical sub table : Divide a table into multiple tables according to fields , Each table stores some of these fields .
  • Horizontal sub table : In the same database , Split the data of a table into multiple tables according to certain rules .
  • Optimize a single table : For fields in the table 、 Indexes 、 Inquire about SQL To optimize .
  • Add cache

What is a vertical sub table 、 Vertical sub database 、 Horizontal sub table 、 Horizontal sub database ? ***

Vertical sub table : Divide a table into multiple tables according to fields , Each table stores some of these fields . Generally, common fields are put into a table , Put the less frequently used fields in another table .

The advantage of vertical scale :

  • avoid IO Competition reduces the probability of locking tables . Because large fields are less efficient , First, there's a lot of data , It takes a long time to read . second , Large fields take up more space , Fewer rows are stored in a single page , Will make IO More operations .

  • It can better improve the query efficiency of popular data .

Vertical sub database : Classify tables by business , Deploy to different databases , Different databases can be put on different servers .

The advantage of vertical sub database :

  • Reduce coupling in the business , It is convenient to manage different businesses at different levels .
  • Can improve IO、 Number of database connections 、 Solve the bottleneck problem of stand-alone hardware resources .

Split Vertically ( sub-treasury 、 table ) The shortcomings of :

  • Primary key redundancy , Need to manage redundant columns
  • The processing of transactions becomes complicated
  • There is still a problem of large amount of data in a single table

Horizontal sub table : In the same database , Split the data of the same table into multiple tables according to certain rules .

The advantage of the horizontal scale :

  • The problem of large amount of data in a single table is solved
  • avoid IO Compete and reduce the probability of locking tables

Horizontal sub database : Split the data of the same table into different databases according to certain rules , Different databases can be put on different servers .

The advantage of horizontal sub pool :

  • It solves the bottleneck problem of large amount of data in a single database
  • IO Less conflict , Less competition for locks , Problems in one database do not affect other databases ( Usability ), Improve the stability and availability of the system

Horizontal split ( table 、 sub-treasury ) The shortcomings of :

  • Fragment transaction consistency is difficult to solve
  • Cross node JOIN Poor performance , The logic gets complicated
  • Data expansion is difficult , Difficult to maintain

In the system design, we should determine the scheme of vertical sub database and vertical sub table according to business coupling , Cache should be considered when data access pressure is not particularly high 、 Read write separation and other methods , If there's a lot of data , Or continuous growth can consider the level of sub library sub table , The logic involved in horizontal splitting is complex , Common solutions include client architecture and proxy architecture .

After the sub-library sub-table ,ID How to handle ? ***

After the database is divided into tables, each table cannot be ID from 1 Start , So we need a whole situation ID, Setting global ID There are mainly the following methods :

  • UUID: advantage : Locally generated ID, No remote call is required ; Global uniqueness, no repetition . shortcoming : Large space , Not suitable for indexing .

  • Database autoincrement ID: Use database auto increment after sub database and sub table ID, You need a library dedicated to generating primary keys , Every time the service receives a request , First insert a meaningless piece of data into this library , Get a database self incrementing ID, Use this ID Write data in sub database and sub table . advantage : Easy to implement . shortcoming : There is a bottleneck in the high parallel development . The system structure is as follows ( This is from the Internet )

 Insert picture description here

  • Redis Generate ID: advantage : Database independent , Good performance . shortcoming : The introduction of new components will increase the complexity of the system

  • Twitter Of snowflake Algorithm : It's a 64 Bit long Type ID, Among them is 1bit No need ,41bit As milliseconds ,10bit As working machine ID,12bit As serial number .

    1bit: first bit The default is 0, Because the first one in binary bit by 1 It's a negative number , however ID It can't be negative .

    41bit: Time stamp , In milliseconds .

    10bit: Record the working machine ID, among 5 individual bit It means the computer room ID,5 individual bit Represents a machine ID.

    12bit: It is used to record the differences generated in the same millisecond ID.

  • US mission Leaf Distributed ID generating system , Meituan comments distributed ID generating system

MySQL The principle and process of replication ? How to realize master-slave replication ? ***

MySQL Copy : In order to ensure the data consistency between master server and slave server , After inserting data into the primary server , The slave server will automatically synchronize the modified data from the master server .

The principle of master-slave replication :

There are three main threads in master-slave replication :binlog Threads ,I/O Threads ,SQL Threads .

  • binlog Threads : Responsible for writing data changes on the master server to the binary log (Binary log) in .
  • I/O Threads : Responsible for reading binary logs from the primary server (Binary log), And write to the relay log of the slave server (Relay log) in .
  • SQL Threads : Responsible for reading relay logs , Resolve the data changes that have been executed in the master server and replay them in the slave server

The replication process is as follows ( This is from the Internet ):

 Insert picture description here

  1. Master Before each transaction updates the data , Write the operation record to binlog in .
  2. Slave Connect from library Master Main library , also Master How many Slave How many binlog dump Threads . When Master Node binlog When something changes ,binlog dump All of them will be informed Slave, And the corresponding binlog Send to Slave.
  3. I/O Thread received binlog After content , Write it to the relay log (Relay log) in .
  4. SQL Thread read relay log , And replay it from the server .

Here is an easy to understand picture .
 Insert picture description here

The role of master-slave replication :

  • High availability and failover
  • Load balancing
  • The data backup
  • Upgrade test

Do you understand the separation of reading and writing ? ***

Read write separation mainly depends on master-slave replication , Master slave replication serves for the separation of read and write .

The advantage of separation of reading and writing :

  • The master server is responsible for writing , Read from the server , Alleviates lock competition
  • From the server you can use MyISAM, Improve query performance and save system overhead
  • Add redundancy , Improve availability

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云