Speaking of MySQL Optimized words , Most people must be familiar with it . We have already remembered these key words in our memory reserve : Avoid using SELECT*、 Avoid using NULL Value judgment 、 Index as needed 、 Optimize MySQL Parameters ...... But do you really understand these optimization techniques and how they work ? Can you fully apply it to our development process ? I think it remains to be seen . therefore , This article will introduce in detail MySQL Optimization techniques and their corresponding technical principles , I hope that after reading it , Can understand these optimization schemes more clearly and directly , And apply it to our jobs .


Principle analysis


MySQL Framework principle

Before that, we need to understand MySQL The working structure of each part of the component , It's easy for us to understand MySQL The server . Here's the picture :


MySQL The work structure can be roughly divided into three layers , At the top is the client , such as : Connection processing 、 Authorized certification 、 Security and other functions are handled in this layer .

MySQL Its core services are in the middle layer , Including query parsing 、 analysis 、 Optimize 、 cache 、 Built in functions . All cross storage engine functions are also implemented at this level : stored procedure 、 trigger 、 View etc. .

The bottom layer is responsible for the storage engine MySQL Data storage and extraction in . and Linux The file system under is similar to , Each storage engine has its advantages and disadvantages . The middle service layer passes through API Communicating with the storage engine , these API The interface masks the differences between different storage engines .


MySQL Query principle

We want to make MySQL Get higher query performance , First of all, we need to understand MySQL How to optimize and execute queries . Let's take a look first , When we turn to MySQL When you send a request , What did it do ?


client / Server communication protocol

MySQL client / Server side communication protocols don't happen at the same time , At any moment , Or the server sends data to the client , Either the client sends data to the server , Once one end starts sending messages , The other end has to receive the entire message to respond to it , So we can't and don't have to cut a message into small pieces and send it independently , There's no way to control the flow .

The client sends the query request to the server in a separate packet , So when the query statement is very long , Need to set up max_allowed_packet Parameters . But here's the thing , If the inquiry is really too big , The server will refuse to receive more data and throw an exception .

The opposite is true , There is usually a lot of data that the server responds to the user , It's made up of multiple packets . But when the server responds to a client request , The client must receive the whole returned result completely , Instead of simply taking the first few results , Then let the server stop sending . So in actual development , Try to keep the query simple and return only the necessary data , It's a good habit to reduce the size and number of packets between communications , This is also the query to avoid the use of SELECT* And add LIMIT One of the reasons for the restriction .

The query cache

Before parsing a query statement , If the query cache is turned on , that MySQL Will check whether the query statement hits the data in the query cache . If the current query just hits the query cache , Directly return the result in the cache after checking the user right once . In this case , The query will not be parsed , No execution plan will be generated , Not to carry out .

MySQL Store the cache in a reference table ( Don't understand it as table, It can be thought of as similar to HashMap Data structure of ), Index through a hash value , This hash value passes through the query itself 、 The current database to query 、 The client protocol version number and other information that may affect the result can be calculated . So the two queries are different in any character ( for example : Space 、 notes ), Will cause the cache to miss .

If the query contains any user-defined functions 、 Storage function 、 User variables 、 A temporary table 、mysql The system table in the library , Its query results Will not be cached . Like functions NOW() perhaps CURRENT_DATE() Because of different query time , Return different query results , Another example includes CURRENT_USER perhaps CONNECION_ID() The query statement of will return different results because of different users , There is no point in caching such query results .

Since it's a cache , It will fail. , When does the query cache fail ?MySQL The query caching system of will track every table involved in the query , If these watches ( Data or structure ) change , Then all cached data related to this table will be invalidated . Because of that , In any write operation ,MySQL All caches of the corresponding table must be set to fail . If the query cache is very large or fragmented , This operation may bring a lot of system consumption , It even causes the system to freeze for a while . Moreover, the extra consumption of query cache to the system is not only in the write operation , Read operations are no exception :

  1. Any query statement must be checked before it starts , Even this one SQL Statement will never hit cache

  2. If the query results can be cached , So after execution , The results are cached , There will also be additional system consumption

Based on this , We need to know that query caching does not always improve system performance , Caching and invalidation will bring extra consumption , Only when the resource saving brought by cache is greater than the resource consumed by itself , To improve the performance of the system . But how to evaluate whether opening cache can bring performance improvement is a very difficult thing , It's not in the scope of this article . If the system does have some performance problems , You can try to open the query cache , And make some optimization in database design , such as :

  1. Replace a large watch with several small watches , Be careful not to over design

  2. Batch insert instead of circular single insert

  3. Reasonably control the size of cache space , Generally speaking, its size is set to tens of megabytes

  4. Can pass  SQL_CACHE and  SQL_NO_CACHE To control whether a query statement needs to be cached

The final advice is not to open the query cache easily , Especially write intensive applications . If you can't help it , Can be query_cache_type Set to DEMAND, At this time, only joining in SQL_CACHE The query will go to cache , Other queries do not , This gives you very free control over which queries need to be cached .

Of course, the query caching system itself is very complex , What is discussed here is only a small part , Other deeper topics , such as : How does cache use memory ? How to control the fragmentation of memory ? How does the transaction affect the query cache and so on , Readers can read relevant materials by themselves , Here's the right to throw a brick and draw a jade .

Parsing and preprocessing

MySQL Pass the keyword will SQL Statement parsing , And generate a corresponding parse tree . This process parser mainly validates and parses through syntax rules . such as SQL Whether the wrong keywords are used or the order of keywords is correct, etc . Pretreatment will be based on MySQL The rule further checks that the parse tree is legal . For example, check whether the data table and data column to be queried exist .


Query optimization

The syntax tree generated through the previous steps is considered legal , And the optimizer turns it into a query plan . Most of the time , A query can be executed in many ways , At the end of the day, they all return the corresponding results . The role of the optimizer is to find the best execution plan .

MySQL Use a cost based optimizer , It attempts to predict the cost of a query using an execution plan , And choose the one with the lowest cost . stay MySQL You can query the last_query_cost To calculate the cost of the current query .

mysql> select * from t_message limit 10;... Omit the result set
mysql> show status like 'last_query_cost';+-----------------+-------------+| Variable_name   | Value       |+-----------------+-------------+| Last_query_cost | 6391.799000 |+-----------------+-------------+

The results in the example indicate that the optimizer thinks it needs to do something about 6391 Random search of data pages can complete the above query . This result is based on the statistics of some columns , These statistics include : Number of pages per table or index 、 The cardinality of the index 、 Index and data row length 、 Index distribution, etc .

There are many reasons for MySQL Choose the wrong execution plan , For example, the statistics are not accurate 、 No consideration will be given to operating costs that are not under its control ( User defined functions 、 stored procedure )、MySQL The best we think is different from what we think ( We want the execution time to be as short as possible , but MySQL It's worth choosing what it thinks is less expensive , But a small cost doesn't mean a short execution time ) wait .

MySQL The query optimizer for is a very complex part , It uses a lot of optimization strategies to generate an optimal execution plan :

  • Redefine the order in which tables are associated ( When multiple tables are associated for query , Not necessarily according to SQL In the order specified in , But there are some techniques for specifying the order of Association )

  • Optimize  MIN() and  MAX() function ( Find the minimum value of a column , If the column has an index , Just look for B+Tree The leftmost end of the index , On the contrary, we can find the maximum , See below for details )

  • Terminate query ahead of time ( such as : Use Limit when , The query will be terminated immediately after finding a result set that meets the number )

  • Optimization of sorting ( In the old version MySQL Will use two transfers to sort , That is, first read the row pointer and the fields to be sorted and sort them in memory , Then read the data lines according to the sorting results , The new version uses a single transmission sequence , That is to read all data lines at once , Then sort according to the given column . about I/O Intensive application , It's a lot more efficient )

With MySQL Continuous development , The optimization strategy used by the optimizer is also evolving , Here are just a few very common and easy to understand optimization strategies , Other optimization strategies , Let's check it by ourselves .

Query execution engine

After the parsing and optimization phase ,MySQL The corresponding execution plan will be generated , The query execution engine executes the instructions given by the execution plan step by step to get the result . Most operations of the whole execution process are completed by calling the interface implemented by the storage engine , These interfaces are called handler API. Each table in the query process consists of a handler An example shows . actually ,MySQL In the query optimization phase, we create a handler example , The optimizer can get information about tables based on the interfaces of these instances , Include all column names of the table 、 Index statistics, etc . The storage engine interface provides a lot of functions , But there are only dozens of interfaces at the bottom , These interfaces complete most operations of a query like building blocks .

Returns the result to the client

The last stage of query execution is to return the result to the client . Even if you can't find the data ,MySQL The information about this query will still be returned , For example, the number of rows affected by the query and the execution time .

If the query cache is turned on and the query can be cached ,MySQL The results will also be stored in the cache .

The result set return to the client is an incremental and step-by-step process . There may be MySQL When generating the first result , Start to return the result set to the client step by step . In this way, the server does not need to store too many results and consume too much memory , You can also let the client get the return result in the first time . It should be noted that , Each row in the result set is satisfied with one ① The packet transmission of the communication protocol described in , Re pass TCP Protocol for transmission , During transmission , Is likely to MySQL The packets are cached and sent in bulk .

Let's sum up later MySQL The whole query execution process , Generally speaking, it is divided into 6 A step :

  1. Client to MySQL The server sends a query request

  2. The server first checks the query cache , If hit cache , Returns the result stored in the cache immediately . Otherwise move on to the next stage

  3. Server run SQL analysis 、 Preprocessing 、 Then the optimizer generates the corresponding execution plan

  4. MySQL According to the execution plan , To invoke the storage engine API To execute the query

  5. Returns the result to the client , Also cache query results


Optimization plan


Optimization Suggestions

Looking at the above principles , You must be clear by now MySQL How it works , Next, I will optimize it from the following aspects .

SQL Optimize

We need to pay attention to these points : Choose the one that needs to be optimized first SQL, Usually from Explain and Profile set out , Always drive large result sets with small result sets , Sort in the index , Use the smallest Columns, Use the most effective filtering conditions , Avoid complex JOIN And subquery . The specific optimization is as follows :

analysis SQL Frequency of execution

show status
for example : It's mainly analytical reading , It's mainly about writing 

Inefficient positioning SQL

 Slow query log location
-log-slow-queries = xxx( Specify a filename )
View the current thread in progress , Including thread state 、 Whether to lock the table 

analysis SQL Implementation plan

explain "your sql"
desc "your sql"
-  Partial parameter analysis
SIMPLE  A simple watch , Do not use table joins or subqueries
PRIMARY  Main query , That is, the outer query
SUBQUER  The first subquery select
ALL  Full table scan
index  Index full scan
range  Index range scan
ref  Scan with non unique index or prefix of unique index
eq_ref  similar ref, The index used is the only index
const/system  At most one matching row in a single table
NULL  No access to tables or indexes , Direct result 

show profile analysis SQL

select @@have_profiling  Do you support
select @@profiling  Open or not
perform  "your sql"
show profiles 
show profile block io for QUERY 17

Index optimization

Here I will analyze from three aspects , It's the type , Method , establish :



Examples are as follows :

Storage classification of index

B-TREE Indexes : common , Most of them support
HASH Indexes : Only memory Engine support
R-TREE Indexes : The spatial index is MyISAM A special index type of , Mainly used for geospatial data types
full-text Indexes : Full-text index ,MyISAM A special index type of ,innodb from 5.6 Start supporting 

Index creation and deletion

 Add index
ALTER Table `table_name` ADD PRIMARY KEY(`column`)
ALTER Table `table_name` ADD UNIQUE(`column`)
ALTER Table `table_name` ADD INDEX(`column`)
ALTER Table `table_name` ADD FULLTEXT(`column`)
ALTER Table `table_name` drop index index_name

MySQL The ability to use an index

 Match full value
Matching value range query
Match the leftmost prefix
Just query the index ( Overlay query )
Match column prefix  ( Add prefix index )
Partially accurate + Part of the scope 

When you can't use an index

 With % On off like Inquire about
Implicit conversion of data type
The composite index query condition does not contain the leftmost part
Using an index is still slower than a full table scan
use or The conditions of separation 

Statements to optimize

Periodic optimization table

optimize table table_name  Merge table space fragments , Yes MyISAM、BDB、INNODB It works
If the prompt does not support , It can be used  mysql --skip-new  perhaps  mysql --safe-mode  To restart , So that other engines can support 

Common optimizations

 Try to avoid full scan , Yes where And orderby Index the columns of
Try to avoid where Use  !=  or  <>
Try to avoid where Clause with  or  Connection condition
Misuse % Causes a full table scan
Try to avoid where Clause to perform expression operations on fields
Try to avoid where Clause to perform function operations on fields
Overlay query , Return the required fields
Optimize nested queries , Associative queries are better than subqueries
Composite index or composite index , The leftmost index principle
use exist Instead of in
When the index column has a lot of duplicate data ,SQL Queries may not use indexes 

JOIN The optimization of the

JOIN principle

 stay mysql Use in Nested Loop Join To achieve join;
   A JOIN B: adopt A The result set of the table serves as the basis for the loop , One by one, the data in the result set is used as the filter condition to query the data in the next table , Then merge the results 

JOIN Optimization principle

1, Reduce... As much as possible Join  Statement Nested Loop  The total number of cycles , Driving big result sets with small result sets ;
2, Prioritize Nested Loop  The inner circle of ;
3, Guarantee Join  Statement on the driven table Join  The condition field has been indexed ;
4, expand join buffer Size ;

Database object optimization

Optimize table data type

PROCEDURE ANALYSE (16,256)  Exclude more than 16 individual , Greater than 256 Bytes of ENUM Suggest

Table split

 Split Vertically
For some columns 、 Some columns are not commonly used
Horizontal split
The watch is big
The data in the table is independent , It can be classified simply
Need to store a variety of media in the table 

Anti paradigm

 Add redundant columns 、 Add derived Columns 、 Regroup and split tables 

Use the middle watch

 Large amount of data query
Data statistics 、 Analyze the scene 

Optimize MySQL The server

MySQL Engine comparison :


The command of the table engine :

show engines;  see myql Supported storage engines
show variables like '%storage_engine';  see mysql The default storage engine
show create table table_name  See the storage engine used by the specific table 


1.  Provide transactions 、 Roll back 、 System crash repair capability 、 Multi version concurrent control transactions
2.  Support self adding column
3.  Support foreign keys
4.  Support transactions and their associated functions
5.  Support mvcc Row level lock of 


1.  Unsupported transaction 、 Row level locks are not supported , Only concurrent insert table locks are supported , Mainly used for high load select
2.  Supports three different storage structures : static state 、 dynamic 、 Compress 

MySQL Concurrent parameter adjustment

MySQL Concurrent parameters

max_connections  maximum connection , Default 151
back_log  Deal with a lot of connections in a short time , It can be increased appropriately
table_open_cache  Control all SQL The number of table caches that the execution thread can open , Constrained by other parameters
thread_cache_size  Control the number of cache client service threads , Speed up database connections , according to threads_created/connections To measure the appropriateness of
innodb_lock_wait_timeout  Control transaction waiting time for row lock , Default 50ms

Optimize memory to improve MySQL Background services

MyISAM Memory optimization

# Modify the configuration file of the corresponding server location  my.cnf
decision myisam Size of index block cache , It directly affects the access efficiency of the table , Suggest 1/4 Available memory
read_buffer  Read cache
write_buffer  Write cache 

InnoDB Memory optimization

innodb_buffer_pool_size  Maximum cache size to store engine table data and index data
innodb_old_blocks_pct LRU Algorithm   decision old sublist The proportion of
innodb_old_blocks_time LRU Algorithm   Data transfer interval 

Application optimization

Reason for use

 The importance of data
mysql  Service and its own performance bottleneck
Ensure the stable and reliable operation of large-scale system 

An optimization method

 Use connection pool
Reduce to  mysql  The real connection of
a.  Avoid duplicate execution of the same data ( The query cache )
b.  Use  mysql  cache (sql  cache )
Load balancing
a. LVS  Distributed
b.  Read / write separation ( Master master copy 、 Master slave replication ensures data consistency 

This period of MySQL Here comes the optimization explanation , Of course, the optimization plan is much more than that , You are also welcome to discuss in the message area , I hope this issue of the article can play a positive role in the process of daily development .


If you think that asamura's article will help you , Please search and follow on wechat 「 Shallow feathered IT hut 」 WeChat official account , I'll share my computer information knowledge here 、 Theoretical Technology 、 Tool resources 、 The software is introduced 、 The backend development 、 interview 、 A series of articles, such as thoughts on work and some thoughts on life . What you see and what you get , It's all about life . take your time , Work harder , You and I grew up together ...


I set up a technology group , If you want to know more about IT The technology of the industry and the problems in life , Welcome to join the group , Just add my wechat , Note that you can enter the group , We look forward to your participation .


Previous recommendation


hardcore !15 Zhang diagram Redis Why so soon?


We media people's essential multi platform synchronization 、 More than one article


Chrome Browser plug-in recommended programmer series


Free working platform for programmers


Programmer essential technology website Collection


Things about image processing


Shallow feather

Message area


Point a praise , Prove that you still love me