Python The actual combat community
Java The actual combat community
Long press to identify the QR code below , Add as needed
Scan code, pay attention to add customer service
Into the Python community ▲
Scan code, pay attention to add customer service
Into the Java community ▲
The author of this book is grass kneading
From the grass pinch （ID：chaycao）
Studying recently MySQL Knowledge of optimization . In this paper, data types and schema The optimization of aspects is introduced .
1. Choose the optimized data type
MySQL There are many supported data types , And how to choose the right data type , It's critical for performance . Here are a few principle Can help determine the data type ：
Smaller ones are usually better
The smallest data type that can store data correctly should be used as far as possible , Enough is good . This will take up less disk 、 Memory and cache , It takes less time to process .
Simple is good
When both data types are capable of storing a field , Choose the simple side , It's often the best choice . For example, integers and strings , Because the operation cost of integers is less than that of characters , So when you choose between the two , Choosing integers usually leads to better performance .
Try to avoid NULL
When the column can be NULL when , about MySQL Come on , More work needs to be done on indexes and value comparisons , Although the impact on performance is not great , But we should also try to avoid being designed to be NULL.
In addition to the above principles , When selecting a data type , What to follow step ： First identify the appropriate large type , For example, data 、 character string 、 Time and so on ; Then choose the specific type . We will discuss some specific types under the large type , First of all, the numbers , There are two types of ： Integers and real numbers .
1.1 Integer types
The type of integer and the space occupied are as follows ：
|Integer types||The size （bit）|
The storage range of integer type depends on the size of space ：-2^(N-1) to 2^(N-1)-1, among N Is the number of digits of space size .
Integer types have UNSIGNED Optional properties for , When declared , Negative numbers are not allowed , Then the storage range becomes ：0 to 2^(N)-1, It's doubled .
stay MySQL in , You can also specify the width for the integer type , for example INT(1), But that doesn't mean much , It doesn't limit the legal range of values , It can still store -2^31 to 2^31-1 Value , What's affected is with MySQL Interactive tools for displaying the number of characters .
1.2 Real number type
The comparison of real number types is as follows ：
|Real number type||The size （Byte）||Value range||Calculation accuracy|
|FLOAT||4||negative ：-3.4E+38～-1.17E-38; Nonnegative number ：0、1.17E-38~3.4E+38||Approximate calculation|
|DOUBLE||8||negative ：-1.79E+308～-2.22E-308; Nonnegative number ：0、2.22E-308~1.79E+308||Approximate calculation|
|DECIMAL||It's about precision||Same as DOUBLE||Calculate accurately|
As can be seen from the above ,FLOAT and DOUBLE There's a fixed amount of space , But at the same time because it uses standard floating point operations , So it can only be calculated approximately . and DECIMAL Then the accurate calculation can be realized , At the same time, it will take up more space , It's also more computationally expensive .
DECIMAL The amount of space occupied depends on the specified precision , for example DECIMAL(M,D)：
M Is the maximum length of the whole number , The value range is [1, 65], The default value is 10;
D Is the length after the decimal point , The value range is [0, 30], And D <= M, The default value is 0.
MySQL In the storage DECIMAL Type is stored as a binary string , Every time 4 Bytes of storage 9 A digital , When insufficient 9 When a , The space occupied by the numbers is as follows ：
|Number of numbers||Occupancy space （Byte）|
Before and after the decimal point will be stored separately , At the same time, the decimal point should also take up 1 Bytes . Here are two examples of calculation ：
DECIMAL(18, 9)： The length of the integral part is 9, Occupy 4 Bytes . The length of the decimal part is 9, Occupy 4 Bytes . And add the decimal point 1 Bytes , The total occupation of 9 Bytes .
DECIMAL(20, 9)： The length of the integral part is 14, Occupy 7（4+3） Bytes . The length of the decimal part is 9, Occupy 4 Bytes . And add the decimal point 1 Bytes , The total occupation of 12 Bytes .
It can be seen that DECIMAL It takes up a lot of space , So only when it is necessary to calculate decimals accurately , To use DECIMAL. besides , We can also use BIGINT Instead of DECIMAL, For example, you need to make sure that after the decimal point 5 The calculation of bits , You can multiply the value by 10 Of 5 To the power of BIGINT Storage , In this way, we can avoid the inaccuracy of floating-point storage calculation and DECIMAL The high cost of accurate calculation .
1.3 String type
The most commonly used string type is VARCHAR and CHAR.VARCHAR As Variable length string , Will use 1 or 2 Extra bytes record the length of the string , When the maximum length does not exceed 255 when , just 1 Byte record length , exceed 255, Then 2 Bytes .VARCHAR Of Applicable scenario ：
The maximum length is much larger than the average length ;
Less column updates , Avoid debris ;
Using complex character sets , Such as UTF-8, Each character can be stored in different bytes .
CHAR Then for Fixed length string , Allocate enough space according to the defined string length , Applicable scenario ：
Short in length ;
Similar in length , for example MD5;
Updated frequently .
except VARCHAR and CHAR, For storing large strings , have access to BLOB and TEXT type .BLOB and TEXT The difference is that ,BLOB In order to Binary system Way to store , and TEXT In order to character Way to store . This has also led to ,BLOB Type data has no concept of character set , Can't sort by character , and TEXT Type has the concept of character set , You can sort by character . The use of both scenarios , It's also determined by the storage format , When storing binary data , For example, pictures , You should use BLOB, And when you store text , For example, articles , You should use TEXT type .
1.4 Date and time type
MySQL The minimum time granularity that can be stored in is seconds , The common date types are DATETIME and TIMESTAMP.
|type||Store content||The size （Byte）||Time zone concept|
|DATETIME||The format is YYYYMMDDHHMMSS The integer of||8||nothing|
|TIMESTAMP||from 1970 year 1 month 1 The number of seconds since zero of the day||4||Yes|
TIMESTAMP The values displayed will depend on the time zone , It means that the values queried in different time zones will be different . Except for the differences listed above ,TIMESTAMP It also has a special attribute , When inserting and updating , If you don't specify the first TIMESTAMP The value of the column , The value of this column will be set to the current time .
We are in the process of development , Try to use TIMESTAMP, It's mainly because the size of its space only needs DATETIME Half of , More efficient space .
If we want to store dates and times in seconds , What do I do ？ because MySQL Not provided , So we can use it BIGINT Storing subtle level timestamps , Or use DOUBLE Store the fraction after seconds .
1.5 Select the identifier
Generally speaking, integers are the best choice for identifiers , Mainly because of its simplicity , Fast calculation , And can use AUTO_INCREMENT.
2. Paradigms and paradigms
Simply speaking , A normal form is the level of a design standard that the table structure of a data table conforms to . First normal form , Attributes are inseparable , current RDBMS The tables built by the system conform to the first paradigm . And the second paradigm , It is to eliminate non primary attribute pairs （ It can be understood as a primary key ） Part of it depends on . The third normal form eliminates the transitive dependence of non primary attributes on codes . Specific introduction , You can read the answer from Hu （https://www.zhihu.com/question/24696366/answer/29189700）
Strictly Normalization In the database of , Each fact will appear and only once , There will be no data redundancy , The benefits that this can bring are ：
Update operations are faster ;
Modify less data ;
The watch is smaller , Better in memory , Faster execution ;
Less need for DISTINCT or GROUP BY.
But also because the data is scattered in the tables , When querying, you need to associate tables . and Anti paradigm The advantage of this is There's no need to correlate , Redundant storage of data .
in application , There will be no complete normalization or complete anti normalization , It's often necessary to Mixing paradigms and anti paradigms , Using partially normalized schema, It's often the best choice . About database design , See such a paragraph on the Internet , You can feel it .
Database design should be divided into three realms ：
The first realm ： Just started database design , The importance of paradigms has not been fully understood . This is the anti paradigm design , There are usually problems .
Second realm ： With problems, solve problems , The real benefits of paradigms come to light , So we can design low redundancy quickly 、 Efficient databases .
Third realm ： after N Years of exercise , The limitations of paradigms are bound to be found . At this point to break the paradigm , Design a more rational anti paradigm part .
Paradigms are like tricks in martial arts , Beginners have delusions that they don't follow the moves , It's embarrassing to die . After all, the trick is the essence of master's summing up. . And with the improvement of martial arts , When you're good at it , It must be to discover the limitations of tricks , Or forget the trick , Or create your own tricks .
Just try , Plus a few more years , Always get to the second level , I always think the paradigm is classic . At this point, we can not rely too much on the paradigm , People who quickly break through the limitations of paradigms , Nature is a master .
4. Cache tables and summary tables
In addition to the anti paradigm mentioned above , Storing redundant data in tables , We can also create a completely separate summary table or cache table , To meet the needs of retrieval .
Cache table , Storage can be accessed from schema Other tables to get data from , That is, logically redundant data . and The summary table , It refers to the use of storage GROUP BY And so on to aggregate data , Calculated non redundant data .
Cache table , Can be used for Optimize search and search query statements , The trick to use here is to use different storage engines for cache tables , For example, the main table uses InnoDB, The cache table can use MyISAM, Get smaller index footprint . You can even put the cache table into a dedicated search system , for example Lucene.
The summary table , For the sake of Avoid the high cost of computing statistics in real time , The price comes from two aspects , One is to scan most of the data in the table , The second is to establish a specific index , Would be right UPDATE The operation has an impact . for example , Check the past of wechat 24 Number of friends per hour , Then you can fix every 1 Hour scan the whole table , After statistics, write a record to the summary table , When querying , Just query the latest 24 Bar record , You don't have to scan the whole table for statistics every time you query .
When using cache tables and summary tables , It has to be decided that Maintain data in real time still Rebuild regularly , It depends on our needs . Regular reconstruction compared to real-time maintenance , Can save more resources , There are fewer pieces of the table . And in reconstruction , We still need to make sure that the data is available for operation , Need to pass through “ Shadow table ” To achieve . Create a shadow table after the real table , When you fill in the data , Switch between shadow table and original table by renaming atoms .
5. To speed up the ALTER TABLE Speed of operation
When MySQL In execution ALTER TABLE In operation , It's often a new table , The data is then retrieved from the old table and inserted into the new table , Delete the old table again , If the watch is big , It will take a long time , And will lead to MySQL Service interruption of . In order to avoid service interruption , You can usually use Two techniques ：
On a machine that does not provide services ALTER TABLE operation , Then switch with the main database that provides the service ;
“ Shadow copy ”, Create a new table that has nothing to do with the original table , After the data migration is complete , Switch through the rename operation .
But also Not all ALTER TABLE Operations cause table rebuilding , For example, when changing the default value of a field , Use MODIFY COLUMN It will rebuild the table , While using ALTER COLUMN Table reconstruction is not performed , It's very fast . This is because ALTER COLUMN When changing the default value , The existing table will be modified directly .frm file （ Store the default value of the field ）, Instead of rebuilding the table .
《 High performance MySQL》
MySQL DECIMAL data type
Programmer column Scan code and pay attention to customer service Press and hold to recognize the QR code below to enter the group
Recent highlights are recommended ：
Here's a look Good articles to share with more people ↓↓