Python The actual combat community
Java The actual combat community
Long press to identify the QR code below , Add as needed
Scan code, pay attention to add customer service
Into the Python community ▲
Scan code, pay attention to add customer service
Into the Java community ▲
The author of this book is grass kneading
From the grass pinch (ID:chaycao)
Studying recently MySQL Knowledge of optimization . In this paper, data types and schema The optimization of aspects is introduced .

1. Choose the optimized data type
MySQL There are many supported data types , And how to choose the right data type , It's critical for performance . Here are a few principle Can help determine the data type :
Smaller ones are usually better
The smallest data type that can store data correctly should be used as far as possible , Enough is good . This will take up less disk 、 Memory and cache , It takes less time to process .
Simple is good
When both data types are capable of storing a field , Choose the simple side , It's often the best choice . For example, integers and strings , Because the operation cost of integers is less than that of characters , So when you choose between the two , Choosing integers usually leads to better performance .
Try to avoid NULL
When the column can be NULL when , about MySQL Come on , More work needs to be done on indexes and value comparisons , Although the impact on performance is not great , But we should also try to avoid being designed to be NULL.
In addition to the above principles , When selecting a data type , What to follow step : First identify the appropriate large type , For example, data 、 character string 、 Time and so on ; Then choose the specific type . We will discuss some specific types under the large type , First of all, the numbers , There are two types of : Integers and real numbers .
1.1 Integer types
The type of integer and the space occupied are as follows :
Integer types | The size (bit) |
---|---|
TINYINT | 8 |
SMALLINT | 16 |
MEDIUMINT | 24 |
INT | 32 |
BIGINT | 64 |
The storage range of integer type depends on the size of space :-2^(N-1) to 2^(N-1)-1, among N Is the number of digits of space size .
Integer types have UNSIGNED Optional properties for , When declared , Negative numbers are not allowed , Then the storage range becomes :0 to 2^(N)-1, It's doubled .
stay MySQL in , You can also specify the width for the integer type , for example INT(1), But that doesn't mean much , It doesn't limit the legal range of values , It can still store -2^31 to 2^31-1 Value , What's affected is with MySQL Interactive tools for displaying the number of characters .
1.2 Real number type
The comparison of real number types is as follows :
Real number type | The size (Byte) | Value range | Calculation accuracy |
---|---|---|---|
FLOAT | 4 | negative :-3.4E+38~-1.17E-38; Nonnegative number :0、1.17E-38~3.4E+38 | Approximate calculation |
DOUBLE | 8 | negative :-1.79E+308~-2.22E-308; Nonnegative number :0、2.22E-308~1.79E+308 | Approximate calculation |
DECIMAL | It's about precision | Same as DOUBLE | Calculate accurately |
As can be seen from the above ,FLOAT and DOUBLE There's a fixed amount of space , But at the same time because it uses standard floating point operations , So it can only be calculated approximately . and DECIMAL Then the accurate calculation can be realized , At the same time, it will take up more space , It's also more computationally expensive .
DECIMAL The amount of space occupied depends on the specified precision , for example DECIMAL(M,D):
M Is the maximum length of the whole number , The value range is [1, 65], The default value is 10;
D Is the length after the decimal point , The value range is [0, 30], And D <= M, The default value is 0.
MySQL In the storage DECIMAL Type is stored as a binary string , Every time 4 Bytes of storage 9 A digital , When insufficient 9 When a , The space occupied by the numbers is as follows :
Number of numbers | Occupancy space (Byte) |
---|---|
1、2 | 1 |
3、4 | 2 |
5、6 | 3 |
7、8 | 4 |
Before and after the decimal point will be stored separately , At the same time, the decimal point should also take up 1 Bytes . Here are two examples of calculation :
DECIMAL(18, 9): The length of the integral part is 9, Occupy 4 Bytes . The length of the decimal part is 9, Occupy 4 Bytes . And add the decimal point 1 Bytes , The total occupation of 9 Bytes .
DECIMAL(20, 9): The length of the integral part is 14, Occupy 7(4+3) Bytes . The length of the decimal part is 9, Occupy 4 Bytes . And add the decimal point 1 Bytes , The total occupation of 12 Bytes .
It can be seen that DECIMAL It takes up a lot of space , So only when it is necessary to calculate decimals accurately , To use DECIMAL. besides , We can also use BIGINT Instead of DECIMAL, For example, you need to make sure that after the decimal point 5 The calculation of bits , You can multiply the value by 10 Of 5 To the power of BIGINT Storage , In this way, we can avoid the inaccuracy of floating-point storage calculation and DECIMAL The high cost of accurate calculation .
1.3 String type
The most commonly used string type is VARCHAR and CHAR.VARCHAR As Variable length string , Will use 1 or 2 Extra bytes record the length of the string , When the maximum length does not exceed 255 when , just 1 Byte record length , exceed 255, Then 2 Bytes .VARCHAR Of Applicable scenario :
The maximum length is much larger than the average length ;
Less column updates , Avoid debris ;
Using complex character sets , Such as UTF-8, Each character can be stored in different bytes .
CHAR Then for Fixed length string , Allocate enough space according to the defined string length , Applicable scenario :
Short in length ;
Similar in length , for example MD5;
Updated frequently .
except VARCHAR and CHAR, For storing large strings , have access to BLOB and TEXT type .BLOB and TEXT The difference is that ,BLOB In order to Binary system Way to store , and TEXT In order to character Way to store . This has also led to ,BLOB Type data has no concept of character set , Can't sort by character , and TEXT Type has the concept of character set , You can sort by character . The use of both scenarios , It's also determined by the storage format , When storing binary data , For example, pictures , You should use BLOB, And when you store text , For example, articles , You should use TEXT type .
1.4 Date and time type
MySQL The minimum time granularity that can be stored in is seconds , The common date types are DATETIME and TIMESTAMP.
type | Store content | The size (Byte) | Time zone concept |
---|---|---|---|
DATETIME | The format is YYYYMMDDHHMMSS The integer of | 8 | nothing |
TIMESTAMP | from 1970 year 1 month 1 The number of seconds since zero of the day | 4 | Yes |
TIMESTAMP The values displayed will depend on the time zone , It means that the values queried in different time zones will be different . Except for the differences listed above ,TIMESTAMP It also has a special attribute , When inserting and updating , If you don't specify the first TIMESTAMP The value of the column , The value of this column will be set to the current time .
We are in the process of development , Try to use TIMESTAMP, It's mainly because the size of its space only needs DATETIME Half of , More efficient space .
If we want to store dates and times in seconds , What do I do ? because MySQL Not provided , So we can use it BIGINT Storing subtle level timestamps , Or use DOUBLE Store the fraction after seconds .
1.5 Select the identifier
Generally speaking, integers are the best choice for identifiers , Mainly because of its simplicity , Fast calculation , And can use AUTO_INCREMENT.
2. Paradigms and paradigms
Simply speaking , A normal form is the level of a design standard that the table structure of a data table conforms to . First normal form , Attributes are inseparable , current RDBMS The tables built by the system conform to the first paradigm . And the second paradigm , It is to eliminate non primary attribute pairs ( It can be understood as a primary key ) Part of it depends on . The third normal form eliminates the transitive dependence of non primary attributes on codes . Specific introduction , You can read the answer from Hu (https://www.zhihu.com/question/24696366/answer/29189700)
Strictly Normalization In the database of , Each fact will appear and only once , There will be no data redundancy , The benefits that this can bring are :
Update operations are faster ;
Modify less data ;
The watch is smaller , Better in memory , Faster execution ;
Less need for DISTINCT or GROUP BY.
But also because the data is scattered in the tables , When querying, you need to associate tables . and Anti paradigm The advantage of this is There's no need to correlate , Redundant storage of data .
in application , There will be no complete normalization or complete anti normalization , It's often necessary to Mixing paradigms and anti paradigms , Using partially normalized schema, It's often the best choice . About database design , See such a paragraph on the Internet , You can feel it .
Database design should be divided into three realms :
The first realm : Just started database design , The importance of paradigms has not been fully understood . This is the anti paradigm design , There are usually problems .
Second realm : With problems, solve problems , The real benefits of paradigms come to light , So we can design low redundancy quickly 、 Efficient databases .
Third realm : after N Years of exercise , The limitations of paradigms are bound to be found . At this point to break the paradigm , Design a more rational anti paradigm part .
Paradigms are like tricks in martial arts , Beginners have delusions that they don't follow the moves , It's embarrassing to die . After all, the trick is the essence of master's summing up. . And with the improvement of martial arts , When you're good at it , It must be to discover the limitations of tricks , Or forget the trick , Or create your own tricks .
Just try , Plus a few more years , Always get to the second level , I always think the paradigm is classic . At this point, we can not rely too much on the paradigm , People who quickly break through the limitations of paradigms , Nature is a master .
4. Cache tables and summary tables
In addition to the anti paradigm mentioned above , Storing redundant data in tables , We can also create a completely separate summary table or cache table , To meet the needs of retrieval .
Cache table , Storage can be accessed from schema Other tables to get data from , That is, logically redundant data . and The summary table , It refers to the use of storage GROUP BY And so on to aggregate data , Calculated non redundant data .
Cache table , Can be used for Optimize search and search query statements , The trick to use here is to use different storage engines for cache tables , For example, the main table uses InnoDB, The cache table can use MyISAM, Get smaller index footprint . You can even put the cache table into a dedicated search system , for example Lucene.
The summary table , For the sake of Avoid the high cost of computing statistics in real time , The price comes from two aspects , One is to scan most of the data in the table , The second is to establish a specific index , Would be right UPDATE The operation has an impact . for example , Check the past of wechat 24 Number of friends per hour , Then you can fix every 1 Hour scan the whole table , After statistics, write a record to the summary table , When querying , Just query the latest 24 Bar record , You don't have to scan the whole table for statistics every time you query .
When using cache tables and summary tables , It has to be decided that Maintain data in real time still Rebuild regularly , It depends on our needs . Regular reconstruction compared to real-time maintenance , Can save more resources , There are fewer pieces of the table . And in reconstruction , We still need to make sure that the data is available for operation , Need to pass through “ Shadow table ” To achieve . Create a shadow table after the real table , When you fill in the data , Switch between shadow table and original table by renaming atoms .
5. To speed up the ALTER TABLE Speed of operation
When MySQL In execution ALTER TABLE In operation , It's often a new table , The data is then retrieved from the old table and inserted into the new table , Delete the old table again , If the watch is big , It will take a long time , And will lead to MySQL Service interruption of . In order to avoid service interruption , You can usually use Two techniques :
On a machine that does not provide services ALTER TABLE operation , Then switch with the main database that provides the service ;
“ Shadow copy ”, Create a new table that has nothing to do with the original table , After the data migration is complete , Switch through the rename operation .
But also Not all ALTER TABLE Operations cause table rebuilding , For example, when changing the default value of a field , Use MODIFY COLUMN It will rebuild the table , While using ALTER COLUMN Table reconstruction is not performed , It's very fast . This is because ALTER COLUMN When changing the default value , The existing table will be modified directly .frm file ( Store the default value of the field ), Instead of rebuilding the table .
Reference resources
《 High performance MySQL》
MySQL DECIMAL data type
(https://my.oschina.net/u/559356/blog/3057960)
Programmer column Scan code and pay attention to customer service Press and hold to recognize the QR code below to enter the group
Recent highlights are recommended :
My girlfriend thinks the annual salary is 50 Ten thousand is the average level , What do I do ?
The sexy goddess of the biggest straight men forum in China overturned
IntelliJ IDEA Fully optimized settings , Efficiency bars !
Here's a look Good articles to share with more people ↓↓