Data type and schema optimization in MySQL

European Union 2020-11-10 13:38:59
data type schema optimization mysql

Python The actual combat community

Java The actual combat community

Long press to identify the QR code below , Add as needed

Scan code, pay attention to add customer service

Into the Python community ▲

Scan code, pay attention to add customer service

Into the Java community

The author of this book is grass kneading

From the grass pinch (ID:chaycao)

Studying recently MySQL Knowledge of optimization . In this paper, data types and schema The optimization of aspects is introduced .

1. Choose the optimized data type

MySQL There are many supported data types , And how to choose the right data type , It's critical for performance . Here are a few principle Can help determine the data type :

  1. Smaller ones are usually better

    The smallest data type that can store data correctly should be used as far as possible , Enough is good . This will take up less disk 、 Memory and cache , It takes less time to process .

  2. Simple is good

    When both data types are capable of storing a field , Choose the simple side , It's often the best choice . For example, integers and strings , Because the operation cost of integers is less than that of characters , So when you choose between the two , Choosing integers usually leads to better performance .

  3. Try to avoid NULL

    When the column can be NULL when , about MySQL Come on , More work needs to be done on indexes and value comparisons , Although the impact on performance is not great , But we should also try to avoid being designed to be NULL.

In addition to the above principles , When selecting a data type , What to follow step : First identify the appropriate large type , For example, data 、 character string 、 Time and so on ; Then choose the specific type . We will discuss some specific types under the large type , First of all, the numbers , There are two types of : Integers and real numbers .

1.1 Integer types

The type of integer and the space occupied are as follows :

Integer types The size (bit)
INT 32

The storage range of integer type depends on the size of space :-2^(N-1) to 2^(N-1)-1, among N Is the number of digits of space size .

Integer types have UNSIGNED Optional properties for , When declared , Negative numbers are not allowed , Then the storage range becomes :0 to 2^(N)-1, It's doubled .

stay MySQL in , You can also specify the width for the integer type , for example INT(1), But that doesn't mean much , It doesn't limit the legal range of values , It can still store -2^31 to 2^31-1 Value , What's affected is with MySQL Interactive tools for displaying the number of characters .

1.2 Real number type

The comparison of real number types is as follows :

Real number type The size (Byte) Value range Calculation accuracy
FLOAT 4 negative :-3.4E+38~-1.17E-38; Nonnegative number :0、1.17E-38~3.4E+38 Approximate calculation
DOUBLE 8 negative :-1.79E+308~-2.22E-308; Nonnegative number :0、2.22E-308~1.79E+308 Approximate calculation
DECIMAL It's about precision Same as DOUBLE Calculate accurately

As can be seen from the above ,FLOAT and DOUBLE There's a fixed amount of space , But at the same time because it uses standard floating point operations , So it can only be calculated approximately . and DECIMAL Then the accurate calculation can be realized , At the same time, it will take up more space , It's also more computationally expensive .

DECIMAL The amount of space occupied depends on the specified precision , for example DECIMAL(M,D):

  • M Is the maximum length of the whole number , The value range is [1, 65], The default value is 10;

  • D Is the length after the decimal point , The value range is [0, 30], And D <= M, The default value is 0.

MySQL In the storage DECIMAL Type is stored as a binary string , Every time 4 Bytes of storage 9 A digital , When insufficient 9 When a , The space occupied by the numbers is as follows :

Number of numbers Occupancy space (Byte)
1、2 1
3、4 2
5、6 3
7、8 4

Before and after the decimal point will be stored separately , At the same time, the decimal point should also take up 1 Bytes . Here are two examples of calculation :

  1. DECIMAL(18, 9): The length of the integral part is 9, Occupy 4 Bytes . The length of the decimal part is 9, Occupy 4 Bytes . And add the decimal point 1 Bytes , The total occupation of 9 Bytes .

  2. DECIMAL(20, 9): The length of the integral part is 14, Occupy 7(4+3) Bytes . The length of the decimal part is 9, Occupy 4 Bytes . And add the decimal point 1 Bytes , The total occupation of 12 Bytes .

It can be seen that DECIMAL It takes up a lot of space , So only when it is necessary to calculate decimals accurately , To use DECIMAL. besides , We can also use BIGINT Instead of DECIMAL, For example, you need to make sure that after the decimal point 5 The calculation of bits , You can multiply the value by 10 Of 5 To the power of BIGINT Storage , In this way, we can avoid the inaccuracy of floating-point storage calculation and DECIMAL The high cost of accurate calculation .

1.3 String type

The most commonly used string type is VARCHAR and CHAR.VARCHAR As Variable length string , Will use 1 or 2 Extra bytes record the length of the string , When the maximum length does not exceed 255 when , just 1 Byte record length , exceed 255, Then 2 Bytes .VARCHAR Of Applicable scenario

  1. The maximum length is much larger than the average length ;

  2. Less column updates , Avoid debris ;

  3. Using complex character sets , Such as UTF-8, Each character can be stored in different bytes .

CHAR Then for Fixed length string , Allocate enough space according to the defined string length , Applicable scenario

  1. Short in length ;

  2. Similar in length , for example MD5;

  3. Updated frequently .

except VARCHAR and CHAR, For storing large strings , have access to BLOB and TEXT type .BLOB and TEXT The difference is that ,BLOB In order to Binary system Way to store , and TEXT In order to character Way to store . This has also led to ,BLOB Type data has no concept of character set , Can't sort by character , and TEXT Type has the concept of character set , You can sort by character . The use of both scenarios , It's also determined by the storage format , When storing binary data , For example, pictures , You should use BLOB, And when you store text , For example, articles , You should use TEXT type .

1.4 Date and time type

MySQL The minimum time granularity that can be stored in is seconds , The common date types are DATETIME and TIMESTAMP.

type Store content The size (Byte) Time zone concept
DATETIME The format is YYYYMMDDHHMMSS The integer of 8 nothing
TIMESTAMP from 1970 year 1 month 1 The number of seconds since zero of the day 4 Yes

TIMESTAMP The values displayed will depend on the time zone , It means that the values queried in different time zones will be different . Except for the differences listed above ,TIMESTAMP It also has a special attribute , When inserting and updating , If you don't specify the first TIMESTAMP The value of the column , The value of this column will be set to the current time .

We are in the process of development , Try to use TIMESTAMP, It's mainly because the size of its space only needs DATETIME Half of , More efficient space .

If we want to store dates and times in seconds , What do I do ? because MySQL Not provided , So we can use it BIGINT Storing subtle level timestamps , Or use DOUBLE Store the fraction after seconds .

1.5 Select the identifier

Generally speaking, integers are the best choice for identifiers , Mainly because of its simplicity , Fast calculation , And can use AUTO_INCREMENT.

2. Paradigms and paradigms

Simply speaking , A normal form is the level of a design standard that the table structure of a data table conforms to . First normal form , Attributes are inseparable , current RDBMS The tables built by the system conform to the first paradigm . And the second paradigm , It is to eliminate non primary attribute pairs ( It can be understood as a primary key ) Part of it depends on . The third normal form eliminates the transitive dependence of non primary attributes on codes . Specific introduction , You can read the answer from Hu (

Strictly Normalization In the database of , Each fact will appear and only once , There will be no data redundancy , The benefits that this can bring are :

  1. Update operations are faster ;

  2. Modify less data ;

  3. The watch is smaller , Better in memory , Faster execution ;

  4. Less need for DISTINCT or GROUP BY.

But also because the data is scattered in the tables , When querying, you need to associate tables . and Anti paradigm The advantage of this is There's no need to correlate , Redundant storage of data .

in application , There will be no complete normalization or complete anti normalization , It's often necessary to Mixing paradigms and anti paradigms , Using partially normalized schema, It's often the best choice . About database design , See such a paragraph on the Internet , You can feel it .

Database design should be divided into three realms :

The first realm : Just started database design , The importance of paradigms has not been fully understood . This is the anti paradigm design , There are usually problems .

Second realm : With problems, solve problems , The real benefits of paradigms come to light , So we can design low redundancy quickly 、 Efficient databases .

Third realm : after N Years of exercise , The limitations of paradigms are bound to be found . At this point to break the paradigm , Design a more rational anti paradigm part .

Paradigms are like tricks in martial arts , Beginners have delusions that they don't follow the moves , It's embarrassing to die . After all, the trick is the essence of master's summing up. . And with the improvement of martial arts , When you're good at it , It must be to discover the limitations of tricks , Or forget the trick , Or create your own tricks .

Just try , Plus a few more years , Always get to the second level , I always think the paradigm is classic . At this point, we can not rely too much on the paradigm , People who quickly break through the limitations of paradigms , Nature is a master .

4. Cache tables and summary tables

In addition to the anti paradigm mentioned above , Storing redundant data in tables , We can also create a completely separate summary table or cache table , To meet the needs of retrieval .

Cache table , Storage can be accessed from schema Other tables to get data from , That is, logically redundant data . and The summary table , It refers to the use of storage GROUP BY And so on to aggregate data , Calculated non redundant data .

Cache table , Can be used for Optimize search and search query statements , The trick to use here is to use different storage engines for cache tables , For example, the main table uses InnoDB, The cache table can use MyISAM, Get smaller index footprint . You can even put the cache table into a dedicated search system , for example Lucene.

The summary table , For the sake of Avoid the high cost of computing statistics in real time , The price comes from two aspects , One is to scan most of the data in the table , The second is to establish a specific index , Would be right UPDATE The operation has an impact . for example , Check the past of wechat 24 Number of friends per hour , Then you can fix every 1 Hour scan the whole table , After statistics, write a record to the summary table , When querying , Just query the latest 24 Bar record , You don't have to scan the whole table for statistics every time you query .

When using cache tables and summary tables , It has to be decided that Maintain data in real time still Rebuild regularly , It depends on our needs . Regular reconstruction compared to real-time maintenance , Can save more resources , There are fewer pieces of the table . And in reconstruction , We still need to make sure that the data is available for operation , Need to pass through “ Shadow table ” To achieve . Create a shadow table after the real table , When you fill in the data , Switch between shadow table and original table by renaming atoms .

5. To speed up the ALTER TABLE Speed of operation

When MySQL In execution ALTER TABLE In operation , It's often a new table , The data is then retrieved from the old table and inserted into the new table , Delete the old table again , If the watch is big , It will take a long time , And will lead to MySQL Service interruption of . In order to avoid service interruption , You can usually use Two techniques

  1. On a machine that does not provide services ALTER TABLE operation , Then switch with the main database that provides the service ;

  2. “ Shadow copy ”, Create a new table that has nothing to do with the original table , After the data migration is complete , Switch through the rename operation .

But also Not all ALTER TABLE Operations cause table rebuilding , For example, when changing the default value of a field , Use MODIFY COLUMN It will rebuild the table , While using ALTER COLUMN Table reconstruction is not performed , It's very fast . This is because ALTER COLUMN When changing the default value , The existing table will be modified directly .frm file ( Store the default value of the field ), Instead of rebuilding the table .

Reference resources

  1. 《 High performance MySQL》

  2. MySQL DECIMAL  data type


 Programmer column   Scan code and pay attention to customer service   Press and hold to recognize the QR code below to enter the group

Recent highlights are recommended :  

  My girlfriend thinks the annual salary is 50 Ten thousand is the average level , What do I do ?

  The sexy goddess of the biggest straight men forum in China overturned

 IntelliJ IDEA Fully optimized settings , Efficiency bars !

  Very useful Python skill

Here's a look Good articles to share with more people ↓↓

本文为[European Union]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云