Detailed introduction of MySQL advanced theoretical knowledge

Xiaoliu 2020-11-06 21:30:02
detailed introduction mysql advanced theoretical


List of articles

One 、 Database partition 、 table 、 sub-treasury 、 Fragmentation

YesOk , Hello everyone , I'm Xiao Liu , Long time no see , I miss you so much , Xiao Liu is here to take you to learn Basic knowledge of sub database and sub table

 Generate results

1.1 The bottleneck of single database

 Generate results

  • The larger the amount of data in a single table , Read-write lock , The less efficient the insert operation is to rebuild the index .
  • The amount of data in a single database is too large ( The amount of data in a database reaches 1T - 2T It's the limit )
  • Too much pressure on a single database server
  • The speed of reading and writing has encountered a bottleneck ( Hundreds of concurrency )

1.2 Partition

Database partitioning is a design technique for physical databases , It's aimed at specific SQL Reduce the total amount of data read and write in order to reduce the response time .

Partitioning is not generating new database tables , Instead, the data in the table is evenly distributed to different hard disks , In the system or different server storage interface , It's actually a watch . in addition , Partitioning can spread the data of the table to different places , Improve the efficiency of data retrieval , Reduce database frequency IO Pressure value , The advantages of partitioning are as follows :

  1. Relative to a single file system or hard disk , Partitions can store more data .
  2. Data management is more convenient , To clean up or discard data for a certain year , You can directly delete the partition data of the date .
  3. Accurate positioning partition query data , You don't need a full table scan query , Greatly improve retrieval efficiency .
  4. Query across multiple partition disks , To improve query throughput .
  5. When it comes to aggregate functions , It's easy to merge data .

1.2.1 When to consider using partitions ?

  • The query speed of a table is too slow to affect the usage .
  • sql optimized
  • Large amount of data
  • The data in the table is segmented
  • The operation of data often involves only a part of data , Not all the data

1.2.2 Horizontal zoning

This form of partitioning is to partition the rows of a table , In this way, the data sets divided by physical columns in different groups can be combined , Thus, it can be divided into individual or collective . All columns defined in the table can be found in each dataset , So the properties of the table are preserved .

give an example : A table containing ten years of invoice records can be partitioned into 10 Different partitions , Each partition contains one year's records .

1.2.3 Vertical zones

Generally speaking, this partition method reduces the width of the target table by vertically dividing the table , To partition certain columns into specific partitions , Each partition contains the rows for the columns in it .

give an example : One contains big text and blob List of columns , these text and blod Column is not frequently visited , This is the time to put these not used frequently text and blob Divided into another partition , While ensuring their data relevance, they can also improve the access speed .

1.2.4 The way partition is implemented

mysql5 Start to support partitioning

Create table

create table sales(
id int auto increment,
amount double not null,
order_day datetime not null,
primary key(id,order_day)
) engine=Innodb

Set up zones

partition by range(year(order_day))(
partition p_2010 values less than (2000),
partition p_2011 values less than (2011),
partition p_2012 values less than (2012),
partition p_2012 values less than maxvalue
);

1.3 table

1.3.1 Consider when to divide the table ?

 Generate results

  • The query speed of a table is too slow to affect the usage
  • sql optimized
  • Large amount of data
  • When inserting or combining queries frequently , Slow down

1.3.2 The problem solved by sub table

After tabulation , The concurrency of a single table is improved , On disk IO Performance also provides , The efficiency of write operations has also been improved .

  • The time for one query is short
  • The data is distributed in different files , disk I/O Performance improvement
  • The amount of data affected by the read-write lock is smaller
  • Less data is inserted into the database that needs to be re indexed

1.3.3 Sub table implementation mode

The business system should cooperate with the migration and upgrading , Heavy workload

Commonly used partition table rule strategy

  • Range( Range )
  • Hash( Hash )
  • Split... According to time
  • Hash After that, take the modulus according to the number of sub tables
  • Save database configuration in authentication library , It's about building a DB, This DB Save separately user_id To DB The mapping relation of

1.4 sub-treasury

1.4.1 When to consider the sub database ?

  • A single DB There is not enough storage space for
  • With the increase of the number of queries, a single database server has no way to support

1.4.2 The problem solved by the sub database

Its main purpose is to break through the single node database server I/O Capacity limitations , Solve the problem of database extensibility .

1.4.3 The way of sub database implementation

Split Vertically

There is no association or need in the system join Can be placed in different databases, different servers . Vertically split according to business . such as : It can be divided into funds by business 、 members 、 Order three databases .

Problems to be solved : Cross database transactions 、 join Inquiry and so on .

Horizontal split

for example , Most of the sites . The data is all about the user , So it can be based on the user , Split data by user level .

Split according to the rules , Generally, the horizontal database is after the vertical branch . For example, the number of orders processed every day is huge , It can be divided into rules and levels .

Problems to be solved : Data routing 、 assemble .

Read / write separation

For data with low timeliness , The database pressure can be relieved by separating read and write .

Problems to be solved : Distinguish which services are allowed a certain time delay in business , And data synchronization .

1.5 Partition 、 table 、 Comparison of sub databases

Partitioning is to divide the data of a table into N Block , Logically, it's just a table , But the bottom is made up of N Made up of physical blocks . Sub table is to decompose a table into N Entity tables with independent storage space . When reading and writing, the system needs to get the corresponding word list name according to the defined rules , Then operate it . Once the database is divided into tables , There are more and more tables in a database

priority : Vertical sub database –> Horizontal sub database –> Read / write separation

1.6 New problems after the split

  • Transaction support , Sub database and sub table , It becomes a distributed transaction
  • join Time span database , Cross table issues
  • Sub database and sub table , Read write separation uses distributed , Distributed to ensure strong consistency , There must be delays , Resulting in reduced performance , The system is less responsible .

Solution :

There are no strict boundaries between different approaches , Characteristics of different , Different emphasis . It depends on the situation , Deal with each way with the characteristics . Choose the third-party database middleware ( Atlas, Mycat, TDDL, DRDS), At the same time, the business system needs to cooperate with the upgrade of data storage .

summary : Give priority to zoning . When partitions don't meet requirements , Start to think about sub table , Reasonable sub table is better than partition to improve efficiency .

1.7 Jingdong reviews the case

present situation

  • Number of reviews for the product : Billions of them
  • Daily service calls : Billions of times
  • It's multiplied every year

Overall data storage : Basic data storage , The text is stored

Basic data storage

MySQL: Only store non text basic information . Include : Comment status , user , Time and other basic data . As well as the pictures , label , Like and other additional information . Data organization form ( Different database table splitting schemes can be selected for different data ):

  • Comment on basic data by user ID Dismantle the library and the table
  • Pictures and labels are in the same database , According to the commodity number, separate the tables
  • Other extended information data , Because of the small amount of data 、 The number of visits is not high , It can be processed in the same database without sub table

The text is stored

The text is stored ( The content of the comment ) Used mongodbhbase

  • choice nosql Instead of mysql。
  • To reduce the mysql Storage pressure , Release msyql, Huge storage also has reliable guarantee .
  • nosql The high-performance read and write performance of the system greatly improves the system throughput and reduces the latency .

1.8 Data fragmentation

In a distributed storage system , Data needs to be spread across multiple devices , Data fragmentation ( Sharding) It's the technology used to determine the distribution of data across multiple storage devices , Data slicing has three purposes :

  1. Evenly distributed , That is, the amount of data on each device should be as close as possible
  2. Load balancing , That is, the number of requests on each device should be as close as possible
  3. There should be as little data migration as possible when scaling

Data slicing method

  • Divide the section
  • modulus
  • Key list
  • Consistent hash algorithm ( Consistent Hashing) Is in 1997 Year by year MIT A distributed hash is proposed (DHT) Implementation algorithm , The goal is to solve the hot spots of the Internet (Hot Spot) problem . The algorithm of consistency hash is simple and clever , It's easy to have data evenly distributed , Its monotonicity also ensures that there is less data migration for expansion and reduction .

Virtual server

To make the system more scalable , The storage layer is proposed here VServer( Virtual server ) The concept of , One VServer It's a logical storage server , It's a storage unit in a distributed storage system , Multiple can be deployed on a physical device VServer, One VServer Support one write process and multiple read processes .

adopt VServer The way , There are some of the benefits :

  1. Improve single machine performance . In order not to introduce complex locking mechanism , Using the design of single write process , If a single machine has only one write process , Write concurrency is limited , adopt VServer In this way, the storage resources on a single machine are ( Memory 、 Hard disk ) Divided into multiple storage units , This enables multiple write processes to work at the same time , Greatly improve the ability of single machine write concurrency .
  2. Deployment scalability is better . VServer Is very flexible in deployment , It can be determined according to the resources of a single machine VServer The number of , For different models, configure different VServer Number , In this way, different models can make full use of the resources on the machine , Even if multiple models are used in one system , It can also achieve the load balance of the machine .

Two 、 The transaction ACID And isolation level

  • Atomicity (Atomic): Operations in the transaction , Do it all or don't do it all , Failure of any operation will cause the failure of the whole transaction
  • Uniformity (Consistent): After the transaction ends, the state of the system is consistent
  • Isolation, (Isolated): Concurrent transactions cannot see each other's intermediate state
  • persistence (Durable): Changes made after the transaction is completed are persisted

Problems caused by concurrency of database transactions

  • Dirty reading : Business A Read the business B Uncommitted data
  • Repeatable : Business A Query to get a line of records row1, Business B After submitting the changes , Business A The second query results in row1, But the column content has changed , Focus on the number of times
  • Fantasy reading : Business A The first query results in a row of records row1, Business B After submitting the changes , Business A The second query results in two rows of records row1 and row2, Focus on insert

MySQL The database provides us with 4 Medium isolation level

  • Serialization (Serializable): Business A Read the same row from a table many times , No other transaction is allowed on this table CRUD operation
  • It can't be read repeatedly (Repeatable read): Business A You can read the same value , Prohibit other transactions from changing fields
  • Read submitted (Read committed): Business A Only submitted data can be read
  • Read uncommitted :(Read uncomitted): Business A Can read uncommitted data

Dirty read repeatable read magic read serialization √√√ It can't be read repeatedly √√× Read submitted √×× Read uncommitted ×××

Oracle Provide 3 Kind of isolation level

Read submitted , Serialization , read only mode : Read only transactions can only see data that has been committed before the transaction is executed , And cannot be executed in the transaction insertupdate And delete sentence .

3、 ... and 、MySQL Locking mechanism

3.1 Classification of locks

  • From the type of data operation ( read / Write ) branch
    • Read the lock ( Shared lock ): For the same data , Multiple read operations can be performed simultaneously without affecting each other .
    • Write lock ( Exclusive lock ): Before the current write operation is completed , It blocks other write and read locks .
  • * Granularity of data operations

To maximize the concurrency of the database , The smaller the data range is locked each time, the better , In theory, only locking the data of the current operation at a time will get the maximum concurrency , But managing locks is a resource consuming thing ( It's about getting , Check , Release the lock and so on ), Therefore, the database system needs to balance the high concurrent response and system performance , That's what happened " Lock granularity ( Lock granularity)" Probability .

One way to improve the concurrency of shared resources is to make locking objects more selective . Try to lock only part of the data that needs to be modified , Not all the resources . The better way is , Only precisely lock the data slice that will be modified . anytime , On a given resource , The less data is locked , The higher the concurrency of the system , As long as there is no conflict between them .

  • Table locks
  • * Row lock

3.2 Table locks

characteristic : deviation MyISAM Storage engine , Low overhead , Locked fast ; No deadlock ; Large locking size , The highest probability of lock collisions , Lowest degree of concurrency .

Case study 1【 Add read lock 】:

[session1]
lock table user read;
You can only query the current table , Cannot query other tables , Inserting or updating the current table will prompt an error
unlock tables;
[session2]
stay session1 After locking the table ,session2 Ability to query or update tables that are not locked , Can query the locked table , Inserting or updating a locked table will wait until the lock is released .

Case study 1【 Add write lock 】:

[session1]
lock tables user write;
Here you can query the locked table 、 to update 、 The insert
unlock tables;
[session2]
stay session1 After locking the table , Inquire about 、 to update 、 Insert operations need to wait until the lock is released .

Conclusion :

  1. Yes MyISAM Read operation of table ( Add read lock ), It will not block other processes' reading requests to the same table , But it will block write requests for the same table . As long as the read lock is released , Write to other processes .
  2. Yes MyISAM Write operation of table ( Add write lock ), Will block other processes to read and write to the same table , Only when the write lock is released , Will perform read and write operations of other processes .

See which tables are locked show open tables;

Analysis table locking : show status like 'table%';

<br> Table_locks_immediate: The number of times table level locks were generated , Indicates the number of queries that can be immediately obtained for locks , Every time you get the lock value immediately, add 1 ;<br> Table_locks_waited: The number of waits for table level lock contention ( The number of times a lock cannot be acquired immediately , Every time you wait, the lock value is increased 1), A high value indicates a serious table level lock contention ;

Myisam Read / write lock scheduling is read first , This is also myisam It is not suitable to be an engine for writing main tables . Because after writing the lock , No other thread can do anything , A large number of updates will make it difficult for queries to get locks , And cause permanent obstruction

3.3 Row lock

characteristic :

  1. deviation InnoDB Storage engine , Spending big , Lock the slow ; Locking granularity minimum , The lowest probability of lock collisions , The highest degree of concurrency .
  2. InnoDB and MyISAM There are two big differences : One is to support affairs ; The second is the use of row level lock .

Case study 【 Add a lock 】

[session1]
set autocommit=0;
Here you can update the lock table
commit;
[session2]
stay session2 After locking the table, No commit when , Here we do the locking table update operation , Will wait for the lock to release .

No index row lock upgraded to table lock

When an index column is not used properly , If the value of the wrong type is assigned , It will change the row lock to the table lock .

Gap lock hazard

Clearance lock : When we retrieve data using range conditions rather than equality conditions , And ask to share or pat him to lock , InnoDB It will lock the index entries of existing data records that meet the conditions ; For records whose key values are within the range of conditions but do not exist , be called " The gap ( GAP)", InnoDB It's also about this " The gap " To lock , This kind of lock mechanism is called gap lock ( Next-Key).

harm : When a range key is locked , Even if some key values do not exist, they will be locked innocently , When locking, you cannot insert any data within the lock key value range , In some scenarios, this can be very detrimental to performance .

【 Interview questions 】 How to lock a row

select * from user for update;

Conclusion :

Innodb The storage engine implements row level locking , Although the performance loss in the implementation of locking mechanism may be higher than that in table level locking , But the overall concurrent processing ability is much better than MyISAM Table level locking of . When the system concurrency is high , InnoDB The overall performance and MyISAM There are obvious advantages in comparison .

however Innodb Row level lock also has a fragile side , When we don't use it properly , May let Innodb The overall performance of MyISAM high , Even worse .

Analysis line locking :

clear through InnoDB_row_lock State variables are used to analyze the contention of row locks on the system

command : mysql> show status like 'innodb_row_lock%'; <br> Innodb_row_lock_current_waits: The number of currently waiting locks ;<br> Innodb_row_lock_time: The total length of time from system startup to lock up ;<br> Innodb_row_lock_time_avg: The average time it takes to wait ;<br> Innodb_row_lock_time_max: Time spent waiting for the most frequent time from system startup to now ;<br> Innodb_row_lock_waits: The total number of times the system has been waiting since it was started ;

Optimization Suggestions

  • As far as possible, all data retrieval should be done through index , Avoid upgrading non indexed row locks to table locks .
  • Design index reasonably , Try to narrow down the range of locks
  • As few search conditions as possible , Avoid gap locks
  • Try to control the transaction size , Reduce the amount of locked resources and the length of time
  • As low level transaction isolation as possible

3.4 Page locks

Cost and lock time are between table lock and row lock ; A deadlock occurs ; Lock granularity is between table lock and row lock , The concurrency is average .

Four 、MySQL Practical problems

4.1 Duplicate data problem


select p1.Email from person p1 where p1.Email in (select p2.Email from person p2 where p1.Id!=p2.Id);
[ optimal ]SELECT email FROM `person` group by email HAVING count(email)>1;
[ expand ] Delete duplicate data
[ Ideas ] Grouping according to duplicate data , And find the smallest id, Delete everything else id That's ok , Here we have to create a temporary table ,
stay mysql in , It can't be in one Sql In the sentence , That is to query the data , And modify the data at the same time
DELETE from person where id not in( select temp.id from (SELECT min(id) id FROM person group by email)as temp);
Be careful : Here it is mysql5.7 The above version will report an error , Because it's not supported select those group by And fields other than aggregate functions

4.2 Index creation and view

establish : create index idx_a_b on table(col_a,col_b);

see : show index from table;

4.3 where 1=1 and where 1=0 The meaning of

where 1=1 When used to splice multiple conditional statements , In this way, it doesn't matter if the condition exists , Spell it where Or spell and.

where1=0 Don't return data , Only return structure , It is used to build a watch quickly .

Search on wechat : Xiao Liu in the whole stack

版权声明
本文为[Xiaoliu]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云