Python The actual combat community
Java The actual combat community
Long press to identify the QR code below , Add as needed
Scan code, pay attention to add customer service
Into the Python community ▲
Scan code, pay attention to add customer service
Into the Java community ▲Author Li Du
Source: Lido programming （ID：LDCldc123095）
1. what Mysql The business of ？ Four characteristics of transactions ？ What's the problem with business ？
Mysql The isolation level of transactions in is divided into four levels ： Read uncommitted （READ UNCOMMITTED）、 Read the submission （READ COMMITTED）、 Repeatable （REPEATABLE READ）、 Serialization （SERIALIZABLE）.
stay Mysql The four characteristics of transaction in mainly include ： Atomicity （Atomicity）、 Uniformity （Consistent）、 Isolation, （Isalotion）、 persistence (Durable), Referred to as
Atomicity ： The atomic operation of a transaction , All modifications to the data are successful , All or nothing , Realize the atomicity of transactions , It's log based Redo/Undo Mechanism .
Uniformity ： Refer to The state should be consistent before and after the transaction , It can be understood as data consistency .
Isolation, ： Emphasis refers to Transactions are separated from each other , Unaffected , This is closely related to the isolation level set by the transaction .
persistence ： It means After a transaction is committed , The state of the transaction is persisted to the database , That is, transaction commit , The addition of data 、 Updates will be persisted to the database .
In my understanding ： Atomicity 、 Isolation, 、 Persistence is all about consistency , Consistency is also the ultimate goal .
No level of isolation is perfect , You can only evaluate and select the most appropriate isolation level according to your own project business scenarios , Most companies generally choose Mysql Default isolation level ： Repeatable .
Isolation level from ： Read uncommitted - Read the submission - Repeatable - Serialization , The level is getting higher and higher , Isolation is becoming more and more strict , To the end of serialization , When there is a read/write lock conflict , The subsequent transaction can only continue to access after the previous transaction is completed .
Read uncommitted ： Read data that has not been committed by another transaction , So dirty reading came into being .
Read the submission ： Read data that has been committed by another transaction , So it can't be repeated .
Repeatable ： The data you see during the transaction start process is the same as the data you see at the beginning of the transaction , And then there's magic reading , stay Mysql Through MVCC The consistency view of multi version control solves the problem of non repeatable reading, and solves the problem of unreal reading through gap lock .
Serialization ： For the same row , If there is a read-write lock conflict , Subsequent transactions can only be accessed after the previous transaction has been executed .
for instance , If there is one user surface , There are two fields id and age, There's a test data in it ：（1,24）, Now it's time to implement age+1, There are two transactions executing at the same time ：
|Business 1||Business 2|
|Start transaction , Then look up age（a1）|
|Inquire about age（a2）|
|Inquire about age（a3）|
|Inquire about age（a4）|
|Inquire about age（a5）|
After the above execution , At four levels of isolation a1,a2,a3,a4,a5 What are the values of ？ Let's seriously analyze a wave of ：
Read uncommitted ：a1 and a2 Because the initial value is read, so it is 24, Isolation level is read uncommitted , Business 2 Yes age=age+1, Regardless of business 2 Submit or not , that a3、a4 and a5 The values are all 25.
Read the submission ：a1 and a2 Because the initial value is read, so it is 24, Isolation level is read commit, so a3 still 24,a4 and a5 Because business 2 It has been submitted, so the value is 25.
Repeatable ：a1 and a2 Because the initial value is read, so it is 24, Under the isolation level of repeatable reads ,a3 and a4 The value read is the same as the result of the start of the transaction , So still 24,a5 The previous step is because the transaction has already been committed , therefore a5 The value of is 25.
Serialization ：a1 and a2 Because the initial value is read, so it is 24, Serial isolation level , When a transaction 2 When modifying data , Got the write lock , Business 1 Read age The value of will be locked , So in business 1 From the angle of a3 and a4 The value read is 24,a5 The value of is 25.
When you can analyze this example , At different isolation levels, the analysis results in a1-a5 Value , It shows that you have a deep understanding of the isolation level of transactions .
2. You know more about MVCC Do you ？ How does it work ？
MVCC be called Multi version control , Realization MVCC When you use Consistency view , Used to support Read the submission and Repeatable The implementation of the .
For a row of data, if you want to achieve repeatable reading or can read the original value of another transaction before the data is committed , Then the original data must be saved or the update operation must be saved , Only in this way can we query the original value .
stay Mysql Of MVCC Specified in the There are different versions of each row of data , After a transaction update operation, a new version is generated , It's not a full backup of all the data , Because full backup costs too much ：
As shown in the picture , If three transactions update the same row of data , Then there will be a corresponding v1、v2、v3 Three data versions , Each transaction gets a unique transaction at the beginning id（
transaction id）, And it's increasing in order , And this business id Finally, it will be assigned to
row trx_id, This creates a unique one line version of the data .
The actual version 1、 edition 2 It's not physical , And in the figure U1 and U2 The actual is
undo log journal （ Rollback log ）, this v1 and v2 The version is based on the current v3 and
undo log Calculated .
InnoDB The engine takes advantage of the fact that each row of data has multiple versions , Second level creation is realized “ snapshot ”, It doesn't take a lot of time .
3.Mysql Of InnoDB and MyISAM What's the difference? ？
（1）InnoDB and MyISAM All are Mysql Storage engine for , Now? MyISAM It's also gradually being InnoDB To replace , Mainly because InnoDB Support transaction and row level locks ,MyISAM Transaction and row level locks are not supported ,MyISAM The minimum lock unit is table level . because MyISAM Row level locks are not supported , So in terms of concurrent processing capability InnoDB than MyISAM good .
（2） Data storage ：MyISAM The index of is also by B+ Trees make up , But the leaf node of the tree stores the address of the row data , When searching, you need to find the address of the leaf node , Then find the data according to the leaf node address .
InnoDB The leaf node of the primary key index directly stores row data , Look up the primary key index tree to get the data ：
If it's based on a non primary key index , The leaf node of non primary key index stores , The current index value and the corresponding primary key value , If the federated index stores the federated index value and the corresponding primary key value .
（3） Data file composition ：MyISAM There are three types of storage files with the extension ：
.frm（ File storage table definition ）、
.MYD (MYData Data files )、
.MYI (MYIndex Index file ). and InnoDB The table is only limited by the size of the operating system file , It's usually 2GB
（4） Query difference ： For business scenarios with more reading and less writing ,MyISAM It will be more suitable for , And for update and insert More scenes InnoDB It will be more suitable for .
（5）coun(*) difference ：select count(*) from table,MyISAM The engine will query the number of rows that have been saved , This is without where Under the condition of , and InnoDB You need to scan the whole table ,InnoDB The specific number of rows in the table is not saved .
（6） Other differences ：InnoDB Support foreign keys , But full text indexing is not supported , and MyISAM Foreign key not supported , Full text index support ,InnoDB The range of the primary key of the MyISAM The big .
4. Do you know the process of executing a query statement ？
When Mysql Executing a query SQl The following steps may have happened ：
The client sends the query statement to the server .
The server first verifies the user name and password and verifies the authority .
It then checks whether the query exists in the cache , If exist , Returns the result that exists in the cache . If it doesn't exist, go on to the next step . Be careful ：Mysql 8 I cut off the cache .
Then we analyze grammar and morphology , Yes SQl Parsing 、 Syntax checking and preprocessing , Then the optimizer generates the corresponding execution plan .
Mysql According to the execution plan generated by the optimizer , Call the interface of storage engine to query . The server returns the query results to the client .
Mysql The execution of statements in is all layered , Each layer performs different tasks , Until we finally get the results back , It is mainly divided into Service layer and Engine layer .
stay Service Layer contains ： The connector 、 analyzer 、 Optimizer 、 actuator . The engine layer can be compatible with different storage engines in the form of plug-ins , It mainly includes InnoDB and MyISAM Two storage engines . The specific execution flow chart is as follows ：
5.redo log and binlog Do you know ？
redo log A journal is also called
WAL technology （
Write- Ahead Logging）, He is a kind of Write the log , And update memory , Finally, update the disk technology , In order to reduce sql Database during execution io operation , And updating disks is often done in Mysql When I'm free , This greatly reduces Mysql The pressure of the .
redo log It's a fixed size , It's a physical log , Belong to InnoDB Engine , And write redo log It's a form of circular log writing ：
As shown in the figure above ： If it's four groups redo log file , One group is 1G Size , So the four groups are 4G Size , among
write pos yes Record the current position , There is data written to the current location , that write pos It will move back as you write .
check point Record Where to erase , because redo log It's a fixed size , So when redo log When it's full , That is to say
write pos Catch up
check point When , It needs to be removed
redo log Part of the data , The cleared data will be persisted to disk , And then
check point Move forward .
redo log Log realizes even when the database is down abnormally , After the restart, the previous records will not be lost , This is it.
crash-safe Ability .
binlog be called Archive log , It's a logical log , It belongs to Mysql Of Server Level log , Record sql The original logic of , There are two main patterns ： One is statement The format records the original sql, and row The format is the record line content .
redo log and binlog The form of the record 、 The content is different , Both logs can recover data from their own records .
The reason why these two logs exist at the same time , It's because in the beginning Mysql Own engine MyISAM There is no crash-safe Functional , And before that Mysql Not yet InnoDB engine ,Mysql Self contained binlog Logs are only used to archive logs , therefore InnoDB And the engine goes through itself redo log Log to achieve crash-safe function .
6. How to add fields to a hot data table online ？
First add a field to the table , It will cause the full table data to be scanned , And will add MDL Write lock , So online operation must be cautious and cautious , It is possible that the database will be destroyed before the operation is finished .
For this case, limited consideration is given to the stable operation of the line , Adding fields is next , It can be done by alter table After setting the waiting time , If you can't get the lock, you're trying , And you can select the time period on the traffic comparison to get .
If you can get the lock, that's the best , Of course, even if you get the lock, don't block the following business statements , Everything is business first .
7.Msyql The underlying implementation of the index ？ Why not an ordered array 、hash Or a binary tree to index ？
Mysql The index of is a kind of data structure that can speed up the query , The index is like the contents of a book. It can quickly locate the location you want to find .
Mysql At the bottom of the index is to use B+ Tree data structure to achieve , The structure is shown in the following figure ：
The size of a data page in the index is 16kb, Loading from disk into memory is based on the size of the data page , And then for the query operation to query , If the query data is not in memory , Will be loaded from disk to memory again .
There are many implementations of indexes , such as hash.hash In order to
key-value Storage in the form of , Scenarios suitable for equivalent queries , The time complexity of the query is O(1), because hash Storage is not orderly , So for range query, you may have to traverse all the data to query , and The calculation of different values will also appear hash Conflict , therefore hash Not suitable for Mysql The index of .
Ordered arrays perform very well in both equivalent and range queries , So why not use an ordered array as an index ？ because It's too expensive to update an array as an index , To add new data, move the following data one bit back , Therefore, it does not use ordered array as the underlying implementation of index .
The last binary tree , Mainly because A binary tree has only two branches , The amount of data stored by a node is very limited , Need frequent random IO Read write disk , If the data volume is large, the height of the binary tree is too high , Seriously affect performance , So we don't use binary tree to implement .
and B+ A tree is a multi branched tree , The size of a data page is 16kb, stay 1-3 The height of the tree can be stored 10 More than 100 million data , That is, as long as you access the disk 1-3 Time is enough , also B+ A leaf node of a tree has a pointer to the next leaf node , Easy range query ：
8. How to check whether the index is effective ？ When the index will fail ？
To see if the index works, you can use explain keyword , In the query statement key Field , If the index is used , This field displays the name of the index .
（1）where The conditional query uses or keyword , It is possible that the index is used for query, which will lead to index invalidation , If you want to use or keyword , I don't want the index to fail , Only in or Index on all columns of .
（2） In conditional query, we use like keyword , And it doesn't conform to the leftmost prefix principle , Can cause indexes to fail .
（3） The field of conditional query is string , And the wrong use of where column = 123 Numeric type It can also cause indexes to fail .
（4） about Federated index queries do not conform to the leftmost prefix principle , It can also cause indexes to fail , As shown below ：
alter table user add index union_index(name, age) // name The column on the left , age The column on the right select * from user where name = 'lidu' // You'll use the index select * from user where age = 18 // No index
（5） stay where After the conditional query, the fields are null Value judgement , Can cause indexes to fail , The solution is to put null Change it to 0 perhaps -1 These special values replace ：
SELECT id FROM table WHERE num is null
（6） stay where Used in clauses != ,< > Such symbols , It can also cause indexes to fail .
SELECT id FROM table WHERE num != 0
（7）where In the conditional clause = On the left side of the is an expression operation or a function operation , It can also cause indexes to fail .
SELECT id FROM user WHERE age / 2 = 1 SELECT id FROM user WHERE SUBSTRING(name,1,2) = 'lidu'
9. Do you know what kind of index there are ？
Index from data structure Divided into ：B+ Tree index 、hash Indexes 、R-Tree Indexes 、FULLTEXT Indexes .
Index from Physical storage The angle is divided into ： Clustered index and Non clustered index .
from From a logical point of view It is divided into ： primary key 、 General index 、 unique index 、 Joint index as well as Spatial index .
10. How do you usually do it SQL Optimization of the ？
SQL The optimization of is mainly to add indexes to the fields , There are mainly four kinds of indexes ( primary key / unique index / Full-text index / General index ), And combined with specific business scenarios, it analyzes what index is most reasonable to use .
explain It can help us not really implement a certain sql When the sentence is , Is executed mysql How to execute , So use us to analyze sql Instructions ：
id： The serial number of the query .
select_type： Query type .
table： The query table name .
type： Scanning mode ,all Indicates full table scan .
possible_keys： But to the index of .
key： Index actually used .
rows： The sql How many lines have been swept .
Extra：sql Statement extra information , For example, sorting
SQL An optimization method
（1） For conditional queries , First consider the condition where and order by The following fields are indexed .（2） Avoid index invalidation , avoid where After condition null Value judgment .（3） avoid where After use != or <> The operator .（4） To avoid the where Use the function later .（5） avoid where Use after condition or Keywords to connect .
All of the above should be noted , Of course, there are a lot of tricks , May lead to the effectiveness of the index .
Types of indexes
On the other hand, we should consider what kind of index is more appropriate , Here is an example of common index and unique index .
If our business scenario is to read more and write less , that SQL Inquiry request come here , If the data is already in memory , After getting the data, it will return directly , If the data is not in a data page in memory , It will load the disk into memory and return , For this scenario, there may be no significant difference in the performance between the normal index and the unique index .
however , It is generally recommended to select the general index , In the context of writing more and reading less , The choice of the two indexes has a greater impact on performance , For ordinary index writing , Whether the data is in memory or not , Will be written to a small piece of memory called
chang buffer In the memory , And then through the background brush the disk , I usually choose Mysql Brush the disk when you are free .
And the only index is different , Because he wants to make sure that the index is unique , When index writes data , If the data is not in memory , First load data from disk into memory , And then compare whether it's unique , So a unique index cannot be used chang buffer The optimization mechanism of , Random disks will be used frequently IO.
11. What are clustered indexes and nonclustered indexes ？
The main difference between a clustered index and a non clustered index is ： The leaf node of a cluster index is a data node , The leaf node storage of non clustered index is still the index node , It's just a pointer to the corresponding data block .
The difference between the two is to contrast InnoDB and MYISAM The data structure of . If we have a table, the raw data is as follows ：
So in MyISAM The storage structure of the data in the index is as follows ：
MyISAM Stored in leaf nodes Row number To find the corresponding row data , In other words, the leaf node stores the row pointer , It can also be found that MyISAM Data files in the engine （.MYI） And index files （.MYD） It's separate. , Indexes MyISAM After looking up the index tree , It needs to be positioned twice according to the row pointer .
And in the InnoDB The structure of the primary key index storage of is as follows ：
InnoDB The leaf node in the primary key index of is not a storage row pointer , It's about storing row data , In the secondary index MyISAM It's the same way to store ,InnoDB The leaf node of the secondary index is to store the current index value and the corresponding primary key index value .
InnoDB The benefit of the secondary index is It reduces the performance cost of maintaining secondary index due to the change of address of row data caused by data movement or data page column , because InnoDB The secondary index of does not need to update the row pointer ：
12. What is return watch ？ How does the back table come into being ？
It said InnoDB Engine The primary key index stores row data , The leaf node of the secondary index stores the index data and the corresponding primary key , So back to the table It's a conditional query based on the index , Go back to the process of searching the primary key index tree ：
Because the query has to return to the table once again , Query the primary key index tree again , So in practice, we should try our best to avoid the generation of back table .
13. How to solve the problem of returning to the table ？
To solve the problem of returning to the table, we can establish a joint index for index coverage , As shown in the figure, according to name Field to query the user's name and sex There is a table return problem with the attribute ：
Then we can establish the following joint index to solve the problem ：
create table user ( id int primary key, name varchar(20), sex varchar(5), index(name, sex) ) engine = innodb;
Established as shown above
index(name,sex) Joint index , The position of the leaf node in the secondary index will also appear sex Value of field , Because you can get all the fields to be queried , Because you don't have to go back to the table again .
14. What is the leftmost prefix principle ？
The leftmost prefix principle can be the leftmost of a federated index N A field , It can also be the leftmost string index M Characters . for instance , If you have a table now, the raw data is as follows ：
And according to
col3 ,col2 To establish the order of Joint index , The structure of the joint index tree is shown in the figure below ：
The leaf node will first be based on col3 To sort the characters of , if col3 equal , stay col3 In the same value, it is right to col2 Sort , If we want to inquire about
where col3 like 'Eri%', You can quickly locate and query Eric.
If the query condition is where col3 like '%se', The preceding character is uncertain , Any character can be , This can lead to a full table scan for character comparison , It will invalidate the index .
15. What is index push down ？
Mysql5.6 There was no index push down function before , In order to improve the performance , Avoid unnecessary returns 5.6 Then there is the function of index push down optimization .
If we have a user table , And use the user's name,age Two fields create a joint index ,name Without index push down function , Execute the following sql, The execution process is shown in the figure below ：
select * from tuser where name like ' Zhang %' and age=10 and ismale=1;
When comparing the first index field name like ' Zhang %' It's going to filter out four rows of data , It won't be compared later age Whether the value meets the requirements , Get the primary key value directly , And then query in the back table , Go back to the table and compare age、ismale Whether the conditions are met .
From the above data, it seems that name,age A joint index of two fields , The values of the two fields are stored in the federated index tree , You can directly compare age Whether the field meets the query criteria age=10, So index pushdown does these things ：
Index pushdown will again be based on your age Compare , Found that two records do not meet the conditions, directly filter out , Only those that meet the requirements will be queried back to the table , This reduces unnecessary back table queries .
16. The primary key uses auto increment ID still UUID? Can you tell me why ？
Self increasing ID and UUID There are two main aspects to consider as a primary key , One is performance The other is Storage space size , Generally, there are no specific business requirements and are not recommended UUID A primary key .
Because use UUID Inserting as a primary key does not guarantee that the insert is in order , It may involve data movement , It is also possible to trigger the splitting of data pages , Because the size of a data page is 16KB, In this way, the cost of inserting data will be higher .
and Self increasing ID As a primary key, inserting data is an append operation , There will be no movement of data and fragmentation of data pages , The performance will be better .
The other is storage space , Self increasing primary keys are generally shaped as long as 4 Bytes , Long plastic surgery accounts for 8 The size space of bytes , While using UUID As the primary key storage space needs 16 The size of bytes , Will take up more disks , A primary key index will also be stored in the secondary index , This takes up twice as much space , Low performance , So... Is not recommended .
17.Mysql How to control concurrent access to resources ？
Mysql Internal control of concurrent access to resources is realized through lock mechanism , Ensure data consistency , The type of locking mechanism depends on the type of engine ,MyISAM There are two types of table level locks supported by default in ： Shared read lock and exclusive write lock . Table level lock in MyISAM and InnoDB All storage engines support , however InnoDB Row locks are supported by default .
MyISAM Locking mechanism
Mysql You can use the following sql To show the explicit lock and unlock operations in the transaction ：
// Explicitly add table level read locks LOCK TABLE Table name READ // Show add table level write lock LOCK TABLE Table name WRITE // Explicit unlocking （ When a business commit It will also automatically unlock when it's time ） unlock tables;
（1）MyISAM Table level write lock ： When a thread acquires a table level write lock , Only this thread can read and write the table , Other threads must wait for the thread to release the lock before they can operate .
（2）MyISAM Table level shared read lock ： When a thread acquires a table level read lock , This thread can only read data and cannot modify data , Other threads can only add read locks , You can't lock it .
InnoDB Locking mechanism
InnoDB and MyISAM The difference is ,InnoDB Supports row locks and transactions ,InnoDB In addition to the concept of table lock and row level lock , also Gap Lock（ Clearance lock ）、Next-key Lock lock , Gap lock is mainly used for range query , Lock the scope of the query , And gap lock is also a solution to magic reading .
InnoDB The row level lock in is a lock on the index , When querying data without indexing ,InnoDB You'll use table locks .
But whether to use index when querying through index , Also look at Mysql Implementation plan of ,Mysql The optimizer will judge that it is a sql The best strategy to execute .
if Mysql I think the speed of index query is not as fast as that of full table scanning , that Mysql You'll use a full table scan to query , This is even if sql The index is used in the statement , Finally, a full table scan is performed , It's a watch lock .
18.Mysql How did the deadlock happen ？ How to solve the deadlock problem ？
Deadlock in InnoDB In order to appear deadlock ,MyISAM There will be no deadlock , because MyISAM Table locks are supported , Get all the locks at once , Other threads can only wait in line .
and InnoDB Row locks are supported by default , Lock acquisition is a step-by-step process , You don't get all the locks at once , Therefore, deadlock will occur during lock competition .
although InnoDB A deadlock occurs , But it doesn't affect InnoDB Become the most popular storage engine ,MyISAM It can be understood as a serialization operation , Read and write in order , Therefore, the concurrency performance supported is low .
（1） Deadlock Case 1 ：
For example , Now the database table employee Six pieces of data in , As shown below ：
among name=ldc There are two pieces of data , also name The field is a normal index , Namely id=2 and id=3 The data line , Now suppose you have two transactions that execute the following two sql sentence ：
// session1 perform update employee set num = 2 where name ='ldc'; // session2 perform select * from employee where id = 2 or id =3;
among session1 Executive sql The data row obtained is two pieces of data , Suppose you get the first id=2 The data line , then cpu Is allocated to another transaction , Another transaction performs a query and gets the second row of data, that is id=3 The data line .
When a transaction 2 When you continue to execute, you get id=3 The data line , Lock the id=3 The data line , here cpu Time is allocated to the first transaction again , The first transaction is ready to acquire a lock on the second row of data , Discovery has been acquired by other transactions , It's in a state of waiting .
When cpu Allocated time to the second transaction , The second transaction is ready to acquire the lock of the first row of data, and it is found that the lock has been acquired by the first transaction , That's the deadlock , Two transactions wait for each other .
（2） Deadlock case two
The second deadlock situation is when a transaction starts and update One id=1 When the data line of , Successfully obtained the write lock , At this point, another transaction is executed update Another one id=2 When the data line of , Also successfully obtained the write lock （id Primary key ）.
here cpu Time is allocated to transaction one , Business goes on and on update id=2 The data line , Because transaction two has acquired id=2 Data row lock , So the transaction is already waiting .
Transaction two has got the time , Like execution update id=1 The data line , But at this time id=1 The lock of is acquired by the transaction , Transaction two is also in the waiting state , So there's a deadlock .
|begin;update t set name=‘ test ’ where id=1;||begin|
|update t set name=‘ test ’ where id=2;|
|update t set name=‘ test ’ where id=2;|
|wait for …||update t set name=‘ test ’ where id=1;|
|wait for …||wait for …|
First of all, we have to solve the deadlock problem , In the design of the program , When it is found that the program has high concurrent access to a table , Try to serialize the execution of the table , Or lock upgrade , Get all lock resources at once .
Then you can also set the parameters
innodb_lock_wait_timeout, Timeout time , And the parameters
innodb_deadlock_detect open , When a deadlock is found , Automatically roll back one of the transactions .
19. Be able to say something Mysql Master slave copy of ？
Read / write separation
Realization MySQL The premise of the separation of reading and writing is that we have made MySQL Master slave replication configuration completed , Read write separation implementation mode ：（1） Configure multiple data sources .（2） Use mysql Of proxy Middleware proxy tool .
The principle of master-slave replication
MySQL There is a close relationship between master-slave replication and read-write separation , First, deploy master-slave replication , Only after master-slave replication is completed can the read-write data be separated on this basis .
The principle of separation of reading and writing
Read write separation is to write only on the primary server , Just reading from the server . The basic principle is to let the main database handle transactional queries , And processing from the server select Inquire about . Database replication is used to synchronize changes caused by transactional queries to the slave database .
20. Can you talk about the sub database and sub table ？ How to divide ？
First of all, why separate tables ？（1） If the content of each record in a table is large , So more is needed IO operation , If the field value is large , And the frequency of use is relatively low , Large fields can be moved to another table , When the query does not look up large fields , This reduces I/O operation （2） If the amount of data in the table is very, very large , Then the query becomes slower ; In other words, the amount of data in the table affects the query performance .（3） The data in the table is inherently independent , For example, record the data of each region separately or the data of different periods , In particular, some data are commonly used , Other data are not commonly used .（4） Sub table technology has ( Horizontal split and vertical split )
Vertical splitting refers to the splitting of data table columns , Split a table with more columns into multiple tables . Vertical segmentation is generally used to split large fields and fields with low access frequency , Separate hot and cold data .
Vertical segmentation is more common ： For example, the article table in the blog system , For example, articles tbl_articles (id, titile, summary, content, user_id, create_time), Because of the content of the article content It will be longer , Put it in tbl_articles Will seriously affect the query speed of the table , So put the content in tbl_articles_detail(article_id, content), Like the list of articles, you just need to query tbl_articles The field in the .
Advantages of vertical splitting ： Can make row data smaller , Reduce the number of reads in the query Block Count , Reduce I/O frequency . Besides , Vertical partitioning simplifies the structure of tables , Easy to maintain .
Disadvantages of vertical splitting ： There will be redundancy in the primary key , Need to manage redundant columns , And will cause Join operation , It can be done in the application layer Join To solve . Besides , Vertical partitioning makes transactions more complex .
Horizontal splitting refers to the splitting of data table row data , The number of rows in the table exceeds 500 The capacity of ten thousand rows or single table exceeds 10GB when , Queries will slow down , At this time, you can split the data of one table into multiple tables to store . Horizontal sub table as much as possible to make the data volume of each table equal , It's more even .
Horizontal splitting adds complexity to the application , It usually requires multiple table names when querying , To query all data, you need union operation . In many database applications , This complexity will outweigh the benefits it brings .
Because as long as the index keyword is not big , When the index is used for queries , Add... To the list 2-3 Times the amount of data , Query also increases the number of times to read an index layer disk , Therefore, the growth rate of data volume should be considered in horizontal splitting , Decide whether to split the table horizontally according to the actual situation .
The most important thing about horizontal segmentation is to find the standard of segmentation , Different tables should find different standards according to the business
The user table can be divided according to the user's mobile phone number segment, such as user183、user150、user153、user189 etc. , Each segment is a table .
The user table can also be based on the user's id Segmentation , Add points 3 A watch user0,user1,user2, If the user's id%3=0 Just search user0 surface , If the user's id%3=1 Just search user1 surface .
The order table can be divided according to the time of the order .
Sub database and sub table technology
Now the main sub database and sub table technology on the market is mycat and sharding-jdbc, We will give a detailed explanation of the specific sub database and sub table technology in the next time .
《Mysql45 speak 》
《 Database principle 》
Programmer column Scan code and pay attention to customer service Press and hold to recognize the QR code below to enter the group
Recent highlights are recommended ：
Here's a look Good articles to share with more people ↓↓