List of articles
One 、 Database partition 、 table 、 sub-treasury 、 Fragmentation
YesOk , Hello everyone , I'm Xiao Liu , Long time no see , I miss you so much , Xiao Liu is here to take you to learn Basic knowledge of sub database and sub table
1.1 The bottleneck of single database
- The larger the amount of data in a single table , Read-write lock , The less efficient the insert operation is to rebuild the index .
- The amount of data in a single database is too large ( The amount of data in a database reaches 1T - 2T It's the limit )
- Too much pressure on a single database server
- The speed of reading and writing has encountered a bottleneck ( Hundreds of concurrency )
1.2 Partition
Database partitioning is a design technique for physical databases , It's aimed at specific SQL
Reduce the total amount of data read and write in order to reduce the response time .
Partitioning is not generating new database tables , Instead, the data in the table is evenly distributed to different hard disks , In the system or different server storage interface , It's actually a watch . in addition , Partitioning can spread the data of the table to different places , Improve the efficiency of data retrieval , Reduce database frequency IO
Pressure value , The advantages of partitioning are as follows :
- Relative to a single file system or hard disk , Partitions can store more data .
- Data management is more convenient , To clean up or discard data for a certain year , You can directly delete the partition data of the date .
- Accurate positioning partition query data , You don't need a full table scan query , Greatly improve retrieval efficiency .
- Query across multiple partition disks , To improve query throughput .
- When it comes to aggregate functions , It's easy to merge data .
1.2.1 When to consider using partitions ?
- The query speed of a table is too slow to affect the usage .
- sql optimized
- Large amount of data
- The data in the table is segmented
- The operation of data often involves only a part of data , Not all the data
1.2.2 Horizontal zoning
This form of partitioning is to partition the rows of a table , In this way, the data sets divided by physical columns in different groups can be combined , Thus, it can be divided into individual or collective . All columns defined in the table can be found in each dataset , So the properties of the table are preserved .
give an example : A table containing ten years of invoice records can be partitioned into 10 Different partitions , Each partition contains one year's records .
1.2.3 Vertical zones
Generally speaking, this partition method reduces the width of the target table by vertically dividing the table , To partition certain columns into specific partitions , Each partition contains the rows for the columns in it .
give an example : One contains big text
and blob
List of columns , these text
and blod
Column is not frequently visited , This is the time to put these not used frequently text
and blob
Divided into another partition , While ensuring their data relevance, they can also improve the access speed .
1.2.4 The way partition is implemented
mysql5
Start to support partitioning
Create table :
create table sales(
id int auto increment,
amount double not null,
order_day datetime not null,
primary key(id,order_day)
) engine=Innodb
Set up zones :
partition by range(year(order_day))(
partition p_2010 values less than (2000),
partition p_2011 values less than (2011),
partition p_2012 values less than (2012),
partition p_2012 values less than maxvalue
);
1.3 table
1.3.1 Consider when to divide the table ?
- The query speed of a table is too slow to affect the usage
sql
optimized- Large amount of data
- When inserting or combining queries frequently , Slow down
1.3.2 The problem solved by sub table
After tabulation , The concurrency of a single table is improved , On disk IO
Performance also provides , The efficiency of write operations has also been improved .
- The time for one query is short
- The data is distributed in different files , disk
I/O
Performance improvement - The amount of data affected by the read-write lock is smaller
- Less data is inserted into the database that needs to be re indexed
1.3.3 Sub table implementation mode
The business system should cooperate with the migration and upgrading , Heavy workload
Commonly used partition table rule strategy
Range
( Range )Hash
( Hash )- Split... According to time
Hash
After that, take the modulus according to the number of sub tables- Save database configuration in authentication library , It's about building a
DB
, ThisDB
Save separatelyuser_id
ToDB
The mapping relation of
1.4 sub-treasury
1.4.1 When to consider the sub database ?
- A single
DB
There is not enough storage space for - With the increase of the number of queries, a single database server has no way to support
1.4.2 The problem solved by the sub database
Its main purpose is to break through the single node database server I/O
Capacity limitations , Solve the problem of database extensibility .
1.4.3 The way of sub database implementation
Split Vertically
There is no association or need in the system join
Can be placed in different databases, different servers . Vertically split according to business . such as : It can be divided into funds by business 、 members 、 Order three databases .
Problems to be solved : Cross database transactions 、 join
Inquiry and so on .
Horizontal split
for example , Most of the sites . The data is all about the user , So it can be based on the user , Split data by user level .
Split according to the rules , Generally, the horizontal database is after the vertical branch . For example, the number of orders processed every day is huge , It can be divided into rules and levels .
Problems to be solved : Data routing 、 assemble .
Read / write separation
For data with low timeliness , The database pressure can be relieved by separating read and write .
Problems to be solved : Distinguish which services are allowed a certain time delay in business , And data synchronization .
1.5 Partition 、 table 、 Comparison of sub databases
Partitioning is to divide the data of a table into N Block , Logically, it's just a table , But the bottom is made up of N Made up of physical blocks . Sub table is to decompose a table into N Entity tables with independent storage space . When reading and writing, the system needs to get the corresponding word list name according to the defined rules , Then operate it . Once the database is divided into tables , There are more and more tables in a database
priority : Vertical sub database –> Horizontal sub database –> Read / write separation
1.6 New problems after the split
- Transaction support , Sub database and sub table , It becomes a distributed transaction
join
Time span database , Cross table issues- Sub database and sub table , Read write separation uses distributed , Distributed to ensure strong consistency , There must be delays , Resulting in reduced performance , The system is less responsible .
Solution :
There are no strict boundaries between different approaches , Characteristics of different , Different emphasis . It depends on the situation , Deal with each way with the characteristics . Choose the third-party database middleware ( Atlas
, Mycat
, TDDL
, DRDS
), At the same time, the business system needs to cooperate with the upgrade of data storage .
summary : Give priority to zoning . When partitions don't meet requirements , Start to think about sub table , Reasonable sub table is better than partition to improve efficiency .
1.7 Jingdong reviews the case
present situation
- Number of reviews for the product : Billions of them
- Daily service calls : Billions of times
- It's multiplied every year
Overall data storage : Basic data storage , The text is stored
Basic data storage
MySQL
: Only store non text basic information . Include : Comment status , user , Time and other basic data . As well as the pictures , label , Like and other additional information . Data organization form ( Different database table splitting schemes can be selected for different data ):
- Comment on basic data by user
ID
Dismantle the library and the table - Pictures and labels are in the same database , According to the commodity number, separate the tables
- Other extended information data , Because of the small amount of data 、 The number of visits is not high , It can be processed in the same database without sub table
The text is stored
The text is stored ( The content of the comment ) Used mongodb
、 hbase
- choice
nosql
Instead ofmysql。
- To reduce the
mysql
Storage pressure , Releasemsyql
, Huge storage also has reliable guarantee . nosql
The high-performance read and write performance of the system greatly improves the system throughput and reduces the latency .
1.8 Data fragmentation
In a distributed storage system , Data needs to be spread across multiple devices , Data fragmentation ( Sharding
) It's the technology used to determine the distribution of data across multiple storage devices , Data slicing has three purposes :
- Evenly distributed , That is, the amount of data on each device should be as close as possible
- Load balancing , That is, the number of requests on each device should be as close as possible
- There should be as little data migration as possible when scaling
Data slicing method
- Divide the section
- modulus
- Key list
- Consistent hash algorithm (
Consistent Hashing
) Is in 1997 Year by yearMIT
A distributed hash is proposed(DHT
) Implementation algorithm , The goal is to solve the hot spots of the Internet (Hot Spot
) problem . The algorithm of consistency hash is simple and clever , It's easy to have data evenly distributed , Its monotonicity also ensures that there is less data migration for expansion and reduction .
Virtual server
To make the system more scalable , The storage layer is proposed here VServer
( Virtual server ) The concept of , One VServer
It's a logical storage server , It's a storage unit in a distributed storage system , Multiple can be deployed on a physical device VServer
, One VServer
Support one write process and multiple read processes .
adopt VServer
The way , There are some of the benefits :
- Improve single machine performance . In order not to introduce complex locking mechanism , Using the design of single write process , If a single machine has only one write process , Write concurrency is limited , adopt
VServer
In this way, the storage resources on a single machine are ( Memory 、 Hard disk ) Divided into multiple storage units , This enables multiple write processes to work at the same time , Greatly improve the ability of single machine write concurrency . - Deployment scalability is better .
VServer
Is very flexible in deployment , It can be determined according to the resources of a single machineVServer
The number of , For different models, configure differentVServer
Number , In this way, different models can make full use of the resources on the machine , Even if multiple models are used in one system , It can also achieve the load balance of the machine .
Two 、 The transaction ACID And isolation level
- Atomicity (
Atomic
): Operations in the transaction , Do it all or don't do it all , Failure of any operation will cause the failure of the whole transaction - Uniformity (
Consistent
): After the transaction ends, the state of the system is consistent - Isolation, (
Isolated
): Concurrent transactions cannot see each other's intermediate state - persistence (
Durable
): Changes made after the transaction is completed are persisted
Problems caused by concurrency of database transactions
- Dirty reading : Business
A
Read the businessB
Uncommitted data - Repeatable : Business
A
Query to get a line of recordsrow1
, Business B After submitting the changes , BusinessA
The second query results inrow1
, But the column content has changed , Focus on the number of times - Fantasy reading : Business
A
The first query results in a row of recordsrow1
, BusinessB
After submitting the changes , BusinessA
The second query results in two rows of recordsrow1
androw2
, Focus oninsert
MySQL The database provides us with 4 Medium isolation level
- Serialization (
Serializable
): Business A Read the same row from a table many times , No other transaction is allowed on this table CRUD operation - It can't be read repeatedly (
Repeatable read
): Business A You can read the same value , Prohibit other transactions from changing fields - Read submitted (
Read committed
): Business A Only submitted data can be read - Read uncommitted :(
Read uncomitted
): Business A Can read uncommitted data
Dirty read repeatable read magic read serialization √√√ It can't be read repeatedly √√× Read submitted √×× Read uncommitted ×××
Oracle Provide 3 Kind of isolation level
Read submitted , Serialization , read only mode : Read only transactions can only see data that has been committed before the transaction is executed , And cannot be executed in the transaction insert
、 update
And delete
sentence .
3、 ... and 、MySQL Locking mechanism
3.1 Classification of locks
- From the type of data operation ( read / Write ) branch
- Read the lock ( Shared lock ): For the same data , Multiple read operations can be performed simultaneously without affecting each other .
- Write lock ( Exclusive lock ): Before the current write operation is completed , It blocks other write and read locks .
- * Granularity of data operations
To maximize the concurrency of the database , The smaller the data range is locked each time, the better , In theory, only locking the data of the current operation at a time will get the maximum concurrency , But managing locks is a resource consuming thing ( It's about getting , Check , Release the lock and so on ), Therefore, the database system needs to balance the high concurrent response and system performance , That's what happened " Lock granularity ( Lock granularity
)" Probability .
One way to improve the concurrency of shared resources is to make locking objects more selective . Try to lock only part of the data that needs to be modified , Not all the resources . The better way is , Only precisely lock the data slice that will be modified . anytime , On a given resource , The less data is locked , The higher the concurrency of the system , As long as there is no conflict between them .
- Table locks
- * Row lock
3.2 Table locks
characteristic : deviation MyISAM
Storage engine , Low overhead , Locked fast ; No deadlock ; Large locking size , The highest probability of lock collisions , Lowest degree of concurrency .
Case study 1【 Add read lock 】:
[session1]
lock table user read;
You can only query the current table , Cannot query other tables , Inserting or updating the current table will prompt an error
unlock tables;
[session2]
stay session1 After locking the table ,session2 Ability to query or update tables that are not locked , Can query the locked table , Inserting or updating a locked table will wait until the lock is released .
Case study 1【 Add write lock 】:
[session1]
lock tables user write;
Here you can query the locked table 、 to update 、 The insert
unlock tables;
[session2]
stay session1 After locking the table , Inquire about 、 to update 、 Insert operations need to wait until the lock is released .
Conclusion :
- Yes
MyISAM
Read operation of table ( Add read lock ), It will not block other processes' reading requests to the same table , But it will block write requests for the same table . As long as the read lock is released , Write to other processes . - Yes
MyISAM
Write operation of table ( Add write lock ), Will block other processes to read and write to the same table , Only when the write lock is released , Will perform read and write operations of other processes .
See which tables are locked : show open tables;
Analysis table locking : show status like 'table%';
<br>
Table_locks_immediate
: The number of times table level locks were generated , Indicates the number of queries that can be immediately obtained for locks , Every time you get the lock value immediately, add 1 ;<br> Table_locks_waited
: The number of waits for table level lock contention ( The number of times a lock cannot be acquired immediately , Every time you wait, the lock value is increased 1), A high value indicates a serious table level lock contention ;
Myisam Read / write lock scheduling is read first , This is also myisam It is not suitable to be an engine for writing main tables . Because after writing the lock , No other thread can do anything , A large number of updates will make it difficult for queries to get locks , And cause permanent obstruction
3.3 Row lock
characteristic :
- deviation
InnoDB
Storage engine , Spending big , Lock the slow ; Locking granularity minimum , The lowest probability of lock collisions , The highest degree of concurrency . InnoDB
andMyISAM
There are two big differences : One is to support affairs ; The second is the use of row level lock .
Case study 【 Add a lock 】
[session1]
set autocommit=0;
Here you can update the lock table
commit;
[session2]
stay session2 After locking the table, No commit when , Here we do the locking table update operation , Will wait for the lock to release .
No index row lock upgraded to table lock
When an index column is not used properly , If the value of the wrong type is assigned , It will change the row lock to the table lock .
Gap lock hazard
Clearance lock : When we retrieve data using range conditions rather than equality conditions , And ask to share or pat him to lock , InnoDB
It will lock the index entries of existing data records that meet the conditions ; For records whose key values are within the range of conditions but do not exist , be called " The gap ( GAP
)", InnoDB
It's also about this " The gap " To lock , This kind of lock mechanism is called gap lock ( Next-Key
).
harm : When a range key is locked , Even if some key values do not exist, they will be locked innocently , When locking, you cannot insert any data within the lock key value range , In some scenarios, this can be very detrimental to performance .
【 Interview questions 】 How to lock a row
select * from user for update;
Conclusion :
Innodb
The storage engine implements row level locking , Although the performance loss in the implementation of locking mechanism may be higher than that in table level locking , But the overall concurrent processing ability is much better than MyISAM
Table level locking of . When the system concurrency is high , InnoDB
The overall performance and MyISAM
There are obvious advantages in comparison .
however Innodb
Row level lock also has a fragile side , When we don't use it properly , May let Innodb
The overall performance of MyISAM
high , Even worse .
Analysis line locking :
clear through InnoDB_row_lock
State variables are used to analyze the contention of row locks on the system
command : mysql> show status like 'innodb_row_lock%';
<br>
Innodb_row_lock_current_waits
: The number of currently waiting locks ;<br> Innodb_row_lock_time
: The total length of time from system startup to lock up ;<br> Innodb_row_lock_time_avg
: The average time it takes to wait ;<br> Innodb_row_lock_time_max
: Time spent waiting for the most frequent time from system startup to now ;<br> Innodb_row_lock_waits
: The total number of times the system has been waiting since it was started ;
Optimization Suggestions
- As far as possible, all data retrieval should be done through index , Avoid upgrading non indexed row locks to table locks .
- Design index reasonably , Try to narrow down the range of locks
- As few search conditions as possible , Avoid gap locks
- Try to control the transaction size , Reduce the amount of locked resources and the length of time
- As low level transaction isolation as possible
3.4 Page locks
Cost and lock time are between table lock and row lock ; A deadlock occurs ; Lock granularity is between table lock and row lock , The concurrency is average .
Four 、MySQL Practical problems
4.1 Duplicate data problem
select p1.Email from person p1 where p1.Email in (select p2.Email from person p2 where p1.Id!=p2.Id);
[ optimal ]SELECT email FROM `person` group by email HAVING count(email)>1;
[ expand ] Delete duplicate data
[ Ideas ] Grouping according to duplicate data , And find the smallest id, Delete everything else id That's ok , Here we have to create a temporary table ,
stay mysql in , It can't be in one Sql In the sentence , That is to query the data , And modify the data at the same time
DELETE from person where id not in( select temp.id from (SELECT min(id) id FROM person group by email)as temp);
Be careful : Here it is mysql5.7 The above version will report an error , Because it's not supported select those group by And fields other than aggregate functions
4.2 Index creation and view
establish : create index idx_a_b on table(col_a,col_b);
see : show index from table;
4.3 where 1=1 and where 1=0 The meaning of
where 1=1
When used to splice multiple conditional statements , In this way, it doesn't matter if the condition exists , Spell it where
Or spell and
.
where1=0
Don't return data , Only return structure , It is used to build a watch quickly .
Search on wechat : Xiao Liu in the whole stack