List of articles
One 、 Database partition 、 table 、 sub-treasury 、 Fragmentation
YesOk , Hello everyone , I'm Xiao Liu , Long time no see , I miss you so much , Xiao Liu is here to take you to learn Basic knowledge of sub database and sub table
1.1 The bottleneck of single database
- The larger the amount of data in a single table , Read-write lock , The less efficient the insert operation is to rebuild the index .
- The amount of data in a single database is too large （ The amount of data in a database reaches 1T - 2T It's the limit ）
- Too much pressure on a single database server
- The speed of reading and writing has encountered a bottleneck （ Hundreds of concurrency ）
Database partitioning is a design technique for physical databases , It's aimed at specific
SQL Reduce the total amount of data read and write in order to reduce the response time .
Partitioning is not generating new database tables , Instead, the data in the table is evenly distributed to different hard disks , In the system or different server storage interface , It's actually a watch . in addition , Partitioning can spread the data of the table to different places , Improve the efficiency of data retrieval , Reduce database frequency
IO Pressure value , The advantages of partitioning are as follows ：
- Relative to a single file system or hard disk , Partitions can store more data .
- Data management is more convenient , To clean up or discard data for a certain year , You can directly delete the partition data of the date .
- Accurate positioning partition query data , You don't need a full table scan query , Greatly improve retrieval efficiency .
- Query across multiple partition disks , To improve query throughput .
- When it comes to aggregate functions , It's easy to merge data .
1.2.1 When to consider using partitions ？
- The query speed of a table is too slow to affect the usage .
- sql optimized
- Large amount of data
- The data in the table is segmented
- The operation of data often involves only a part of data , Not all the data
1.2.2 Horizontal zoning
This form of partitioning is to partition the rows of a table , In this way, the data sets divided by physical columns in different groups can be combined , Thus, it can be divided into individual or collective . All columns defined in the table can be found in each dataset , So the properties of the table are preserved .
give an example ： A table containing ten years of invoice records can be partitioned into 10 Different partitions , Each partition contains one year's records .
1.2.3 Vertical zones
Generally speaking, this partition method reduces the width of the target table by vertically dividing the table , To partition certain columns into specific partitions , Each partition contains the rows for the columns in it .
give an example ： One contains big
blob List of columns , these
blod Column is not frequently visited , This is the time to put these not used frequently
blob Divided into another partition , While ensuring their data relevance, they can also improve the access speed .
1.2.4 The way partition is implemented
mysql5 Start to support partitioning
Create table ：
create table sales( id int auto increment, amount double not null, order_day datetime not null, primary key(id,order_day) ) engine=Innodb
Set up zones ：
partition by range(year(order_day))( partition p_2010 values less than (2000), partition p_2011 values less than (2011), partition p_2012 values less than (2012), partition p_2012 values less than maxvalue );
1.3.1 Consider when to divide the table ？
- The query speed of a table is too slow to affect the usage
- Large amount of data
- When inserting or combining queries frequently , Slow down
1.3.2 The problem solved by sub table
After tabulation , The concurrency of a single table is improved , On disk
IO Performance also provides , The efficiency of write operations has also been improved .
- The time for one query is short
- The data is distributed in different files , disk
- The amount of data affected by the read-write lock is smaller
- Less data is inserted into the database that needs to be re indexed
1.3.3 Sub table implementation mode
The business system should cooperate with the migration and upgrading , Heavy workload
Commonly used partition table rule strategy
Range（ Range ）
Hash（ Hash ）
- Split... According to time
HashAfter that, take the modulus according to the number of sub tables
- Save database configuration in authentication library , It's about building a
DBThe mapping relation of
1.4.1 When to consider the sub database ？
- A single
DBThere is not enough storage space for
- With the increase of the number of queries, a single database server has no way to support
1.4.2 The problem solved by the sub database
Its main purpose is to break through the single node database server
I/O Capacity limitations , Solve the problem of database extensibility .
1.4.3 The way of sub database implementation
There is no association or need in the system
join Can be placed in different databases, different servers . Vertically split according to business . such as ： It can be divided into funds by business 、 members 、 Order three databases .
Problems to be solved ： Cross database transactions 、
join Inquiry and so on .
for example , Most of the sites . The data is all about the user , So it can be based on the user , Split data by user level .
Split according to the rules , Generally, the horizontal database is after the vertical branch . For example, the number of orders processed every day is huge , It can be divided into rules and levels .
Problems to be solved ： Data routing 、 assemble .
Read / write separation
For data with low timeliness , The database pressure can be relieved by separating read and write .
Problems to be solved ： Distinguish which services are allowed a certain time delay in business , And data synchronization .
1.5 Partition 、 table 、 Comparison of sub databases
Partitioning is to divide the data of a table into N Block , Logically, it's just a table , But the bottom is made up of N Made up of physical blocks . Sub table is to decompose a table into N Entity tables with independent storage space . When reading and writing, the system needs to get the corresponding word list name according to the defined rules , Then operate it . Once the database is divided into tables , There are more and more tables in a database
priority ： Vertical sub database –> Horizontal sub database –> Read / write separation
1.6 New problems after the split
- Transaction support , Sub database and sub table , It becomes a distributed transaction
joinTime span database , Cross table issues
- Sub database and sub table , Read write separation uses distributed , Distributed to ensure strong consistency , There must be delays , Resulting in reduced performance , The system is less responsible .
There are no strict boundaries between different approaches , Characteristics of different , Different emphasis . It depends on the situation , Deal with each way with the characteristics . Choose the third-party database middleware （
DRDS）, At the same time, the business system needs to cooperate with the upgrade of data storage .
summary ： Give priority to zoning . When partitions don't meet requirements , Start to think about sub table , Reasonable sub table is better than partition to improve efficiency .
1.7 Jingdong reviews the case
- Number of reviews for the product ： Billions of them
- Daily service calls ： Billions of times
- It's multiplied every year
Overall data storage ： Basic data storage , The text is stored
Basic data storage
MySQL： Only store non text basic information . Include ： Comment status , user , Time and other basic data . As well as the pictures , label , Like and other additional information . Data organization form （ Different database table splitting schemes can be selected for different data ）：
- Comment on basic data by user
IDDismantle the library and the table
- Pictures and labels are in the same database , According to the commodity number, separate the tables
- Other extended information data , Because of the small amount of data 、 The number of visits is not high , It can be processed in the same database without sub table
The text is stored
The text is stored （ The content of the comment ） Used
- To reduce the
mysqlStorage pressure , Release
msyql, Huge storage also has reliable guarantee .
nosqlThe high-performance read and write performance of the system greatly improves the system throughput and reduces the latency .
1.8 Data fragmentation
In a distributed storage system , Data needs to be spread across multiple devices , Data fragmentation （
Sharding） It's the technology used to determine the distribution of data across multiple storage devices , Data slicing has three purposes ：
- Evenly distributed , That is, the amount of data on each device should be as close as possible
- Load balancing , That is, the number of requests on each device should be as close as possible
- There should be as little data migration as possible when scaling
Data slicing method
- Divide the section
- Key list
- Consistent hash algorithm （
Consistent Hashing） Is in 1997 Year by year
MITA distributed hash is proposed
(DHT) Implementation algorithm , The goal is to solve the hot spots of the Internet (
Hot Spot) problem . The algorithm of consistency hash is simple and clever , It's easy to have data evenly distributed , Its monotonicity also ensures that there is less data migration for expansion and reduction .
To make the system more scalable , The storage layer is proposed here
VServer（ Virtual server ） The concept of , One
VServer It's a logical storage server , It's a storage unit in a distributed storage system , Multiple can be deployed on a physical device
VServer Support one write process and multiple read processes .
VServer The way , There are some of the benefits ：
- Improve single machine performance . In order not to introduce complex locking mechanism , Using the design of single write process , If a single machine has only one write process , Write concurrency is limited , adopt
VServerIn this way, the storage resources on a single machine are （ Memory 、 Hard disk ） Divided into multiple storage units , This enables multiple write processes to work at the same time , Greatly improve the ability of single machine write concurrency .
- Deployment scalability is better .
VServerIs very flexible in deployment , It can be determined according to the resources of a single machine
VServerThe number of , For different models, configure different
VServerNumber , In this way, different models can make full use of the resources on the machine , Even if multiple models are used in one system , It can also achieve the load balance of the machine .
Two 、 The transaction ACID And isolation level
- Atomicity (
Atomic)： Operations in the transaction , Do it all or don't do it all , Failure of any operation will cause the failure of the whole transaction
- Uniformity (
Consistent)： After the transaction ends, the state of the system is consistent
- Isolation, (
Isolated): Concurrent transactions cannot see each other's intermediate state
- persistence (
Durable)： Changes made after the transaction is completed are persisted
Problems caused by concurrency of database transactions
- Dirty reading ： Business
ARead the business
- Repeatable ： Business
AQuery to get a line of records
row1, Business B After submitting the changes , Business
AThe second query results in
row1, But the column content has changed , Focus on the number of times
- Fantasy reading ： Business
AThe first query results in a row of records
BAfter submitting the changes , Business
AThe second query results in two rows of records
row2, Focus on
MySQL The database provides us with 4 Medium isolation level
- Serialization (
Serializable)： Business A Read the same row from a table many times , No other transaction is allowed on this table CRUD operation
- It can't be read repeatedly (
Repeatable read)： Business A You can read the same value , Prohibit other transactions from changing fields
- Read submitted (
Read committed)： Business A Only submitted data can be read
- Read uncommitted ：(
Read uncomitted): Business A Can read uncommitted data
Dirty read repeatable read magic read serialization √√√ It can't be read repeatedly √√× Read submitted √×× Read uncommitted ×××
Oracle Provide 3 Kind of isolation level
Read submitted , Serialization , read only mode ： Read only transactions can only see data that has been committed before the transaction is executed , And cannot be executed in the transaction
delete sentence .
3、 ... and 、MySQL Locking mechanism
3.1 Classification of locks
- From the type of data operation （ read / Write ） branch
- Read the lock （ Shared lock ）： For the same data , Multiple read operations can be performed simultaneously without affecting each other .
- Write lock （ Exclusive lock ）： Before the current write operation is completed , It blocks other write and read locks .
- * Granularity of data operations
To maximize the concurrency of the database , The smaller the data range is locked each time, the better , In theory, only locking the data of the current operation at a time will get the maximum concurrency , But managing locks is a resource consuming thing （ It's about getting , Check , Release the lock and so on ）, Therefore, the database system needs to balance the high concurrent response and system performance , That's what happened " Lock granularity （
Lock granularity）" Probability .
One way to improve the concurrency of shared resources is to make locking objects more selective . Try to lock only part of the data that needs to be modified , Not all the resources . The better way is , Only precisely lock the data slice that will be modified . anytime , On a given resource , The less data is locked , The higher the concurrency of the system , As long as there is no conflict between them .
- Table locks
- * Row lock
3.2 Table locks
characteristic ： deviation
MyISAM Storage engine , Low overhead , Locked fast ; No deadlock ; Large locking size , The highest probability of lock collisions , Lowest degree of concurrency .
Case study 1【 Add read lock 】:
[session1] lock table user read; You can only query the current table , Cannot query other tables , Inserting or updating the current table will prompt an error unlock tables; [session2] stay session1 After locking the table ,session2 Ability to query or update tables that are not locked , Can query the locked table , Inserting or updating a locked table will wait until the lock is released .
Case study 1【 Add write lock 】:
[session1] lock tables user write; Here you can query the locked table 、 to update 、 The insert unlock tables; [session2] stay session1 After locking the table , Inquire about 、 to update 、 Insert operations need to wait until the lock is released .
MyISAMRead operation of table （ Add read lock ）, It will not block other processes' reading requests to the same table , But it will block write requests for the same table . As long as the read lock is released , Write to other processes .
MyISAMWrite operation of table （ Add write lock ）, Will block other processes to read and write to the same table , Only when the write lock is released , Will perform read and write operations of other processes .
See which tables are locked ：
show open tables;
Analysis table locking :
show status like 'table%';
Table_locks_immediate： The number of times table level locks were generated , Indicates the number of queries that can be immediately obtained for locks , Every time you get the lock value immediately, add 1 ;<br>
Table_locks_waited： The number of waits for table level lock contention ( The number of times a lock cannot be acquired immediately , Every time you wait, the lock value is increased 1), A high value indicates a serious table level lock contention ;
Myisam Read / write lock scheduling is read first , This is also myisam It is not suitable to be an engine for writing main tables . Because after writing the lock , No other thread can do anything , A large number of updates will make it difficult for queries to get locks , And cause permanent obstruction
3.3 Row lock
InnoDBStorage engine , Spending big , Lock the slow ; Locking granularity minimum , The lowest probability of lock collisions , The highest degree of concurrency .
MyISAMThere are two big differences ： One is to support affairs ; The second is the use of row level lock .
Case study 【 Add a lock 】
[session1] set autocommit=0; Here you can update the lock table commit; [session2] stay session2 After locking the table, No commit when , Here we do the locking table update operation , Will wait for the lock to release .
No index row lock upgraded to table lock
When an index column is not used properly , If the value of the wrong type is assigned , It will change the row lock to the table lock .
Gap lock hazard
Clearance lock ： When we retrieve data using range conditions rather than equality conditions , And ask to share or pat him to lock ,
InnoDB It will lock the index entries of existing data records that meet the conditions ; For records whose key values are within the range of conditions but do not exist , be called " The gap （
InnoDB It's also about this " The gap " To lock , This kind of lock mechanism is called gap lock （
harm ： When a range key is locked , Even if some key values do not exist, they will be locked innocently , When locking, you cannot insert any data within the lock key value range , In some scenarios, this can be very detrimental to performance .
【 Interview questions 】 How to lock a row
select * from user for update;
Innodb The storage engine implements row level locking , Although the performance loss in the implementation of locking mechanism may be higher than that in table level locking , But the overall concurrent processing ability is much better than
MyISAM Table level locking of . When the system concurrency is high ,
InnoDB The overall performance and
MyISAM There are obvious advantages in comparison .
Innodb Row level lock also has a fragile side , When we don't use it properly , May let
Innodb The overall performance of
MyISAM high , Even worse .
Analysis line locking ：
InnoDB_row_lock State variables are used to analyze the contention of row locks on the system
mysql> show status like 'innodb_row_lock%'; <br>
Innodb_row_lock_current_waits： The number of currently waiting locks ;<br>
Innodb_row_lock_time： The total length of time from system startup to lock up ;<br>
Innodb_row_lock_time_avg： The average time it takes to wait ;<br>
Innodb_row_lock_time_max： Time spent waiting for the most frequent time from system startup to now ;<br>
Innodb_row_lock_waits： The total number of times the system has been waiting since it was started ;
- As far as possible, all data retrieval should be done through index , Avoid upgrading non indexed row locks to table locks .
- Design index reasonably , Try to narrow down the range of locks
- As few search conditions as possible , Avoid gap locks
- Try to control the transaction size , Reduce the amount of locked resources and the length of time
- As low level transaction isolation as possible
3.4 Page locks
Cost and lock time are between table lock and row lock ; A deadlock occurs ; Lock granularity is between table lock and row lock , The concurrency is average .
Four 、MySQL Practical problems
4.1 Duplicate data problem
select p1.Email from person p1 where p1.Email in (select p2.Email from person p2 where p1.Id!=p2.Id); [ optimal ]SELECT email FROM `person` group by email HAVING count(email)>1; [ expand ] Delete duplicate data [ Ideas ] Grouping according to duplicate data , And find the smallest id, Delete everything else id That's ok , Here we have to create a temporary table , stay mysql in , It can't be in one Sql In the sentence , That is to query the data , And modify the data at the same time DELETE from person where id not in( select temp.id from (SELECT min(id) id FROM person group by email)as temp); Be careful ： Here it is mysql5.7 The above version will report an error , Because it's not supported select those group by And fields other than aggregate functions
4.2 Index creation and view
create index idx_a_b on table(col_a,col_b);
show index from table;
4.3 where 1=1 and where 1=0 The meaning of
where 1=1 When used to splice multiple conditional statements , In this way, it doesn't matter if the condition exists , Spell it
where Or spell
where1=0 Don't return data , Only return structure , It is used to build a watch quickly .
Search on wechat : Xiao Liu in the whole stack