Write it at the front
A lot of friends left messages saying that let me write some real cases in the process of work , What to write about ？ think it over and over again , Write an article about the evolution of database architecture upgrading from scratch to more than 10 million users in my previous company .
This article records my first visit to a start-up company , From zero to tens of millions of users , How to optimize step by step in case of system pressure surge MySQL Database , And the evolution of database architecture upgrade . The upgrade process is technically challenging , And I've learned a lot from it . Hope to be able to bring substantial help to the partners .
I've worked in a startup before , It's a mall business , Shopping mall business , On the surface, it seems that the business involved is simple , Include ： user 、 goods 、 stock 、 Order 、 The shopping cart 、 payment 、 Logistics and other businesses . however , Subdivision down , It's more complicated . This often involves a lot of potential needs to improve the user experience . for example ： Recommend products for users , This involves user behavior analysis and accurate recommendation of big data . In terms of specific technology , That must include ： User behavior log embedding point 、 collection 、 Report , Real time statistical analysis of big data , User portrait , Big data technology such as product recommendation .
The company's business is growing rapidly , only 2 In less than a year and a half, users have accumulated from zero to tens of millions , Hundreds of millions of visits a day , peak QPS Tens of thousands of times per second . The pressure of data writing comes from the user placing an order , Payment and other operations , Especially during the promotion period of the double 11 National Congress , The system's write pressure will multiply . However , The pressure of reading business is much greater than that of writing , according to an uncompleted statistic , The number of requests for read service is that of write service 50 About times .
Next , Let's take a look at how the database is upgraded .
The initial technology selection
As a startup , The most important thing is agility , Fast product realization , External services , So we chose public cloud services , Ensure fast implementation and scalability , It saves the time of self built computer room . The whole backstage uses Java Language development , The database uses MySQL. The whole picture is as shown in the figure below .
Read / write separation
As the business grows , Rapid growth of visits , The above scheme can not meet the performance requirements very quickly . The response time of each request is longer and longer , Let's say the user H5 Products are constantly updated on the page , Response time from initial 500 Milliseconds increased to 2 More than seconds . Business peak , The system even went down . At this critical moment of life and death , By monitoring , We found high peak MySQL CPU Usage is close 80%, disk IO Close usage 90%,slow query（ The slow query ） From a day 1 100 up to 1 Ten thousand , And it's getting worse every day . Database has become a bottleneck , We have to upgrade the architecture quickly .
When Web When application services have performance bottlenecks , Because the service itself is stateless , We can solve this problem by adding machines horizontally . The database can't be extended by simply adding machines , So we took MySQL Master slave synchronization and application server read-write separation scheme .
MySQL Support master-slave synchronization , Real time incremental copy of the data from the master database to the slave database , And a master database can connect multiple slave databases to synchronize . Use this feature , We make read and write judgments on each request on the application server , If you write a request , Put all the... In this request DB The operation is sent to the main database ; If it's a read request , Put all the... In this request DB The operation is sent to the slave library , As shown in the figure below .
After the separation of reading and writing , The pressure on the database has been reduced a lot ,CPU Utilization rate and sum IO Utilization has dropped to 5% within ,Slow Query（ The slow query ） Also approaching 0. Master slave synchronization 、 The separation of reading and writing brings us the following two benefits ：
Reduced the main library （ Write ） pressure ： The business of mall mainly comes from reading operation , After separation of reading and writing , Read pressure transferred to slave , The pressure in the main reservoir has been reduced dozens of times .
Slave Library （ read ） Scalable horizontally （ Add from library machine ）： Because the system pressure is mainly read request , And from the library can be expanded horizontally , When the pressure from the reservoir is too high , You can add slave machines directly , Relieve the pressure of reading request .
Of course , No plan is all powerful . Read / write separation , For the time being MySQL The pressure problem , It also brings new challenges . Business peak , The user submits the order , I can't see the order information submitted by myself in my order list （ Typical read after write problem ）; There are some exceptions in the system that can not query data occasionally . By monitoring , We found that , Business peak MySQL There may be master-slave replication delays , In extreme cases , Master slave delay up to a few seconds . This has a huge impact on the user experience .
How to monitor the master-slave synchronization status ？ On slave machine , perform show slave status, see Seconds_Behind_Master value , Represents the time when the master-slave synchronization slave database falls behind the master database , The unit is in seconds , If master-slave synchronization has no delay , The value of 0.MySQL One of the important reasons for master-slave delay is that master-slave replication is a single thread serial execution （ Higher version MySQL Support for parallel replication ）.
How to avoid or solve the master-slave delay ？ We did the following optimizations ：
- Optimize MySQL Parameters , For example, increase innodb_buffer_pool_size, Let more operations MySQL Done in memory , Reduce disk operations .
- Use high performance CPU host .
- The database uses the physical host , Avoid using virtual virtual virtual virtual virtual machine , promote IO performance .
- Use SSD disk , promote IO performance .SSD The random IO The performance is about SATA The hard disk 10 Times or more .
- Business code optimization , Some operations with high real-time requirements , Force the use of the master library for read operations .
- Upgrade to a higher version MySQL, Support parallel master-slave replication .
Vertical sub database
The separation of reading and writing solves the problem of reading pressure , The pressure increases with each reading , It can be expanded horizontally by adding from the library . But the pressure of write operation has not been effectively relieved with the explosive growth of business , For example, it's getting slower and slower for users to submit orders . By monitoring MySQL database , We found that , Database writing is getting slower and slower , An ordinary insert operation , It may even be carried out 1 More than seconds .
On the other hand , Business is getting more and more complex , Multiple application systems use the same database , One of the small non core functions is delayed , Often affect other core business functions on the main database . At this time , Main library becomes performance bottleneck , We realize that , We need to upgrade the architecture again , Split the main database , On the one hand, to improve performance , On the other hand, reduce the interaction between systems , To improve system stability . This time, , We split the system vertically by business . As shown in the figure below , Split the original huge database into different business databases by business , Each system only accesses the database of corresponding business , Try to avoid or reduce cross library access .
Vertical sub warehouse process , We also have a lot of challenges , The biggest challenge is ： Unable to cross Library join, At the same time, we need to refactor the existing code . Single database , Easy to use join Association table query ; After dismantling the library , Split database on different instances , Can't be used across Libraries join 了 .
for example , Query all orders of a certain merchant by merchant name , Before vertical branch , Sure join Merchant and order table query , You can also use subqueries directly , As shown below ：
select * from tb_order where supplier_id in (select id from supplier where name=’ Business name ’);
After the sub database , Then refactor the code , First query the merchant by the merchant name id, Through merchants id Query order table , As shown below ：
select id from supplier where name=’ Business name ’ select * from tb_order where supplier_id in (supplier_ids )
The experience and lessons in the process of vertical division , So that we made SQL Best practices , One of them is to disable or use less join, Instead, assemble the data in the program , Give Way SQL It's simpler . On the one hand, prepare for further vertical business split in the future , On the other hand, it avoids MySQL in join The problem of low performance .
After nearly ten days of overtime, the infrastructure has been adjusted , And business code refactoring , Finally completed the vertical split of the database . After splitting , Each application only accesses the corresponding database , On the one hand, the single point database is divided into several , Sharing the write pressure of the main database ; On the other hand , The split database is independent , Achieve business isolation , No more mutual influence .
Horizontal sub database
Read / write separation , By expanding horizontally from the library , Solved reading pressure ; Vertical sub library splitting main library by business , Write pressure cached , But the system still has the following hidden dangers ：
- More and more single table data . Such as the order form , The number of records in a single table soon exceeded 100 million , beyond MySQL The limits of , Affect read / write performance .
- The writing pressure of core business library is increasing , No more vertical split , In this system architecture ,MySQL The main database does not have the ability of horizontal expansion .
here , We need to be right about MySQL Further horizontal split .
The first problem faced by horizontal sub base is , What is the logic for splitting . One solution is to split it by city , All data of a city is in one database ; Another option is to order ID Average split data . The advantage of splitting by city is high data aggregation , It's easy to aggregate queries , The implementation is relatively simple , The disadvantage is that the data is not evenly distributed , Some cities have a huge amount of data , Generate hot spots , And these hot spots may be forced to split up again in the future . By order ID Split is the opposite , The advantage is even data distribution , There will not be a case where the database data is very large or very small , The disadvantage is that the data is too scattered , Not conducive to aggregate query . such as , By order ID After break up , Orders from one merchant may be distributed in different databases , Query all orders of a merchant , Multiple databases may need to be queried . In this case , One solution is to make redundant tables of data that need to be aggregated and queried , Redundant tables are not split , At the same time, in the process of business development , Reduce aggregate queries .
After a lot of thinking , We finally decided to order ID Do horizontal sub database . Architecturally , Divide the system into three layers ：
- application layer ： Various business application systems
- Data access layer ： Unified data access interface , Shielding the reading and writing sub Library of the upper application layer 、 table 、 Cache and other technical details .
- The data layer ： Yes DB Data fragmentation , And can be added dynamically shard Fragmentation .
The key point of horizontal database technology lies in the design of data access layer , The data access layer consists of three parts ：
- Distributed cache
- database middleware
- Data heterogeneous middleware
The database middleware needs to include the following important functions ：
- ID generator ： Generate primary key for each table
- Data source routing ： Will each time DB Operations are routed to different fragmented data sources
ID Generator is the core of the whole horizontal database , It determines how to split the data , And query storage - Retrieving data .ID Need to be globally unique across Libraries , Otherwise, it will cause conflicts at the business level . Besides ,ID Must be numeric and ascending , This is mainly about ascending order ID Can guarantee MySQL Performance of （ if UUID And random strings , In the case of high concurrency and large amount of data , Poor performance ）. meanwhile ,ID Generator must be very stable , Because any failure will affect all database operations .
In our system ID The generator is designed as follows .
- Whole ID The binary length of is 64 position
- front 36 Bit use timestamp , In order to make sure ID It's an ascending order
- middle 13 Bit is the sub library identifier , Used to identify the current ID Which database is the corresponding record in
- after 15 Bits are self increasing sequences , To ensure concurrency in the same second ,ID No repetition . Each fragment library has an auto increment list , When generating an auto increasing sequence , Get the current auto increment sequence value from the auto increment order list , And add 1, As for the present ID After 15 position
- The next second , after 15 The self increasing sequence of bits starts again from 1 Start .
Horizontal database is a challenging project , Our whole team is also growing rapidly in constantly meeting the challenges .
In order to adapt to the continuous development of the company's business , In addition to the MySQL Database on the corresponding architecture upgrade , We also built a complete set of big data real-time analysis and Statistics Platform , Real time analysis of user behavior in the system .
How to build a big data real-time analysis and Statistics Platform , Real time analysis of user behavior , We'll talk about it in detail later .
Okay , That's all for today , I'm glacier , See you next time ！！
Search on wechat 【 Glacier Technology 】 WeChat official account , Focus on this deep programmer , Read hard core technology dry goods every day , Reply in official account 【PDF】 I have prepared the interview materials of the first-line large factories and my original superhard core PDF Technical documentation , And I carefully prepared for you a set of Resume Template （ Constantly updating ）, I hope you can find the job you want , Learning is a matter of depression , The way to laugh now and then , come on. . If you succeed in getting into the company you want , Don't slack off , Workplace growth is the same as learning new technology , move forward , or you 'll fall behind . If we're lucky, we'll see you again ！
in addition , I open source each PDF, I will continue to update and maintain , Thank you for your long-term support for the glacier ！！