From 0 to 10 million users, how do I optimize MySQL database step by step?

glacier 2020-11-10 23:12:19
million users optimize mysql database

Write it at the front

A lot of friends left messages saying that let me write some real cases in the process of work , What to write about ? think it over and over again , Write an article about the evolution of database architecture upgrading from scratch to more than 10 million users in my previous company .

This article records my first visit to a start-up company , From zero to tens of millions of users , How to optimize step by step in case of system pressure surge MySQL Database , And the evolution of database architecture upgrade . The upgrade process is technically challenging , And I've learned a lot from it . Hope to be able to bring substantial help to the partners .

Business background

I've worked in a startup before , It's a mall business , Shopping mall business , On the surface, it seems that the business involved is simple , Include : user 、 goods 、 stock 、 Order 、 The shopping cart 、 payment 、 Logistics and other businesses . however , Subdivision down , It's more complicated . This often involves a lot of potential needs to improve the user experience . for example : Recommend products for users , This involves user behavior analysis and accurate recommendation of big data . In terms of specific technology , That must include : User behavior log embedding point 、 collection 、 Report , Real time statistical analysis of big data , User portrait , Big data technology such as product recommendation .

The company's business is growing rapidly , only 2 In less than a year and a half, users have accumulated from zero to tens of millions , Hundreds of millions of visits a day , peak QPS Tens of thousands of times per second . The pressure of data writing comes from the user placing an order , Payment and other operations , Especially during the promotion period of the double 11 National Congress , The system's write pressure will multiply . However , The pressure of reading business is much greater than that of writing , according to an uncompleted statistic , The number of requests for read service is that of write service 50 About times .

Next , Let's take a look at how the database is upgraded .

The initial technology selection

As a startup , The most important thing is agility , Fast product realization , External services , So we chose public cloud services , Ensure fast implementation and scalability , It saves the time of self built computer room . The whole backstage uses Java Language development , The database uses MySQL. The whole picture is as shown in the figure below .

Read / write separation

As the business grows , Rapid growth of visits , The above scheme can not meet the performance requirements very quickly . The response time of each request is longer and longer , Let's say the user H5 Products are constantly updated on the page , Response time from initial 500 Milliseconds increased to 2 More than seconds . Business peak , The system even went down . At this critical moment of life and death , By monitoring , We found high peak MySQL CPU Usage is close 80%, disk IO Close usage 90%,slow query( The slow query ) From a day 1 100 up to 1 Ten thousand , And it's getting worse every day . Database has become a bottleneck , We have to upgrade the architecture quickly .

When Web When application services have performance bottlenecks , Because the service itself is stateless , We can solve this problem by adding machines horizontally . The database can't be extended by simply adding machines , So we took MySQL Master slave synchronization and application server read-write separation scheme .

MySQL Support master-slave synchronization , Real time incremental copy of the data from the master database to the slave database , And a master database can connect multiple slave databases to synchronize . Use this feature , We make read and write judgments on each request on the application server , If you write a request , Put all the... In this request DB The operation is sent to the main database ; If it's a read request , Put all the... In this request DB The operation is sent to the slave library , As shown in the figure below .

After the separation of reading and writing , The pressure on the database has been reduced a lot ,CPU Utilization rate and sum IO Utilization has dropped to 5% within ,Slow Query( The slow query ) Also approaching 0. Master slave synchronization 、 The separation of reading and writing brings us the following two benefits :

  • Reduced the main library ( Write ) pressure : The business of mall mainly comes from reading operation , After separation of reading and writing , Read pressure transferred to slave , The pressure in the main reservoir has been reduced dozens of times .
  • Slave Library ( read ) Scalable horizontally ( Add from library machine ): Because the system pressure is mainly read request , And from the library can be expanded horizontally , When the pressure from the reservoir is too high , You can add slave machines directly , Relieve the pressure of reading request .

Of course , No plan is all powerful . Read / write separation , For the time being MySQL The pressure problem , It also brings new challenges . Business peak , The user submits the order , I can't see the order information submitted by myself in my order list ( Typical read after write problem ); There are some exceptions in the system that can not query data occasionally . By monitoring , We found that , Business peak MySQL There may be master-slave replication delays , In extreme cases , Master slave delay up to a few seconds . This has a huge impact on the user experience .

How to monitor the master-slave synchronization status ? On slave machine , perform show slave status, see Seconds_Behind_Master value , Represents the time when the master-slave synchronization slave database falls behind the master database , The unit is in seconds , If master-slave synchronization has no delay , The value of 0.MySQL One of the important reasons for master-slave delay is that master-slave replication is a single thread serial execution ( Higher version MySQL Support for parallel replication ).

How to avoid or solve the master-slave delay ? We did the following optimizations :

  • Optimize MySQL Parameters , For example, increase innodb_buffer_pool_size, Let more operations MySQL Done in memory , Reduce disk operations .
  • Use high performance CPU host .
  • The database uses the physical host , Avoid using virtual virtual virtual virtual virtual machine , promote IO performance .
  • Use SSD disk , promote IO performance .SSD The random IO The performance is about SATA The hard disk 10 Times or more .
  • Business code optimization , Some operations with high real-time requirements , Force the use of the master library for read operations .
  • Upgrade to a higher version MySQL, Support parallel master-slave replication .

Vertical sub database

The separation of reading and writing solves the problem of reading pressure , The pressure increases with each reading , It can be expanded horizontally by adding from the library . But the pressure of write operation has not been effectively relieved with the explosive growth of business , For example, it's getting slower and slower for users to submit orders . By monitoring MySQL database , We found that , Database writing is getting slower and slower , An ordinary insert operation , It may even be carried out 1 More than seconds .

On the other hand , Business is getting more and more complex , Multiple application systems use the same database , One of the small non core functions is delayed , Often affect other core business functions on the main database . At this time , Main library becomes performance bottleneck , We realize that , We need to upgrade the architecture again , Split the main database , On the one hand, to improve performance , On the other hand, reduce the interaction between systems , To improve system stability . This time, , We split the system vertically by business . As shown in the figure below , Split the original huge database into different business databases by business , Each system only accesses the database of corresponding business , Try to avoid or reduce cross library access .

Vertical sub warehouse process , We also have a lot of challenges , The biggest challenge is : Unable to cross Library join, At the same time, we need to refactor the existing code . Single database , Easy to use join Association table query ; After dismantling the library , Split database on different instances , Can't be used across Libraries join 了 .

for example , Query all orders of a certain merchant by merchant name , Before vertical branch , Sure join Merchant and order table query , You can also use subqueries directly , As shown below :

select * from tb_order where supplier_id in (select id from supplier where name=’ Business name ’);

After the sub database , Then refactor the code , First query the merchant by the merchant name id, Through merchants id Query order table , As shown below :

select id from supplier where name=’ Business name ’
select * from tb_order where supplier_id in (supplier_ids )

The experience and lessons in the process of vertical division , So that we made SQL Best practices , One of them is to disable or use less join, Instead, assemble the data in the program , Give Way SQL It's simpler . On the one hand, prepare for further vertical business split in the future , On the other hand, it avoids MySQL in join The problem of low performance .

After nearly ten days of overtime, the infrastructure has been adjusted , And business code refactoring , Finally completed the vertical split of the database . After splitting , Each application only accesses the corresponding database , On the one hand, the single point database is divided into several , Sharing the write pressure of the main database ; On the other hand , The split database is independent , Achieve business isolation , No more mutual influence .

Horizontal sub database

Read / write separation , By expanding horizontally from the library , Solved reading pressure ; Vertical sub library splitting main library by business , Write pressure cached , But the system still has the following hidden dangers :

  • More and more single table data . Such as the order form , The number of records in a single table soon exceeded 100 million , beyond MySQL The limits of , Affect read / write performance .
  • The writing pressure of core business library is increasing , No more vertical split , In this system architecture ,MySQL The main database does not have the ability of horizontal expansion .

here , We need to be right about MySQL Further horizontal split .

The first problem faced by horizontal sub base is , What is the logic for splitting . One solution is to split it by city , All data of a city is in one database ; Another option is to order ID Average split data . The advantage of splitting by city is high data aggregation , It's easy to aggregate queries , The implementation is relatively simple , The disadvantage is that the data is not evenly distributed , Some cities have a huge amount of data , Generate hot spots , And these hot spots may be forced to split up again in the future . By order ID Split is the opposite , The advantage is even data distribution , There will not be a case where the database data is very large or very small , The disadvantage is that the data is too scattered , Not conducive to aggregate query . such as , By order ID After break up , Orders from one merchant may be distributed in different databases , Query all orders of a merchant , Multiple databases may need to be queried . In this case , One solution is to make redundant tables of data that need to be aggregated and queried , Redundant tables are not split , At the same time, in the process of business development , Reduce aggregate queries .

After a lot of thinking , We finally decided to order ID Do horizontal sub database . Architecturally , Divide the system into three layers :

  • application layer : Various business application systems
  • Data access layer : Unified data access interface , Shielding the reading and writing sub Library of the upper application layer 、 table 、 Cache and other technical details .
  • The data layer : Yes DB Data fragmentation , And can be added dynamically shard Fragmentation .

The key point of horizontal database technology lies in the design of data access layer , The data access layer consists of three parts :

  • Distributed cache
  • database middleware
  • Data heterogeneous middleware

The database middleware needs to include the following important functions :

  • ID generator : Generate primary key for each table
  • Data source routing : Will each time DB Operations are routed to different fragmented data sources

ID generator

ID Generator is the core of the whole horizontal database , It determines how to split the data , And query storage - Retrieving data .ID Need to be globally unique across Libraries , Otherwise, it will cause conflicts at the business level . Besides ,ID Must be numeric and ascending , This is mainly about ascending order ID Can guarantee MySQL Performance of ( if UUID And random strings , In the case of high concurrency and large amount of data , Poor performance ). meanwhile ,ID Generator must be very stable , Because any failure will affect all database operations .

In our system ID The generator is designed as follows .

  • Whole ID The binary length of is 64 position
  • front 36 Bit use timestamp , In order to make sure ID It's an ascending order
  • middle 13 Bit is the sub library identifier , Used to identify the current ID Which database is the corresponding record in
  • after 15 Bits are self increasing sequences , To ensure concurrency in the same second ,ID No repetition . Each fragment library has an auto increment list , When generating an auto increasing sequence , Get the current auto increment sequence value from the auto increment order list , And add 1, As for the present ID After 15 position
  • The next second , after 15 The self increasing sequence of bits starts again from 1 Start .

Horizontal database is a challenging project , Our whole team is also growing rapidly in constantly meeting the challenges .

In order to adapt to the continuous development of the company's business , In addition to the MySQL Database on the corresponding architecture upgrade , We also built a complete set of big data real-time analysis and Statistics Platform , Real time analysis of user behavior in the system .

How to build a big data real-time analysis and Statistics Platform , Real time analysis of user behavior , We'll talk about it in detail later .

Okay , That's all for today , I'm glacier , See you next time !!

Heavy benefits

Search on wechat 【 Glacier Technology 】 WeChat official account , Focus on this deep programmer , Read hard core technology dry goods every day , Reply in official account 【PDF】 I have prepared the interview materials of the first-line large factories and my original superhard core PDF Technical documentation , And I carefully prepared for you a set of Resume Template ( Constantly updating ), I hope you can find the job you want , Learning is a matter of depression , The way to laugh now and then , come on. . If you succeed in getting into the company you want , Don't slack off , Workplace growth is the same as learning new technology , move forward , or you 'll fall behind . If we're lucky, we'll see you again !

in addition , I open source each PDF, I will continue to update and maintain , Thank you for your long-term support for the glacier !!

At the end

If you think it's pretty good , Please search and pay attention to wechat 「 Glacier Technology 」 WeChat official account , Learn from glacier 、 Distributed 、 Microservices 、 big data 、 Internet and cloud native technology ,「 Glacier Technology 」 The official account of WeChat has updated a lot of technical topics. , Every technical article is full of dry goods ! Many readers have already read 「 Glacier Technology 」 WeChat official account , Condole interviewer , Successful job hopping to big factory ; There are also many readers who have achieved technological leaps , Become the technical backbone of the company ! If you want to improve your abilities like them , Realize the leap of technical ability , Enter the large factory , A promotion and pay increase , Then pay attention 「 Glacier Technology 」 The official account of WeChat. , Daily update of superhard core technology dry goods , Let you no longer be confused about how to improve technical ability !


  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云