MySQL deep dive: analyzing performance schema memory management

Ali Technology 2021-10-14 07:57:22
mysql deep dive analyzing performance


One   introduction

MYSQL Performance schema(PFS) yes mysql Provides powerful performance monitoring and diagnostic tools , Provides a way to check at run time server Special method of internal implementation .PFS By monitoring server Internal registered events to collect information , An event can theoretically be server Any internal execution behavior or resource occupation , For example, a function call 、 A system call wait、SQL Parsing or sorting status in the query , Or memory resource occupation, etc .

PFS Store the collected performance data in performance_schema In the storage engine ,performance_schema The storage engine is a memory table engine , That is, all the collected diagnostic information will be saved in memory . The collection and storage of diagnostic information will bring some additional overhead , In order to have as little impact on the business as possible ,PFS Performance and memory management are also very important .

This article mainly through to PFS Read the source code of engine memory management , Reading PFS Memory allocation and release principle , In depth analysis of some of the problems , And some improvement ideas . The source code analysis of this paper is based on Mysql-8.0.24 edition .

Two   Memory management model

PFS Memory management has several key features :

  • Memory allocation to Page In units of , One Page Multiple can be stored in the record

  • Pre assigned parts at system startup pages, Dynamic growth as needed during operation , but page It is a mode of only adding but not recycling

  • record Your application and release are unlocked

1  Core data structure

PFS_buffer_scalable_container yes PFS The core data structure of memory management , The overall structure is shown below :


Container Contains multiple page, Every page There are a fixed number of records, Every record Corresponding to an event object , such as PFS_thread. Every page Medium records The quantity is fixed , but page The number will increase as the load increases .

2  Allocate when Page selection strategy

PFS_buffer_scalable_container yes PFS The core data structure of memory management

The key data structures related to memory allocation are as follows :

PFS_PAGE_SIZE // Every page Size , global_thread_container China and Murdoch think 256PFS_PAGE_COUNT // page Maximum number of ,global_thread_container China and Murdoch think 256
class PFS_buffer_scalable_container { PFS_cacheline_atomic_size_t m_monotonic; // Monotonically increasing atomic variables , For lockless selection page PFS_cacheline_atomic_size_t m_max_page_index; // The maximum... Currently allocated page index size_t m_max_page_count; // Maximum page Number , No new... Will be assigned after page std::atomic<array_type *> m_pages[PFS_PAGE_COUNT]; // page Array native_mutex_t m_critical_section; // Create a new page A lock needed for }

First m_pages Is an array , Every page There may be free Of records, Or maybe the whole page All are busy Of ,Mysql Adopted a relatively simple strategy , Rotate and try each one one one by one page Is there any free time , Until the distribution is successful . If you rotate all pages Still not allocated successfully , This time a new... Will be created page To expand , Until you reach page The upper limit of the number .

Rotation training is not always from the second stage 1 individual page Start looking for , Instead, use atomic variables m_monotonic Start looking at the location of the record , m_monotonic Every time in page The allocation failure in is plus 1.

The core simplified code is as follows :

value_type *allocate(pfs_dirty_state *dirty_state) { current_page_count = m_max_page_index.m_size_t.load();  monotonic = m_monotonic.m_size_t.load(); monotonic_max = monotonic + current_page_count; while (monotonic < monotonic_max) { index = monotonic % current_page_count; array = m_pages[index].load(); pfs = array->allocate(dirty_state); if (pfs) { // Allocation successful return  return pfs; } else { // Allocation failed , Try the next one page,  // because m_monotonic It is concurrent accumulation , There may be local monotonic Variables are not linearly increasing , It could be from 1 Directly into 3 Or bigger , // So at the moment while The cycle is not strictly rotational page, It's probably jumping and trying , The replacement said that we will train all together in rotation under concurrent access page. // This algorithm actually has some problems , It can lead to something page Skipped ignored , So as to intensify the expansion of new capacity page The risk of , More on that later . monotonic = m_monotonic.m_size_t++; } }  // Rotate all Page Failed to allocate after , If the upper limit is not reached , Start expanding page while (current_page_count < m_max_page_count) { // Because of concurrent access , To avoid creating new at the same time page, Here's a sync lock , Also the whole PFS Memory allocation unique lock  native_mutex_lock(&m_critical_section); // Take the lock successfully , If array No more null, It indicates that it has been successfully created by other threads  array = m_pages[current_page_count].load(); if (array == nullptr) { // Grab the creation page The responsibility of the  m_allocator->alloc_array(array); m_pages[current_page_count].store(array); ++m_max_page_index.m_size_t; } native_mutex_unlock(&m_critical_section);  // In the new page Try again to allocate  pfs = array->allocate(dirty_state); if (pfs) { // The allocation is successful and returns  return pfs; } // Allocation failed , Continue trying to create new page Up to the upper limit  }}

Let's analyze the rotation training in detail page The question of strategy , because m_momotonic The accumulation of atomic variables is concurrent , Will lead to some page Skipped rotation training it , This exacerbates the expansion of new capacity page The risk of .

Take an extreme example , It's easier to explain the problem , Suppose there are currently 4 individual page, The first 1、4 individual page Full no available record, The first 2、3 individual page Available record.

When it comes at the same time 4 Threads are concurrent Allocate request , At the same time, I got m_monotonic=0.

monotonic = m_monotonic.m_size_t.load();

At this time, all threads try to start from 1 individual page Distribute record Will fail ( Because the first 1 individual page Yes no available record), Then add up to try the next page

monotonic = m_monotonic.m_size_t++;

At this time, the problem comes , Because atomic variables ++ Is to return the latest value ,4 Threads ++ Success is sequential , The first 1 individual ++ After the thread of monotonic The value is 2, The first 2 individual ++ The thread of is 3, And so on . So you see the second 3、4 A thread skipped page2 and page3, Lead to 3、4 The thread will end the rotation and fail to enter the process of creating a new thread page In the process of , But at this point page2 and page3 There is free time in record serviceable .

Although the above examples are extreme , But in Mysql Concurrent access , concurrent application PFS Memory causes a portion to be skipped page The situation should still be very easy to happen .

3  Page Inside Record selection strategy

PFS_buffer_default_array Is each Page Maintain a group of records Management category .

The key data structure is as follows :

class PFS_buffer_default_array {PFS_cacheline_atomic_size_t m_monotonic; // Monotonically increasing atomic variable , Used for selection free Of recordsize_t m_max; // record Maximum number of T *m_ptr; // record Corresponding PFS object , such as PFS_thread}

Every Page It's actually a fixed length array , Every record Objects have 3 Status FREE , DIRTY ALLOCATED , FREE I'm free record have access to , ALLOCATED Is assigned successfully , DIRTY It's an intermediate state , Indicates that it has been occupied but has not been allocated successfully .

Record The essence of choice is to find and seize the status of rotation training free Of record The process of .

The core simplified code is as follows :

value_type *allocate(pfs_dirty_state *dirty_state) { // from m_monotonic Start to try the rotation search at the recorded position  monotonic = m_monotonic.m_size_t++; monotonic_max = monotonic + m_max;
while (monotonic < monotonic_max) { index = monotonic % m_max; pfs = m_ptr + index; // m_lock yes pfs_lock structure ,free/dirty/allocated Three states are maintained by this data structure // How to realize atomic state migration will be described in detail later if (pfs->m_lock.free_to_dirty(dirty_state)) { return pfs; } // At present record Not for free, Atomic variable ++ Try the next one monotonic = m_monotonic.m_size_t++; }}

choice record Main process and selection page Basically similar , The difference is page Inside record The quantity is fixed , Therefore, there is no logic of capacity expansion .

Of course, the selection strategy is the same , There will be the same problem , there
m_monotonic Atomic variable ++ Is multithreaded and concurrent , Similarly, if the concurrency is large, there will be record Skipped and selected , This leads to page Even if there is free Of record It may not have been selected .

So that is page Choose even if it's not skipped ,page Internal record There is also a chance of being skipped and not selected , worse , Further exacerbated the growth of memory .

4  pfs_lock

Every record There is one. pfs_lock , To maintain it in page Allocation status in (free/dirty/allocated), as well as version Information .

Key data structure :

struct pfs_lock {
std::atomic m_version_state;

pfs_lock Use 1 individual 32 Bit unsigned integer to save version+state Information , The format is as follows :


low 2 Bit bytes indicate allocation status .

state PFS_LOCK_FREE = 0x00
state PFS_LOCK_DIRTY = 0x01


initial version by 0, Add... For each successful allocation 1,version It means that record The number of successful assignments
Mainly look at the state migration code :

// below 3 A macro is mainly used for bit operation , Convenient operation state or version#define VERSION_MASK 0xFFFFFFFC#define STATE_MASK 0x00000003#define VERSION_INC 4
bool free_to_dirty(pfs_dirty_state *copy_ptr) { uint32 old_val = m_version_state.load();
// Judge the present state Is it FREE, If not , Direct return failure if ((old_val & STATE_MASK) != PFS_LOCK_FREE) { return false; }
uint32 new_val = (old_val & VERSION_MASK) + PFS_LOCK_DIRTY;
// At present state by free, Try to state It is amended as follows dirty,atomic_compare_exchange_strong Belong to optimistic lock , Multiple threads may be simultaneously // Modify the atomic variable , But only 1 Modification succeeded . bool pass = atomic_compare_exchange_strong(&m_version_state, &old_val, new_val);
if (pass) { // free to dirty success copy_ptr->m_version_state = new_val; }
return pass;}
void dirty_to_allocated(const pfs_dirty_state *copy) { /* Make sure the record was DIRTY. */ assert((copy->m_version_state & STATE_MASK) == PFS_LOCK_DIRTY); /* Increment the version, set the ALLOCATED state */ uint32 new_val = (copy->m_version_state & VERSION_MASK) + VERSION_INC + PFS_LOCK_ALLOCATED;;}

The state transition process is easy to understand ,  from dirty_to_allocated and allocated_to_free The logic is simpler , Because only record Status is free when , Its state migration has the problem of concurrent multiple writes , once state Turn into dirty, At present record It is equivalent to being occupied by a thread , Other threads will not attempt to operate on this record 了 .

version The growth of is in state Turn into PFS_LOCK_ALLOCATED when

5  PFS Memory free

PFS Memory release is relatively simple , Because of every record All recorded where they were container and page, call deallocate Interface , Finally, set the status to free It's done. .

The bottom will go into pfs_lock To update the status :

struct pfs_lock { void allocated_to_free(void) { /* If this record is not in the ALLOCATED state and the caller is trying to free it, this is a bug: the caller is confused, and potentially damaging data owned by another thread or object. */ uint32 copy = copy_version_state(); /* Make sure the record was ALLOCATED. */ assert(((copy & STATE_MASK) == PFS_LOCK_ALLOCATED)); /* Keep the same version, set the FREE state */ uint32 new_val = (copy & VERSION_MASK) + PFS_LOCK_FREE;; }}

3、 ... and   Optimization of memory allocation

As we analyzed earlier, whether it is page still record There is a chance of skipping rotation training , Even if there is... In the cache free The distribution of the members of the group will also be unsuccessful , Leads to the creation of more page, Take up more memory . The main problem is that this memory will not be released once allocated .

In order to improve PFS Memory hit rate , Try to avoid the above problems , Some ideas are as follows :

 while (monotonic < monotonic_max) { index = monotonic % current_page_count; array = m_pages[index].load(); pfs = array->allocate(dirty_state); if (pfs) { // Record the successful allocation index; return pfs; } else { // Local variables are incremented , Avoid avoiding concurrent accumulation and skipping some pages monotonic++; } }

Another point , Each search starts from the last successful location , This will inevitably lead to the conflict of concurrent access , Because everyone starts from the same position , A certain randomness should be added to the starting search position , This can avoid a large number of conflicts .

Summarized below :

  1. Every time Allocate From the most recent assignment index Start looking for , Or start looking at random locations

  2. Every Allocate Strictly rotate all pages or records

Four   Optimization of memory release

PFS The biggest problem with memory release is that once the created memory is not released , until shutdown. If you encounter hot business , During the peak period of business, a lot of page Of memory , In the low peak stage of business, it is still not released .

To achieve periodic detection and reclaim memory , Without affecting the efficiency of memory allocation , It is quite complicated to implement a set of lockless recycling mechanism .

There are mainly the following points to consider :

  1. The release must be in the form of page Unit , That is, the released page In all of the records All must be guaranteed to be free, And make sure you stay free Of page Will no longer be assigned to

  2. Memory allocation is random , Overall, memory can be recycled , But maybe everyone page There are some. busy Of , How to better coordinate this situation

  3. How to determine the threshold of release , Also avoid frequent assignments + The problem of release

in the light of PFS Optimization of memory release ,PolarDB Regular recycling has been developed and provided PFS Memory features , In view of the limitation of this space , It will be introduced later .

5、 ... and   About us

PolarDB It is a cloud native distributed relational database independently developed by Alibaba , On 2020 in Gartner Global database Leader quadrant , And got it. 2020 The first prize of scientific and technological progress awarded by China Electronics Society in .PolarDB Cloud based native distributed database architecture , Provide large-scale online transaction processing capability , It also has the ability of parallel processing of complex queries , In the field of cloud native distributed database, it has reached the international leading level , And has been widely recognized by the market . In the best practices within Alibaba Group ,PolarDB And fully supported 2020 Double 11, Tianmao , And refresh the database processing peak record , the height is 1.4 Billion TPS. Welcome people with lofty ideals to join us , Please send your resume to , We look forward to working with you to build a world-class next-generation cloud native distributed relational database .

Reference resources :

[1] MySQL Performance Schema

[2] MySQL · Best practices · Have you been together today ?--- Insight PolarDB 8.0 Parallel query

[3] Source code mysql / mysql-server 8.0.24

Advanced application skills

In this course, you will practice cloud development of advanced applications , Including the development and deployment of several common applications , For example, open Web Applications and applets , Based on the rapid creation of open source applications and cloud native services DevOps practice .1. Web、 Hands on practice of applets and open source applications ;2. Cloud native DevOps Practice ;3. Integration with local development processes .

Click to read the original text to view the course details ~
本文为[Ali Technology]所创,转载请带上原文链接,感谢

  1. L'apprentissage le plus détaillé de springboot sur le Web - day08
  2. Introduction à la page Web de rabbitmq (3)
  3. No Converter found for return value of type: class java.util.arraylist Error Problem
  4. (16) , spring cloud stream message driven
  5. Que faut - il apprendre de l'architecture des microservices Spring Cloud?
  6. Résolution: erreur: Java: distribution cible invalide: 11problème d'erreur
  7. Springboot démarre en une minute et sort de l'enfer de la configuration SSM!
  8. Maven - un outil de gestion essentiel pour les grands projets d'usine, de l'introduction à la maîtrise![️ Collection recommandée]
  9. ️ Push to interview in Large Factory ᥧ - - Spring Boot Automatic Assembly Principle
  10. [️ springboot Template Engine] - thymeleaf
  11. Springboot - MVC Automatic configuration Principle
  12. Mybatis reverse engineering and the use of new version mybatisplus 3.4 reverse engineering
  13. Base de données MySQL - transactions et index
  14. Sécurité du printemps - [authentification, autorisation, déconnexion et contrôle des droits]
  15. Moteur de base de données InnoDB diffère de myisam
  16. Swagger - [springboot Integrated Swagger, configure Swagger, configure scan Interface, configure API Group]
  17. Cadre de sécurité Shiro - [QUICKstart, login Block, User Authentication, request Authorization]
  18. [Introduction à Java] installation de l'environnement de développement - Introduction à Java et construction de l'environnement
  19. 【 linux】 notes d'utilisation tmux
  20. MySQL + mybatis paging query - database series learning notes
  21. Usage relations and differences of count (1), count (*) and count (a field) in MySQL
  22. 2021 Ali Java advanced interview questions sharing, Java Architect interview materials
  23. Mybatis - dynamic SQL statement - if usage - MySQL series learning notes
  24. [go to Dachang series] deeply understand the use of where 1 = 1 in MySQL
  25. [secret room escape game theme ranking list] Based on spring MVC + Spring + mybatis
  26. Redis log: the killer mace of fearless downtime and rapid recovery
  27. 5 minutes to build redis cluster mode and sentinel mode with docker
  28. Java小白入门200例106之遍历ArrayList的几种方式
  29. Java小白入门200例105之Java ArrayList类
  30. Java小白入门200例104之JDK自带记录日志类logging
  31. Practice of high availability architecture of Tongcheng travel network based on rocketmq
  32. Chapter 9 - Linux learning will - file archiving and compression tar --- zip
  33. Java小白入門200例104之JDK自帶記錄日志類logging
  34. JDK avec journalisation de classe dans 200 cas 104
  35. Java ArrayList Class for Introduction to Java LITTLE WHITE 200 example 105
  36. Plusieurs façons de traverser ArrayList à partir de 200 exemples 106
  37. Provectus / Kafka UI: open source Apache Kafka's Web GUI Graphical interface management tool
  38. Design pattern series: Singleton pattern
  39. Java小白入門200例105之Java ArrayList類
  40. Understanding Java record types
  41. Five load balancing algorithms implemented in Java
  42. Data structure must be an example to understand dynamic programming (with universal Python code)
  43. The idea and implementation of recursion in data structure (Python)
  44. The idea and implementation of linked list (Python)
  45. Data structure must be queue and double ended queue (Python)
  46. Idea and implementation of data structure must be able stack (Python)
  47. Data structure | time complexity (with video explanation)
  48. 20 flutter libraries you should know
  49. Case sharing: Online failure caused by Dubbo 2.7.12 bug
  50. Open source | didi open source, general functional components for Java authentication, authentication, management and task scheduling
  51. Flutter multi engine supports platformview and thread merging solution
  52. In depth understanding of netty: viewing netty traffic control from occasional downtime
  53. Spring AOP internal skill cultivation
  54. Interviewer: is Tomcat a symbolic parent delegation mechanism?
  55. Expérimentez la première tablette de consommation Linux. La puce et le système d'origine sont tous faits maison
  56. 2021 summary of the latest Java common open source libraries, Java interview handwritten code
  57. 2021 latest Java factory interview true questions, Kafka introduction video
  58. 01 javase - première connaissance de l'installation de Java et de l'environnement de développement
  59. The sales volume in September broke the record: Weilai and Xiaopeng both exceeded 10000, with an ideal month on month decrease of 24.7%
  60. Redis core principle and practice: implementation principle of hash type and dictionary structure