Do you really know MySQL order by

Pretend to know programming 2021-11-25 17:57:39
really know mysql order

Sort the word , My first feeling is that almost all App There are places to sort , Taobao products are sorted according to purchase time 、B The comments of the station are sorted according to the heat ..., Of course, what we are talking about today is not how to sort gracefully under big data , How to improve sorting performance , Let's say MySQL The sorting .

about MySQL, When it comes to sorting , What did you first think of ? keyword order by?order by It's best to have an index on the field ? Leaf nodes are already sequential ? Or try not to MySQL Internal sorting ?

The cause of the matter

Now suppose there is a list of users' friends :

  `id` int(10) AUTO_INCREMENT,
  `user_id` int(10),
  `friend_addr` varchar(1000),
  `friend_name` varchar(100),  
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`)

At present, there are two points in the table that need attention :

  1. User user_id , Friend's name friend_name、 Friend's address friend_addr

  2. user_id Yes, there is Indexes Of

one day , There's a junior development engineer, ape , Received a demand from Xiao Wang, the primary product manager :
Xiao Wang : Comrade ape , Now you need to add a function in the background , This function should be supported according to the user id Can find the names and addresses of all his friends , And ask friends' names to be sorted according to the dictionary .
Ape : well , This function is simple , I'll be online soon .

So the little ape wrote this sql:

select friend_name,friend_addr from user where user_id=? order by name

In the moment of lightning , The little ape toe went on the line in high spirits , It's all going well , Until one day, an operation classmate led to such a query :

select friend_name,friend_addr from user where user_id=10086 order by name

However , This query is much slower than usual , The database reported a slow query , The little ape panicked at this time b: What's going on ?user_id There is an index , And wisely, I only used select friend_name,friend_addr, It's no use select * ah . The little ape kept comforting himself , Be calm, be calm , Then it suddenly occurred to me that there was explain command , use explain Let's check that sql Let's go ahead with our plan , When the little ape used explain after , Find out extra There is a dangerous word in the field :using filesort.

“ This query actually uses the legendary file sorting , But if a person doesn't have many friends , Even if you use file sorting , It should be soon ”, Unless this user_id=10086 I have many friends , Later, the little ape went to check , This user's friend has 10w Multiple ~.

Lost in thought, the little ape thought : This pot seems to be fixed ,10w The data is a little big , And this one. using filesort What sort principle is it ?

Anatomical file sorting

One might say that the above problem is 10w The data is too big , It's slow even if you don't sort , This actually makes sense ,10w The data can be found at one time , Whether it's MySQL Occupation of memory buffer , Or the consumption of network bandwidth is very large , Then if I add limit 1000 Well ? The problem of network bandwidth must have been solved , Because the packet is smaller as a whole , however using filesort The problem is still unsolved , Seeing this, you may have questions ,using filesort Is it sorted in the file ? How is it sorted in the file ? Or I ask : What would you do if you were to design a ranking ? With these questions and thoughts, let's take a look using filesort What technical difficulties will be involved and how to solve them ?

  1. First of all, our user_id It's indexed , So I'll be there first user_id Retrieve our target data from the index tree , namely user_id=10086 The data of , But what we're looking for is friend_name and friend_addr Field , Unfortunately , The light on user_id The index cannot find these two field values

  2. So you need to go back to the table , adopt user_id Find the corresponding primary key in the primary key index tree ,ok, We found the first user_id=10086 Of friend_name and friend_addr Field

  3. What to do at this time ? It must be wrong to go straight back , Because I need to know friend_name Sort , How to arrange ? I haven't found all the data yet , Then you have to put the data in one place first , This place is sort_buffer, When you see the name, I think you should guess , you 're right ,sort_buffer This is the buffer used for sorting in this case , It should be noted here that each thread will have a separate sort_buffer, The main purpose of this is to avoid lock contention caused by multiple threads operating on the same block of memory .

  4. When the first piece of data friend_name and friend_addr Has been put in sort_buffer in , Of course it's not over , The synchronization steps will be repeated all the time , Until all user_id=10086 Of friend_name and friend_addr Put them in sort_buffer It only ends in

  5. sort_buffer The data in has been put in , It's time to sort , here MySQL Would be right friend_name Make a quick schedule , After a quick row ,sort_buffer in friend_name It's orderly

  6. Finally back to sort_buffer In front of 1000 strip , end .

Everything looks smooth , however sort_buffer It takes up memory space , This is awkward , Memory itself is not infinite , It must have an upper limit , Of course sort_buffer It can't be too small , Too small , It doesn't make much sense . stay InnoDB In the storage engine , This value defaults to 256K.

mysql> show variables  like 'sort_buffer_size';
| Variable_name    | Value  |
| sort_buffer_size | 262144 |

in other words , If you want to put sort_buffer The data in is greater than 256K Words , Then use in sort_buffer The way of medium and fast platoon will certainly not work , Now , You may ask :MySQL Can't it expand automatically according to the data size ? forehead ,MySQL It's a multithreading model , If each thread expands , Then give it to other functions buffer It's small ( such as change buffer etc. ), It will affect the quality of other functions .

Then you have to sort in another way , you 're right , This is the real file sorting , That is, the temporary file on the disk ,MySQL Will adopt the idea of merging and sorting , Divide the data to be sorted into several parts , After each piece of data is sorted in memory, it will be put into a temporary file , Finally, merge and sort the data of these sorted temporary files again ok 了 , The typical divide and rule principle , Its specific steps are as follows :

  1. First divide the data to be sorted , Each piece of data can be divided into sort_buffer in

  2. For each piece of data in sort_buffer In order , After sorting , Write to a temporary file

  3. When all the data is written to the temporary file , At this time, for each temporary file , The interior is orderly , But they are not a whole , The whole is not orderly , So the next step is to merge the data

  4. Let's assume that there is tmpX and tmpY Two temporary files , It's going to start from tmpX Read some data into memory , And then from tmpY Read part of the data into memory , Here you may wonder why it is a part rather than a whole or a single ? Because first, the disk is slow , So try to read more data into memory every time , But you can't read too much , Because there's more buffer The limits of space .

  5. about tmpX Suppose what comes in is tmpX[0-5] , about tmpY Suppose what comes in is tmpY[0-5], So just compare : If tmpX[0] < tmpY[0], that tmpX[0] It must be the smallest , then tmpX[1] and tmpY[0] Compare , If tmpX[1] > tmpY[0], that tmpY[0] It must be the second smallest ..., In this way, pairwise comparison can finally put tmpX and tmpY Merge into an ordered file tmpZ, Many of them tmpZ Merge again ..., Finally, all the data can be merged into an orderly large file .

File sorting is slow , Is there any other way

Through the sorting process above, we know , If the data to be sorted is large , exceed sort_buffer Size , Then you need to sort the files , File sorting involves batch sorting and merging , Very time consuming , The root cause of this problem is sort_buffer Not enough use , I don't know if you found out there's no us friend_name Need to sort , But they put friend_addr Also stuffed sort_buffer in , such The size of a single row of data is equal to friend_name The length of + friend_addr The length of , Can you make sort_buffer The only known friend_name Field , In this case , The overall utilization space is large , You don't have to use temporary files . you 're right , This is another sort optimization to be discussed next rowid Sort .

rowid The idea of sorting is not to put unnecessary data into sort_buffer in , Give Way sort_buffer Only the necessary data is retained in , So what do you think is the necessary data ? Only put friend_name? It's not going to work , After sorting ,friend_addr What do I do ? Therefore, the primary key id Put it in , After this line , adopt id Go back to the table again , Get friend_addr that will do , Therefore, its general process is as follows :

  1. according to user_id Indexes , Find the target data , Then go back to the table , Only id and friend_name In the sort_buffer in

  2. repeat 1 step , Until all the target data is sort_buffer in

  3. Yes sort_buffer According to the data in friend_name Field to sort

  4. Sort according to id Go back to the table again and find friend_addr return , Until we return to 1000 Data , end .

In fact, there are several points to pay attention to :

  1. In this way, you need to go back to the table twice

  2. sort_buffer Although small , But if the amount of data itself is still large , It should still be sorted by temporary files

So here comes the question , Two ways ,MySQL How to choose ? We have to judge which way to go according to certain conditions , This condition is to enter sort_buffer The length of a single line , If the length is too large (friend_name + friend_addr The length of ), Will adopt rowid This way, , Otherwise, the first , The standard of length is based on max_length_for_sort_data To the , The default value is 1024 byte :

mysql> show variables like 'max_length_for_sort_data';
| Variable_name          | Value |
| max_length_for_sort_data | 1024  |

I don't want to go back to my watch , Don't want to sort again

In fact, no matter which of the above methods , They all need Back to the table + Sort , The reason for returning to the table is that there is no target field on the secondary index , Sorting is because the data is not ordered , If the secondary index has a target field and is already sorted , That's the best of both worlds .

you 're right , It's a joint index , We just need to build a (user_id,friend_name,friend_addr) Joint index of , So I can get the target data through this index , also friend_name It's already sorted , As well as friend_addr Field , One move , There is no need to return the form , No need to sort again . Therefore, for the above sql, Its general process is as follows :

  1. Find... Through the federated index user_id=10086 The data of , Then read the corresponding friend_name and friend_addr Field directly returns , because friend_name It's already sorted , No need for extra treatment

  2. Repeat the first step , Follow the leaf node and look back , Until you find the first one that is not 10086 The data of , end .

Although joint index can solve this problem , However, in practical application, we must not blindly establish , It is necessary to judge whether it is necessary to establish... According to the actual business logic , If you don't often have similar queries , You don't have to create , Because the federated index will take up more storage space and maintenance overhead .


  1. about order by When the index is not used , At this time explain in Extra The field will probably appear using filesort The word

  2. appear using filesort You don't have to be too flustered , If the amount of data is small , For example, there are dozens of data , So in sort buffer It is also fast to use fast platoon in

  3. If the amount of data is large , More than the sort buffer Size , So it is necessary to sort temporary files , That is, merge sort , This part is made up of MySQL The optimizer decides

  4. If there are many fields to query , Want to try to avoid using temporary file sorting , You can try setting max_length_for_sort_data The size of the field , Make it less than the sum of the length of all query fields , This may avoid , But there will be one more operation to return to the table

  5. In real business , We can also create a joint index for the combination of fields that we often query , In this way, there is no need to go back to the table or sort separately , But federated indexing takes up more storage and overhead

  6. When querying a large amount of data , Try to divide into batches , advance explain To observe sql Your execution plan is a good choice .


It's not easy to create , Everyone 「 Three even 」 Is the greatest support for the author , It is also the author's greatest creative power , See you next time .


Search on wechat 【 Pretend to know programming 】, Continue to share technical dry goods  

Past highlights :

本文为[Pretend to know programming]所创,转载请带上原文链接,感谢

  1. 数据结构实验八 领会图的两种主要储存结构和图的基本运算算法设计
  2. Hibernate数据校验简介
  3. Il a dépensé 270 000 yuans pour soulever Xiaopeng p7 et a parcouru 3 627 km. Le propriétaire du véhicule a partagé 6 avantages et inconvénients.
  4. 阿里蚂蚁花呗团队面试题:spring+分布式+jvm+session+redis
  5. 【Java入门100例】14.字符串排序——compareTo()
  6. 【Java入门100例】13.修改文件扩展名——字符串替换
  7. Leetcode 79. Word Search [C + + / java detailed problem]
  8. Introduction à la vérification des données hibernantes
  9. Expérience de la structure des données
  10. Spring cloud gateway practice 2: more routing configuration methods
  11. Java network programming - summary overview
  12. 基于语法树的 Java 代码自动化插桩
  13. 云原生 Spring Boot 应用配置 Prometheus + Grafana 监控(保姆级)
  14. Spring cloud gateway practice 2: more routing configuration methods
  15. Jenkins file one line of code to deploy. Net program to k8s
  16. Java network programming - summary overview
  17. Cloud Native Spring Boot application configuration Prometheus + grafana Monitoring (baby - sitter)
  18. Insertion automatique de code Java basée sur l'Arbre syntaxique
  19. Le SUV phare de Xiaopeng, Xiaopeng G9, a fait ses débuts au salon de l'automobile et s'est tenu en position C dans la nouvelle force?
  20. Docker 从入门到实践系列四 - Docker 容器编排利器 Docker Compose
  21. 6年老猿带你掌握Spring Boot实现定时任务的动态增删启停
  22. disruptor笔记之六:常见场景,java教程从入门到精通pdf百度云
  23. Pourquoi InnoDB n'utilise - t - il pas un cache LRU naïf?
  24. Java Reflection (2): quelques opérations de base de reflection
  25. 6年老猿帶你掌握Spring Boot實現定時任務的動態增删啟停
  26. Les singes âgés vous permettent de maîtriser le démarrage et l'arrêt dynamiques des tâches programmées par Spring boot
  27. Docker From Beginning to Practice Series IV - docker Container chorégraphe Clean docker Composition
  28. 编写 java 程序,为家用电脑 ipv6 自动更新 goddy dns 记录(ddns)
  29. java jvm-old gc耗时几十s,导致系统告警
  30. Disruptor note 6: scénario commun, tutoriel Java de l'introduction à la maîtrise du PDF Baidu Cloud
  31. 编写Java程序启动脚本最佳实践
  32. How to get the correct Linux user's documents, music videos and other directories?
  33. Java JVM Old GC prend des dizaines de s, ce qui provoque une alarme système
  34. Écrivez un programme Java pour mettre à jour automatiquement les enregistrements DNS goddy (ddns) pour l'ordinateur domestique IPv6
  35. 編寫Java程序啟動脚本最佳實踐
  36. Meilleures pratiques pour écrire des scripts de démarrage de programmes Java
  37. Notes sur springcloud Eureka
  38. Ajout, suppression et modification simples de mybatis
  39. MySQL Learning - Logging System Redo log and Bin log
  40. Springboot Common comments | @ configuration
  41. Mécanisme d'expiration du cache redis et d'élimination de la mémoire
  42. Analyse concise du code source redis 01 - configuration de l'environnement
  43. Java - carte mémoire de l'objet
  44. Redis source Concise Analysis 02 - SDS String
  45. Why did docker lose to kubernetes? Docker employee readme!
  46. Spring cloud gateway practice 2: more routing configuration methods
  47. Principe de mise en œuvre ultime du mécanisme de concurrence Java sous - jacent
  48. [démarrer avec Java 100 exemples] 13. Modifier l’extension de fichier - remplacement de chaîne
  49. Java期末作业——王者荣耀的洛克王国版游戏
  50. Elasticsearch聚合学习之五:排序结果不准的问题分析,阿里巴巴java性能调优实战
  51. Java期末作業——王者榮耀的洛克王國版遊戲
  52. Java final work - King's Glory Rock Kingdom Game
  53. 【网络编程】TCP 网络应用程序开发
  54. 【网络编程入门】什么是 IP、端口、TCP、Socket?
  55. 【網絡編程入門】什麼是 IP、端口、TCP、Socket?
  56. [Introduction à la programmation réseau] qu'est - ce que IP, port, TCP et socket?
  57. [programmation réseau] développement d'applications réseau TCP
  58. [Java Basics] comprendre les génériques
  59. Dix outils open source que les architectes de logiciels Java devraient maîtriser!!
  60. java架构之路(多线程)synchronized详解以及锁的膨胀升级过程,mysql数据库实用教程pdf