In depth understanding of netty: viewing netty traffic control from occasional downtime

Vivo Internet technology 2021-10-14 06:58:39
depth understanding netty viewing netty


author :vivo Internet server team -Zhang Lin


One 、 Business background


At present, a large number of message push will be used in the use scenario of the mobile terminal ,push Messages can help operators achieve operational goals more efficiently ( For example, push marketing activities or reminders to users APP new function ).


For the push system, the following two features are required :

  • The message is sent to the user in seconds , No delay , Support million push per second , Single machine million long connection .

  • Support notification 、 Text 、 Customize message transparent and other presentation forms . It is for the above reasons , It brings challenges to the development and maintenance of the system . The following figure is a brief description of the push system (API-> Push module -> mobile phone ).



Two 、 The problem background


Stability test of long connection cluster in push system 、 After the stress test stage runs for a period of time, a process will hang up randomly , Low probability ( The frequency is about once a month ), This will affect the timeliness of some client messages .


Long... In the push system Connecting nodes (Broker System ) Is based on Netty Development , This node maintains the long connection between the server and the mobile terminal , After an online problem occurs , add to Netty Troubleshoot memory leak monitoring parameters , Observed for many days, but no problem was found .


from The long connection node is Netty Development , For the convenience of the reader , Here is a brief introduction Netty.


3、 ... and 、 Netty Introduce


Netty It's a high performance 、 Asynchronous event driven NIO frame , be based on Java NIO Provided API Realization . It provides the right TCP、UDP And file transfer support , As the most popular NIO frame ,Netty In the field of Internet 、 Big data distributed computing 、 Game industry 、 Communication industry has been widely used ,HBase,Hadoop,Bees,Dubbo And other open source components are also based on Netty Of NIO Frame building .


Four 、 Problem analysis


4.1 guess


The initial conjecture was caused by the long connection number , But after checking the log 、 Analysis of the code , It is found that this is not the reason .


Number of long connections :39 ten thousand , Here's the picture :



Every channel Byte size 1456, Press 40 10000 long connection calculation , It will not cause excessive memory .


4.2 see GC journal


see GC journal , Frequently before the process is found to hang up full GC( frequency 5 Minutes at a time ), But the memory is not reduced , Suspected out of heap memory leak .


4.3 analysis heap The memory of


ChannelOutboundBuffer Objects account for nearly 5G Memory , The cause of leakage can be basically determined :ChannelOutboundBuffer Of entry Too many counts lead to , see ChannelOutboundBuffer The source code can analyze , yes ChannelOutboundBuffer Data in .


Didn't write to , Resulting in a backlog ;

ChannelOutboundBuffer Inside is a linked list structure .



4.4 The analysis data from the above figure is not written out , Why does this happen ?


The code actually determines whether the connection is available (Channel.isActive), And the timeout connection will be closed . From historical experience , This happens when the connection is half open ( The client is shut down unexpectedly ) There are many cases --- There is no problem if both parties do not conduct data communication .


According to the above conjecture , The test environment is reproduced and tested .

1) Simulate the client cluster , And establish a connection with the long connection server , Set the firewall of the client node , Simulate the scenario of server and client network exceptions ( That is to simulate Channel.isActive Successful call , But the data can't be actually sent out ).


2) Reduce off heap memory , Continue to send test messages to previous clients . Message size (1K about ).


3) according to 128M Memory to calculate , Actually call 9W It will appear many times .



5、 ... and 、 Problem solving


5.1 Enable autoRead Mechanism


When channel When not writable , close autoRead;

public void channelReadComplete(ChannelHandlerContext ctx) throws Exception { if (!ctx.channel().isWritable()) { Channel channel = ctx.channel(); ChannelInfo channelInfo = ChannelManager.CHANNEL_CHANNELINFO.get(channel); String clientId = ""; if (channelInfo != null) { clientId = channelInfo.getClientId(); }
LOGGER.info("channel is unwritable, turn off autoread, clientId:{}", clientId); channel.config().setAutoRead(false); }}


Turns on when data is writable autoRead;

@Overridepublic void channelWritabilityChanged(ChannelHandlerContext ctx) throws Exception{ Channel channel = ctx.channel(); ChannelInfo channelInfo = ChannelManager.CHANNEL_CHANNELINFO.get(channel); String clientId = ""; if (channelInfo != null) { clientId = channelInfo.getClientId(); } if (channel.isWritable()) { LOGGER.info("channel is writable again, turn on autoread, clientId:{}", clientId); channel.config().setAutoRead(true); }}


explain :


autoRead The function of is more accurate rate control , If you open it Netty Will help us register to read Events . When a read event is registered , If the network is readable , be Netty It will start from channel Reading data . Then if autoread Turn it off , be Netty Will not register read Events .


In this way, even if the data sent by the opposite end comes, the read event will not be triggered , So it won't start from channel Read to data . When recv_buffer Full hour , No more data will be received .


5.2 Set high and low water levels

serverBootstrap.option(ChannelOption.WRITE_BUFFER_WATER_MARK, new WriteBufferWaterMark(1024 * 1024, 8 * 1024 * 1024));
notes : High and low water levels cooperate with the following isWritable Use


5.3 increase channel.isWritable() The judgment of the


channel Whether it is available except for verification channel.isActive() I need to add channel.isWrite() The judgment of the ,isActive Just make sure the connection is active , And whether it can be written by isWrite To decide .

private void writeBackMessage(ChannelHandlerContext ctx, MqttMessage message) { Channel channel = ctx.channel(); // increase channel.isWritable() The judgment of the  if (channel.isActive() && channel.isWritable()) { ChannelFuture cf = channel.writeAndFlush(message); if (cf.isDone() && cf.cause() != null) { LOGGER.error("channelWrite error!", cf.cause()); ctx.close(); } }}
notes : isWritable Can control ChannelOutboundBuffer, Don't let it expand indefinitely . The mechanism is to use the set channel Judging by high and low water levels .


5.4 Problem verification


Test after modification , Send to 27W It doesn't report an error once ;



6、 ... and 、 Solution analysis


commonly Netty The data processing flow is as follows : The read data is processed by the business thread , Send it after processing ( The whole process is asynchronous ),Netty In order to improve the throughput of the network , At the business level and socket There's an extra ChannelOutboundBuffer.


Calling channel.write When , All the written data is not actually written to socket, It's about writing first ChannelOutboundBuffer. When calling channel.flush When you really want to socket Write . Because there's one in the middle buffer, There is a rate match , And this buffer Still unbounded ( Linked list ), That is, if you don't control channel.write The speed of , There will be a lot of data in this buffer Pile up in , If you encounter socket When you can't write data (isActive At this time, the judgment is invalid ) Or write slowly .


The most likely result is the depletion of resources , And if the ChannelOutboundBuffer Deposit is

DirectByteBuffer, This will make the problem more difficult to troubleshoot .


The process can be abstracted as follows :



From the above analysis, we can see that , Step one is written too fast ( Too fast to handle ) Or the downstream can't send data, which will cause problems , This is actually a rate matching problem .


7、 ... and 、Netty Source code description


Above high water level

When ChannelOutboundBuffer After the capacity of exceeds the set threshold of high water level ,isWritable() return false, Set up channel Don't write (setUnwritable), And trigger fireChannelWritabilityChanged().

private void incrementPendingOutboundBytes(long size, boolean invokeLater) { if (size == 0) { return; }
long newWriteBufferSize = TOTAL_PENDING_SIZE_UPDATER.addAndGet(this, size); if (newWriteBufferSize > channel.config().getWriteBufferHighWaterMark()) { setUnwritable(invokeLater); }}private void setUnwritable(boolean invokeLater) { for (;;) { final int oldValue = unwritable; final int newValue = oldValue | 1; if (UNWRITABLE_UPDATER.compareAndSet(this, oldValue, newValue)) { if (oldValue == 0 && newValue != 0) { fireChannelWritabilityChanged(invokeLater); } break; } }}


Below low water level

When ChannelOutboundBuffer After the capacity of is lower than the set threshold of low water level ,isWritable() return true, Set up channel Can write , And trigger fireChannelWritabilityChanged().

private void decrementPendingOutboundBytes(long size, boolean invokeLater, boolean notifyWritability) { if (size == 0) { return; }
long newWriteBufferSize = TOTAL_PENDING_SIZE_UPDATER.addAndGet(this, -size); if (notifyWritability && newWriteBufferSize < channel.config().getWriteBufferLowWaterMark()) { setWritable(invokeLater); }}private void setWritable(boolean invokeLater) { for (;;) { final int oldValue = unwritable; final int newValue = oldValue & ~1; if (UNWRITABLE_UPDATER.compareAndSet(this, oldValue, newValue)) { if (oldValue != 0 && newValue == 0) { fireChannelWritabilityChanged(invokeLater); } break; } }}


8、 ... and 、 summary


When ChannelOutboundBuffer After the capacity of exceeds the set threshold of high water level ,isWritable() return false, Indicates that the message is generated , Need to reduce write speed .


When ChannelOutboundBuffer After the capacity of is lower than the set threshold of low water level ,isWritable() return true, Indicates that there are too few messages , Need to increase write speed . After modification through the above three steps , No problems occurred during the deployment of online observation for half a year .


END

Guess you like

版权声明
本文为[Vivo Internet technology]所创,转载请带上原文链接,感谢
https://javamana.com/2021/10/20211002145641033y.html

  1. Day17 Java Foundation
  2. Day18 Java Foundation
  3. Linux installe JDK 1.8 et configure les variables d'environnement
  4. Tutoriel d'utilisation Maven super détaillé
  5. Spring boot reads project parameter configuration
  6. Docker installing rocketmq
  7. Java Zero Basic small white Beginner must make a summary of issues (recommended Collection) Chapitre 1
  8. Manuel pour vous apprendre à utiliser le développement Java pour générer des documents PDF en ligne
  9. 40 + comment les femmes s'habillent - elles pour montrer leur jeunesse?Un manteau et une jupe vous donnent un look haut de gamme tout au long de l'automne et de l'hiver
  10. Tutoriel d'installation Ubuntu 16.04 / Hadoop 3.1.3Configuration autonome / pseudo - distribuée
  11. L'apprentissage le plus détaillé de springboot à l'échelle du réseau - day01
  12. L'apprentissage le plus détaillé de springboot sur le Web - day02
  13. L'apprentissage le plus détaillé de springboot sur le Web - day03
  14. L'apprentissage le plus détaillé de springboot sur le Web - day04
  15. Tutoriel d'utilisation Maven super détaillé
  16. L'apprentissage le plus détaillé de springboot sur le Web - day05
  17. L'apprentissage le plus détaillé de springboot sur le Web - day06
  18. L'apprentissage le plus détaillé de springboot sur le Web - day07
  19. Introduction to JavaScript - write a photo album for your girlfriend
  20. [Hadoop 3. X] HDFS storage type and storage strategy (V) overview
  21. L'apprentissage le plus détaillé de springboot sur le Web - day08
  22. Introduction à la page Web de rabbitmq (3)
  23. No Converter found for return value of type: class java.util.arraylist Error Problem
  24. (16) , spring cloud stream message driven
  25. Que faut - il apprendre de l'architecture des microservices Spring Cloud?
  26. Résolution: erreur: Java: distribution cible invalide: 11problème d'erreur
  27. Springboot démarre en une minute et sort de l'enfer de la configuration SSM!
  28. Maven - un outil de gestion essentiel pour les grands projets d'usine, de l'introduction à la maîtrise![️ Collection recommandée]
  29. ️ Push to interview in Large Factory ᥧ - - Spring Boot Automatic Assembly Principle
  30. [️ springboot Template Engine] - thymeleaf
  31. Springboot - MVC Automatic configuration Principle
  32. Mybatis reverse engineering and the use of new version mybatisplus 3.4 reverse engineering
  33. Base de données MySQL - transactions et index
  34. Sécurité du printemps - [authentification, autorisation, déconnexion et contrôle des droits]
  35. Moteur de base de données InnoDB diffère de myisam
  36. Swagger - [springboot Integrated Swagger, configure Swagger, configure scan Interface, configure API Group]
  37. Cadre de sécurité Shiro - [QUICKstart, login Block, User Authentication, request Authorization]
  38. [Introduction à Java] installation de l'environnement de développement - Introduction à Java et construction de l'environnement
  39. 【 linux】 notes d'utilisation tmux
  40. MySQL + mybatis paging query - database series learning notes
  41. Usage relations and differences of count (1), count (*) and count (a field) in MySQL
  42. 2021 Ali Java advanced interview questions sharing, Java Architect interview materials
  43. Mybatis - dynamic SQL statement - if usage - MySQL series learning notes
  44. [go to Dachang series] deeply understand the use of where 1 = 1 in MySQL
  45. [secret room escape game theme ranking list] Based on spring MVC + Spring + mybatis
  46. Redis log: the killer mace of fearless downtime and rapid recovery
  47. 5 minutes to build redis cluster mode and sentinel mode with docker
  48. Java小白入门200例106之遍历ArrayList的几种方式
  49. Java小白入门200例105之Java ArrayList类
  50. Java小白入门200例104之JDK自带记录日志类logging
  51. Practice of high availability architecture of Tongcheng travel network based on rocketmq
  52. Chapter 9 - Linux learning will - file archiving and compression tar --- zip
  53. Java小白入門200例104之JDK自帶記錄日志類logging
  54. JDK avec journalisation de classe dans 200 cas 104
  55. Java ArrayList Class for Introduction to Java LITTLE WHITE 200 example 105
  56. Plusieurs façons de traverser ArrayList à partir de 200 exemples 106
  57. Provectus / Kafka UI: open source Apache Kafka's Web GUI Graphical interface management tool
  58. Design pattern series: Singleton pattern
  59. Java小白入門200例105之Java ArrayList類
  60. Understanding Java record types