Deep understanding of Web protocol (3): http 2

Vivo Internet technology 2021-02-23 15:13:44
deep understanding web protocol http

This article will introduce in detail http2 All aspects of the agreement , The knowledge points are as follows :

  • HTTP 2 Connection establishment

  • HTTP 2 The relationship between frame and stream in a video

  • HTTP 2 The secret of traffic saving in the Internet :HPACK Algorithm

  • HTTP 2 Agreement Server Push The ability of

  • HTTP 2 Why flow control ?

  • HTTP 2 Problems with the agreement

One 、HTTP 2 Connection establishment

Different from the stereotype of many people HTTP 2 The agreement itself does not stipulate that it must be based on TLS/SSL above , In fact, with ordinary TCP The connection can also be done HTTP 2 Connection establishment . But now for the sake of safety, all browsers on the market only support the browser based on TLS/SSL Of HTTP 2 agreement . In short, we can build on TCP Connect the above HTTP 2 The agreement calls it H2C, And build on TLS/SSL What is above the agreement can be understood as H2 了 .

Enter the command :

tcpdump -i eth0 port 80 and host -w h2c.pcap &

And then use curl Access based on TCP Connect , That is to say port 80 Port of HTTP 2 Site ( There is no way to access it with a browser , Because browsers don't allow )

curl --http2 -v

In fact, you can have a general understanding of the connection establishment process by looking at the log :


We will TCPDump Coming out pcap Copy files to local , And then use Wireshark Open it and restore the whole HTTP 2 Connection establishment message :

First of all HTTP 1.1 Upgrade to HTTP 2 agreement


Then the client also needs to send a “ Magic frame ”:


Finally, you need to send a setup frame :


after , Let's take a look , be based on TLS Of HTTP 2 How the connection is made , Considering encryption and other factors , We need to do some preparatory work ahead of time . Can be in Chrome Download the plug-in from .


Then open any web page, as long as you see that the lightning icon is blue, it means that the site supports HTTP 2; Otherwise, it doesn't support . Here's the picture :


take Chrome Browser's TLS/SSL Information like that Output to a log file , Additional system variables need to be configured , As shown in the figure :


And then we put our Wireshark in SSL Related settings are also configured .


So the browser is doing TLS When the protocol interacts , Relevant encryption and decryption information will be written to this log In file , our Wireshark I'll use this log The information in the file to decrypt our TLS message .

With the above foundation , We can start to analyze based on TLS Connected HTTP 2 It's agreed . For example, we visit tmall The site of  Then open our Wireshark.


Take a look at the label and you can see that , yes TLS After the connection is established Then continue sending magic frames and setting frames , Only on behalf of HTTP 2 The connection is really established . Let's see TLS The message client hello This information :


Among them the alpn Agreement information Which two protocols can the client accept .server hello The news Let's make it clear that We're going to use H2 agreement .


 This is also HTTP 2 comparison spdy One of the most important advantages of the agreement :spdy The agreement is strongly dependent on TLS/SSL, The server has no choice . and HTTP 2 The protocol will be carried when the client makes a request alpn This extension , In other words, when the client sends a request, it will tell the server which protocols I support . So that the server can choose , If I need to go TLS/SSL.

Two 、HTTP 2 The relationship between frame and stream in a video

Simply speaking ,HTTP 2 That is to simulate the transport layer on the application layer TCP in “ flow ” The concept of , So it solved HTTP 1.x The problem of queue congestion in the protocol , stay 1.x Agreement ,HTTP Protocols are made up of messages , Same article TCP Connected to the , The response to the previous message did not come back , Subsequent messages cannot be sent . stay HTTP 2 in , Removed this restriction , The so-called “ news ” Defined as “ flow ”, The order between streams can be disordered , But the order of the frames in the stream can't be disordered . Pictured :

 That is to say, in the same line TCP Connected to the , There can be more than one stream flow , These streams One by one frame The frame of , There is no sequential relationship between flows , But there is a sequence of frames within each stream . Take a look at this picture 135 And the numbers are actually stream id,WebSocket Although there is also the concept of frame , But because WebSocket There is no stream id, therefore Websocket It has no multiplexing function .HTTP 2 Because of the stream id So there's the ability to multiplex . Can be in one TCP There is... On the connection n A flow , It means that the server can process concurrently n And then respond to the same request at the same time TCP Connected to the . Of course, this is in the same line TCP Connect to transmit n individual stream There are limits to our ability to , stay HTTP 2 When the connection is established ,setting frame This setting information will be included in . Such as below When visiting tmall's website , The browser carries setting The message in the frame is marked The browser HTTP 2 The client of can support concurrent, the biggest stream is 1000.

On the same day, the cat server returns this setting Frame response time , It tells the browser , The maximum concurrency I can support stream by 128.


meanwhile We also need to know ,HTTP 2 Agreement flow id An odd number represents a stream initiated by the client , Even number represents the flow initiated by the server ( It can be understood as the active push of the server ).

3、 ... and 、 HTTP 2 The secret of traffic saving in the Internet :HPACK Algorithm

Compared with the HTTP 1.x agreement ,HTTP 2 The protocol also makes a great improvement in traffic consumption . It's mainly divided into three parts : Static dictionary , Dynamic dictionary , And Huffman code . You can install the following tools to detect The effect on traffic savings :

apt-get install nghttp2-client

And then you can detect that some of them are turned on HTTP 2 The site of , Basically, the traffic saved is 100% 25 rise , If you visit frequently There will be more :


For traffic consumption , Actually HTTP 2 comparison HTTP 1.x The biggest improvement of the protocol is HTTP 2 We can deal with HTTP My head is compressed , And in the past HTTP 1.x Agreement ,gzip It's impossible to wait for header To compress , Especially for the vast majority of requests , Actually header The largest proportion of .

Let's first look at the static dictionary , As shown in the figure :


It's not hard to understand , It's just that we use our usual HTTP Head , Use fixed numbers to represent , Of course, it can save traffic . What we should pay attention to here is There are some value The situation is complicated header, their value There is no static dictionary . such as cache-control This cache control field , There are too many values behind this to be solved by static dictionary , It's just Hoffman coding . The figure below shows HPACK This compression algorithm Play the role of saving traffic :


for example , Let's take a look at 62 This Head ,user-agent It means browser , Generally, the header information will not change when we request , So in the end hpack After algorithm optimization When it's retransmitted later You just need to transmit 62 This number represents what it means .

Another example is the picture below :


It's the same , When multiple requests are sent continuously , Most of the time, the only thing that changes is path, The rest of the header information is unchanged , So based on this scenario , In the end, it's just path This is a header message .

And finally, let's see hpack The core of the algorithm : Huffman code . The core idea of Huffman coding is to use shorter coding when the frequency is higher , Those with lower frequency use longer encoding (HTTP 2 The predecessor of the agreement spdy The protocol uses dynamic Huffman coding , and HTTP 2 The protocol chooses static Huffman coding ).


Let's take a few examples :


For example, this header frame , Pay attention to this method:get The head information of . because method:get The index value in the static index table is 2. For this kind of key and value All values in the index table , We use one byte, which is 8 individual bit To mark , The first bit Fixed for 1, be left over 7 Bits are used to represent the values in the index table , here method:get The value of the index table is 2, So this value is 1000 0010, The conversion 16 Hexadecimal is 0x82.


 Look at another group ,key In the index table ,value Not in the index table header Example .


about key In the index table ,value Not in the index table , Fixed is 01 The first byte , Back 6 individual bit(111010 Conversion to decimal is 58) Is the value of the static index , user-agent In the index index The value of is 58 Plus 01 At the beginning 2 individual bit To convert to binary is 01111010,16 It's just 7a 了 . And then look at the second byte ,0xd4,0xd4 To convert to binary is 1 101 0100, The first bit It's a Huffman code , hinder 7 individual bit This key-value Of value It takes a few bytes to represent , Here is 101 0100 The conversion 10 Hexadecimal is 84, That is to say user-agent hinder value need 84 In bytes , Let's count the bytes in the figure below 16*5+ first row d4 hinder 4 Bytes , Exactly equal to 84 Bytes .

Finally, one more key and value Examples that are not in the index table .


Four 、HTTP 2 Agreement Server Push The ability of

We mentioned before ,H2 comparison H1.x The biggest improvement of the agreement is H2 It can be on a single line TCP On the basis of connection Transmit at the same time n individual stream. To avoid H1.x The congestion problem of the team leader in the protocol . In fact, in most front-end pages , We can also use H2 Agreed Server Push Ability Further improve the loading speed of the page . For example, we usually use a browser to access a Html When the page is , Only when html The page returns to the browser , The browser kernel resolves to this Html Page with CSS perhaps JS And so on , The browser will send the corresponding CSS perhaps JS request , When CSS and JS After coming back The browser will render further , Such a process usually causes the browser to be on the white screen for a period of time, thus reducing the user experience . With H2 After the agreement , When a browser accesses a Html When the page reaches the server , The server can actively push the corresponding CSS and JS Content to the browser , So you can omit the browser and resend CSS and JS The requested step .

Some people are right Server Push There is a degree of misunderstanding , I think this technology can let the server send “ notice ”, Even with WebSocket Compare . This is not the case ,Server Push It just saves the browser the process of sending requests . Only when “ If you don't push this resource , The browser will request this resource ” When , The browser will use the pushed content . Otherwise, if the browser itself does not request a resource , Then pushing this resource will only consume bandwidth in vain . Of course, if the client is communicating with the server instead of the browser , that HTTP 2 The agreement can be completed naturally push The function of push . So they all use HTTP 2 In the case of an agreement , Is it the client or the browser that communicates with the server There are some differences in function .


Now to demonstrate this process , Let's write a piece of code . Considering browser access HTTP 2 The site has to be built on TLS Connect above , We first need to generate the corresponding certificate and secret key .


Then open HTTP 2, On receiving Html Take the initiative when you ask push Html It is quoted in CSS file .

package main
import (
func main() {
e := echo.New()
e.Static("/", "html")
// It is mainly used to verify whether it is successfully opened http2 Environmental Science 
e.GET("/request", func(c echo.Context) error {
req := c.Request()
format := `
Protocol: %s<br>
Host: %s<br>
Remote Address: %s<br>
Method: %s<br>
Path: %s<br>
return c.HTML(http.StatusOK, fmt.Sprintf(format, req.Proto, req.Host, req.RemoteAddr, req.Method, req.URL.Path))
// Upon receipt of html On request At the same time, take the initiative push html It is quoted in css file , There's no need to wait for the browser to make a request 
e.GET("/h2.html", func(c echo.Context) (err error) {
pusher, ok := c.Response().Writer.(http.Pusher)
if ok {
if err = pusher.Push("/app.css", nil); err != nil {
println("error push")
return c.File("html/h2.html")
e.StartTLS(":1323", "cert.pem", "key.pem")

then Chrome When visiting this page , look down NetWork panel :


You can see this CSS file We take the initiative push Over here . Let's take a look at Wireshark.


You can see it stream id by 13 Of Is a client initiated request , because id It's singular , In this stream in , There is still push_promise frame , This frame is sent to the browser by the server , Take a look at his details .


You can see that this frame is used to tell the browser , I take the initiative push Which resource is it for you , Of this resource stream-id yes 6. In the picture, we also see one stream-id by 6 Of   data In the transmission , This is the server initiative push Coming out CSS file . Come here , Once complete Server Push The interaction is over .

But in practical online applications Server Push When The challenge is far greater than ours demo It's a lot of complexity . First of all, most cdn supplier ( Unless you build it yourself cdn) Yes Server Push My support is limited . It's impossible for us to make every resource request go directly to our source server , Most of the static resources are in front of CDN in . secondly , For static resources , We also have to consider the impact of caching , If it's a static resource request sent by the browser itself , The browser can decide whether I really need to request this resource according to the cache state , and Server Push It was initiated by the server , In most cases, the server does not know whether the cache of this resource has expired . Of course, it can be received in the browser push Promise After that frame , Query your own cache state and initiate RST_STREAM frame , Tell the server that I have a cache for this resource , There's no need to continue sending , But you can't guarantee that RST_STREAM On reaching the server , Server initiative push Out of the data The frame hasn't been sent out yet . So there will still be a certain amount of bandwidth waste . On the whole ,Server Push It's also a very effective way to improve the front-end user experience , Used Server Push in the future Browser performance metrics idle indicators Generally, it can improve 3-5 times ( After all, browsers don't have to wait for parsing Html Ask later CSS and JS 了 ).

5、 ... and 、HTTP 2 Why flow control ?

Many people don't understand , Why? TCP The transport layer has implemented flow control , Our application layer HTTP 2 And flow control . Let's take a look at a picture .


stay HTTP 2 Agreement , Because we support multiplexing , That means we can send multiple stream In the same article TCP Connecting , Above picture , Each color represents one stream, You can see We have 4 Kind of stream, every last stream And then there is n individual frame, This is very dangerous , Suppose we use multiplexing in the application layer , Will appear n individual frame At the same time, it is continuously sent to the target server , When the traffic reaches its peak, it will trigger TCP Congestion control , So that the following frame It's all blocked up , The server response is too slow .HTTP 1.x This problem does not exist because multiplexing is not supported in . And we mentioned it many times before , A request from the client to the server goes through many proxy servers , The memory size and network condition of these proxy servers may be different , So do a flow control on the application layer to avoid triggering as much as possible TCP It's necessary to control the flow of water . stay HTTP 2 Traffic control strategy in the protocol , Follow these principles :

  1. Both client and server have the ability of flow control .

  2. The transmitter and receiver can set the flow control capability independently .

  3. Only data Frames need flow control , other header Frame or push promise Frames and so on don't need to be .

  4. Flow control capability only for TCP Both ends of the connection , Even if there is a proxy server in the middle , It's not transmitted to the source server .

Visit Zhihu's website to take a look at the package .


These signs window_update The frame of It's called flow control frame . Let's open one at will , You can see the frame size that the traffic control frame tells us .

Smart as you can think of , since HTTP 2 We can do flow control , You can do it, too . Let's say in HTTP 1.x Agreement , We visited one Html page , There will be JS and CSS There are pictures and other resources , We send these requests at the same time , But these requests don't have the concept of priority , Who goes out first and who comes back first is unknown ( Because you don't know that either CSS and JS Is the request on the same line TCP Connected to the , Since it's scattered in different places TCP in , So it's uncertain which is fast or slow ), But from a user experience perspective , sure CSS The highest priority , And then there was JS, Finally, the picture , This can greatly reduce the browser's white screen time . stay HTTP 2 in To achieve this ability . For example, we visit sina The site of , Then grab the bag and you can see :

You can have a look at this CSS  The priority of the frame :

JS The priority of the

And finally gif The priority of the picture , You can see that this priority is the lowest .

With weight This keyword identifies the priority , The server knows which requests need to be responded first and sent first response, Which requests can be sent later . In this way, the overall experience provided by the browser will become better .

6、 ... and 、HTTP 2 Problems with the agreement

be based on TCP perhaps TCP+TLS Of HTTP 2 agreement There are still a lot of problems , such as : The handshake time is too long , If it is based on TCP Of HTTP 2 agreement , Then shake hands at least three times , If it is TCP+TLS Of HTTP 2 agreement , except TCP You have to go through TLS A lot of handshakes (TLS1.3 It can be done only by 1 The second handshake ). Each handshake needs to send a message and then receive the message ack To shake hands for the next time , In the weak network environment, we can imagine that the efficiency of establishing this connection is extremely low . Besides ,TCP Protocol born team leader The problem has been puzzling HTTP 21.x The protocol and HTTP 2 agreement . Let's take a look at Google spdy The propaganda map of , We can understand the essence of congestion more accurately :

Figure 1 is easy to understand , We're sending out at the same time with multiplexing support 3 individual stream, And then pass by TCP/IP agreement Send to server side , then TCP The protocol sends these packets back to our application layer , Notice that there's a condition here that , Send packets in the same order as receive packets . As you can see in the figure above, the order of the blocks is the same , But if you come across the situation in the figure below , For example, these packets happen to lose the first red packet , Then, even if the subsequent packets have been sent to the server's machine , We can't deliver data to our application layer protocol right away , because TCP The protocol stipulates that the order of receiving should be consistent with that of sending , Now that the red packet is missing , Then subsequent packets can only be blocked in the server , Wait until the red packet arrives at the server after being retransmitted , And then pass these packets to the application layer protocol .

TCP In addition to some of the defects mentioned above , There's another problem TCP The implementer of the protocol is at the operating system level , We have no language , Include Java,C,C++,Go wait The so-called exposure to the outside world Socket Programming interface (API) The ultimate implementer is actually the operating system itself . Let the operating system upgrade itself TCP The implementation of the protocol is very, very difficult , Besides, so many devices in the whole Internet want to be implemented as a whole TCP The upgrade of the agreement is an unrealistic thing (IPV6 There are also reasons for the slow upgrading of the agreement ). Based on the above problems , Google is based on udp The protocol encapsulates a layer of quic agreement ( In fact, many of them are based on udp Application layer protocol of protocol , They are all partially implemented on the application layer TCP Several functions of the protocol ), To replace HTTP 21.x-HTTP 2 Medium TCP agreement .

We turn on Chrome Medium quic Protocol switch :

Then visit youtube( The domestic b In fact, the station also supports ).


It can be seen that we have supported quic It's agreed . Why is this option in Chrome The browser is off by default , It's easy to understand , This quic The agreement was actually made by Google itself , It's not officially included in HTTP 3 Agreement , Everything is still in the draft . So this option is off by default . look down quic The agreement is compared to the original TCP What improvements have been made to the agreement ? In fact, the original queue transmission message is changed to no queue transmission , Naturally, there will be no congestion at the head of the team .

In addition to HTTP 3 It is also provided Change the port number or ip Addresses can also reuse the ability to connect before , I understand that the characteristics supported by this protocol may be more for the sake of the Internet of things . Many devices in the Internet of things ip It can all be changing all the time . Reusing the previous connection will greatly improve the efficiency of network transmission . In this way, you can avoid the existing disconnection, and you need to go through at least 1-3 individual rtt Can continue to transmit data .


Last but not least , In the extreme weak network environment ,HTTP 2 You may not perform as well as HTTP 1.x, because HTTP 2 There's only one TCP Connect , Under the weak net , If the packet loss rate is very high , So it's going to trigger all the time TCP Layer timeout retransmission , cause TCP The backlog of messages , The message cannot be delivered to the application layer above , however HTTP 1.x in , Because you can use multiple TCP Connect , So to a certain extent , The message backlog will not be like HTTP 2 So serious , That's what I think HTTP 2 The only agreement is not as good as HTTP 1.x The place of , Of course, this pot is TCP Of , Not at all HTTP 2 Of itself .

Read more :

author :vivo Internet -WuYue

本文为[Vivo Internet technology]所创,转载请带上原文链接,感谢

  1. Redis solves cache, breakdown and avalanche
  2. redis
  3. Knight card: build a message center based on Kafka, and push hundreds of millions of messages easily
  4. Oracle OCP 19c 认证1Z0-083考试题库(第2题)
  5. redis的三种模式
  6. kubernetes和docker----2.学习Pod资源
  7. 谈一谈如何远程访问MySQL(腾讯云,云主机)
  8. Linux(五):Linux的文档编辑器Vi
  9. Oracle OCP 19C certification 1z0-083 examination question bank (question 2)
  10. 云原生项目实践DevOps(GitOps)+K8S+BPF+SRE,从0到1使用Golang开发生产级麻将游戏服务器—第6篇
  11. kubernetes和docker----2.学习Pod资源
  12. JSP基于Java开发Web应用程序特点有哪些?
  13. Three modes of redis
  14. Kubernetes and docker -- 2. Learning pod resources
  15. Linux (5): the document editor VI of Linux
  16. Cloud native project practice Devops (gitops) + k8s + BPF + SRE, using golang to develop production level mahjong game server from 0 to 1
  17. Kubernetes and docker -- 2. Learning pod resources
  18. What are the characteristics of JSP developing web application based on Java?
  19. Lottie error: java.lang.AssertionError : android.util.JsonReader .peek
  20. Rxhttp - lightweight, extensible, easy to use, perfectly compatible with MVVM, MVC architecture network encapsulation class library
  21. docker入门到熟练
  22. Java之HTTP网络编程(一):TCP/SSL网页下载
  23. Introduction to docker
  24. HTTP network programming in Java (1): TCP / SSL web page download
  25. mysql 的ACID以及隔离级别
  26. Acid and isolation level of MySQL
  27. Java序列化对字段名的影响
  28. The influence of Java serialization on field names
  29. Redis 日志篇:系统高可用的杀手锏
  30. Java中把一个对象复制给另外一个对象引发的思考
  31. Java之HTTP网络编程(一):TCP/SSL网页下载
  32. Redis log: the killer of system high availability
  33. Thinking about copying one object to another in Java
  34. HTTP network programming in Java (1): TCP / SSL web page download
  35. 数据库--oracle安装配置(本地安装的步骤及各种问题解决方案)
  36. 从事Java9年,27天熬夜把近年遇到的面试题收录成册全网开源!
  37. Database -- Oracle installation configuration (local installation steps and various problem solutions)
  38. Engaged in Java for 9 years, 27 days stay up late, the interview questions encountered in recent years included into a volume, the whole network open source!
  39. Java序列化 / 调用 Wildfly 服务接口异常:EJBCLIENT000409
  40. docker-compose部署Estack
  41. Redis 日志篇:系统高可用的杀手锏
  42. Java中把一个对象的值复制给另外一个对象引发的思考
  43. Java serialization / call wildfly service interface exception: ejbclient000409
  44. Docker compose deploy stack
  45. Mac下查看已安装的jdk版本及其安装目录
  46. Redis log: the killer of system high availability
  47. mybatis映射xml配置文件报错:<statement> or DELIMITER expected, got ‘id‘
  48. Thinking about copying the value of one object to another in Java
  49. IntelliJ IDEA 还能画思维导图,果然最强 IDE!
  50. vue使用sdk进行七牛云上传
  51. IntelliJ IDEA 还能画思维导图,果然最强 IDE!
  52. Spring原来还可以这么玩!阿里新产Spring全线宝典成功颠覆了我对Spring的认知!
  53. View the installed JDK version and its installation directory under mac
  54. Error in mybatis mapping XML configuration file: < statement > or delay expected, got 'ID‘
  55. IntelliJ IDEA 还能画思维导图,果然最强 IDE!
  56. Javascript性能优化【内联缓存】 V8引擎特性
  57. IntelliJ idea can also draw mind maps. It's really the strongest ide!
  58. Vue uses SDK to upload Qi Niu cloud
  59. IntelliJ idea can also draw mind maps. It's really the strongest ide!
  60. 深入理解 Web 协议 (三):HTTP 2