“ Everything is Socket!”

It's a little exaggerated , But the truth is, too , Today's network programming is almost all used socket.

—— Thanks for practical programming and open source project research .

We understand the value of information exchange , How do processes communicate with each other in the network , For example, when we open the browser to browse the web every day , How does the browser process relate to web Server communication ? When you use QQ When chatting ,QQ How does the process work with the server or where your friends are QQ Process of communication ? It all depends socket? What is socket?socket What are the types of ? also socket The basic function of , These are all what this article wants to introduce . The main contents of this paper are as follows :

  • 1、 How processes communicate with each other in the network ?

  • 2、Socket What is it? ?

  • 3、socket Basic operation

    • 3.1、socket() function

    • 3.2、bind() function

    • 3.3、listen()、connect() function

    • 3.4、accept() function

    • 3.5、read()、write() Functions, etc

    • 3.6、close() function

  • 4、socket in TCP Three handshakes to establish a connection

  • 5、socket in TCP Four handshake release connection details

  • 6、 An example ( Practice a )

  • 7、 Leave a question , Welcome to reply !!!

1、 How processes communicate with each other in the network ?

Local interprocess communication (IPC) There are many ways , But it can be summed up as follows 4 class :

  • The messaging ( The Conduit 、FIFO、 Message queue )

  • Sync ( The mutex 、 Condition variables, 、 Read-write lock 、 File and write record lock 、 Semaphore )

  • Shared memory ( Anonymous and named )

  • Remote procedure call (Solaris The door and Sun RPC)

But none of this is the subject of this article ! We are going to talk about how processes in the network communicate with each other ? The first problem to be solved is how to uniquely identify a process , Otherwise, communication is impossible ! It can be done locally through processes PID To uniquely identify a process , But it doesn't work on the Internet . Actually TCP/IP The protocol family has solved this problem for us , Network layer “ip Address ” Can uniquely identify hosts in the network , And the transport layer “ agreement + port ” Can uniquely identify applications in the host ( process ). This uses triples (ip Address , agreement , port ) You can identify the process of the network , Process communication in the network can use this flag to interact with other processes .

Use TCP/IP The application program of the protocol usually uses the application programming interface :UNIX  BSD Socket (socket) and UNIX System V Of TLI( Has been eliminated ), To achieve communication between network processes . For now , Almost all applications use socket, And now it's the Internet age , Process communication is ubiquitous in the network , That's why I said “ Everything is socket”.

2、 What is? Socket?

We already know that the process in the network is through socket To communicate , What is socket Well ?socket Come of Unix, and Unix/Linux One of the basic philosophies is “ Everything is a document ”, Both can be used. “ open open –> Reading and writing write/read –> close close” Mode to operate . My understanding is that Socket It's an implementation of this pattern ,socket It's a special document , some socket Function is the operation on it ( read / Write IO、 open 、 close ), These functions are described later .

socket The origin of the word

The first use in networking is in 1970 year 2 month 12 The literature published on IETF RFC33 Found in the , The writer is Stephen Carr、Steve Crocker and Vint Cerf. According to the American Museum of computer history ,Croker writes :“ The elements of the namespace can be called socket interfaces . A socket interface forms one end of a connection , And a connection can be completely defined by a pair of socket interfaces .” The Museum of computer history added :“ This is more than BSD The socket interface definition of is about 12 year .”

3、socket Basic operation

since socket yes “open—write/read—close” An implementation of patterns , that socket The function interface corresponding to these operations is provided . Let's say TCP For example , Introduce some basic socket The interface function .

3.1、socket() function

int socket(int domain, int type, int protocol);

socket Function corresponds to the open operation of a normal file . The open operation of a normal file returns a file description word , and socket() Used to create a socket The descriptor (socket descriptor), It uniquely identifies a socket. This socket Descriptors are the same as document descriptors , Subsequent operations are useful to it , Take it as a parameter , Through it to do some reading and writing operations .

Just as you can give fopen Passed in different parameter values for , To open different files . establish socket When , You can also specify different parameters to create different socket The descriptor ,socket The three parameters of the function are :

  • domain: Protocol domain , Also known as the protocol family (family). Common protocol families are ,AF_INET、AF_INET6、AF_LOCAL( Or called AF_UNIX,Unix Domain socket)、AF_ROUTE wait . The protocol family decided socket Address type of , The corresponding address must be used in the communication , Such as AF_INET Decided to use ipv4 Address (32 Bit ) And port number (16 Bit ) The combination of 、AF_UNIX Decided to use an absolute pathname as the address .
  • type: Appoint socket type . frequently-used socket Type a ,SOCK_STREAM、SOCK_DGRAM、SOCK_RAW、SOCK_PACKET、SOCK_SEQPACKET wait (socket What are the types of ?).
  • protocol: So it's called Siyi , It's a protocol . Common protocols are ,IPPROTO_TCP、IPPTOTO_UDP、IPPROTO_SCTP、IPPROTO_TIPC etc. , They correspond to each other TCP Transfer protocol 、UDP Transfer protocol 、STCP Transfer protocol 、TIPC Transfer protocol ( I'll have a separate discussion on this agreement !).

Be careful : It's not the one above type and protocol Can be combined at will , Such as SOCK_STREAM You can't talk to IPPROTO_UDP Combine . When protocol by 0 when , Will automatically select type Default protocol corresponding to type .

When we call socket Create a socket when , Back to socket Descriptors exist in the protocol family (address family,AF_XXX) In the space , But there is no specific address . If you want to assign an address to it , You have to call bind() function , Otherwise, call connect()、listen() The system will automatically assign a port at random .

3.2、bind() function

As mentioned above bind() Function to assign a specific address in an address family to socket. For example, it corresponds to AF_INET、AF_INET6 Is to take a ipv4 or ipv6 The combination of address and port number is assigned to socket.

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

The three parameters of the function are :

  • sockfd: namely socket Description words , It's through socket() Function creates , A unique logo socket.bind() The function will bind a name to the description word .
  • addr: One const struct sockaddr * The pointer , Point to bind to sockfd Agreement address for . This address structure is created according to the address socket When the address protocol family is different , Such as ipv4 The corresponding is :
    struct sockaddr_in {
        sa_family_t    sin_family; /* address family: AF_INET */
        in_port_t      sin_port;   /* port in network byte order */
        struct in_addr sin_addr;   /* internet address */
    };
    /* Internet address. */
    struct in_addr {
        uint32_t       s_addr;     /* address in network byte order */
    };
    ipv6 The corresponding is :
    struct sockaddr_in6 { 
        sa_family_t     sin6_family;   /* AF_INET6 */ 
        in_port_t       sin6_port;     /* port number */ 
        uint32_t        sin6_flowinfo; /* IPv6 flow information */ 
        struct in6_addr sin6_addr;     /* IPv6 address */ 
        uint32_t        sin6_scope_id; /* Scope ID (new in 2.4) */ 
    };
    struct in6_addr { 
        unsigned char   s6_addr[16];   /* IPv6 address */ 
    };
    Unix The domain corresponds to :
    #define UNIX_PATH_MAX    108
    struct sockaddr_un { 
        sa_family_t sun_family;               /* AF_UNIX */ 
        char        sun_path[UNIX_PATH_MAX];  /* pathname */ 
    };
  • addrlen: It corresponds to the length of the address .

Usually the server will bind a well-known address when it starts ( Such as ip Address + Port number ), Used to provide services , The client can connect to the server through it ; And the client doesn't have to specify , The system automatically assigns a port number and its own ip Address combination . That's why the server is usually on listen We'll call bind(), And the client will not call , But in connect() The system randomly generates a .

Network byte order and host byte order

Host byte order It's what we usually call the big end and small end mode : Different CPU There are different byte order types , These byte orders refer to the order in which integers are stored in memory , This is called host sequence . Quoting standard Big-Endian and Little-Endian Is defined as follows :

a) Little-Endian That is, the low byte is discharged at the low address end of the memory , The high byte is placed at the high address end of the memory .

b) Big-Endian That is, the high byte is discharged at the low address side of the memory , The low byte is placed at the high address of the memory .

Network byte order :4 Bytes of 32 bit Values are transmitted in the following order : First of all 0~7bit, secondly 8~15bit, then 16~23bit, And finally 24~31bit. This order of transmission is called big endian . because TCP/IP All binary integers in the header are required to be in this order when they are transmitted in the network , So it's also called network byte order . Byte order , As the name implies, the order of bytes , It is the storage order of data larger than one byte in memory , A byte of data is out of order .

therefore : Binding an address to socket When , Please convert the host byte order to network byte order first , Do not assume that the host byte order is the same as the network byte order Big-Endian. Because of this problem, there have been murders ! Because of this problem in the company project code , It leads to a lot of inexplicable problems , So remember not to make any assumptions about the host byte order , Be sure to convert it into network byte order and assign it to socket.

3.3、listen()、connect() function

If as a server , Calling socket()、bind() After that, it will call listen() To monitor this socket, If the client calls connect() Make a connection request , The server will receive the request .

int listen(int sockfd, int backlog);
int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

listen The first parameter of the function is the one to listen to socket Description words , The second parameter is the corresponding socket The maximum number of connections that can be queued .socket() Function created socket The default is an active type ,listen Function will socket Become passive , Waiting for customer's connection request .

connect The first parameter of the function is the client's socket Description words , The second parameter is... Of the server socket Address , The third parameter is zero socket The length of the address . The client calls connect Function to establish and TCP Server connection .

3.4、accept() function

TCP The server calls... In turn socket()、bind()、listen() after , Will monitor the designated socket Address .TCP The client calls... In turn socket()、connect() And then I thought TCP The server sent a connection request .TCP After the server listens for the request , Will call accept() Function to receive the request , So the connection is established . Then we can start the Internet I/O Operation , That is, reading and writing similar to ordinary documents I/O operation .

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

accept The first parameter of the function is... Of the server socket Description words , The second parameter is pointing to struct sockaddr * The pointer to , Used to return the protocol address of the client , The third parameter is the length of the protocol address . If accpet success , So its return value is a new description word automatically generated by the kernel , Representing and returning customers TCP Connect .

Be careful :accept The first parameter of is the... Of the server socket Description words , It's the server that starts calling socket() Function generated , It's called listening socket Description words ; and accept The function returns connected socket Description words . A server usually only creates one monitor socket Description words , It exists throughout the life of the server . The kernel creates a connection for each client connection accepted by the server process socket Description words , When the server completes the service to a certain customer , Corresponding connected socket The descriptors are closed .

3.5、read()、write() Such as function

Everything is in place only a strong wind , Now the server has established a connection with the customer . You can call the network I/O Read and write it , That is, the communication between different processes in the network is realized ! The Internet I/O There are several groups of operations :

  • read()/write()
  • recv()/send()
  • readv()/writev()
  • recvmsg()/sendmsg()
  • recvfrom()/sendto()

I recommend using recvmsg()/sendmsg() function , These two functions are the most common I/O function , You can actually replace all the other functions above with these two functions . Their statements are as follows :

       #include <unistd.h>
       ssize_t read(int fd, void *buf, size_t count);
       ssize_t write(int fd, const void *buf, size_t count);
       #include <sys/types.h>
       #include <sys/socket.h>
       ssize_t send(int sockfd, const void *buf, size_t len, int flags);
       ssize_t recv(int sockfd, void *buf, size_t len, int flags);
       ssize_t sendto(int sockfd, const void *buf, size_t len, int flags,
                      const struct sockaddr *dest_addr, socklen_t addrlen);
       ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags,
                        struct sockaddr *src_addr, socklen_t *addrlen);
       ssize_t sendmsg(int sockfd, const struct msghdr *msg, int flags);
       ssize_t recvmsg(int sockfd, struct msghdr *msg, int flags);

read The function is responsible for from fd Read content in . When the reading is successful ,read Returns the actual number of bytes read , If the value returned is 0 Indicates that the end of the file has been read , Less than 0 Indicates that there is a mistake . If the error is EINTR Explain that reading is caused by interruption , If it is ECONNREST Indicates that there is a problem with the network connection .

write Function will buf Medium nbytes Byte contents written to file descriptor fd. Returns the number of bytes written on success . Return... On failure -1, And set up errno Variable . In a web program , There are two possibilities when we write to socket file descriptors .1)write The return value of is greater than 0, Indicates that some or all of the data has been written .2) The value returned is less than 0, There was a mistake . We have to deal with it according to the type of error . If the error is EINTR Indicates an interrupt error occurred while writing . If EPIPE Indicates that there is a problem with the network connection ( The other party has closed the connection ).

I'm not going to introduce these other pairs one by one I/O Function , Specific see man Document or baidu、Google, The following example will be used to send/recv.

3.6、close() function

After the server has established a connection with the client , I can do some reading and writing , After reading and writing, close the corresponding socket Description words , It's like opening a file after operation fclose Close open files .

#include <unistd.h>
int close(int fd);

close One TCP socket The default behavior is to socket Mark to close , Then immediately return to the calling process . The descriptor can no longer be used by the calling process , That is to say, we can no longer do read or write The first parameter of .

Be careful :close The operation just makes the corresponding socket Reference count of descriptive words -1, Only if the reference count is 0 When , Will trigger TCP The client sends a termination request to the server .

4、socket in TCP Three handshakes to establish a connection

We know tcp To establish a connection “ Three handshakes ”, That is, three packets are exchanged . The general flow is as follows :

  • The client sends a... To the server SYN J
  • The server responds to the client with a SYN K, Also on SYN J Confirm ACK J+1
  • The client wants the server to send a confirmation ACK K+1

Only three handshakes , But this three handshake happened in socket Among the functions of ? Please look at the chart below. :

image

chart 1、socket Sent in TCP Three handshakes

As you can see from the diagram , When the client calls connect when , Connection request triggered , Sent... To the server SYN J package , At this time connect Go into blocking mode ; The server listens for connection requests , I will receive SYN J package , call accept Function to receive the request and send it to the client SYN K ,ACK J+1, At this time accept Go into blocking mode ; Client receives server's SYN K ,ACK J+1 after , At this time connect return , Also on SYN K Confirm ; Server received ACK K+1 when ,accept return , Now the three handshakes are over , Connection is established .

summary : Client's connect On the second return of the three handshakes , On the server side accept On the third return of the three handshakes .

5、socket in TCP Four handshake release connection details

It says socket in TCP The establishment process of three handshakes , And what it involves socket function . Now let's introduce socket The process of releasing a connection with four handshakes in , Please look at the chart below. :

image

chart 2、socket Sent in TCP Four handshakes

The process is as follows :

  • An application process first calls close Active close connection , At this time TCP Send a FIN M;

  • The other end receives FIN M after , Perform passive shutdown , For this FIN Confirm . Its reception is also passed to the application process as a file Terminator , because FIN It means that the application process can no longer receive additional data on the corresponding connection ;

  • After a while , Receive the application process call of the end of file close Turn it off socket. This leads to its TCP Also send a FIN N;

  • Received this FIN Source sender of TCP Confirm it .

So there's one in every direction FIN and ACK.

6、 An example ( Practice a )

That's all , Do it yourself . Now write a simple server 、 client ( Use TCP)—— The server has been listening to the local 6666 Port no. , If you receive a connection request , Will receive the request and receive the message from the client ; The client establishes a connection with the server and sends a message .

Server-side code :

Server side

Client code :

client

Of course, the code above is very simple , There are also many disadvantages , It's just a simple demonstration socket The basic function of . In fact, no matter how complex the network program , All of these basic functions . The server above is in iterative mode , That is to say, only after one client request is processed can the next client request be processed , This kind of server processing power is very weak , In reality, servers need to have concurrent processing power ! In order to need concurrent processing , Server needs fork() A new process or thread to process requests, etc .

7、 use one's hands

Leave a question , Welcome to reply !!! Are you familiar with Linux Network programming ? If you are familiar with , Write the following program to complete the following functions :

Server side :

Receiving address 192.168.100.2 Client information , If the message is “Client Query”, Then print “Receive Query”

client :

To address 192.168.100.168 The server side sends information in sequence “Client Query test”,“Cleint Query”,“Client Query Quit”, And then quit .

In the title ip The address can be determined according to the actual situation .

—— This article just introduces a simple socket Programming .

More complex needs to continue to go deep .

(unix domain socket) Use udp send out >=128K The news will be reported ENOBUFS Error of ( A reality socket Problems encountered in programming , I hope it helps you )

 

author : Saylor
Source :http://www.cnblogs.com/skynet/