tcp packet combination and subcontracting of swoole

Keywords: PHP

The following two examples are used to understand the problems caused by the characteristics of tcp transmission without data boundary, which leads to the concepts of packet combination and subcontracting proposed in this paper.

Use the client and server of swoole here.

Example 1: the sender sends multiple pieces of data and the receiver reads them at one time

//Sender
$client = new swoole_client(SWOOLE_SOCK_TCP);
$client->connect('127.0.0.1', 6001, -1);
for ($i=0; $i < 11; $i++) {
    $client->send("hello!");// Send smaller data at one time and multiple times.
}
$client->close();

The client has sent "hello!" 11 times, so what is the server receiving?

//Receiver
$serv = new swoole_server("127.0.0.1", 6001);
$serv->on('receive', function ($serv, $fd, $from_id, $data) {
    var_dump($data);
});
$serv->start();

Print results

Different from the expectation, the server did not receive 11 times, and 11 "hello!" were stuck together!

Example 2: the sender sends a large amount of data, and the receiver reads it several times

//Sender
$client = new swoole_client(SWOOLE_SOCK_TCP);
$client->connect('127.0.0.1', 6001, -1);
$client->send(str_repeat('a',32*1024));//Send a large piece of data
$client->close();

The above client sends 32kb data once. How does the server receive it?

//Receiver
$serv = new swoole_server("127.0.0.1", 6001);
$serv->on('receive', function ($serv, $fd, $from_id, $data) {
    var_dump(strlen($data));
});
$serv->start();

Print results

The server did not read once, but read five times!

tcp

//Create a TCP socket
$socket = socket_create(AF_INET,SOCK_STREAM,SOL_TCP);

Transmission type SOCK_STREAM represents a connection oriented socket (stream stream). It is figuratively likened to a "conveyor belt". TCP protocol is based on this streaming socket.

features:

  • reliable
  • Sequential transmission
  • No data boundary

The above figure is from tcp/ip network programming

On the left side, data packets are placed on the conveyor belt one by one. On the right side, in order to improve efficiency, there is a buffer, which may be read once after the buffer is full or multiple times before it is full.

That is, the sender sends multiple pieces of data, and the receiver may read them at one time; Or the sender sends a large amount of data and the receiver reads it several times. In connection oriented sockets, the number of read and write function calls does not make much sense.

For the above example, there is no data boundary for tcp transmission

Two solutions are given in the swoole documentation. In these two schemes, the bottom layer of swoole will splice data packets to ensure that each callback can get a complete package ($data).
Processing method 1, EOF (end of file)
The complete data is confirmed by a specific separator (that is, the separator is used to define the boundary of the data.)

//Receiver
$serv = new swoole_server("127.0.0.1", 6001);
$serv->set([
    'open_eof_split'=>true,
    'package_eof'=>"\r\n\r\n" 
]);
$serv->on('receive', function ($serv, $fd, $from_id, $data) {
    var_dump(strlen($data));
});
$serv->start();

After eof is enabled, a user-defined terminator shall be added at the end of the data, otherwise the receiver cannot receive the data. Next, we will demonstrate examples 1 and 2 above. The sender adds a separator at the end of the data
Example 1(eof)

//Send multiple pieces of data
for ($i=0; $i < 11; $i++) {
    $client->send("hello!"."\r\n\r\n"); //Conventions are delimited by \ r\n\r\n
}

Example 2(eof)

//Send a large piece of data
$client->send(str_repeat('a',32*1024)."\r\n\r\n");

It can be seen that the problem in example 1 and example 2 at the beginning of the article has been solved.

Treatment method 2: fixed Baotou + inclusion
For the EOF method, it is required to ensure that the data cannot contain EOF characters, otherwise the intercepted data will be incorrect, but the actual data cannot guarantee that the data does not contain EOF characters. And when intercepting data, traversal data is used for EOF character matching, which has a certain performance consumption. Therefore, the method of Baotou + inclusion is usually used.

Principle:

Before data, save the length of data in a few bytes, and the receiver intercepts data according to the length.
For example, the packet body data='aaaaa ', the packet header uses 2 bytes to save the data length of 5.

After receiving the package, the receiver first parses the binary format package header and parses 5, representing that the length of the package body is 5. The total length of the packet is 7, and the data with the length of 5 is intercepted from the second byte (2 bytes are occupied due to the fixed packet header). Data = substr (package, 2) gets AAA.

The next focus is on how to define the header:

  1. Ensure that the length of the packet header is fixed (how many bytes to save the length of the data), and let the receiver know where to intercept (offset) from the packet, because the data length is uncertain.

  2. Secondly, the fixed length of the header should be as small as possible without taking up too much resources, so it is very appropriate to use binary to store.

  3. When different computers save and parse data, the order is inconsistent (host byte order). For example, the storage method of integer value 1 at the sender is 00000000 00000000 00000000 00000001. If the host byte order of the receiver and the sender is opposite, it saves 00000001 00000000 00000000 00000000. For an inappropriate example, if the sender sends 1234, it first sends the high-order 1 (thousands of bits), and the receiver receives 1. Because it is in the opposite order to the sender's saving data, it saves the low-order first, 1 is saved to the lowest 1 (bits), and becomes 4321 after receiving. Therefore, there is a concept called "network byte order", which unifies the order of sending data. The receiver converts the data in this order into the host byte order of its own host according to a fixed function. For example, the unified transmission order is 4 (PCS) - 3 (TENS) - 2 (hundreds) - 1 (thousands) - > the receiver knows that 1 is high.

In the swoole documentation, select server > Configuration Options > package_ length_ Type lists the type of header.

Then select the unsigned network byte order. I.e. n, n. N can represent more integer values.

//Receiver
$serv = new swoole_server("127.0.0.1", 6001);
$serv->set([
    'open_length_check' => true, //Enable packet length detection feature
    'package_max_length' => 32*1024, //The maximum length of the package, which will occupy more memory
    'package_length_type' => 'N', //Header length type
    'package_length_offset' => 0,  //The length value is in the first few bytes of the packet header
    'package_body_offset' => 4, //Calculate the length of the package from the first few bytes (N is 4 bytes)
]);
$serv->on('receive', function ($serv, $fd, $from_id, $data) {
   $len = unpack('N',$data);
   $body = substr($data,4,$len[1]);
   var_dump($body);
});
$serv->start();
//Sender
$client = new swoole_client(SWOOLE_SOCK_TCP);
$client->connect('127.0.0.1', 6001, -1);

$body = 'aaaaa';
$head = pack('N',strlen($body));//pack
$pack = $head.$body; 

for ($i=0; $i < 6 ; $i++) {
    $client->send($pack);
}
$client->close();

The sender sent AAA six times

Finally, the examples in this paper are that the client sends messages to the server, and the information sent by the server to the client also needs to be subcontracted, and the code is the same.

<?php //client
$client = new swoole_client(SWOOLE_SOCK_TCP);

//Note that the synchronization client setting option is before connect!
$client->set(array(
    'open_length_check'     => 1,
    'package_length_type'   => 'N',
    'package_length_offset' => 0,  
    'package_body_offset'   => 4,   
    'package_max_length'    => 100*1024, 
));

$client->connect('127.0.0.1', 6001, -1);

$data = $client->recv();
if($data){
    $len = unpack('N',$data);
    $body = substr($data,4,$len[1]);
    var_dump(strlen($body));
}

$client->close();

Posted by kdoggfunkstah on Sun, 10 Oct 2021 05:08:12 -0700