C# implement WebSocket server: (02) message frame analysis and code implementation

Keywords: C# http websocket

Earlier, we introduced the handshake of WebSocket: C # implements WebSocket server: (01) handshake
After the handshake is completed, both the client and the server can send and receive messages.
WebSocket messages are sent and received in frames.

0. WebSocket frame

Frame type Op

There are six common frame types:

valuetypeexplain
0x00ContinuationFor subsequent frames, when a frame is a non end frame, the subsequent frame will be marked as Continuation. The application needs to read the next frame until the end frame is read.
0x01TextData frame: text, indicating that the Payload of the frame is the data encoded by UTF8
0x02BinaryData frame: binary, indicating that the Payload of the frame is binary data
0x08CloseClosing a frame usually requires the receiver to respond to a Close frame to the sender when receiving the Close frame
0x09PingPing frame to detect whether the other party can continue to send and receive data (the word used in RFC is whether it can respond)
0x0aPongGenerally, one end of a Pong frame needs to respond to a Pong frame to the sender after receiving a Ping frame to confirm that it is "responsive"

Data format of frame

The interpretation of data format is generally quite boring. Fortunately, the data format of WebSocket frame is relatively simple. The following explains the frame data format according to my understanding.
Firstly, the frame can be divided into two parts: metadata and payload. Payload follows the metadata, and the content is determined by the metadata.

metadata

1. First byte, expanded in bits:

position01234-7
explainIdentifies whether the current frame is an end frame, 1-end frame, 0-non-end frameretainretainretainFrame type, corresponding to the above types

For the non end frame, the frame type of the next frame after the end of the current frame is Continuation (0x00). The application needs to check whether the frame is the end frame. If not, it needs to continue reading the next frame until the end frame is encountered and all the read frames are connected to the Payload, which is the complete Payload data.

2. The second byte, expanded in bits:

position01-7
explainIdentifies whether the Payload data of the frame has been masked, 1 - masked, 0 - not maskedPayload length identification

3. About Payload length identification
If the ID value is less than 126, the Payload length is the value of the Payload length ID.
If the flag value is equal to 126, it means that the Payload length is the unsigned integer value represented by the next two bytes.
If the flag value is equal to 127, it means that the Payload length is the unsigned long integer value represented by the following 8 bytes.
Yes, there are no 4 bytes, that is, 2 or 8.

4. About masks
If the frame is masked, the next four bytes are the mask value.
If there is no mask, Payload will follow.

5. For Payload length identification and mask, give a few simple examples.

End frameFrame typeWith or without maskPayload lengthMetadata code (0xXX represents random bytes)explain
yestextnothing100x81 0x0aIn the simplest way, two bytes can describe the metadata clearly
yestexthave100x81 0x8a 0xXX 0xXX 0xXX 0xXXIf it is masked, it must be in the last 4 bytes of metadata
yestextnothing10000x81 0x7e 0x03 0xe8If the data exceeds 125 and is less than 65536, an additional 2 bytes are required to represent the Payload length
yestexthave10000x81 0xfe 0x03 0xe8 0xXX 0xXX 0xXX 0xXXThere is a mask, which can be supplemented by 4 bytes on the basis of the previous one
yestextnothing1000000x81 0x7f 0x00 0x00 0x00 0x00 0x00 0x01 0x86 0xa0The data exceeds 65535, and an additional 8 bytes are required to represent the Payload length
yestexthave1000000x81 0xff 0x00 0x00 0x00 0x00 0x00 0x01 0x86 0xa0 0xXX 0xXX 0xXX 0xXXThere is a mask, which can be supplemented by 4 bytes on the basis of the previous one

Of course, the Payload is 10, and we can code according to the mode of 126 or 127.
Similarly, if the Payload is 1000, it can also be encoded according to the mode of 127.
However, for a length greater than 65535, it can only be encoded in 127 mode.

6. Mask operation
The operation of mask is to perform XOR operation by bit according to the mask.
For example:
Mask: 0x01 0x02 0x03 0x04
Payload original text: 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09
Payload encoding mode of actual transmission: (0x01 ^ 0x01) (0x02 ^ 0x02) (0x03 ^ 0x03) (0x04 ^ 0x04) (0x05 ^ 0x01) (0x06 ^ 0x02) (0x07 ^ 0x03) (0x08 ^ 0x04) (0x09 ^ 0x01)
It can be seen that the mask is recycled. After the fourth byte is used up, continue encoding from the first byte until the Payload is encoded.

Byte encoded numbers are transmitted in network byte order (big end), that is, the high order is in the front and the low order is in the back.
At this point, the metadata ends, and the Payload is immediately followed by the metadata. We have obtained the Payload length in the metadata.
The following figure shows the description of frames in RFC:

Payload

According to the Payload length obtained from the metadata, the Payload can be read out completely.
For Text, Binary and Continuation frames, the Payload is nothing special. The following describes the Payload of other frames.
1. Close close frame
The party sending the closing frame may include a status code and the closing reason in the Payload of the closing frame, or it may only carry the status code without the reason.

Status codeClosing reason
Unsigned integer identified by 2 bytes, > = 1000Except for the first two bytes of the status code, all remaining Payload bytes are the cause

After receiving the Close frame, the receiver usually needs to respond to a Close frame to the sender, and usually returns the status code and reason given by the sender to the sender as is.
2. Ping frame
After receiving the Ping frame, the receiver needs to reply a Pong frame to the sender and bring the Payload of the Ping frame as it is.
(I'm playing table tennis. The ball will always be the ball, and Payload will always be the Payload ~ ~ ~)

1. Realize frame analysis

Metadata parsing

Don't say much. Go directly to the source code and parse it according to the metadata format mentioned above.
Definition and implementation of Frame class: https://github.com/hooow-does-it-work/http/blob/main/src/WebSocket/Frame.cs
At the same time, we also realize the common control frames: https://github.com/hooow-does-it-work/http/tree/main/src/WebSocket/Frames

public static Frame NextFrame(Stream baseStream)
{
    byte[] buffer = new byte[2];
    ReadPackage(baseStream, buffer, 0, 2);

    Frame frame = new Frame();

    //Process first byte
    //The first bit, if 1, represents that the frame is the end frame
    frame.Fin = buffer[0] >> 7 == 1;

    //Three reserved seats, we don't need them
    frame.Rsv1 = (buffer[0] >> 6 & 1) == 1;
    frame.Rsv2 = (buffer[0] >> 5 & 1) == 1;
    frame.Rsv3 = (buffer[0] >> 4 & 1) == 1;

    //5-8 bits, representing frame type
    frame.OpCode = (OpCode)(buffer[0] & 0xf);

    //Process second byte
    //The first bit, if 1, represents that the Payload has been masked
    frame.Mask = buffer[1] >> 7 == 1;

    //2-7 bits, Payload length identification
    int payloadLengthMask = buffer[1] & 0x7f;

    //If the value is less than 126, this value represents the actual length of the Payload
    if (payloadLengthMask < 126)
    {
        frame.PayloadLength = payloadLengthMask;
    }
    //126 means that the following two bytes save the Payload length
    else if (payloadLengthMask == 126)
    {
        frame.PayloadLengthBytesCount = 2;

    }
    //126 means that the following 8 bytes save the Payload length. Yes, there are no 4 bytes.
    else if (payloadLengthMask == 127)
    {
        frame.PayloadLengthBytesCount = 8;

    }

    //If there is no mask and no additional bytes are required to determine the Payload length, it is returned directly
    //Later, just read the Payload according to the Payload length
    if (!frame.Mask && frame.PayloadLengthBytesCount == 0)
    {
        return frame;
    }

    //Read out 2 or 8 bytes of the saved length
    //If there is a mask, you need to continue reading the 4-byte mask
    buffer = frame.Mask 
        ? new byte[frame.PayloadLengthBytesCount + 4] 
        : new byte[frame.PayloadLengthBytesCount];

    //Read Payload length data and mask (if any)
    ReadPackage(baseStream, buffer, 0, buffer.Length);

    //If there is a mask, extract it
    if (frame.Mask)
    {
        frame.MaskKey = buffer.Skip(frame.PayloadLengthBytesCount).Take(4).ToArray();
    }

    //Get the length of the Payload from the byte data
    if (frame.PayloadLengthBytesCount == 2)
    {
        frame.PayloadLength = buffer[0] << 8 | buffer[1];

    }
    else if (frame.PayloadLengthBytesCount == 8)
    {
        frame.PayloadLength = ToInt64(buffer);
    }

    //So far, all data representing frame element information are read out
    //We will read the Payload data in a stream
    //For some special frames, the Payload will also have a specific data format, which will be introduced separately later

    return frame;
}

Payload read

After reading the frame metadata, call the Frame static method OpenRead to open a read stream to read the Payload.
The read Payload method here is a general method and does not analyze the Payload data of special frames (such as Close).
FrameReadStream will automatically decode the masked Payload internally.

Note: the Frame class also has a non static method OpenRead. The Stream opened here can only read the Payload of the current Frame, but cannot read the data of the Continuation Frame.

/// <summary>
///Static method to open a stream from Frame
/// </summary>
/// <param name="frame"></param>
/// <param name="stream"></param>
///< returns > if the FIN ID of the frame is 1, return FrameReadStream directly; Otherwise, a MultipartFrameReadStream is returned. MultipartFrameReadStream can read all subsequent frames until FIN ID is 0 < / returns >
public static Stream OpenRead(Frame frame, Stream stream) {
    if (frame.Fin) return frame.OpenRead(stream);
    return new MultipartFrameReadStream(frame, stream, true);
}

2. Frame encapsulation

The encapsulation and parsing of frames are the opposite process. I won't talk about it specifically. We implemented a CreateMetaBytes method in the Frame class to generate metadata.
After calling the OpenWrite method of Frame, the Frame metadata will be automatically generated and written to the underlying stream. At the same time, a FrameWriteStream stream stream will be returned to write data to the Frame.

Note: FrameWriteStream does not send big data by frame, which will be implemented later.

3. Testing

Let's start testing our logic.
Pull our front-end test code directly from Git: https://github.com/hooow-does-it-work/http/tree/main/bin/Release/web
Before writing the test server, we had no implementation of OnWebSocket, but simply closed the flow. Now we realize the reading and writing of frames.
We set the WebRoot of the server as the web directory pulled from Git to realize the simultaneous operation of WebSocket and ordinary HTTP services.
The test server is simple and rough. It directly reads and analyzes frames from the client in a while loop. In the follow-up, it can do some encapsulation work to encapsulate the logic into a specific frame, such as reading the status code of the Close frame and the closing reason.
Test server code:

public class HttpServer : HttpServerBase
{
    public HttpServer() : base()
    {
        //Set root directory
        WebRoot = Path.GetFullPath(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "web"));
    }
    protected override void OnWebSocket(HttpRequest request, Stream stream)
    {
        while (true)
        {
            Frame frame = null;
            try
            {
                frame = Frame.NextFrame(stream);
            }
            catch (IOException)
            {
                Console.WriteLine("Client disconnected");
                break;
            }
            Console.WriteLine($"Frame type:{frame.OpCode},Whether there is a mask:{frame.Mask},Frame length:{frame.PayloadLength}");

            //Read all payloads
            byte[] payload = null;
            using (Stream input = Frame.OpenRead(frame, stream))
            {
                using MemoryStream output = new MemoryStream();
                input.CopyTo(output);
                payload = output.ToArray();
            }

            //After receiving the close frame, you need to reply a close frame to the client if necessary.
            //The closing frame is special. The client may send the status code or reason to the server
            //You can analyze the status code and cause from the payload
            //First two byte bit status code, unsigned int; Following the status code is the reason.
            if (frame.OpCode == OpCode.Close)
            {
                int code = 0;
                string reason = null;

                if(payload.Length >= 2) {
                    code = payload[0] << 8 | payload[1];
                    reason = Encoding.UTF8.GetString(payload, 2, payload.Length - 2);

                    Console.WriteLine($"Closing reason:{code},{reason}");
                }

                //Close the WebSocket normally and reply to the closed frame
                //Other codes directly exit the loop and close the basic flow
                if (code <= 1000)
                {
                    CloseFrame response = new CloseFrame(code, reason);
                    response.OpenWrite(stream);
                }
                break;
            }

            //After receiving a Ping frame, you need to reply a Pong frame to the client.
            //If there is a payload, it will be sent to the client at the same time
            if (frame.OpCode == OpCode.Ping)
            {
                PongFrame response = new PongFrame(payload);
                response.OpenWrite(stream);
                continue;
            }

            //Receive the Binary frame and print the content
            //Here, the frame data can be saved to a file or other applications in the form of stream
            if(frame.OpCode == OpCode.Binary)
            {
                Console.WriteLine(string.Join(", ", payload));

                //In order to test, we send the test content to the client
                TextFrame response = new TextFrame($"The server received binary data, length:{payload.Length}");
                response.OpenWrite(stream);
                continue;
            }

            //Receive the text and print it out
            if (frame.OpCode == OpCode.Text)
            {
                string message = Encoding.UTF8.GetString(payload);
                Console.WriteLine(message);

                //For testing, we send the information back to the client
                TextFrame response = new TextFrame($"The server received text data:{message}");
                response.OpenWrite(stream);
            }

        }
        stream.Close();
    }
}

Run the server, browser access: http://127.0.0.1:4189/websocket.html

Click the connect button. After the connection is successful, the following form will be displayed.

Enter some data and click "send in text mode" and "send in binary mode" respectively to view the console output.

You can see that the service correctly parses the data sent by the browser, and the browser also displays the data returned by the server.
Click disconnect.

The server received a Close frame with a length of 0, indicating that the browser did not send the status code and reason.

4. Summary

The key to WebSocket is frame parsing. After fully understanding the data structure of the frame, it is actually very easy.
What we implement here is the most basic WebSocket. WebSocket actually has more functions, such as compression and other extensions.

Posted by suttercain on Sat, 20 Nov 2021 02:07:45 -0800