Advanced plug-in implementation of GoReplay

The most effective function of GoReplay artifact is to non invasively import the real traffic into the local disk file or test machine without affecting the operation of the online service machine, so as to test the real traffic on the test machine, so as to ensure the quality of product release.

In the last article < < HTTP traffic copy test artifact goreplay > > , the author mainly describes the following four aspects.

  • Traffic replication test
  • Introduction to saving traffic to file and replay function
  • HTTP request filtering
  • HTTP request change

However, we may also encounter the following problems:

  1. Comparison of traffic test results: This refers to, for example, comparing the test results of a new program to be released with the test results of the original version to ensure that the behavior of the program to be released after changes is correct and stable. Before, we can only import the same real traffic through two test machines, and then compare the results through Log or other methods. This generally requires code modification, and we also need to write a Log analysis program for semi-automatic result comparison and analysis. So is there a method to realize real-time traffic comparison analysis? Yes, that is, GoReplay's Middleware programming, which is called plug-in programming in this paper.
  2. Request Rewriting: Although the GoReply command implements some HTTP request change functions, it is weak after all. If you want to realize complete HTTP request rewriting, you can use the plug-in function of GoRelay.

How the GoReplay plug-in works

GoReplay plug-in adopts the way of inter process communication. From another point of view, it supports plug-ins implemented in any language. So how does GoReplay communicate with plug-ins? What are the inputs and outputs of the GoReplay plug-in? And are there any points that need attention?

  • GoReplay plug-in adopts standard input and standard output as the way of inter process communication
  • The standard input that the GoReplay plug-in can obtain is the original request of the real traffic, the original response result and the response result of the test machine. At this time, think about whether the comparison function of the traffic test can be completed through the following two points? The plug-in can also rewrite the original request and then output it to standard output, so GoReplay will send the rewritten request to the test machine.
  • It should be noted that the original request, the original response results and the response results of the test machine are not necessarily sequential in theory, because GoReplay adopts asynchronous processing.

The following figure is from the official GoReplay Wiki.

So what format is the content obtained by the plug-in? I'll give you a test example, as shown below. Is it confused. This is a hexadecimal representation. This representation can facilitate the plug-in to segment the standard input information. It uses \ n to represent the end of a message body. After divergent thinking, is the design of this protocol a little similar to the design of application layer communication protocol based on TCP? Many things are by analogy. We can find common things from different things, so that we can be comfortable in our own design.

3120303433303233383230303030303030313464343533646533203136333830303236343633393533383930303020300a474554202f20485454502f312e310d0a4163636570743a202a2f2a0d0a486f73743a206c6f63616c686f73743a393039300d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174652c2062720d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a0d0a

In fact, the decoded results are shown in the figure below:

In the above figure, the contents of the first line are separated by spaces:

  • The first part is a number, which can be 1, 2 or 3. It represents the original request, the original response result and the response result of the test machine respectively. For example, this example is an original request.
  • The second part is an ID. for the same original request, the impact result of the corresponding original request and the response result of the test machine, the same ID is adopted; But the ID is different for different requests. Then the plug-in can correspond the original response result and the test response result of the same request according to this ID.
  • The third part is the timestamp of the arrival of a request or the receipt of a response result
  • The fourth part indicates the elapsed time from the beginning to the end of the message. The official document indicates that this value does not necessarily exist. Look at the source code when you are free. When will it not be available? If you know, readers are also welcome to leave a message.

The content of the next few lines is clear at a glance, which is an HTTP message body.

Here, let's hurry to practice the implementation and application of plug-ins! Taking the official Python sample as an example, the author adds a line sys.stdout.flush().

#! /usr/bin/env python3
# -*- coding: utf-8 -*-

import sys
import fileinput
import binascii

# Used to find end of the Headers section
EMPTY_LINE = b'\r\n\r\n'


def log(msg):
    """
    Logging to STDERR as STDOUT and STDIN used for data transfer
    @type msg: str or byte string
    @param msg: Message to log to STDERR
    """
    try:
        msg = str(msg) + '\n'
    except:
        pass
    sys.stderr.write(msg)
    sys.stderr.flush()


def find_end_of_headers(byte_data):
    """
    Finds where the header portion ends and the content portion begins.
    @type byte_data: str or byte string
    @param byte_data: Hex decoded req or resp string
    """
    return byte_data.index(EMPTY_LINE) + 4


def process_stdin():
    """
    Process STDIN and output to STDOUT
    """
    for raw_line in fileinput.input():

        line = raw_line.rstrip()

        # Decode base64 encoded line
        decoded = bytes.fromhex(line)

        # Split into metadata and payload, the payload is headers + body
        (raw_metadata, payload) = decoded.split(b'\n', 1)

        # Split into headers and payload
        headers_pos = find_end_of_headers(payload)
        raw_headers = payload[:headers_pos]
        raw_content = payload[headers_pos:]

        log('===================================')
        request_type_id = int(raw_metadata.split(b' ')[0])
        log('Request type: {}'.format({
          1: 'Request',
          2: 'Original Response',
          3: 'Replayed Response'
        }[request_type_id]))
        log('===================================')

        log('Original data:')
        log(line)

        log('Decoded request:')
        log(decoded)

        encoded = binascii.hexlify(raw_metadata + b'\n' + raw_headers + raw_content).decode('ascii')
        log('Encoded data:')
        log(encoded)

        sys.stdout.write(encoded + '\n')
        sys.stdout.flush()

if __name__ == '__main__':
    process_stdin()

This example mainly shows how to read the input, parse the input, and write the original content back to the standard output. We save it as plugin.py, then run the command line as follows, start GoReplay, and the plug-in process will be loaded:

sudo ./gor  --input-raw :9898 --output-http-track-response  --input-raw-track-response  --middleware "python3 plugin.py" --output-http "http://< target machine IP >: 9898“

In the above example, all inputs are written back to the standard output. In fact, it is not necessary. Only when the content of the HTTP request is written back to the standard output will the traffic be imported into the test machine. If it is not written to the standard output, it means that the request will not be sent to the test machine. Of course, you can also modify the HTTP request and output it to the standard output, and then send it to the test machine The test machine will be the modified HTTP request.

If you want to rewrite the HTTP request or compare the test results, you can't avoid encapsulating the operation of the HTTP protocol. Some open source authors have implemented some auxiliary library functions of the GoReplay plug-in, which makes it easier to write the GoReplay plug-in. For example, the GoReplay author has implemented the auxiliary library goreplay_middleware based on NodeJS; another example is that the open source author amyangfei has implemented Python based The auxiliary library gor of 3 can be installed through pip: pip install gor.

Comparison of flow test results

We implement the gor of amyangfei to compare the original response results with the response results of the test machine. At this time, the test deployment should be as shown in the figure below. The GoReplay plug-in process obtains the original response results from the GoReplay process and the response results of the test machine, and then compares them.

This article uses the example of the author amyangfei:

# coding: utf-8
import sys
from gor.middleware import AsyncioGor


def on_request(proxy, msg, **kwargs):
    proxy.on('response', on_response, idx=msg.id, req=msg)

def on_response(proxy, msg, **kwargs):
    proxy.on('replay', on_replay, idx=kwargs['req'].id, req=kwargs['req'], resp=msg)

def on_replay(proxy, msg, **kwargs):
    replay_status = proxy.http_status(msg.http)
    resp_status = proxy.http_status(kwargs['resp'].http)
    if replay_status != resp_status:
        sys.stderr.write('replay status [%s] diffs from response status [%s]\n' % (replay_status, resp_status))
    else:
        sys.stderr.write('replay status is same as response status\n')
    sys.stderr.flush()

if __name__ == '__main__':
    proxy = AsyncioGor()
    proxy.on('request', on_request)
    proxy.run()

We save it as plugin.py, then run the command line as follows, start GoReplay, and the plug-in process will be loaded:

sudo ./gor  --input-raw :9898 --output-http-track-response  --input-raw-track-response  --middleware "python3 plugin.py" --output-http "http://< target machine IP >: 9898“

Note that since standard input and standard output are used for inter process communication, our result output can be file or stderr.

  • If the results are the same, replay status is same as response status will be output.
  • If the results are different, 'replay status [%s] diffs from response status [%s] will be output

This is just an example. If you want to conduct actual engineering tests, you can also compare the HTTP header or HTTP Body. In addition, it is recommended to output the results to a file, and save the original request with unequal results, the original response results and the response results of the test machine to a file for subsequent analysis.

Rewrite request

Sometimes during the test process, we may need to modify some HTTP requests and import them into the test machine to achieve some test purposes. For example, based on the example in the previous chapter, I will give an example here. Change the HTTP request path from / to / test. At this time, the request path re sent to the test machine will become / test.

def on_request(proxy, msg, **kwargs):
    if proxy.http_path(msg.http) == '/':
        msg.http = proxy.set_http_path(msg.http, '/test')

    proxy.on('response', on_response, idx=msg.id, req=msg)

on_request is a callback function. After calling, the program will directly output the updated msg to the standard output according to the defined protocol format. GoReplay reads the new request from the standard output and sends it to the test machine.

For other request modification methods, the methods are similar, so the author will not repeat them.

reference resources

  1. GoRepay Wiki: https://github.com/buger/goreplay/wiki
  2. Python GorMW: https://github.com/amyangfei/GorMW

Posted by renj0806 on Fri, 03 Dec 2021 20:09:58 -0800