How to Open SSH Service on Port 80 of Web Server

Keywords: Go ssh network github curl

The network port multiplexing discussed in this paper does not refer to Socket Bind multiplexing with SO_REUSEADDR option in network programming. It is more like a port forwarding tool with a specific routing function, implemented in the application layer.

background

The firewall in my network only opens one port, but I hope to provide a variety of network services for testing. So we need to find a solution that can identify the characteristics of TCP data packets and provide HTTP/SSH/MQTT and other services on an open port at the same time.

For example, you can reuse an SSH service on port 80. Ordinary users only know that the browser accesses http://x.x.x/, but you can access your server in the way of SSH user@x.x.x-p 80. This is also a way to hide the SSH service.

Port multiplexing artifact - sslh

sslh It is an open source port reuse software written in C language. At present, it supports HTTP, SSL, SSH, OpenVPN, tinc, XMPP and other protocol identification. It runs mainly in the * nix environment, and the source code is hosted in GitHub Up. According to the official website, Windows system can be compiled and run in Cygwin environment, the author did not test.

The compilation process is not complicated. It is operated directly according to the official documents, and it is not discussed here. Debian users can install it directly through sudo apt-get install sslh.

Compilation generates two executable files: sslh-fork and sslh-select. The difference between them lies in the difference of working mode:

sslh-fork uses * nix process fork model to process packet forwarding for each TCP connection fork. For long connections, there is no need to establish a large number of new connections frequently, and the overhead of fork can be neglected. However, if a short connection request such as HTTP is forwarded by fork subprocess, the efficiency will be affected when a large number of concurrent requests occur. However, the fork mode has been well tested and runs stably and reliably.
sslh-select uses single thread monitoring to manage all network connections, which is a relatively new way. However, compared with event-based I/O mechanisms such as epool, the efficiency of the traditional polling mode of select is relatively low.

sslh supports the use of regular expressions in configuration files to customize protocol identification rules, but when I tried MQTT v3.1 protocol identification, I came up with problem . Of course, it's also possible that the regular expression I wrote does not match the regular library it uses.

High Performance Load Balancer - HAProxy

HAProxy It is an open source and high performance TCP/HTTP software load balancer. At present, it has a very wide range of applications in game back-end services and Web server load balancing. Through configuration, multiple SSL applications can reuse the same port, such as HTTPS, SSH, OpenVPN and so on. Here's an article. Reference document.

Although HAProxy has excellent performance, it is not easy to extend to meet specific needs.

Modern Language for the Internet - Go

Go Language is one of the excellent programming languages I have studied in recent years. Its simplicity and efficiency have attracted me deeply (I like simple things, such as Python). goroutine of Go language provides concurrent support at the language level, and channel provides convenient and reliable communication mechanism between these protocols. In combination, the Go language is very suitable for writing highly concurrent network applications. Previously, I had planned to use Python+gevent. Finally, considering the high efficiency of static compilation of Go language, I did not choose Python.

Flip over Github and find a project like sslh implemented in Go language—— Switcher . It hasn't been updated for a long time and supports very few protocols - in fact, it can only recognize SSH protocols. The implementation of Switcher is very simple, with less than 200 lines of core code. So I decided to rebuild it on the basis of it to realize the functions I need.

D——I——Y

To Github fork A Switcher code, based on which to modify. It's a modification, but it's totally different. In the new implementation, the original architecture is adjusted to remove the direct support for SSH protocol, and a more general protocol recognition mode is adopted, so that most protocols can be supported without modifying the program but only with simple configuration, which makes the program more versatile.

First, the most common protocol matching pattern is to compare the characteristics of the target protocol according to the first few bytes of the package. If only the first N bytes of each protocol are saved and compared one by one without any processing, there may be some efficiency problems. On the one hand, all patterns need to be traversed and compared with the received packet s one by one; on the other hand, if the network delay is too large to collect enough bytes at one time, it needs to be compared repeatedly. Let's take an extreme example. Suppose I have 100 target protocols that need to be matched. Patterns are all over 10 bytes in size. At this time, I connect to the server through telnet/netcat and send data one byte at a time. Then the server may have to make 10*100 string comparisons.

To solve this problem, a simple tree structure is designed, which fills all pattern s into the tree in bytes until the end. The destination IP and port values corresponding to the protocol are saved on the leaf node.

func (t *MatchTree) Add(p *PREFIX) {
    for _, patternStr := range p.Patterns {
        pattern := []byte(patternStr)
        node := t.Root
        for i, b := range pattern {
            nodes := node.ChildNodes
            if next_node, ok := nodes[b]; ok {
                node = next_node
                continue
            }

            if nodes == nil {
                nodes = make(map[byte]*MatchTreeNode)
                node.ChildNodes = nodes
            }

            root, leaf := createSubTree(pattern[i+1:])
            leaf.Address = p.Address
            nodes[b] = root

            break
        }
    }
}

Maybe I think too much, in the case of a small number of protocols that need to be compared, maybe such a design can not bring fundamental efficiency improvement. But I like this constant effort to improve my efficiency.^^

Regular expressions are more flexible than packet prefix matching patterns. So I took a similar approach to sslh, adding support for regular expressions. Considering the efficiency and specific implementation, some restrictions are added to the regular expression matching rules, such as the need to know the maximum length of the target string. Regular expressions can only be matched one by one if the packet buffer meets a certain length requirement.

func (p *REGEX) Probe(header []byte) (result ProbeResult, address string) {
    if p.MinLength > 0 && len(header) < p.MinLength {
        return TRYAGAIN, ""
    }
    for _, re := range p.regexpList {
        if re.Match(header) {
            return MATCH, p.Address
        }
    }

    if p.MaxLength > 0 && len(header) >= p.MaxLength {
        return UNMATCH, ""
    }

    return TRYAGAIN, ""
}

Based on the above two simple matching rules, it is easy to construct common protocols such as ssh and http. In the implementation, I added some common protocol support to save the trouble of user customization.

    case "ssh":
        service = "prefix"
        p = &PREFIX{ps.BaseConfig, []string{"SSH"}}
    case "http":
        service = "prefix"
        p = &PREFIX{ps.BaseConfig, []string{"GET ", "POST ", "PUT ", "DELETE ", "HEAD ", "OPTIONS "}}

Special protocols need to be implemented separately. For example, the MQTT protocol I need cannot be recognized by simple string alignment or regular expression. Because it has no established pattern and structure is not fixed length. MQTT protocol recognition is implemented as follows:

func (s *MQTT) Probe(header []byte) (result ProbeResult, address string) {
    if header[0] != 0x10 {
        return UNMATCH, ""
    }

    if len(header) < 13 {
        return TRYAGAIN, ""
    }

    i := 1
    for ; ; i++ {
        if header[i]&0x80 == 0 {
            break
        }

        if i == 4 {
            return UNMATCH, ""
        }
    }

    i++

    if bytes.Compare(header[i:i+8], []byte("\x00\x06MQIsdp")) == 0 || bytes.Compare(header[i:i+6], []byte("\x00\x04MQTT")) == 0 {
        return MATCH, s.Address
    }

    return UNMATCH, ""
}

The configuration file is in json format, mainly for convenience and flexibility. Here is an example:

{
    "listen": ":80",
    "default": "127.0.0.1:80",
    "timeout": 1,
    "connect_timeout": 1,
    "protocols": [
        {
            "service": "ssh",
            "addr": "127.0.0.1:22"
        },
        {
            "service": "mqtt",
            "addr": "127.0.0.1:1883"
        },
        {
            "name": "custom_http",
            "service": "regex",
            "addr": "127.0.0.1:8080",
            "patterns": [
                "^(GET|POST|PUT|DELETE|HEAD|\\x79PTIONS) "
            ]
        },
        {
            "service": "prefix",
            "addr": "127.0.0.1:8081",
            "patterns": [
                "GET ",
                "POST "
            ]
        }
    ]
}

performance testing

Dead work

First, prepare a simple Web service application. Previously, I wrote a simple script with Python+bjoern to test the network bandwidth myself, but I couldn't find it for half a day. It's a new one in Go language. The function is to return N characters according to the input parameter value N.

package main

import (
    "bytes"
    "flag"
    "fmt"
    "log"
    "net/http"
    "regexp"
    "strconv"
    "strings"
)

func defaultHandler(w http.ResponseWriter, r *http.Request) {
    if r.URL.Path == "/" {
        fmt.Fprintln(w, "It works.")
        return
    }

    myHandler(w, r)
}

func myHandler(w http.ResponseWriter, r *http.Request) {
    re := regexp.MustCompile(`^/(\d+)([kKmMgGtT]?)$`)
    match := re.FindStringSubmatch(r.URL.Path)
    if match == nil {
        http.NotFound(w, r)
        return
    }

    buffSize := 20480
    buff := bytes.Repeat([]byte{'X'}, buffSize)

    size, _ := strconv.ParseInt(match[1], 10, 64)
    switch strings.ToLower(match[2]) {
    case "k":
        size *= 1 << 10
    case "m":
        size *= 1 << 20
    case "g":
        size *= 1 << 30
    case "t":
        size *= 1 << 40
    }

    w.Header().Set("Content-Length", strconv.FormatInt(size, 10))
    for buffSize := int64(buffSize); size >= buffSize; size -= buffSize {
        w.Write(buff)
    }
    if size > 0 {
        w.Write(bytes.Repeat([]byte{'X'}, int(size)))
    }

}

func main() {
    portPtr := flag.Int("port", 8080, "Monitor port")

    flag.Parse()

    http.HandleFunc("/", defaultHandler)
    err := http.ListenAndServe(fmt.Sprintf(":%d", *portPtr), nil)
    if err != nil {
        log.Fatal("ListenAndServe: ", err)
    }
}

Compile and run, test the Web server running normally.

$ go build test.go
$ ./test -port 9999 &
$ curl localhost:9999/1
X
$ curl localhost:9999/10
XXXXXXXXXX
$ curl -o /dev/null localhost:9999/10g
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10.0G  100 10.0G    0     0  1437M      0  0:00:07  0:00:07 --:--:-- 1469M

Similar to the above demonstration process, I used curl to download large files to test the network I/O rate. Of course, the test process did not go through the physical network card, but directly through the loopback interface. This allows for a more objective comparison of the rate of decline after the agency.

In addition, the AB stress testing tool similar to Apache is used to test the Web response speed under high concurrency. Here, I use boom s that are more perverted than ab. It's an open source pressure testing software implemented in Go language, recently renamed hey. It's called on the home page because of its pressure testing tool with Python version. Boom! Name conflict. Installation and use are simple:

$ go get -u github.com/rakyll/hey
$ $GOPATH/bin/hey http://localhost:9999/1
......
All requests done.

Summary:
  Total:        0.0223 secs
  Slowest:      0.0182 secs
  Fastest:      0.0002 secs
  Average:      0.0039 secs
  Requests/sec: 8962.9371
  Total data:   200 bytes
  Size/request: 1 bytes
......

Download and install my modified version of Switcher:

$ go get github.com/jackyspy/switcher

The commands sslh runs are as follows:

$ sudo sslh-select -n -p 127.0.0.1:9998 --ssh 127.0.0.1:22 --http 127.0.0.1:9999

Test Switcher with the following configuration file default.cfg:

{
    "listen": ":9997",
    "default": "127.0.0.1:22",
    "timeout": 1,
    "connect_timeout": 1,
    "protocols": [
        {
            "service": "ssh",
            "addr": "127.0.0.1:22"
        },
        {
            "service": "http",
            "addr": "127.0.0.1:9999"
        }
    ]
}

The testing process is mainly used for the following two commands. When testing sslh and switch, you can change the port number.

$ curl -o /dev/null localhost:9999/10g
$ $GOPATH/bin/hey -n 100000 http://localhost:9999/1

OK, everything is ready, only to be tested.

Start testing

The test is divided into two parts. One is to test the download rate of large files. In order not to be limited to the network card rate, the test is done locally. The other is to test the concurrent amount of Web requests and initiate tests on another computer.

In order to reduce the amount of manual operation, a code is simply written in Python to test the speed several times and output the results.

# coding=utf-8
from __future__ import print_function
import itertools
from subprocess import check_output


def get_speed(port):
    cmd = 'curl -o /dev/null -s -w %{{speed_download}} localhost:{}/10g'.format(port)  # noqa
    speed = check_output(cmd.split())
    return float(speed)


def test_multi_times(port, times):
    return map(get_speed, itertools.repeat(port, times))


def format_speed(speed):
    return str(int(0.5 + speed / 1024 / 1024))


def main():
    testcases = {
        'Direct': 9999,
        'sslh': 9998,
        'switcher': 9997
    }

    count = 10

    print('| Target | {} | Avg | '.format(
        ' | '.join(str(x) for x in range(1, count + 1))))
    print(' --: '.join('|' * (count + 3)))
    for name, port in testcases.items():
        speed_list = test_multi_times(port, count)
        speed_list.append(sum(speed_list) / len(speed_list))
        print('|{}|{}|'.format(name, '|'.join(map(format_speed, speed_list))))


if __name__ == '__main__':
    main()

The results are as follows (speed unit is MB/s):

Target	1	2	3	4	5	6	7	8	9	10	Avg
switcher	870	876	924	915	885	928	904	880	909	898	899
sslh	866	865	860	880	865	861	866	863	864	856	865
Direct	1446	1505	1392	1362	1423	1419	1395	1492	1412	1427	1427

It can be seen that the downlink rate decreases significantly after proxy. sslh is slightly lower than switcher, and the difference is not too big.

Similarly, in order to facilitate the test of concurrent request response, a script was written to complete:

# coding=utf-8
from __future__ import print_function
import itertools
from subprocess import check_output


def get_speed(url):
    cmd = "hey -n 100000 -c 50 {}  | grep 'Requests/sec'".format(url)  # noqa
    output = check_output(cmd, shell=True)
    return float(output.partition(':')[2])


def test_multi_times(url, times):
    return map(get_speed, itertools.repeat(url, times))


def main():
    testcases = {
        'Direct': 'http://x.x.x.x:9999/1',
        'sslh': 'http://x.x.x.x:9998/1',
        'switcher': 'http://x.x.x.x:9997/1'
    }

    count = 10

    print('| Target | {} | Average | '.format(
        ' | '.join(str(x) for x in range(1, count + 1))))
    print(' --: '.join('|' * (count + 3)))
    for name, port in testcases.items():
        speed_list = test_multi_times(port, count)
        speed_list.append(sum(speed_list) / len(speed_list))
        print('|{}|{}|'.format(name, '|'.join('{:.0f}'.format(x + 0.5)
                                              for x in speed_list)))


if __name__ == '__main__':
    main()

The results are as follows (the unit of speed is Requests/s):

Target	1	2	3	4	5	6	7	8	9	10	Average
switcher	14367	14886	15144	14289	15456	14834	14871	14951	14610	14865	14827
sslh	13892	14281	14469	14352	14468	14132	14510	14565	14633	14555	14386
Direct	20494	20110	20558	19519	19467	19891	19777	19682	20737	20396	20063

Similar to the previous tests, RPS also decreased significantly after proxy. sslh is slightly lower than switcher, and the difference is not significant.

More application scenarios??

The network port reuse described in this paper is essentially a TCP application proxy. Based on this, we can also extend many other application scenarios.

One scenario I came up with was dynamic IP authentication. We reuse HTTP and SSH. By default, HTTP can be accessed by everyone, but SSH needs to be authenticated by IP address before forwarding packets. Unlike the IP address access rules implemented by firewalls such as iptables, it is restricted at the application level and has strong flexibility. It can be dynamically added and deleted through programs. For example, I access a specific authentication page through a mobile browser. After verification, the system automatically adds my current IP address to the access list, and then I can access the server smoothly through SSH. After the connection is established, the temporary IP address can be removed from the access list, which enhances the security of the server to a certain extent.

Posted by jon23d on Thu, 21 Mar 2019 03:54:52 -0700

Programmer Group