Chinese scrambling in Tomcat 8.0.0 response to get request

Keywords: encoding Tomcat Apache Attribute

Problem Description:

When the page code UTF-8 has been specified in the application filter, the Chinese in the request sent by GET mode is still recognized as scrambling code, while the Chinese in the request sent by POST mode is normal.

Reason:

Tomcat has a problem in parsing parameters passed by two different request modes, which should be treated as a BUG. Tomcat's version after 8.0.0 fixes this problem to some extent.

Code analysis:

Tomcat uses the class of org.apache.tomcat.util.http.Parameters when parsing parameters passed by the front desk. The method invoked when parsing Get parameters is as follows

/** Process the query string into parameters
       */
 public void handleQueryParameters() {
        if( didQueryParameters ) {
            return;
        }

        didQueryParameters=true;

        if( queryMB==null || queryMB.isNull() ) {
            return;
        }

        if(log.isDebugEnabled()) {
            log.debug("Decoding query " + decodedQuery + " " +
                    queryStringEncoding);
        }

        try {
            decodedQuery.duplicate( queryMB );
        } catch (IOException e) {
            // Can't happen, as decodedQuery can't overflow
            e.printStackTrace();
        }
        processParameters( decodedQuery, queryStringEncoding );
    }

Note the red line, where the character encoding is specified by the queryStringEncoding property of the Parameters object, and if this property is empty, the default encoding method iso8859-1 is used. However, this field cannot be specified manually when encoding, but is assigned by Tomcat when assigning the request object.

 Request request = (Request) req.getNote(ADAPTER_NOTES);
        Response response = (Response) res.getNote(ADAPTER_NOTES);

        if (request == null) {

            // Create objects
            request = connector.createRequest();
            request.setCoyoteRequest(req);
            response = connector.createResponse();
            response.setCoyoteResponse(res);

            // Link objects
            request.setResponse(response);
            response.setRequest(request);

            // Set as notes
            req.setNote(ADAPTER_NOTES, request);
            res.setNote(ADAPTER_NOTES, response);

            // Set query string encoding
            req.getParameters().setQueryStringEncoding
                (connector.getURIEncoding());

        }

The above code is part of the org.apache.catalina.connector.CoyoteAdapter. The red part indicates that queryStringEncoding should be given by Connector, but the construction of Connector indicates that the value of URIEncoding is null (especially in Tomcat version 8.0.0-RC1), which results in that all parameters passed by get are decoded in iso8859-1 format, resulting in confusion in Chinese.

POST mode does not cause this problem, because the parameter analytic function of Post is

 public void processParameters( byte bytes[], int start, int len ) {
        processParameters(bytes, start, len, getCharset(encoding));
    }

The encoding method is specified by the encoding attribute of the Parameters object. The priority of obtaining the attribute value is (acquired in Content-Type of Request)> (specified in user code)> (specified in Tomcat). As long as the encoding method is specified manually before decoding, there will be no scrambling problem.

Solution:

1 Upgraded Tomcat, which bypasses this problem by setting character encoding during Connector construction.

        if (!Globals.STRICT_SERVLET_COMPLIANCE) {
            URIEncoding = "UTF-8";
            URIEncodingLower = URIEncoding.toLowerCase(Locale.ENGLISH);
        }

2. Modify the value of request.request.coyoteRequest.parameters.queryStringEncoding in filter by reflection (request.request is either a spelling error or a request object in request).

Posted by cbullock on Fri, 22 Mar 2019 09:57:53 -0700