上一篇文章中介绍了tomcat的连接处理,在连接建立完毕后,tomcat通过Nio读取数据然后交给Http11Processor的service方法进行处理,整个处理过程大致分为协议解析和请求处理两步,这里大致介绍一下协议解析的过程。协议的解析过程在Http11InputBuffer中,主要工作由parseRequestLine和parseHeaders两个函数完成,前者负责解析request line(包括method, url, protocol),后者负责读取request header。

parseRequestLine

parseRequestLine比较长,不过代码结构相对来说比较清晰。rfc协议规定request line由method, request-uri和http-version三部分组成,中间通过分隔符进行分割。

The Request-Line begins with a method token, followed by the Request-URI and the protocol version, and ending with CRLF. The elements are separated by SP characters. No CR or LF is allowed except in the final CRLF sequence. Request-Line = Method SP Request-URI SP HTTP-Version CRLF

parseRequestLine基本上可以按照rfc协议划分为几个阶段,每个阶段有对应的parsingRequestLinePhase值来标示,每一个阶段结束后parsingRequestLinePhase都会被赋值进入到下一个阶段。如果某个阶段没有读取到数据,跳出解析后,下一次读取到数据再次进入到parseRequestLine则会根据上一次退出时parsingRequestLinePhase的状态直接进入到对应的阶段。

阶段1 跳过所有空行

        if (parsingRequestLinePhase < 2) {
            byte chr = 0;
            do {

                // Read new bytes if needed
                if (byteBuffer.position() >= byteBuffer.limit()) {
                    if (keptAlive) {
                        // Haven't read any request data yet so use the keep-alive
                        // timeout.
                        wrapper.setReadTimeout(wrapper.getEndpoint().getKeepAliveTimeout());
                    }
                    if (!fill(false)) {
                        // A read is pending, so no longer in initial state
                        parsingRequestLinePhase = 1;
                        return false;
                    }
                    // At least one byte of the request has been received.
                    // Switch to the socket timeout.
                    wrapper.setReadTimeout(wrapper.getEndpoint().getSoTimeout());
                }
                if (!keptAlive && byteBuffer.position() == 0 && byteBuffer.limit() >= CLIENT_PREFACE_START.length - 1) {
                    boolean prefaceMatch = true;
                    for (int i = 0; i < CLIENT_PREFACE_START.length && prefaceMatch; i++) {
                        if (CLIENT_PREFACE_START[i] != byteBuffer.get(i)) {
                            prefaceMatch = false;
                        }
                    }
                    if (prefaceMatch) {
                        // HTTP/2 preface matched
                        parsingRequestLinePhase = -1;
                        return false;
                    }
                }
                // Set the start time once we start reading data (even if it is
                // just skipping blank lines)
                if (request.getStartTime() < 0) {
                    request.setStartTime(System.currentTimeMillis());
                }
                chr = byteBuffer.get();
            } while ((chr == Constants.CR) || (chr == Constants.LF));
            byteBuffer.position(byteBuffer.position() - 1);

            parsingRequestLineStart = byteBuffer.position();
            parsingRequestLinePhase = 2;
            if (log.isDebugEnabled()) {
                log.debug("Received ["
                        + new String(byteBuffer.array(), byteBuffer.position(), byteBuffer.remaining(), StandardCharsets.ISO_8859_1) + "]");
            }
        }

首先会判断当前bytebuffer的position是否大于limit,如果大于等于则表示buffer中的所有数据已经处理完成,需要进一步读取数据。这时候则调用fill方法读取数据,如果读取到的数据是0字节,则将parsingRequestLine标记为1,并且退出parseRequestLine,service方法会根据当前状态进行下一步处理。如果读取到数据,则设置socket的timeout,并且逐个字节的从bytebuffer中读取数据,判断字符是否是\r或者\n,如果是则开始下一次循环。否则跳出循环,将position设置为position - 1,这是因为上一个读取到的字符是合法字符,后续还需要用到,同时标记parsingRequestLin为2,进入阶段2

阶段2 获取方法名 GET/POST等

        if (parsingRequestLinePhase == 2) {
            //
            // Reading the method name
            // Method name is a token
            //
            boolean space = false;
            while (!space) {
                // Read new bytes if needed
                if (byteBuffer.position() >= byteBuffer.limit()) {
                    if (!fill(false)) // request line parsing
                        return false;
                }
                // Spec says method name is a token followed by a single SP but
                // also be tolerant of multiple SP and/or HT.
                int pos = byteBuffer.position();
                byte chr = byteBuffer.get();
                if (chr == Constants.SP || chr == Constants.HT) {
                    space = true;
                    request.method().setBytes(byteBuffer.array(), parsingRequestLineStart,
                            pos - parsingRequestLineStart);
                } else if (!HttpParser.isToken(chr)) {
                    byteBuffer.position(byteBuffer.position() - 1);
                    throw new IllegalArgumentException(sm.getString("iib.invalidmethod"));
                }
            }
            parsingRequestLinePhase = 3;
        }

阶段2会一直读取字符如果字符不是空格或者\t,则进入下一次循环,否则根据parsingRequestLineStart和当前position从bytebuffer中读取对应的方法名,parsingRequestLineStart作为起始位置,position作为结束位置。

阶段3 跳过空格和\t

        if (parsingRequestLinePhase == 3) {
            // Spec says single SP but also be tolerant of multiple SP and/or HT
            boolean space = true;
            while (space) {
                // Read new bytes if needed
                if (byteBuffer.position() >= byteBuffer.limit()) {
                    if (!fill(false)) // request line parsing
                        return false;
                }
                byte chr = byteBuffer.get();
                if (!(chr == Constants.SP || chr == Constants.HT)) {
                    space = false;
                    byteBuffer.position(byteBuffer.position() - 1);
                }
            }
            parsingRequestLineStart = byteBuffer.position();
            parsingRequestLinePhase = 4;
        }

协议规定method后可以出现多个空格或者\t,所以在阶段3对这种情况做特殊处理,跳过空格和\t。

阶段4 读取url

        if (parsingRequestLinePhase == 4) {
            // Mark the current buffer position

            int end = 0;
            //
            // Reading the URI
            //
            boolean space = false;
            while (!space) {
                // Read new bytes if needed
                if (byteBuffer.position() >= byteBuffer.limit()) {
                    if (!fill(false)) // request line parsing
                        return false;
                }
                int pos = byteBuffer.position();
                byte chr = byteBuffer.get();
                if (chr == Constants.SP || chr == Constants.HT) {
                    space = true;
                    end = pos;
                } else if (chr == Constants.CR || chr == Constants.LF) {
                    // HTTP/0.9 style request
                    parsingRequestLineEol = true;
                    space = true;
                    end = pos;
                } else if (chr == Constants.QUESTION && parsingRequestLineQPos == -1) {
                    parsingRequestLineQPos = pos;
                } else if (HttpParser.isNotRequestTarget(chr)) {
                    throw new IllegalArgumentException(sm.getString("iib.invalidRequestTarget"));
                }
            }
            if (parsingRequestLineQPos >= 0) {
                request.queryString().setBytes(byteBuffer.array(), parsingRequestLineQPos + 1,
                        end - parsingRequestLineQPos - 1);
                request.requestURI().setBytes(byteBuffer.array(), parsingRequestLineStart,
                        parsingRequestLineQPos - parsingRequestLineStart);
            } else {
                request.requestURI().setBytes(byteBuffer.array(), parsingRequestLineStart,
                        end - parsingRequestLineStart);
            }
            parsingRequestLinePhase = 5;
        }

当读取的字符串是问号并且parsingRequestLineQPos是-1的时候,设置parsingRequestLineQPos为当前位置,这种情况下表明请求的url中带有参数需要记录下来。当读取的字符是空格或者\t,\r或者\n(兼容HTTP/0.9)的时候则设置end为当前位置,并且标记跳出循环。如果标记了有查询条件(parsingRequestLineQPos >= 0), parsingRequestLineStart到parsingRequestLineQPos为url,parsingRequestLineQPos到end位查询条件,否则parsingRequestLineStart到end为url。

阶段5 跳过空格和\t

阶段6 读取协议protocol

        if (parsingRequestLinePhase == 6) {
            //
            // Reading the protocol
            // Protocol is always "HTTP/" DIGIT "." DIGIT
            //
            while (!parsingRequestLineEol) {
                // Read new bytes if needed
                if (byteBuffer.position() >= byteBuffer.limit()) {
                    if (!fill(false)) // request line parsing
                        return false;
                }

                int pos = byteBuffer.position();
                byte chr = byteBuffer.get();
                if (chr == Constants.CR) {
                    end = pos;
                } else if (chr == Constants.LF) {
                    if (end == 0) {
                        end = pos;
                    }
                    parsingRequestLineEol = true;
                } else if (!HttpParser.isHttpProtocol(chr)) {
                    throw new IllegalArgumentException(sm.getString("iib.invalidHttpProtocol"));
                }
            }

当读取到\n的时候结束protocol的解析

parseHeaders

首先我们需要了解下rfc中对header的定义,

message-header = field-name “:” [ field-value ] field-name = token field-value = ( field-content | LWS ) field-content = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string> LWS = [CRLF] 1( SP | HT )

header由field-name和field-value两部分,中间使用”:”分割。field-value由field-content组成,中间会出现LWS(空格,\t,\r,\n)等。

    /**
     * Parse the HTTP headers.
     */
    boolean parseHeaders() throws IOException {
        if (!parsingHeader) {
            throw new IllegalStateException(sm.getString("iib.parseheaders.ise.error"));
        }

        HeaderParseStatus status = HeaderParseStatus.HAVE_MORE_HEADERS;

        do {
            status = parseHeader();
            // Checking that
            // (1) Headers plus request line size does not exceed its limit
            // (2) There are enough bytes to avoid expanding the buffer when
            // reading body
            // Technically, (2) is technical limitation, (1) is logical
            // limitation to enforce the meaning of headerBufferSize
            // From the way how buf is allocated and how blank lines are being
            // read, it should be enough to check (1) only.
            if (byteBuffer.position() > headerBufferSize || byteBuffer.capacity() - byteBuffer.position() < socketReadBufferSize) {
                throw new IllegalArgumentException(sm.getString("iib.requestheadertoolarge.error"));
            }
        } while (status == HeaderParseStatus.HAVE_MORE_HEADERS);
        if (status == HeaderParseStatus.DONE) {
            parsingHeader = false;
            end = byteBuffer.position();
            return true;
        } else {
            return false;
        }
    }

parseHeaders主要由一个循环构成,循环内部不断读取header,如果出现以下两种情况则会抛出异常,

  1. request line的大小加上当前已经读取的header的大小超过了headerBufferSize
  2. byteBuffer的剩余容量比socketReadBufferSize要小

上面两种情况都会导致没有足够的空间存储后续的请求数据。 读取的工作主要由parseHeader来完成,解析过程和request line类似,也可以划分为几个步骤,

阶段1 跳过空行

        //
        // Check for blank line
        //

        byte chr = 0;
        while (headerParsePos == HeaderParsePosition.HEADER_START) {

            // Read new bytes if needed
            if (byteBuffer.position() >= byteBuffer.limit()) {
                if (!fill(false)) {// parse header
                    headerParsePos = HeaderParsePosition.HEADER_START;
                    return HeaderParseStatus.NEED_MORE_DATA;
                }
            }

            chr = byteBuffer.get();

            if (chr == Constants.CR) {
                // Skip
            } else if (chr == Constants.LF) {
                return HeaderParseStatus.DONE;
            } else {
                byteBuffer.position(byteBuffer.position() - 1);
                break;
            }

        }

        if (headerParsePos == HeaderParsePosition.HEADER_START) {
            // Mark the current buffer position
            headerData.start = byteBuffer.position();
            headerParsePos = HeaderParsePosition.HEADER_NAME;
        }

解析的过程中使用headerParsePos来标示当前状态,

    private static enum HeaderParsePosition {
        /**
         * 准备读取下一个HEADER, 如果在此状态下读取到\n\r则表示没有需要继续处理的HEADER了,
         * 如果读取到其他字符则进入到HEADER_NMAE阶段
         */
        HEADER_START,
        /**
         * 读取HEADER NAME。":"会出现在HEADER NAME之后,如果在":"之前出现任何non-HTTP_TOKEN_CHAR(包含)
         * 任意空白字符都会导致当前行被忽略
         */
        HEADER_NAME,
        /**
         * 该阶段用于跳过HEADER VALUE之前的空白字符,这个字符有可能出现在HEADER VALUE的首行,也可能出现在后续的
         * 行中。
         */
        HEADER_VALUE_START,
        /**
         * 读取数据。当状态处于HEADER_VALUE_START时,如果遇到第一个非空格和\t,则进入到HEADER_VALUE阶段
         */
        HEADER_VALUE,

        HEADER_MULTI_LINE,

        HEADER_SKIPLINE
    }

阶段2 读取header name

        //
        // Reading the header name
        // Header name is always US-ASCII
        //

        while (headerParsePos == HeaderParsePosition.HEADER_NAME) {

            // Read new bytes if needed
            if (byteBuffer.position() >= byteBuffer.limit()) {
                if (!fill(false)) { // parse header
                    return HeaderParseStatus.NEED_MORE_DATA;
                }
            }

            int pos = byteBuffer.position();
            chr = byteBuffer.get();
            if (chr == Constants.COLON) {
                headerParsePos = HeaderParsePosition.HEADER_VALUE_START;
                headerData.headerValue = headers.addValue(byteBuffer.array(), headerData.start,
                        pos - headerData.start);
                pos = byteBuffer.position();
                // Mark the current buffer position
                headerData.start = pos;
                headerData.realPos = pos;
                headerData.lastSignificantChar = pos;
                break;
            } else if (!HttpParser.isToken(chr)) {
                // If a non-token header is detected, skip the line and
                // ignore the header
                headerData.lastSignificantChar = pos;
                byteBuffer.position(byteBuffer.position() - 1);
                return skipLine();
            }

            // chr is next byte of header name. Convert to lowercase.
            if ((chr >= Constants.A) && (chr <= Constants.Z)) {
                byteBuffer.put(pos, (byte) (chr - Constants.LC_OFFSET));
            }
        }

如协议所示当读取到的字符是”:”的时候进入到下一个阶段HEADER_VALUE_START。果在”:”之前读取到了非法字符则进入skipLine,skipLine首先会设置当前状态为HEADER_SKIPLINE,然后不断跳过\r直到读取到\n,然后重新进入到HEADER_START阶段,返回HAVE_MORE_HEADERS,在这种情况下parseHeaders会进入到下一个header的读取

阶段3 读取value

在三种状态下都可以进入到阶段3,分别是HEADER_VALUE_START, HEADER_VALUE和HEADER_MULTI_LINE。如果当前阶段是HEADER_VALUE_START,则会不停的读取字符直到遇到非空格或者\t,然后进入到HEADER_VALUE阶段。

            if (headerParsePos == HeaderParsePosition.HEADER_VALUE_START) {
                // Skipping spaces
                while (true) {
                    // Read new bytes if needed
                    if (byteBuffer.position() >= byteBuffer.limit()) {
                        if (!fill(false)) {// parse header
                            // HEADER_VALUE_START
                            return HeaderParseStatus.NEED_MORE_DATA;
                        }
                    }

                    chr = byteBuffer.get();
                    if (!(chr == Constants.SP || chr == Constants.HT)) {
                        headerParsePos = HeaderParsePosition.HEADER_VALUE;
                        byteBuffer.position(byteBuffer.position() - 1);
                        break;
                    }
                }
            }

如果当前阶段是HEADER_VALUE,则读取数据直到遇到\n,然后将状态设置为HEADER_MULTI_LINE。

            if (headerParsePos == HeaderParsePosition.HEADER_VALUE) {

                // Reading bytes until the end of the line
                boolean eol = false;
                while (!eol) {

                    // Read new bytes if needed
                    if (byteBuffer.position() >= byteBuffer.limit()) {
                        if (!fill(false)) {// parse header
                            // HEADER_VALUE
                            return HeaderParseStatus.NEED_MORE_DATA;
                        }
                    }

                    chr = byteBuffer.get();
                    if (chr == Constants.CR) {
                        // Skip
                    } else if (chr == Constants.LF) {
                        eol = true;
                    } else if (chr == Constants.SP || chr == Constants.HT) {
                        byteBuffer.put(headerData.realPos, chr);
                        headerData.realPos++;
                    } else {
                        byteBuffer.put(headerData.realPos, chr);
                        headerData.realPos++;
                        headerData.lastSignificantChar = headerData.realPos;
                    }
                }

                // Ignore whitespaces at the end of the line
                headerData.realPos = headerData.lastSignificantChar;

                // Checking the first character of the new line. If the character
                // is a LWS, then it's a multiline header
                headerParsePos = HeaderParsePosition.HEADER_MULTI_LINE;
            }

如果当前状态是HEADER_MULTI_LINE,则会读取下一个字符,然后判断这个字符是否是空格或者\t,如果不是则表示进入到了下一个header,将当前转改标记为HEADER_START,退出循环。如果字符是空格或者\t,则表示当前header的value还没有读取完毕,重新设置状态为HEADER_VALUE_START,进入下一次循环继续读取数据。

            chr = byteBuffer.get(byteBuffer.position());
            if (headerParsePos == HeaderParsePosition.HEADER_MULTI_LINE) {
                if ((chr != Constants.SP) && (chr != Constants.HT)) {
                    headerParsePos = HeaderParsePosition.HEADER_START;
                    break;
                } else {
                    // Copying one extra space in the buffer (since there must
                    // be at least one space inserted between the lines)
                    byteBuffer.put(headerData.realPos, chr);
                    headerData.realPos++;
                    headerParsePos = HeaderParsePosition.HEADER_VALUE_START;
                }
            }
        }

当上面三步都处理完成的时候,将bytebuffer中的数据设置到header-value中。到这里tomcat解析协议的流程就差不多走完了。接下来将进入到请求处理阶段。