上一篇文章中介绍了tomcat的连接处理,在连接建立完毕后,tomcat通过Nio读取数据然后交给Http11Processor的service方法进行处理,整个处理过程大致分为协议解析和请求处理两步,这里大致介绍一下协议解析的过程。协议的解析过程在Http11InputBuffer中,主要工作由parseRequestLine和parseHeaders两个函数完成,前者负责解析request line(包括method, url, protocol),后者负责读取request header。
parseRequestLine
parseRequestLine比较长,不过代码结构相对来说比较清晰。rfc协议规定request line由method, request-uri和http-version三部分组成,中间通过分隔符进行分割。
The Request-Line begins with a method token, followed by the Request-URI and the protocol version, and ending with CRLF. The elements are separated by SP characters. No CR or LF is allowed except in the final CRLF sequence. Request-Line = Method SP Request-URI SP HTTP-Version CRLF
parseRequestLine基本上可以按照rfc协议划分为几个阶段,每个阶段有对应的parsingRequestLinePhase值来标示,每一个阶段结束后parsingRequestLinePhase都会被赋值进入到下一个阶段。如果某个阶段没有读取到数据,跳出解析后,下一次读取到数据再次进入到parseRequestLine则会根据上一次退出时parsingRequestLinePhase的状态直接进入到对应的阶段。
阶段1 跳过所有空行
if (parsingRequestLinePhase < 2) {
byte chr = 0;
do {
// Read new bytes if needed
if (byteBuffer.position() >= byteBuffer.limit()) {
if (keptAlive) {
// Haven't read any request data yet so use the keep-alive
// timeout.
wrapper.setReadTimeout(wrapper.getEndpoint().getKeepAliveTimeout());
}
if (!fill(false)) {
// A read is pending, so no longer in initial state
parsingRequestLinePhase = 1;
return false;
}
// At least one byte of the request has been received.
// Switch to the socket timeout.
wrapper.setReadTimeout(wrapper.getEndpoint().getSoTimeout());
}
if (!keptAlive && byteBuffer.position() == 0 && byteBuffer.limit() >= CLIENT_PREFACE_START.length - 1) {
boolean prefaceMatch = true;
for (int i = 0; i < CLIENT_PREFACE_START.length && prefaceMatch; i++) {
if (CLIENT_PREFACE_START[i] != byteBuffer.get(i)) {
prefaceMatch = false;
}
}
if (prefaceMatch) {
// HTTP/2 preface matched
parsingRequestLinePhase = -1;
return false;
}
}
// Set the start time once we start reading data (even if it is
// just skipping blank lines)
if (request.getStartTime() < 0) {
request.setStartTime(System.currentTimeMillis());
}
chr = byteBuffer.get();
} while ((chr == Constants.CR) || (chr == Constants.LF));
byteBuffer.position(byteBuffer.position() - 1);
parsingRequestLineStart = byteBuffer.position();
parsingRequestLinePhase = 2;
if (log.isDebugEnabled()) {
log.debug("Received ["
+ new String(byteBuffer.array(), byteBuffer.position(), byteBuffer.remaining(), StandardCharsets.ISO_8859_1) + "]");
}
}
首先会判断当前bytebuffer的position是否大于limit,如果大于等于则表示buffer中的所有数据已经处理完成,需要进一步读取数据。这时候则调用fill方法读取数据,如果读取到的数据是0字节,则将parsingRequestLine标记为1,并且退出parseRequestLine,service方法会根据当前状态进行下一步处理。如果读取到数据,则设置socket的timeout,并且逐个字节的从bytebuffer中读取数据,判断字符是否是\r或者\n,如果是则开始下一次循环。否则跳出循环,将position设置为position - 1,这是因为上一个读取到的字符是合法字符,后续还需要用到,同时标记parsingRequestLin为2,进入阶段2
阶段2 获取方法名 GET/POST等
if (parsingRequestLinePhase == 2) {
//
// Reading the method name
// Method name is a token
//
boolean space = false;
while (!space) {
// Read new bytes if needed
if (byteBuffer.position() >= byteBuffer.limit()) {
if (!fill(false)) // request line parsing
return false;
}
// Spec says method name is a token followed by a single SP but
// also be tolerant of multiple SP and/or HT.
int pos = byteBuffer.position();
byte chr = byteBuffer.get();
if (chr == Constants.SP || chr == Constants.HT) {
space = true;
request.method().setBytes(byteBuffer.array(), parsingRequestLineStart,
pos - parsingRequestLineStart);
} else if (!HttpParser.isToken(chr)) {
byteBuffer.position(byteBuffer.position() - 1);
throw new IllegalArgumentException(sm.getString("iib.invalidmethod"));
}
}
parsingRequestLinePhase = 3;
}
阶段2会一直读取字符如果字符不是空格或者\t,则进入下一次循环,否则根据parsingRequestLineStart和当前position从bytebuffer中读取对应的方法名,parsingRequestLineStart作为起始位置,position作为结束位置。
阶段3 跳过空格和\t
if (parsingRequestLinePhase == 3) {
// Spec says single SP but also be tolerant of multiple SP and/or HT
boolean space = true;
while (space) {
// Read new bytes if needed
if (byteBuffer.position() >= byteBuffer.limit()) {
if (!fill(false)) // request line parsing
return false;
}
byte chr = byteBuffer.get();
if (!(chr == Constants.SP || chr == Constants.HT)) {
space = false;
byteBuffer.position(byteBuffer.position() - 1);
}
}
parsingRequestLineStart = byteBuffer.position();
parsingRequestLinePhase = 4;
}
协议规定method后可以出现多个空格或者\t,所以在阶段3对这种情况做特殊处理,跳过空格和\t。
阶段4 读取url
if (parsingRequestLinePhase == 4) {
// Mark the current buffer position
int end = 0;
//
// Reading the URI
//
boolean space = false;
while (!space) {
// Read new bytes if needed
if (byteBuffer.position() >= byteBuffer.limit()) {
if (!fill(false)) // request line parsing
return false;
}
int pos = byteBuffer.position();
byte chr = byteBuffer.get();
if (chr == Constants.SP || chr == Constants.HT) {
space = true;
end = pos;
} else if (chr == Constants.CR || chr == Constants.LF) {
// HTTP/0.9 style request
parsingRequestLineEol = true;
space = true;
end = pos;
} else if (chr == Constants.QUESTION && parsingRequestLineQPos == -1) {
parsingRequestLineQPos = pos;
} else if (HttpParser.isNotRequestTarget(chr)) {
throw new IllegalArgumentException(sm.getString("iib.invalidRequestTarget"));
}
}
if (parsingRequestLineQPos >= 0) {
request.queryString().setBytes(byteBuffer.array(), parsingRequestLineQPos + 1,
end - parsingRequestLineQPos - 1);
request.requestURI().setBytes(byteBuffer.array(), parsingRequestLineStart,
parsingRequestLineQPos - parsingRequestLineStart);
} else {
request.requestURI().setBytes(byteBuffer.array(), parsingRequestLineStart,
end - parsingRequestLineStart);
}
parsingRequestLinePhase = 5;
}
当读取的字符串是问号并且parsingRequestLineQPos是-1的时候,设置parsingRequestLineQPos为当前位置,这种情况下表明请求的url中带有参数需要记录下来。当读取的字符是空格或者\t,\r或者\n(兼容HTTP/0.9)的时候则设置end为当前位置,并且标记跳出循环。如果标记了有查询条件(parsingRequestLineQPos >= 0), parsingRequestLineStart到parsingRequestLineQPos为url,parsingRequestLineQPos到end位查询条件,否则parsingRequestLineStart到end为url。
阶段5 跳过空格和\t
阶段6 读取协议protocol
if (parsingRequestLinePhase == 6) {
//
// Reading the protocol
// Protocol is always "HTTP/" DIGIT "." DIGIT
//
while (!parsingRequestLineEol) {
// Read new bytes if needed
if (byteBuffer.position() >= byteBuffer.limit()) {
if (!fill(false)) // request line parsing
return false;
}
int pos = byteBuffer.position();
byte chr = byteBuffer.get();
if (chr == Constants.CR) {
end = pos;
} else if (chr == Constants.LF) {
if (end == 0) {
end = pos;
}
parsingRequestLineEol = true;
} else if (!HttpParser.isHttpProtocol(chr)) {
throw new IllegalArgumentException(sm.getString("iib.invalidHttpProtocol"));
}
}
当读取到\n的时候结束protocol的解析
parseHeaders
首先我们需要了解下rfc中对header的定义,
message-header = field-name “:” [ field-value ] field-name = token field-value = ( field-content | LWS ) field-content = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string> LWS = [CRLF] 1( SP | HT )
header由field-name和field-value两部分,中间使用”:”分割。field-value由field-content组成,中间会出现LWS(空格,\t,\r,\n)等。
/**
* Parse the HTTP headers.
*/
boolean parseHeaders() throws IOException {
if (!parsingHeader) {
throw new IllegalStateException(sm.getString("iib.parseheaders.ise.error"));
}
HeaderParseStatus status = HeaderParseStatus.HAVE_MORE_HEADERS;
do {
status = parseHeader();
// Checking that
// (1) Headers plus request line size does not exceed its limit
// (2) There are enough bytes to avoid expanding the buffer when
// reading body
// Technically, (2) is technical limitation, (1) is logical
// limitation to enforce the meaning of headerBufferSize
// From the way how buf is allocated and how blank lines are being
// read, it should be enough to check (1) only.
if (byteBuffer.position() > headerBufferSize || byteBuffer.capacity() - byteBuffer.position() < socketReadBufferSize) {
throw new IllegalArgumentException(sm.getString("iib.requestheadertoolarge.error"));
}
} while (status == HeaderParseStatus.HAVE_MORE_HEADERS);
if (status == HeaderParseStatus.DONE) {
parsingHeader = false;
end = byteBuffer.position();
return true;
} else {
return false;
}
}
parseHeaders主要由一个循环构成,循环内部不断读取header,如果出现以下两种情况则会抛出异常,
- request line的大小加上当前已经读取的header的大小超过了headerBufferSize
- byteBuffer的剩余容量比socketReadBufferSize要小
上面两种情况都会导致没有足够的空间存储后续的请求数据。 读取的工作主要由parseHeader来完成,解析过程和request line类似,也可以划分为几个步骤,
阶段1 跳过空行
//
// Check for blank line
//
byte chr = 0;
while (headerParsePos == HeaderParsePosition.HEADER_START) {
// Read new bytes if needed
if (byteBuffer.position() >= byteBuffer.limit()) {
if (!fill(false)) {// parse header
headerParsePos = HeaderParsePosition.HEADER_START;
return HeaderParseStatus.NEED_MORE_DATA;
}
}
chr = byteBuffer.get();
if (chr == Constants.CR) {
// Skip
} else if (chr == Constants.LF) {
return HeaderParseStatus.DONE;
} else {
byteBuffer.position(byteBuffer.position() - 1);
break;
}
}
if (headerParsePos == HeaderParsePosition.HEADER_START) {
// Mark the current buffer position
headerData.start = byteBuffer.position();
headerParsePos = HeaderParsePosition.HEADER_NAME;
}
解析的过程中使用headerParsePos来标示当前状态,
private static enum HeaderParsePosition {
/**
* 准备读取下一个HEADER, 如果在此状态下读取到\n\r则表示没有需要继续处理的HEADER了,
* 如果读取到其他字符则进入到HEADER_NMAE阶段
*/
HEADER_START,
/**
* 读取HEADER NAME。":"会出现在HEADER NAME之后,如果在":"之前出现任何non-HTTP_TOKEN_CHAR(包含)
* 任意空白字符都会导致当前行被忽略
*/
HEADER_NAME,
/**
* 该阶段用于跳过HEADER VALUE之前的空白字符,这个字符有可能出现在HEADER VALUE的首行,也可能出现在后续的
* 行中。
*/
HEADER_VALUE_START,
/**
* 读取数据。当状态处于HEADER_VALUE_START时,如果遇到第一个非空格和\t,则进入到HEADER_VALUE阶段
*/
HEADER_VALUE,
HEADER_MULTI_LINE,
HEADER_SKIPLINE
}
阶段2 读取header name
//
// Reading the header name
// Header name is always US-ASCII
//
while (headerParsePos == HeaderParsePosition.HEADER_NAME) {
// Read new bytes if needed
if (byteBuffer.position() >= byteBuffer.limit()) {
if (!fill(false)) { // parse header
return HeaderParseStatus.NEED_MORE_DATA;
}
}
int pos = byteBuffer.position();
chr = byteBuffer.get();
if (chr == Constants.COLON) {
headerParsePos = HeaderParsePosition.HEADER_VALUE_START;
headerData.headerValue = headers.addValue(byteBuffer.array(), headerData.start,
pos - headerData.start);
pos = byteBuffer.position();
// Mark the current buffer position
headerData.start = pos;
headerData.realPos = pos;
headerData.lastSignificantChar = pos;
break;
} else if (!HttpParser.isToken(chr)) {
// If a non-token header is detected, skip the line and
// ignore the header
headerData.lastSignificantChar = pos;
byteBuffer.position(byteBuffer.position() - 1);
return skipLine();
}
// chr is next byte of header name. Convert to lowercase.
if ((chr >= Constants.A) && (chr <= Constants.Z)) {
byteBuffer.put(pos, (byte) (chr - Constants.LC_OFFSET));
}
}
如协议所示当读取到的字符是”:”的时候进入到下一个阶段HEADER_VALUE_START。果在”:”之前读取到了非法字符则进入skipLine,skipLine首先会设置当前状态为HEADER_SKIPLINE,然后不断跳过\r直到读取到\n,然后重新进入到HEADER_START阶段,返回HAVE_MORE_HEADERS,在这种情况下parseHeaders会进入到下一个header的读取
阶段3 读取value
在三种状态下都可以进入到阶段3,分别是HEADER_VALUE_START, HEADER_VALUE和HEADER_MULTI_LINE。如果当前阶段是HEADER_VALUE_START,则会不停的读取字符直到遇到非空格或者\t,然后进入到HEADER_VALUE阶段。
if (headerParsePos == HeaderParsePosition.HEADER_VALUE_START) {
// Skipping spaces
while (true) {
// Read new bytes if needed
if (byteBuffer.position() >= byteBuffer.limit()) {
if (!fill(false)) {// parse header
// HEADER_VALUE_START
return HeaderParseStatus.NEED_MORE_DATA;
}
}
chr = byteBuffer.get();
if (!(chr == Constants.SP || chr == Constants.HT)) {
headerParsePos = HeaderParsePosition.HEADER_VALUE;
byteBuffer.position(byteBuffer.position() - 1);
break;
}
}
}
如果当前阶段是HEADER_VALUE,则读取数据直到遇到\n,然后将状态设置为HEADER_MULTI_LINE。
if (headerParsePos == HeaderParsePosition.HEADER_VALUE) {
// Reading bytes until the end of the line
boolean eol = false;
while (!eol) {
// Read new bytes if needed
if (byteBuffer.position() >= byteBuffer.limit()) {
if (!fill(false)) {// parse header
// HEADER_VALUE
return HeaderParseStatus.NEED_MORE_DATA;
}
}
chr = byteBuffer.get();
if (chr == Constants.CR) {
// Skip
} else if (chr == Constants.LF) {
eol = true;
} else if (chr == Constants.SP || chr == Constants.HT) {
byteBuffer.put(headerData.realPos, chr);
headerData.realPos++;
} else {
byteBuffer.put(headerData.realPos, chr);
headerData.realPos++;
headerData.lastSignificantChar = headerData.realPos;
}
}
// Ignore whitespaces at the end of the line
headerData.realPos = headerData.lastSignificantChar;
// Checking the first character of the new line. If the character
// is a LWS, then it's a multiline header
headerParsePos = HeaderParsePosition.HEADER_MULTI_LINE;
}
如果当前状态是HEADER_MULTI_LINE,则会读取下一个字符,然后判断这个字符是否是空格或者\t,如果不是则表示进入到了下一个header,将当前转改标记为HEADER_START,退出循环。如果字符是空格或者\t,则表示当前header的value还没有读取完毕,重新设置状态为HEADER_VALUE_START,进入下一次循环继续读取数据。
chr = byteBuffer.get(byteBuffer.position());
if (headerParsePos == HeaderParsePosition.HEADER_MULTI_LINE) {
if ((chr != Constants.SP) && (chr != Constants.HT)) {
headerParsePos = HeaderParsePosition.HEADER_START;
break;
} else {
// Copying one extra space in the buffer (since there must
// be at least one space inserted between the lines)
byteBuffer.put(headerData.realPos, chr);
headerData.realPos++;
headerParsePos = HeaderParsePosition.HEADER_VALUE_START;
}
}
}
当上面三步都处理完成的时候,将bytebuffer中的数据设置到header-value中。到这里tomcat解析协议的流程就差不多走完了。接下来将进入到请求处理阶段。