首页 > 代码库 > Network | TCP

Network | TCP

Transmission Control Protocol, TCP是一种面向连接的、可靠的、基于字节流的传输层通信协议.

应用层向TCP层发送用于网间传输的、用8位字节表示的数据流,然后TCP把数据流分区成适当长度的报文段(通常受该计算机连接的网络的数据链路层的最大传输单元(MTU:Maximum Transmission Unit)的限制)。之后TCP把结果包传给IP层,由它来通过网络将包传送给接收端实体的TCP层。TCP为了保证不发生丢包,就给每个包一个序号,同时序号也保证了传送到接收端实体的包的按序接收。然后接收端实体对已成功收到的包发回一个相应的确认(ACK);如果发送端实体在合理的往返时延(RTT:Round-Trip Time)内未收到确认,那么对应的数据包就被假设为已丢失将会被进行重传。TCP用一个校验和函数来检验数据是否有错误;在发送和接收时都要计算校验和。

TCP连接包括三个状态:连接创建、数据传送和连接终止。

MSS

The maximum segment size (MSS) is the largest amount of data, specified in bytes, that TCP is willing to receive in a single segment. For best performance, the MSS should be set small enough to avoid IP fragmentation, which can lead to packet loss and excessive retransmissions. To try to accomplish this, typically the MSS is announced by each side using the MSS option when the TCP connection is established, in which case it is derived from the maximum transmission unit (MTU) size of the data link layer of the networks to which the sender and receiver are directly attached. 

MSS announcement is also often called "MSS negotiation". Strictly speaking, the MSS is not "negotiated" between the originator and the receiver, because that would imply that both originator and receiver will negotiate and agree upon a single, unified MSS that applies to all communication in both directions of the connection. In fact, two completely independent values of MSS are permitted for the two directions of data flow in a TCP connection.

Connection establishment连接创建

TCP用三路握手(three-way handshake)过程创建一个连接。在连接创建过程中,很多参数要被初始化,例如序号被初始化以保证按序传输和连接的强壮性。

一对终端同时初始化一个它们之间的连接是可能的。但通常是由一端打开一个套接字(socket)然后监听来自另一方的连接,这就是通常所指的被动打开(passive open)。服务器端被被动打开以后,用户端就能开始创建主动打开(active open)。

  1. SYN: The active open is performed by the client sending a SYN to the server. The client sets the segment‘s sequence number to a random value A.
  2. SYN-ACK: In response, the server replies with a SYN-ACK. The acknowledgment number is set to one more than the received sequence number i.e. A+1, and the sequence number that the server chooses for the packet is another random number, B.
  3. ACK: Finally, the client sends an ACK back to the server. The sequence number is set to the received acknowledgement value i.e. A+1, and the acknowledgement number is set to one more than the received sequence number i.e. B+1.

Data transfer数据传输

在TCP的数据传送状态,很多重要的机制保证了TCP的可靠性和强壮性。它们包括:使用序号,对收到的TCP报文段进行排序以及检测重复的数据;使用校验和来检测报文段的错误;使用确认和计时器来检测和纠正丢包或延时。

在TCP的连接创建状态,两个主机的TCP层间要交换初始序号(ISN:initial sequence number)。这些序号用于标识字节流中的数据,并且还是对应用层的数据字节进行记数的整数。通常在每个TCP报文段中都有一对序号和确认号。TCP报文发送者认为自己的字节编号为序号,而认为接收者的字节编号为确认号。TCP报文的接收者为了确保可靠性,在接收到一定数量的连续字节流后才发送确认。这是对TCP的一种扩展,通常称为选择确认(Selective Acknowledgement)。选择确认使得TCP接收者可以对乱序到达的数据块进行确认。每一个字节传输过后,ISN号都会递增1。

通过使用序号和确认号,TCP层可以把收到的报文段中的字节按正确的顺序交付给应用层。序号是32位的无符号数,在它增大到232-1时,便会回绕到0。对于ISN的选择是TCP中关键的一个操作,它可以确保强壮性和安全性。

TCP数据传输不同于UDP之处

  1. Ordered data transfer — the destination host rearranges according to sequence number
  2. Retransmission of lost packets — any cumulative stream not acknowledged is retransmitted
  3. Error-free data transfer
  4. Flow control — limits the rate a sender transfers data to guarantee reliable delivery. The receiver continually hints the sender on how much data can be received (controlled by the sliding window). When the receiving host‘s buffer fills, the next acknowledgment contains a 0 in the window size, to stop transfer and allow the data in the buffer to be processed.
  5. Congestion control

Connection termination通路的终结

连接终止使用了四路握手过程(four-way handshake),在这个过程中每个终端的连接都能独立地被终止。因此,一个典型的拆接过程需要每个终端都提供一对FIN和ACK。

端口

TCP使用了端口号(Port number)的概念来标识发送方和接收方的应用层。对每个TCP连接的一端都有一个相关的16位的无符号端口号分配给它们。

Port numbers are categorized into three basic categories: well-known, registered, and dynamic/private. The well-known ports are assigned by the Internet Assigned Numbers Authority (IANA) and are typically used by system-level or root processes. Well-known applications running as servers and passively listening for connections typically use these ports. Some examples include: FTP (20 and 21), SSH (22), TELNET (23), SMTP (25), SSL (443) and HTTP (80). Registered ports are typically used by end user applications as ephemeral source ports when contacting servers, but they can also identify named services that have been registered by a third party. Dynamic/private ports can also be used by end user applications, but are less commonly so. Dynamic/private ports do not contain any meaning outside of any particular TCP connection.

Flow control流量控制

TCP uses an end-to-end flow control protocol to avoid having the sender send data too fast for the TCP receiver to receive and process it reliably. Having a mechanism for flow control is essential in an environment where machines of diverse network speeds communicate. For example, if a PC sends data to a smartphone that is slowly processing received data, the smartphone must regulate the data flow so as not to be overwhelmed.

TCP uses a sliding window flow control protocol. In each TCP segment, the receiver specifies in the receive window field the amount of additionally received data (in bytes) that it is willing to buffer for the connection. The sending host can send only up to that amount of data before it must wait for an acknowledgment and window update from the receiving host.

If a receiver is processing incoming data in small increments, it may repeatedly advertise a small receive window. This is referred to as the silly window syndrome, since it is inefficient to send only a few bytes of data in a TCP segment, given the relatively large overhead of the TCP header.

Congestion control拥塞控制

Modern implementations of TCP contain four intertwined algorithms: Slow-start, congestion avoidance, fast retransmit, and fast recovery.

总共只有两种模式:Slow-start, congestion avoidance.

Basic slow-start

The algorithm begins in the exponential growth phase initially with a Congestion Window Size (CWND) of 1, 2 or 10 segments and increases it by one Segment Size (SS) for each new ACK received. If the receiver sends an ACK for every segment, this behavior effectively doubles the window size each round trip of the network. If the receiver supports delayed ACKs, the rate of increase is lower, but still increases by a minimum of one MSS each round-trip time. This behavior continues until the congestion window size (CWND) reaches the size of the receiver‘s advertised window or until a loss occurs.

When a loss occurs, half of the current CWND is saved as a Slow Start Threshold (SSThresh) and slow start begins again from its initial CWND. Once the CWND reaches the SSThresh, TCP goes into congestion avoidance mode where each new ACK increases the CWND by SS × SS / CWND. This results in a linear increase of the CWND.

慢启动->loss occur->set ssthresh -> 慢启动->congestion avoidance,线性增

通过half threshold来实现乘性减。

Fast recovery

There is a variation to the slow-start algorithm known as Fast Recovery, which uses fast retransmit followed by Congestion Avoidance. In the Fast Recovery algorithm, during Congestion Avoidance mode, when packets (detected through 3 duplicate ACKs) are not received, the congestion window size is reduced to the slow-start threshold, rather than the smaller initial value.

Fast Recovery也是一种慢启动->loss occur->set ssthresh

这个快一点,就是直接half。

congestion avoidance

When the congestion window exceeds SSThresh the algorithm enters a new state, called congestion avoidance.

Transmission Control Protocol (TCP) uses a network congestion-avoidance algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, with other schemes such as slow-start to achieve congestion avoidance.

AIMD有许多变种实现。

As long as non-duplicate ACKs are received, the congestion window is additively increased by one MSS every round trip time. When a packet is lost, the likelihood of duplicate ACKs being received is very high (it‘s possible though unlikely that the stream just underwent extreme packet reordering, which would also prompt duplicate ACKs). The behavior of Tahoe and Reno differ in how they detect and react to packet loss:

Tahoe: Triple duplicate ACKS are treated the same as a timeout. Tahoe will perform "fast retransmit", set the slow start threshold to half the current congestion window, reduce congestion window to 1 MSS, and reset to slow-start state. (同Basic slow-start)
Reno: If three duplicate ACKs are received (i.e., four ACKs acknowledging the same packet, which are not piggybacked on data, and do not change the receiver‘s advertised window), Reno will halve the congestion window (instead of setting it to 1 MSS like Tahoe), set the slow start threshold equal to the new congestion window, perform a fast retransmit, and enter a phase called Fast Recovery. If an ACK times out, slow start is used as it is with Tahoe.
Fast Recovery. (Reno Only) In this state, TCP retransmits the missing packet that was signaled by three duplicate ACKs, and waits for an acknowledgment of the entire transmit window before returning to congestion avoidance. If there is no acknowledgment, TCP Reno experiences a timeout and enters the slow-start state.

Both algorithms reduce congestion window to 1 MSS on a timeout event.

这两种方式的区别在于怎么处理loss。slow start是一种状态,fast recovery是Reno在处理loss时的策略。