David Piscitello

The transport layer is the basic end-to-end (host-to-host) building block of
the Internet (TCP/IP) architecture and communications. Protocols above the
transport layer concentrate on distributed applications processing, and
protocols below the transport layer concentrate on the transmission, routing
and forwarding of application data. The transmission
control protocol (TCP, RFC
793) provides a reliable delivery service that provides “robustness in
spite of unreliable communications media” and “data transfer that is reliable,
ordered, full-duplex, and flow controlled.”
To provide reliable data delivery, TCP
must:
·
Deliver data submitted
by sending application processes without loss,
·
Prevent duplication of
data,
·
Preserve the order of bytes
of data submitted,
·
Detect and correct[1]
corruption (e.g., bit-level errors) introduced into the data stream by the
network, and
·
Regulate the flow of
data across the TCP connection (flow control, to help prevent network
congestion, which often results in packet loss).
TCP has some additional features that help
certain applications perform well:
.
·
Push, which
allows a sending application to signal to both sending and receiving TCP
processes (hereafter simply called “TCP”) that the data in this send call must be delivered immediately
to the receiving application process. In the absence of “push,” TCP waits (up
to a configurable timeout) to fill a segment before actually sending the data.
· Urgent data, an interrupt data service whereby a sending application process may request that data marked “urgent” be processed quickly by the receiving upper-layer protocol process. Note that Urgent is a signal by the sending the TCP process to the receiving TCP process: it’s not an application-to-application signal.
TCP treats a data transfer as a continuous stream of bytes (octets), delimited into segments.
All TCP segments have the same format,
illustrated below:
Figure 1: TCP Header Format
0
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source
Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Sequence Number
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved
|R|C|S|S|Y|I|
Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Data Segment
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Let’s consider Internet addressing for a moment, and how it affects TCP. IP addresses are 32-bit numbers that uniquely identify a network interface of a host on an IP network. Port numbers commonly identify applications. RFC 793 explains that the port, paired with an IP address, forms a TCP connection endpoint identifier, also known as a socket address. Sockets are the two endpoints of the connection. Sockets are typically accessed through a sockets API, used by applications to communicate with the network stack at some exposed layer.
Internet port number assignment commonly
follows a client/server paradigm.
Hosts that support Internet services such as FTP, DNS, HTTP, SSL, etc. listen to well-known port numbers, 16-bit values permanently assigned to
identify a registered Internet application. Originally, these values were
documented in an Internet standard called Assigned Numbers; more
recently, they are maintained at an online registry. A client application
associates or binds to a TCP port
number that is typically allocated from unused and unassigned or ephemeral ports.
When a client application opens
a connection to a server application, the socket addresses involved are {client’s
IP address, client’s ephemeral TCP port number} and {server’s
IP address, well-known TCP port number for desired Internet service}.
The client will always send to the well known destination port, but the server
may create a new socket to listen (again) for another incoming connection. This
is where the difference between the address and the socket seems to be throwing
you. Look at this FTP/TCP dialog and you’ll see that
the Destination Port remains
the same for all messages from the client to the server.
TCP has three phases of operation. I’ve posted an LAN packet analysis of an FTP session I captured so the masochists among
you can follow along in bit level detail: the FTP application uses TCP for
reliable delivery.
TCP Connection Establishment
TCP operates as a pair of independent byte
streams of data between upper-layer protocols. The Synchronize stream (SYN) establishes the beginning of the byte
stream in each direction of information flow. TCP segments involved in
connection establishment have the SYN flag set to one
(1). During Synchronize, TCPs encode
the following information in a TCP SYN header:
A responding TCP acknowledges receipt of the SYN segment and commonly attempts to synchronize a byte stream in the
return (responder-to-initiator) direction in a single TCP segment by generating
a TCP segment with both the SYN and ACK flags set to one (1). This process, called piggybacking, improves protocol efficiency.
When composing a SYN/ACK segment, the responding TCP:
· Sets the acknowledgement number to the value of the next sequence number the responder expects to receive (in this case, the ISN+1);
When
the initiating TCP receives the SYN/ACK segment, it
knows the SYN it sent was delivered. With these two messages alone,
however, the responding TCP can’t
know for certain that the SYN/ACK segment was
delivered. TCP uses an additional
message from the initiator – a TCP segment with the ACK flag set – to confirm that the data stream has been synchronized in the
responder-to-initiator direction. This message sequence is referred to as a three-way handshake.
Any TCP segment having the ACK flag set
may also contain application data. For example, if the responder indicated a
non-zero initial Window in the SYN/ segment, the initiator can piggyback up to
the responder’s initial window number
of bytes of data in the SYN/ACK segment.
In non-loss situations, an application
submits data to the local, sending TCP.
The sending TCP sends those data “at its convenience” (seriously, that’s what
RFC 793 says . . .). Typically, the sender attempts to fill a maximum segment
size (MSS) packet before sending (unless a PUSH is invoked).
TCP uses the 32-bit sequence number to
reconstruct the application data stream. The sequence number identifies the
relative position of the first byte contained in a TCP segment, and hence the
position of the data in this segment, with respect to other data segments of
the application stream. The push flag, if
set to one (1), signals the communicating TCPs to immediately deliver all data
processed prior to and including the segment containing the push to the
application (basically, push overrides TCP’s attempt to fill a maximum segment
sized packet before sending.)
When piggybacking is used, TCP must also attend to data being transmitted in the return direction. The acknowledgment flag, if set to one (1), indicates that the acknowledgment sequence number and window are significant, and must be used to process data segments of the return stream. Here, the TCP acknowledgment number identifies the next expected byte in the data stream, and the 16-bit TCP window indicates the amount of data the receiver is willing to accept in the future TCP segment(s); this value is added to the acknowledgment sequence number to determine the send window.
To detect and recover from packet loss,
duplication, and modification during its transfer across an IP network, TCP
relies on a mechanism called positive
acknowledgment and retransmission upon timeout.
A sending TCP runs a separate retransmission timer for each TCP data
segment. If the retransmission timer expires and no acknowledgment packet has
arrived indicating successful delivery of the segment to the receiver, TCP assumes
the data segment is lost, arrived corrupted (failed checksum), or was delivered
to the wrong address. In such cases, TCP resends the data segment; and restarts
the retransmission timer for this segment. RFC 793 suggests two resend
strategies: if the retransmission timer expires, the sending TCP may resend the
unacknowledged segment (first-only
retransmission), or it may resend all the data segments on the retransmission
queue (batch retransmission).
The receiving TCP may apply one of two
acceptance strategies. If an in-order
data-acceptance strategy is used, the receiving TCP accepts only data that
arrive in byte-sequence order and discards all other data. The receiving TCP
returns an acknowledgment to the sender and makes the byte stream available to
the upper-layer protocol process as it arrives. If an in-window data-acceptance strategy is employed, the receiving TCP
maintains segments containing bytes that arrive out of order separately from those
that have arrived in order and examines newly arrived data segments to
determine whether the next expected byte in the ordered stream of bytes has
arrived. If so, the receiving TCP adds this segment’s worth of bytes to the end
of the byte stream that had previously arrived in order and looks at the
out-of-order stream to see whether additional bytes may now be appended to the
end of the in-order stream. The TCP returns an acknowledgment and makes the
accumulated stream of in-order bytes available to the application.
An explicit acknowledgment is returned in
a TCP segment (potentially “piggybacked” with data flowing in the opposite
direction). The acknowledgment sequence
number X indicates that all bytes up to but not including X have been received, and the next byte
expected is at sequence number X. The
segment window indicates the number
of bytes the receiver is willing to accept (beginning with sequence number X) .
Acknowledgment packets reflect only what
has been received in sequence; they do not acknowledge data packets that
arrived successfully but out of sequence.
When an acknowledgment packet arrives, the
sending TCP may choose to resend all unacknowledged data from sequence number X up to the maximum permitted by the
segment window. In theory, applying the “batch” retransmission strategy results
in more traffic but possibly less delay. The sending TCP may resend only the
data segment containing the first unacknowledged byte. This negates a large
window and may increase delay, but it is preferred because it introduces less
traffic into the network. Batch retransmission strategies are generally
regarded as bad ideas, since their excessive retransmission of segments is
likely to contribute to network congestion.
Connection Release (Refusal) in TCP
TCP offers two forms of connection release: graceful and abrupt.
Graceful
close in TCP is an orderly shutdown
process, requiring that all data transmitted in both directions be acknowledged
before the connection may be closed. When an application has finished sending
data and wishes to close the TCP connection, its local TCP sends a TCP segment
with the FIN flag set to one (1). The sequence number is set to the value of the last byte transmitted. The
receiving TCP must acknowledge
receipt of the last byte but is not required to close its half of the
connection; it may continue to transfer data, and the initiator of the FIN segment must dutifully acknowledge all data received until it receives
a TCP segment with the FIN and ACK flags set to one (1) and the
acknowledgment set to the sequence number of the last byte received
from the FIN segment initiator. Upon receiving the FIN/ACK segment, the FIN initiator
returns an ACK segment, completing a three-way “good-bye” handshake.
Abrupt release indicates that something seriously wrong has
occurred. TCP composes a segment with the RST flag set to one
(1), returns this to the peer for this TCP stream, and shuts down.
Connection
refusal occurs in TCP when the
responding TCP cannot establish a TCP connection or when the SYN packet received is in error. To refuse a TCP connection, the called
TCP sets the RST and ACK flags to one (1), and sets
the acknowledgment sequence number to the initiator’s ISN +1.
Much of the success of the Internet can be attributed to TCP (with no sleight intended to its companion network protocol, IP). A large part of the reason TCP has been so successful is that a steady stream of incredibly smart people have used TCP as the basis for research into efficient communications. During the course of 20+ years of research, TCP has been used for reliable delivery over everything from amateur radio frequencies and water droplets to multi-gigabit optical links. As a result of this extensive and elaborate work, the original RFC is thus complemented with extensions that have made TCP a remarkably flexible and adaptable transport protocol. Early work by David Clark (Silly Window Syndrome, RFC 813) and landmark work by Van Jacobsen (Congestion Avoidance and Control) improved TCP’s efficiency. Subsequent standards work culminated in extensions for selective acknowledgement, MTU discovery, and window management. TCP remains under scrutiny by the research community.
I can only scratch the surface of the features of TCP here. You might find Chapter 12 of my book, Open Systems Networking (out of print, available for download), helpful, along with dozens of equally good books on TCP/IP.
[1] Lyman Chapin made the following observation during his review, “The TCP checksum detects most (but not all) errors, and the correction is retransmission—which means that some underlying network failures or defects can’t be corrected by TCP (for example, a link-level driver that always fails when it receives a particular string of 0 and 1 bits). There’s no forward error correction or other mechanism for recovering from failures that can’t be overcome by simple retransmission.