How direct memory access speeds Ethernet

With gigabit networks and fast storage, even sending and receiving packets puts servers under strain. RDMA short-circuits the problem

As buses, memory and networks get faster, new bottlenecks are being found in the search for higher data throughput. One of the basic tenets of high-speed data transfer is 'thou shalt copy data only once', a commandment frequently and deeply broken in today's complex mesh of protocols and processor-mediated transfers. When a packet comes in from the network it is copied into a temporary memory buffer, then the main system processor breaks it down and moves different parts of the packet to different places in memory for analysis and action according to the various TCP, IP and higher level protocols. This often involves moving data between the kernel and main storage, sometimes several times. While the overheads associated with this are acceptable on single-user systems and light application loads, the amount of processor time taken up becomes significant with servers coping with multiple, data-intensive requests -- most typically in networked storage. One answer is to put as much of the protocol processing as possible into dedicated hardware on the network adaptor, and to give it the ability to read and write directly into the host computer's main memory, bypassing the kernel and CPU data reads and writes altogether. These are the basic ideas behind Remote Direct Memory Access, RDMA, a new standard recently completed by the RDMA Consortium -- the usual suspects, Cisco, IBM, Microsoft, Dell, HP and so on -- and a development of work done by the IntelliBand interconnect designers. As far as possible, RDMA lets applications on one computer read and write memory on another by issuing commands directly to the network interface, bypassing the kernel. Any movement of data or commands into or out of the kernel involves context switches and other time-consuming operations, so by keeping the data transfers and link management entirely in user memory space these overheads are avoided. The remote computer that's servicing the requests just needs to tell the network card where its buffers are -- everything else is handled by the hardware. There are security implications in all this -- by bypassing the kernel, you also bypass the normal protection mechanisms that stop applications from overwriting memory they don't control. RDMA introduces the idea of key values -- before an application can access remote memory, it must have the right key value to send along with the request. That key value is initially generated when the remote computer asks its network controller to register a memory buffer: the remote then informs the other computer about what memory is available and what the keys are before any data transfer can take place. Other potential problems include out-of-order packets. IP does not guarantee that packets arrive at the receiver in the same sequence as they left the sender, and higher level protocols are designed to spot and correct that. However, when the delayed -- or lost -- packet contains the information about where the subsequent data is to be placed, the network card doesn't know what to do with that data and has to drop it, asking for a retransmission later. This is known as the framing problem, and has serious bandwidth and latency implications. RDMA copes with this through a technique called Marker-based Protocol data unit Alignment, or MPA. Here, each segment of TCP data has the placement information immediately after the TCP header, so each chunk will arrive with its address intact. For large segments, the marker is inserted in the byte stream at every 512bytes, thus ensuring that it's present in a known place even when the TCP header isn't accessible. Additional logic detects when an intermediate device, such as a firewall or router, has reassembled and re-segmented the byte stream, which has the potential to move the marker relative to the segment start. One of the first uses of RDMA will be in TCP/IP Offload Engines, or TOEs -- chips that are designed to remove as much of the gruntwork in network communication as possible from central processors. Although some companies have already produced TOEs, the existence of a standard is expected to rapidly increase their acceptance. In particular, RDMA is agnostic about higher level protocols and the exact nature of the network fabric -- as long as it's Ethernet -- and so is expected to eventually percolate through to everywhere that high speed networking is being used. With near universal support from its consortium members -- Microsoft is expected to support RDMA in all versions of Windows -- it's only a matter of time before it becomes a fully mainstream product.

More enterprise IT news in ZDNet UK's Tech Update Channel.

For a weekly round-up of the enterprise IT news, sign up for the Tech Update newsletter. Let the editors know what you think in the Mailroom.