Latency Contribution Factors
Given the importance of latency reduction then, it makes sense to analyze in more detail the various contributors to the end-to-end trade flow latency. From a network perspective, the latency can be divided into two broad categories—the latency attributable to the network platform and “all-other” latency.
The “all-other” latency category is necessarily broad and not within the scope of this paper. However, the key components include the latency contributions from the application, middleware, OS, NIC, and most importantly the application architecture. Specifically, the high performance server platform, with extended memory technology, can enable customers to consolidate the trade flow into one server. The benefit is that applications are now working at memory speeds and not network speeds. If customers have the flexibility to do so, re-architecting the application may provide the biggest performance benefit.
From a network perspective, there are five latency contributors, listed in increasing order of importance.
● Serialization delay: This is the delay to place a packet on the wire and is tied to the interface speed. At 10 Gigabit Ethernet, the serialization delay to send a 128-byte frame is 0.1 μsec. At 1 Gigabit Ethernet, the serialization delay for the same frame is 1 μsec. While this is not a large delta, it should be considered that the delay would be accumulated at every port. Further, 1 Gigabit Ethernet switches generally have larger nominal latencies and work in store-and-forward mode, where latency increases with packet size. In general, firms that desire to deploy low-latency infrastructures would implement 10 Gigabit Ethernet wherever possible.
● Propagation delay: Light takes about 3 μsec to traverse 1 km in fiber. To reduce this contribution, firms implement their environments to be as close to the liquidity pool as possible. This may be at the exchange data center itself using a co-location service or at a service provider’s data center. However, it is useful to note that for some businesses like best-execution, the algorithms need to check multiple venues, so co-locating centrally to multiple liquidity venues will be more important than to a specific exchange.
● Nominal switch latency: This is the latency to traverse a switch hop. Many 10 Gigabit Ethernet switches can operate in cut-through mode (rather than store-and-forward), where the nominal latency is the same regardless of the packet size. This latency is measured in first-in-first-out (FIFO) mode, and can be less than 5 μsec per hop. This latency is what is measured by standard RFC tests (RFC2544, RFC2889, etc.), and is usually used by the industry to determine the suitability of a network platform for HFT environments. However,this turns out to be a faulty conclusion, because of the much larger contribution of the following two delays.
● Queuing latency: This is the latency when packets are queued within a network platform. Packets are generally queued due to egress port congestion. Larger buffers can result in more traffic being queued, but note that simply having larger buffers does not increase latency. The buffer is only used if the traffic needs to be queued or if the buffer is not available, the traffic is dropped. (See next point.) The efficiency of queuing algorithm is a key switch attribute, which is not often measured in benchmarks. With queuing latency ranging from tens to thousands of microseconds, depending on the traffic patterns, it can completely dwarf the nominal latency. (Please see the section on micro bursts for a longer discussion of the traffic patterns that result in queuing delays.)
● Retransmission delay: Delay incurred when an application needs to resend a packet, typically due to packet loss in the network. If the packet buffers are not deep enough, traffic may be dropped instead of being queued. The application design needs to be considered to determine the latency impact of the dropped packet. If the drop was during a TCP session, TCP will take care of the retransmission. However, the delay to begin the retransmission is usually 200 milliseconds (RTO min timer in Linux). Also, a substantial rate of loss can result in congestion collapse on TCP sessions. In the case of UDP, the application may itself retransmit or just lose any dropped data. Application developers have a consistent preference to queue and deliver packets rather than dropping the traffic.
In implementing architectures for ultra low-latency infrastructures, it is important to consider all the five latency contribution factors above. Most importantly, it is critical to benchmark to match the application traffic characteristics. Solely relying on RFC results, for instance, would lead to faulty conclusions.
Measuring Latency Accurately
To paraphrase a popular business adage: “You can only improve what you can measure.” This is true of HFT infrastructures. In particular, since firms care about microsecond latency performance, they need the ability to measure latency accurately to the microsecond level. Ideally, the latency is measured on an end-to-end trade-flow basis. However, since the data is being transformed en-route, from tick data to an actual order, it may only be possible to measure latency performance in segregated functional units, such as latency of the market data infrastructure and order management system. It is also important for the servers to be synchronized to accurate, microsecond granular clocks. ARISTA offers an implementation of the Precision Time Protocol (IEEE 1588) for this purpose.