The PCIe® 6.0 Specification Webinar Q&A: A Deeper Dive into FLIT Mode, PAM4, and Forward Error Correction (FEC)
The upcoming PCIe® 6.0 specification introduces Flow Control Unit (FLIT) encoding, which enables the specification to provide low latency with high efficiency. FLIT mode is adopted for the PCIe 6.0 architecture because error correction needs to operate on fixed sized packets. Once the link operates in FLIT mode, any speed change to lower data rates will also have to use the same FLIT mode. Once enabled, FLIT mode is followed in the link, regardless of the speed. The improved bandwidth that results from low overhead amortization allows for high bandwidth efficiency, low latency and reduced area.
The PCIe 6.0 specification also introduces PAM4 (Pulse Amplitude Modulation with 4 levels) signaling and Forward Error Correction (FEC), allowing the PCIe 6.0 specification to achieve low latency, low complexity, and a low bandwidth overhead.
FLIT Mode Questions
- What is the definition of failure in FLIT?
Each FLIT is protected by a Cyclic Redundancy Check (CRC) and a 3-way interleaved FEC. After a FLIT is received, the receiving device performs the FEC decode, which corrects any correctable error within each FEC group. After the decode, the CRC check is performed. If the CRC check fails, the receiving device can indicate that the FLIT has not been successfully received by sending a NAK (no acknowledgment) back to the transmitting device. The NAK causes a replay, resulting in the FLIT being replayed and delivered without any errors. An optimization is possible if the subsequent FLIT indicates that the FLIT in error had only No Operation (NOP) transaction layer packets, which makes the replay unnecessary.
“Failure in Time” (FIT) is a metric used to measure reliability or the failure rate. It is the number of “failures” we get in 109 hours. A failure is defined when the CRC passes even in the presence of bit error(s), resulting in potential data integrity issues. This is the reason why we have always deployed a strong CRC with very low aliasing probability, even in the presence of multiple errors. In the context of Flit, if an erroneous FLIT, even after the FEC correction, remains erroneous and the subsequent CRC check still passes (i.e. the CRC fails to recognize the error, “aliasing” in signature to a correct code), it is considered a failure. We want the FIT to be significantly less than 1 for any link width. Our analysis shows that we expect the FIT to be around 5 x 10-10, which is almost 0. In that regard, PCIe 6.0 specification is a very robust interconnect, as the prior generations.
- Is there a handshake between every FLIT? This would add significant delay for long channels where Retimers would be required.
Every transmitted FLIT has a dedicated slot (the 2 Bytes in DLP 0 and 1) to ACK (acknowledge)/NAK (non-acknowledge) the received FLIT, so it is bandwidth matched. In addition, we have the ability to ACK multiple FLITs simultaneously by ACK’ing the latest sequence number of the received FLIT. Since we have a dedicated slot in each FLIT for the management of ACK/NAK, we do not incur any additional delay beyond the delay of getting to the slot in the transmitted FLIT.
- Does this mean each FLIT will transfer one Transaction Layer Packet (TLP) (one type of TLP, for example: completion, or completion with data payload)?
Not necessarily. One TLP can span over multiple FLITs and one FLIT can have multiple TLPs, depending on the size of the TLP. The 236 Bytes in each FLIT of 256 Bytes can be used to transfer a partial TLP, as well as one or more TLPs.
- Why go for different FLIT sizes for the PCIe 6.0 specification?
We have only one FLIT size for PCIe 6.0 specification. It is 256 Bytes. We did consider different sizes and settled on the 256 Byte FLIT size as the right tradeoff between bandwidth efficiency and latency.
- What was the motivation for keeping the FLIT size at 256B?
We considered various FLIT sizes and settled on 256 Bytes with 236 bytes of TLP payload and a TLP efficiency of 92%.
We evaluated higher FLIT sizes such as 740 Bytes, where 20 Bytes would have been used for Data Link Layer Payload (DLP), CRC, and FEC and 720 bytes would have been for the TLP payload, with a TLP payload efficiency of about 97%. Although this option would have been an improvement over the current FLIT size, the latency would have been 3X due to the resulting FLIT accumulation (e.g. a x4 link would have added an extra latency of 16 ns beyond where we are today).
We also considered lower FLIT sizes such as 64 Bytes, with 44 Bytes for the TLP payload, with a resulting TLP payload efficiency of about 69%. However, this option would have resulted in meager latency savings of about 6 ns.
A FLIT size of 256 bytes is an optimal choice, that allows us to exceed the PCIe 6.0 specification requirements around key metrics like bandwidth efficiency and latency.
- FLIT is 256 bytes – is that the smallest unit of transfer? For example, will a memory read TLP of only 16 bytes still need 1 FLIT?
A FLIT can have multiple TLPs in the first 236 Bytes of a 256-byte FLIT. For example, we can have 14, 16-byte read requests on different Virtual Channels, using up 224 Bytes of 236 Bytes available and can also fit the first 12 bytes of the 15th read request in the remaining space of the same FLIT. The remaining 4 bytes of the 15th read will occupy the first 4 bytes of the next FLIT.
- For FLIT mode, if one TLP's payload is larger than 256 bytes, it will be sent in 2 FLITS, correct?
Yes, even with a 256-byte payload, a TLP can span over 3 FLITS, if it started towards the end of the first FLIT.
- If the data payload is less than 242, is it filled by 0 instead in the FLIT?
If we have a subsequent TLP, then it can start on the same FLIT. The only time we will fill a FLIT with NOP is if a TLP is not available to be sent. A TLP does not need to start or end at any boundary within a FLIT. TLPs can be packed in a FLIT, subject to some rules.
- Since DLPs can't be sent standalone, do they need to be sent along with TLP to populate a FLIT?
Every FLIT has dedicated 6 bytes for DLPs. While there is no direct dependency between TLP and DLP, the rationale for having dedicated bytes in each FLIT for TLP and DLP is to increase efficiency of packets and reduce latency, while reducing the area overhead.
Additional PCIe 6.0 Specification Resources
The PCIe 6.0 specification webinar, which explores multiple new features in the upcoming specification, is available for on-demand viewing on the PCI-SIG YouTube channel. In addition, you can read the previous webinar Q&A blogs covering PAM4 signaling, L0p, FEC and other supported features. Subscribe to the PCI-SIG Channel on BrightTALK for upcoming webinars and previous talks on demand.