Seamless Transition to PCIe® 5.0 Technology in System Implementations Webinar Q&A

Posted on: 22 January 2021
by Casey Morrison, Jonathan Bender, and Liang Liu, Astera Labs

Standards & Compliance

With the widespread adoption of compute-intensive workloads – such as artificial intelligence and machine learning – in enterprise and cloud data centers, high-speed, low-latency interconnects like PCI Express^® architecture are required to connect high-performance nodes. The upgrade from PCIe^® 4.0 to PCIe 5.0 technology doubles the bandwidth from 16GT/s to 32GT/s, but also impacts signal reach and system topology challenges.

The recent Seamless Transition to PCIe 5.0 Technology in System Implementations webinar, presented by Astera Labs, explored changes between the PCIe 4.0 and PCIe 5.0 specifications, including signal integrity and system design challenges, where the right balance must be found between PCB materials, connector types and the use of signal conditioning devices for practical compute topologies. This blog post provides answers to questions about PCIe Retimers and Redrivers, the PCIe 5.0 specification, PCB materials, RX detection and precoding that were asked during the webinar.

Retimer/Redriver

If equalization can be bypassed in Retimers in PCIe 5.0 architecture, then how would an Endpoint (EP) detect if there is a Retimer present?

Even when equalization is bypassed, a Retimer will still assert the Retimer Present bit (TS2 symbol 5, bit 4) in 2.5 GT/s data rate so that the Root Complex and EP can learn that a Retimer is present in the link.2

Are there special considerations during link training to avoid timeouts when using Retimers?

There are no “special” considerations per se. During Equalization.Phase2, the Retimer’s upstream pseudo port (USPP) and the Endpoint will simultaneously train their receivers, and they have 24 ms (total) to do this. During Equalization.Phase3, the Retimer’s downstream pseudo port (DSPP) and the root complex will simultaneously train their receivers, and they likewise have 24 ms (total) to do this. The timeouts are the same regardless of whether a Retimer is present or not.

Is a Retimer essentially a two-port PCIe packet switch?

Not really. Each port of a packet switch has a full PCIe protocol stack: Physical Layer, Data Link Layer, and Transaction Layer. A packet switch has at least one root port and at least one non-root port. A Retimer, by contrast, has an upstream-facing Physical Layer and a downstream-facing Physical Layer but no Data Link or Transaction Layer. As such, a Retimer’s ports are considered pseudo ports. Because a Retimer does not have — nor does it need — these higher-logic layers, the latency through a Retimer is much smaller compared to the latency through a packet switch.

Is there a difference in Retimer functionality from PCIe 5.0 specification compared to PCIe 4.0 specification?

The only notable differences are:

As with all PCIe 5.0 transmitters, the Retimer’s transmitters must support 32 GT/s precoding when requested by the link partner.
As with all PCIe 5.0 receivers, the Retimer’s receivers must support Lane Margining in both time and voltage.

Other than keeping the same throughput, is a Retimer required to support different link widths for its upstream/downstream ports

A Retimer is required to have the same link width on its upstream-facing port and on its downstream-facing port. In other words, the link widths must match. A Retimer must also support down-configured link widths, but the width must always be the same on both ports.

Are there any typical applications where Redrivers are preferred over Retimers? Why is a Redriver not recommended for the PCIe 5.0 and PCIe 4.0 Specifications?

Redrivers are not defined or specified within the PCIe Base Specification, so there are no formal guidelines for using a Redriver versus using a Retimer. This topic is covered in the webinar slides, and more information is available in this PCI-SIG^® blog: PCI Express^® Retimers vs. Redrivers: An Eye-Popping Difference.

I would like to understand Retimers that can support 30 dB. Do you suggest putting the Retimer close to the receiver? How do you calculate insertion loss (IL) budget before and after the Retimer?

A Retimer’s transmitters and receivers, on both pseudo ports, must meet the PCIe Base Specifications. This means that a Retimer can support the full channel budget (nominally 36 dB at 16 GHz) on both sides — before and after the Retimer. Calculating the IL budget should be done separately for each side of the Retimer, and channel compliance should be performed for each side as well, just as you would do for a Retimer-less Root-Complex-to-Endpoint link.

If a Redriver or Retimer is present, is there any way to enable or disable the Redriver or Retimer?

Yes, but perhaps not in the sense that you are inquiring about. Redrivers and Retimers are active components which impact the data stream: their package imposes signal attenuation, their active circuits apply boost, and (in the case of Retimers) clock and data recovery. As such, there is no way to truly disable these components and still have data pass through. When disabled, no data will pass through a Redriver or Retimer.

PCIe 5.0 Specification

PCIe 3.0 and PCIe 4.0 architectures both support an embedded clock. Why does PCIe 5.0 architecture not support this clock architecture?

PCIe 5.0 architecture, like PCIe 4.0 and 3.0 architectures, supports two clock architectures (see Section 8.6.4):

Common REFCLK (CC): The same 100-MHz reference clock source is distributed to all components in the PCIe link — Root Complex, Retimer, and Endpoint. Due to REFCLK distribution via PCB routing, fanout buffers, cables, etc., the phase of the REFCLK will be different for all components.
Independent REFCLK (IR): Both the Root Complex and End Point use independent reference clocks and the Tx and Rx must meet stringent specifications operating in IR mode compared to the specifications under CC mode. The PCIe Base specification does not specify the properties of independent reference clocks.

How is Burst Error Reporting considered in the PCIe 5.0 specification?

Burst errors are not reported any differently than regular correctable/uncorrectable errors. In fact, burst errors may cause silent data corruption, meaning multiple bits in error can lead to an undetected error event. Therefore, it is incumbent on system designers and PCIe component providers to consciously enable precoding if there is a concern or risk of bust errors in a system.

Does recovery loopback capability still have to be supported only on the edge Lane (i.e., lane 0/15 for a x16 link), or does it now need to be supported on any lane (e.g., with enhanced link behavior control)?

According to the PCIe Base Specification, Section 4.2.6.10, the Lane that received two consecutive TS1 Ordered Sets with the Enhanced Link Behavior Control bits set to 01b in Configuration.Linkwidth.Start is the lane under test for the purposes of loopback and Recovery.Equalization, and this need not be an edge lane.

Is there a standard host and root complex channel sNp model published by PCI-SIG?

PCI-SIG does not publish official or “standard” channel models; however, the Electrical Workgroup (EWG) does post example channel models. For PCIe 5.0 specification, the reference package models are posted here: https://members.pcisig.com/wg/PCIe-Electrical/document/folder/885. You can also find example pad-to-pad channel models shared by a few member companies during the specification development by searching *.s24p in the following folder https://members.pcisig.com/wg/PCIe-Electrical/document.

Does PCI-SIG provide a tool for interoperability tests?

PCI-SIG defines the specifications, but not a tool for the purpose of interoperability testing. ASIC vendors and OEMs/ODMs generally provide/have these tools, for the purpose of testing and stressing the PCIe link, to make sure there are no interoperability issues.

Other than add-in-card (CEM connector), are other connectors like M.2 supported in the PCIe 5.0 interface?

There are multiple connector types and form factors in development, which are targeting PCIe 5.0 signal speeds, including M.2, U.2, U.3, mezzanine connectors, and others.

For clock jitter, 0.5-ps RMS is specified for PCIe 4.0 and 0.15-ps RMS is specified for PCIe 5.0. What are the real criteria for a system? (Note that 0.7-ps RMS / 0.25-ps RMS is to be used in channel simulations to account for additional noise in a real system for PCIe 4.0 / 5.0).

The design must ensure that the PCIe 5.0 reference clock RMS jitter does not exceed 0.25ps. This is a design requirement and there is no compliance test for this.

You mentioned the SSC downgrades to 3000ppm; how do you get this value?

This value comes from the parameter T_{SSC-FREQ-DEVIATION}_{_32G_SRIS} in table 8-17 of the PCIe Base Specification, which defined a minimum frequency deviation of -0.3%, or -3000 ppm for Separate RefClk Independent SSC (SRIS) mode.

PCB Material

What ultra-low-loss PCB material do you recommend for PCIe 5.0 technology?

There is no industry-standard definition of mid-loss, low-loss, and ultra-low-loss. It is good practice to start from the loss budget analysis to select which type of PCB material is needed for the system. Megtron-6 or other types of PCB material with similar performance as that of Megtron-6 are commonly used in PCIe 5.0 server systems where the distance from Root Complex pin to CEM connector exceeds 10”.

Is the PCB loss from measurement or simulation? Mid-loss means DF close to 0.01, low-loss DF close to 0.005, and ultra-low loss close to 0.0025.

The loss-per-inch figures noted in the webinar presentation are for reference purposes only, but they are simulated figures.

RX Detecting

How does the RX detect feature work for LTSSM?

The purpose of this state is to detect when a far-end receiver termination is present. The next state is polling if a receiver is detected. See Section 4.6.2.1 of the PCIe Base Specification for more details.

Have there been changes to the CEM add-in card for RX compliance testing?

Test methodology is similar to that of CEM 4.0. See details from the PCIe 5.0 PHY Test Spec v0.5.

The RX test that allows bit error in PCIe 4.0 technology is one bit. If precoding is enabled, the BER increases by 2x, does it allow bit error?

The BER requirement for 16 GT/s and 32 GT/s is 1E-12 or less. If one bit error happens, the PCIe protocol will handle this accordingly depending on where the bit error is in the frame. The purpose of precoding is the mitigation of burst errors that may be introduced from the DFE circuit. Whether precoding is enabled or disabled, the maximum BER requirement must still be met.

Is there a difference in system-level TX/RX compliance testing with a Retimer in the system compared to without?

There is no difference.

Is NEXT/FEXT going to be a required or optional test?

At this moment, these are not specified in the PCIe 5.0 PHY Test Spec v0.5.

Is RX Lane Margin a must for PCIe 5.0 specification compliance?

The Lane Margin Test (LMT) is defined in PCIe 5.0 PHY Test Spec v0.5, and RX Lane Margining in time and voltage is required for all PCIe 5.0 receivers. However, according to the test specification, LMT checks whether the add-in card under test implements the lane margining capability. The margin values reported are not checked against any pre-defined pass/fail criteria.

What is the scope bandwidth for PCIe 5.0 TX testing?

33 GHz for the PCIe 5.0 TX test. See more from PCIe 5.0 PHY Test Spec v0.5.

Do you suggest that ODM vendors implement this LTSSM test? Or is it OK to just pass TX compliance and RX JBERT tests?

Passing TX compliance and RX BER test does not guarantee system-level interoperability. It is advisable to perform separate tests to exercise the LTSSM, as well as application-specific tests, such as hot unplug/hot plug, to demonstrate system-level robustness.

Precoding

How do you enable precoding? Is precoding a feature specific to PCIe 5.0 specification?

The enabling/disabling or Precoding is negotiated during link training. Whether Precoding is needed or not is largely dependent on the specific receiver implementation. As an example, receivers that rely heavily on DFE tap-1 may choose to request Precoding during link training. So, each receiver will make its own determination, based on the receiver architecture, as to whether it should request Precoding or not. Precoding is defined in the PCIe 5.0 specification but not in the PCIe 4.0 specification.

Does precoding impact performance?

The PCIe 5.0 specification introduces selectable Precoding. Precoding breaks an error burst into two errors: an entry error and an exit error. However, a random single-bit error would also be converted to two errors, and therefore a net 1E-12 BER with precoding disabled would effectively become 2E-12 BER with precoding enabled.

Learn more about PCIe 5.0 technology implementation

The recording of “Seamless Transition to PCIe 5.0 Technology in System Implementations” webinar is available to watch anytime on the PCI-SIG BrightTALK channel. The webinar slides are available to view here. Subscribe to our BrightTALK channel and follow PCI-SIG on Twitter and LinkedIn to receive updates about upcoming webinars.