How PCIe® Technology Enables Machine Learning and Artificial Intelligence

By Dong Wei, PCI-SIG Board Member, Arm

According to Tractica, Artificial Intelligence (AI) and Machine Learning (ML) markets are set to grow to $118.6 billion by 2025—as these new technologies are becoming the heart of our digital lives. AI can be found in consumer applications like Google newsfeeds and computer vision for vehicles, and even smart factories. ML is the backbone of Amazon, issuing buying recommendations and significantly reducing time-to-ship. Given their impressive capabilities, AI and ML often require high data bandwidth, low latency transport channels. The new PCI Express® (PCIe®) 5.0 architecture is filling that demand.

A modern server designed to handle AI/ML workloads follows a decentralized architecture–a compute platform including CPUs, graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and specialized accelerators, all connected in a high throughput manner inside and with high speed networking to connect to other compute platforms.

For certain usage models and markets, the accelerators need such a large magnitude of computational elements and on-chip memory that creating a compute CPU+ AI/ML accelerator on the same die is economically infeasible. Due to strict performance demands, the market needs a separate accelerator chip that is attached via standard interconnect protocol to the Compute chip.

PCIe Architecture: A High-Bandwidth, Low Latency Solution

For a compute or an accelerator solution provider to successfully support ML and AI applications, it must include the following features:

The solution must be able to attach to as many compute chips and network devices as possible from various vendors. Using a widely adopted interconnect protocol such as PCIe technology, ensures that the accelerator solution will attach to most compute chips.
The solution should be easily discoverable, programmable and manageable using standard software. PCIe architecture automatically enables an accelerator solution to be discovered and configured using standard software and popular programming models.
The solution needs easy to develop software. By making the accelerator a PCIe device, well-known, robust methods of accessing and using PCIe devices can be instantly deployed.

For certain usage models, like traditional CPUs, a CPU-accelerator collaborative computational model can be useful, due to the demand for high bandwidth, low latency communication channel between the CPU and the accelerator. For such applications, PCIe 5.0 specification possesses native support for carrying additional protocols over its low latency non-return to zero physical layers. PCIe 5.0 technology can help CPUs keep up with the ever-increasing flow of data from edge devices for most traditional businesses, while helping to offload processors.

Adopting PCIe 5.0 architecture for a CPU-accelerator communication channel instantly brings the power of the well-established PCIe technology ecosystem for quality, reliable physical layer solutions and system-level solutions.

Chip Vendors Should Prioritize PCIe Technology in AI/ML

Adopting PCIe architecture will immediately provide the benefit of ready-made solutions for ASIC/FPGA vendors. PCIe technology is the ideal for chip development due to the following:

There is a rich ecosystem providing PCIe technology IP for chip design and verification.
The vendor can easily access compliance testing services to ensure that their chip will connect to all PCIe architecture compatible compute systems.
As the defacto interconnect standard, adopting PCIe specifications will ensure the vendor has a shorter, cost-efficient time to market.

If you would like to learn more about how PCI Express technology enables the AI and ML industries, view my recent video on the PCI-SIG® YouTube channel.