NVIDIA Ampere Architecture

The Heart of the World’s Highest-Performing, Elastic Data Centers

亚洲中文字幕

The Core of AI and HPC in the
Modern Data Center

Scientists, researchers, and engineers—the da Vincis and Einsteins of our time—are working to solve the world’s most important scientific, industrial, and big data challenges with AI and high-performance computing (HPC). Meanwhile businesses and even entire industries seek to harness the power of AI to extract new insights from massive data sets, both on-premises and in the cloud. The NVIDIA Ampere architecture, designed for the age of elastic computing, delivers the next giant leap by providing unmatched acceleration at every scale, enabling these innovators to do their life’s work.

Groundbreaking Innovations

Crafted with 54 billion transistors, NVIDIA Ampere is the largest 7 nanometer (nm) chip ever built and features six key groundbreaking innovations.

Third-Generation Tensor Cores

Third-Generation Tensor Cores

First introduced in the NVIDIA Volta? architecture, NVIDIA Tensor Core technology has brought dramatic speedups to AI, bringing down training times from weeks to hours and providing massive acceleration to inference. The NVIDIA Ampere architecture builds upon these innovations by bringing new precisions—Tensor Float (TF32) and Floating Point 64 (FP64)—to accelerate and simplify AI adoption and extend the power of Tensor Cores to HPC.

TF32 works just like FP32 while delivering speedups of up to 20X for AI without requiring any code change. Using NVIDIA Automatic Mixed Precision, researchers can gain an additional 2X performance with automatic mixed precision and FP16 adding just a couple of lines of code. And with support for bfloat16, INT8, and INT4, Tensor Cores in NVIDIA A100 Tensor Core GPUs create an incredibly versatile accelerator for both AI training and inference. Bringing the power of Tensor Cores to HPC, A100 also enables matrix operations in full, IEEE-certified, FP64 precision.

Multi-Instance GPU (MIG)

Every AI and HPC application can benefit from acceleration, but not every application needs the performance of a full A100 GPU. With MIG, each A100 can be partitioned into as many as seven GPU instances, fully isolated and secured at the hardware level with their own high-bandwidth memory, cache, and compute cores. Now, developers can access breakthrough acceleration for all their applications, big and small, and get guaranteed quality of service. And IT administrators can offer right-sized GPU acceleration for optimal utilization and expand access to every user and application across both bare-metal and virtualized environments.

Multi-Instance GPU (MIG)

Structural Sparsity

Modern AI networks are big and getting bigger, with millions and in some cases billions of parameters. Not all of these parameters are needed for accurate predictions and inference, and some can be converted to zeros to make the models “sparse” without compromising accuracy. Tensor Cores in A100 can provide up to 2X higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also be used to improve the performance of model training.

Structural Sparsity
Smarter and Faster Memory

Smarter and Faster Memory

A100 is bringing massive amounts of compute to data centers. To keep those compute engines fully utilized, it has a leading class 1.6 terabytes per second (TB/sec) of memory bandwidth, a 67 percent increase over the previous generation. In addition, A100 has significantly more on-chip memory, including a 40 megabyte (MB) level 2 cache—7X larger than the previous generation—to maximize compute performance.

Converged Acceleration at the Edge

The combination of the NVIDIA Ampere architecture and NVIDIA Mellanox’s ConnectX-6 Dx SmartNIC in NVIDIA EGX? A100 brings unprecedented compute and network acceleration capabilities to process the massive amounts of data being generated at the edge. The Mellanox SmartNIC includes security offloads that decrypts at line rates up to 200 gigabits per second (Gb/s) and GPUDirect? that transfers video frames directly into GPU memory for AI processing. With the EGX A100, businesses can accelerate AI deployment at the edge more securely and efficiently.

Converged Acceleration at the Edge

Inside the NVIDIA Ampere Architecture

Learn what’s new with the NVIDIA Ampere architecture and its implementation in the NVIDIA A100 GPU.