NVIDIA A100 TENSOR CORE GPU

tech2022-08-22  139

计算机视觉研究院专栏

作者:Edison_G

NVIDIA®GPU是推动人工智能革命的主要计算引擎,为人工智能训练和推理工作负载提供了巨大的加速。此外,NVIDIA GPU加速了许多类型的HPC和数据分析应用程序和系统,使客户能够有效地分析、可视化和将数据转化为洞察力。NVIDIA的加速计算平台是世界上许多最重要和增长最快的行业的核心。

计算机视觉研究院

长按扫描维码关注我们

EDC.CV

1. Unprecedented Acceleration at Every Scale

The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at every scale for AI, data analytics, and HPC to tackle the world’s toughest computing challenges. As the engine of the NVIDIA data center platform, A100 can efficiently scale up to thousands of GPUs or, using new Multi-Instance GPU (MIG) technology, can be partitioned into seven isolated GPU instances to accelerate workloads of all sizes. A100’s third- generation Tensor Core technology now accelerates more levels of precision for diverse workloads, speeding time to insight as well as time to market.

2. SYSTEM SPECIFICATIONS (PEAK PERFORMANCE)

3. GROUNDBREAKING INNOVATIONS

The NVIDIA A100 Tensor Core GPU is the flagship product of the NVIDIA data center platform for deep learning, HPC, and data analytics. The platform accelerates over 700 HPC applications and every major deep learning framework. It’s available everywhere, from desktops to servers to cloud services, delivering both dramatic performance gains and cost-saving opportunities.

To learn more about the NVIDIA A100 Tensor Core GPU, visit www.nvidia.com/a100

1  BERT pre-training throughput using Pytorch, including (2/3) Phase 1 and (1/3) Phase 2 | Phase 1 Seq Len = 128, Phase 2 Seq Len =512 | V100: NVIDIA DGX-1TM server with 8x NVIDIA V100 Tensor Core GPU using FP32 precision | A100: NVIDIA DGXTM A100 server with 8x A100 using TF32 precision.

2  BERT large inference | NVIDIA T4 Tensor Core GPU: NVIDIA TensorRTTM (TRT) 7.1, precision = INT8, batch size 256 | V100: TRT 7.1, precision FP16, batch size 256 | A100 with 7 MIG instances of 1g.5gb; pre-production TRT, batch size 94, precision INT8 with sparsity.

3  V100 used is single V100 SXM2. A100 used is single A100 SXM4. AMBER based on PME-Cellulose, LAMMPS with Atomic Fluid LJ-2.5, FUN3D with dpw, Chroma with szscl21_24_128.

SPECIFICATIONS

 NVIDIA A100 for HGXNVIDIA A100 for PCIePeak FP649.7 TF9.7 TFPeak FP64 Tensor Core19.5 TF19.5 TFPeak FP3219.5 TF19.5 TFPeak TF32 Tensor Core156 TF | 312 TF*156 TF | 312 TF*Peak BFLOAT16 Tensor Core312 TF | 624 TF*312 TF | 624 TF*Peak FP16 Tensor Core312 TF | 624 TF*312 TF | 624 TF*Peak INT8 Tensor Core624 TOPS | 1,248 TOPS*624 TOPS | 1,248 TOPS*Peak INT4 Tensor Core1,248 TOPS | 2,496 TOPS*1,248 TOPS | 2,496 TOPS*GPU Memory40 GB40 GBGPU Memory Bandwidth1,555 GB/s1,555 GB/sInterconnectNVIDIA NVLink 600 GB/s**PCIe Gen4 64 GB/s NVIDIA NVLink 600 GB/s**PCIe Gen4 64 GB/s Multi-instance GPUsVarious instance sizes with up to 7MIGs @5GBVarious instance sizes with up to 7MIGs @5GBForm Factor4/8 SXM on NVIDIA HGX™ A100PCIeMax TDP Power400W250WDelivered Performance of Top Apps100%90%

* With sparsity** SXM GPUs via HGX A100 server boards, PCIe GPUs via NVLink Bridge for up to 2-GPUs

我们开创“计算机视觉协会”知识星球一年有余,也得到很多同学的认可,我们定时会推送实践型内容与大家分享,在星球里的同学可以随时提问,随时提需求,我们都会及时给予回复及给出对应的答复。

如果想加入我们“计算机视觉研究院”,请扫二维码加入我们。我们会按照你的需求将你拉入对应的学习群!

计算机视觉研究院主要涉及深度学习领域,主要致力于人脸检测、人脸识别,多目标检测、目标跟踪、图像分割等研究方向。研究院接下来会不断分享最新的论文算法新框架,我们这次改革不同点就是,我们要着重”研究“。之后我们会针对相应领域分享实践过程,让大家真正体会摆脱理论的真实场景,培养爱动手编程爱动脑思考的习惯!

计算机视觉研究院

长按扫描维码关注我们

EDC.CV

最新回复(0)