Building On-prem GPU Training Infrastructure By Stephen Balaban - PowerPoint PPT Presentation
Building On-prem GPU Training Infrastructure By Stephen Balaban CEO, Lambda Lambda Customers About Me Started using CNNs for face recognition in 2012. First employee at Perceptio. We developed image recognition CNNs that ran
Building On-prem GPU Training Infrastructure By Stephen Balaban CEO, Lambda
Lambda Customers
About Me Started using CNNs for face recognition in 2012. ● First employee at Perceptio. We developed image ● recognition CNNs that ran locally on the iPhone. Acquired by Apple in 2015. Published in SPIE and NeurIPS. ●
Workshop Structure ● Audience survey ● Presentation w/ Q&A ● Q&A + Workshop
5 Stages of GPU Cloud Grief
It all starts with the Shock of an expensive AWS bill.
Stage 1 - Denial “This won’t happen again next month.”
Stage 2 - Anger “The bill doubled again!”
Stage 3 - Bargaining with your account manager.
Stage 4 - Depression “Spot instances and reserved instances aren’t enough, this is hopeless.”
Stage 5 - Acceptance “GPU cloud services are expensive. Managing hardware is scary.”
Hardware: A Quick Rundown 1. GPUs 2. CPUs 3. GPU-GPU Bandwidth & PCIe Topology
GPUs
GPU Speed Comparisons Source: https://lambdalabs.com/blog/titan-rtx-tensorflow-benchmarks/
Performance / $ Source: https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-titan-v-vs-1080-ti-benchmark/
CPUs
What to look for 1. Number of PCIe lanes. (Affects total bandwidth.) 2. NUMA Node Topology. (Affects GPU peering.) Source: https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-titan-v-vs-1080-ti-benchmark/
GPU Peering & PCIe Topology
PCIe Topology 16x 16x 16x 16x 16x 16x
Dual Root PCIe Topology CPU-CPU CPU CPU Interconnect PEX PEX PEX PEX 8748 8748 8748 8748 G G G G G G G G P P P P P P P P U U U U U U U U 4 5 6 7 0 1 2 3 Arrow is 16x PCIe Connection Source: Lambda
Single Root PCIe Topology CPU PEX 8796 PEX 8796 G G G G G G G G P P P P P P P P U U U U U U U U 0 1 2 3 4 5 6 7 Arrow is 16x PCIe Connection Source: Lambda
Cascaded PCIe Topology CPU PEX 8796 PEX 8796 G G G G G G G G P P P P P P P P U U U U U U U U 0 1 2 3 4 5 6 7 Arrow is 16x PCIe Connection Source: Lambda
NVLink System Topology CPU-CPU CPU CPU Interconnect PEX PEX PEX PEX 8748 8748 8748 8748 Open Circle is CPU-CPU Comm GPU 0 GPU 1 GPU 4 GPU 5 Green Double Arrow is NVLink GPU 2 GPU 3 GPU 6 GPU 7 Arrow is 16x PCIe Connection Source: Lambda
Real Life Examples
Source: ASUS
Single Root Complex vs Dual Root Complex Single Root Complex Dual Root Complex (4029GP-TRT2) (4028GR-TRT) Source: Supermicro
1080 Ti GPUDirect Peer-to-Peer Bandwidth Benchmark 16x 16x 16x 16x 16x 16x Source: Lambda
No Peering on the new 2080 Ti Topology used in this experiment. (For the 1080 Ti, no NVLink.) Source: Lambda
Lambda Stack = GPU-enabled Frameworks For Ubuntu 16.04 or 18.04. One command: LAMBDA_REPO=$(mktemp) && \ wget -O${LAMBDA_REPO} https://lambdalabs.com/static/misc/lambda-stack-repo.deb && \ sudo dpkg -i ${LAMBDA_REPO} && rm -f ${LAMBDA_REPO} && \ sudo apt-get update && sudo apt-get install -y lambda-stack-cuda Also comes as a Docker Container. Source: https://lambdalabs.com/lambda-stack-deep-learning-software
Cost Comparison: On-prem vs. Cloud p3dn.24xlarge Instance Lambda Hyperplane AWS $109,008 once $160,308/year with reserved pricing (Add $15,000 / year if you want to co-locate instead.)
Cost Comparison: On-prem vs. Cloud p3.16xlarge Instance Lambda Blade AWS $28,389 once $139,371/year with reserved pricing (Add $15,000 / year if you want to co-locate instead.)
Cost Comparison: On-prem vs. Cloud p3.8xlarge Instance Lambda Quad AWS $12,472 once $69,729/year with reserved pricing
Thank You! Tweet @LambdaAPI @stephenbalaban LAMBDALABS.COM/BLOG
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.