Load Balancing in the AI Data Centers

Your video will begin in 10
48 Views
Published
AI/ML workloads in data centers generate distinct traffic called “Elephant flows.” These large amounts of remote direct memory access (RDMA) traffic are typically produced by graphics processing units (GPUs) in AI servers. It is essential to ensure that the fabric bandwidth utilization is efficient and works well even in situations of low entropy workloads. Juniper’s Arun Gandhi, Mahesh Subramaniam, and Himanshu Tambakuwala discuss the efficient load balancing techniques and their pros and cons within the AI data center fabric.

Managing the Elephant in the Room for AI Data Centers:
https://blogs.juniper.net/en-us/industry-solutions-and-trends/managing-the-elephant-in-the-room-for-ai-data-centers

RDMA Over Converged Ethernet Version 2 for AI Data Centers:
https://www.juniper.net/us/en/the-feed/topics/ai-and-machine-learning/rdma-over-converged-ethernet-version-2-for-ai-data-centers.html

AI Data Center Networking:
https://www.juniper.net/us/en/solutions/data-center/ai-infrastructure.html
Category
Juniper Networks
Tags
AI data center, load balancing, elephant flows
Be the first to comment