This is one of the most exciting posts I’ve written in my 15+ years of existence in the High Performance Computing (HPC) industry.
AMD EPYC Zen 3 “Milan” is integrated, tested and available on the HPCBOX platform on Microsoft Azure, on LAUNCH DAY!
This is something which seldom happens in the HPC world, very few users ever get to start production on day one of the release of a new and very impressive processor generation upgrade.
I will not be doing any technical comparison between Zen 3 and Zen 2 and neither will I present any specific application benchmark numbers in this post. I am sure there’ll be many posts by AMD, Microsoft and other ISVs who’ll be publicly sharing information on the performance boost they see for their codes with EPYC Zen 3 “Milan”. This post will be more about HPCBOX and how we could deliver this upgrade experience for our users, on launch day and without having to set our hands on the physical hardware or physically be present in a datacenter(s) (actually multiple regions)! Awesome work and support from the Azure team!
Drizti was a launch partner for the new HBv3 instance size on Microsoft Azure and these instances are powered by the new EPYC 7xx3/Milan CPUs, HDR InfiniBand and sport very impressive dual NVMe drives which give a big performance boost to applications which use local scratch, specifically when they are striped.
Read more about HBv3 here
Unbelievably quick CPU generation upgrade
It all started with Drizti getting access to 1000+ cores of HBv3 instances for functional and compatibility testing, to make sure we are ready for GA availability of HBv3 on launch day of AMD EPYC ”Milan” CPUs. We went through testing, adding necessary support within the HPCBOX platform to make sure we are able to use the cool new features available on the instances, like automate striping of NVMe, testing out MPI compatibility, testing the workflow component of HPCBOX, auto-scaling/shutdown/start etc. and all other functions which are offered by the HPCBOX platform. In addition to this, we also did some performance tests with different ISV and open-source codes, mainly to make sure the HPCBOX workflow engine can correctly handle the new instances and automatically optimize application pipelines to take advantage of the new hardware.
Some of the applications we tested were ANSYS CFX, ANSYS Fluent, OpenFOAM. At a high level, we can share that we are seeing impressive performance benefits of using EPYC “Milan”, particularly for large runs. Also, for applications which are local scratch bound, we expect users to get a good boost due to the possibility of having striped NVMes in HBv3.
We also did a test to see how easy it would be to upgrade HPC for our users and were really impressed that we could just switch our users from HBv2 to HBv3, meaning from “Rome” to “Milan” and to upgraded local scratch in under 30 minutes! This is mainly because of the design of the HPCBOX platform, it is a self-contained platform, and this gives us the ability to fine tune platform capabilities quickly without depending on external services to get upgraded first.
Agile and Impressive!