Is Cloud Too Expensive for HPC?

Is cloud too expensive for HPC?

Enquiring minds want to know, as does the HPC community whose single-minded obsession with maximum price-performance is notorious and legendary. The Radio Free team looks at actual cloud pricing based on available data and Dan’s research which fuel a hearty discussion.

They look at configurations, compare prices, talk about the costs that are not included, segment the market, and then segment the applications.

Catch of the Week

Henry:

Henry highlights of the importance of having external 3rd party teams and defined processes (FIPS, Common Criteria, GDPR, etc.) test your equipment. This follows the detection of vulnerabilities in a data center class SSD. Nobody can disagree with that, of course.

Shahin:

Reflections on Trusting Trust, Turing Award Lecture by Ken Thompson

To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.

[…]

In college, before video games, we would amuse our- selves by posing programming exercises. One of the favorites was to write the shortest self-reproducing pro- gram. Since this is an exercise divorced from reality, the usual vehicle was FORTRAN. Actually, FORTRAN was the language of choice for the same reason that three-legged races are popular.

 

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

 

TOP500 Jun2019, Facebook Coin

The new TOP500 list of most powerful supercomputers is out and we do our usual quick analysis. Not much changed in the TOP10 but a lot is changing further down the list. Here is a quick take:

  • There are 65 new entries in 2019.
  • US science is receiving support via DOE sites and academic sites like TACC.
  • 26 countries are represented. China continues to widen its lead, now with 219 entries, followed by the US with 116, Japan with 29, France with 19, the UK with 18, Germany with 14, Ireland and the Netherlands with 13 each, and Singapore with 10.
  • Vendors substantially reflect the country standings. Lenovo has 175 entries, Inspur 71, and Sugon 63, all in China. Cray with 42 and HPE with 40 (which will combine when their deal closes), followed by Dell at 17 and IBM at 16.  Bull has 21 entries.
  • There are a lot of “accidental supercomputers” on the list. These are systems that probably are not be doing much science or AI work but they could, and the vendors counted them and it seems to be within the rules to list them. It’s controversial but not a new practice.
  • There are several systems listed as “Internet” companies. Hard to tell what that means but it points to the existence of very large clusters in the cloud for whatever purpose. Last year, there was one system listed as Amazon EC2, which remains on the list. This time, there is also one at Facebook. Usually the big social/cloud players don’t care to participate, though they obviously could summon the resources to run the benchmarks.
  • Just over half of systems use Ethernet as a fabric. A quarter us InfiniBand, nearly 50 use Intel’s OmniPath, and the rest, 55, use custom interconnects like the ones Cray provides. The team talks about Cray+HPE entering the interconnect business for real and if so, they will be formidable.
  • The majority of entries, 367, do not have any accelerators. 125 use Nvidia GPUs.
  • The overwhelming majority of the systems, 478 of them, are based on Intel CPUs. 13 are IBM, and there is 1 system based on Arm provided by Cavium, now part of Marvell.
  • So the when it comes to chips, it’s an Intel game with a respectable showing by Nvidia when GPUs are used. Alternatives are bound to appear as the tens and tens of AI chips in the works become available and Arm, AMD, and IBM build on. The recently announced system at Oakridge will be all AMD, and that will point to an alternative as well.
  • Notably, Intel is listed as the vendor for 2 entries and Nvidia is listed for 4. While Intel has stayed largely away from looking like a system vendor, Nvidia is going for it with its usual alacrity. That, and the pending acquisition of Mellanox by Nvidia should serve as a warning to all system vendors who might feel stuck between treating Nvidia as an important supplier and an up and coming competitor.

CryptoSuper500

Shahin mentions the 2nd edition of the CryptoSuper500 list (really 50 for now), a list developed by his colleague Dr. Stephen Perrenod, which was launched last November, and is being released at the same time as the TOP500. The TOP500 has spawned variations that look at different workloads and attributes, for example, the Green500Graph500, and IO500 lists. CryptoSuper500 was inspired by those lists. The material for the inaugural edition of the CryptoSuper500 list here.

Cryptocurrency mining operations are often pooled and are very much supercomputing class, typically using accelerator technologies such as custom ASICs, FPGAs, or GPUs. Bitcoin is the most notable of such currencies. Scroll down for the top-10 list and see the slides for the full list and the methodology.

Catch of the Week

Henry:

Henry talks about check-out lanes at Target all being down for unknown reasons, though he hesitates to call that a cybersecurity breach. It turned out he’s right and the company blamed an “internal technology issue”.

Target down (then back up) as cash registers fail and leave long lines

Target’s payment systems appeared to be missing the mark the day before Father’s Day, as terminals went AWOL for a couple of hours in a number of the company’s US retail outlets. The outage caused long lines but prompted an encouraging show of sympathy for Target employees from people on Twitter. And there were some jokes too, of course.

Shahin:

Facebook is expected to release a new cryptocurrency that is already impacting the crypto market.

Here’s what we know so far about the secretive Facebook coin

Facebook is likely to release information about its secretive cryptocurrency project, codenamed Libra, as soon as June 18, TechCrunch reports.

As is traditional with new cryptocurrencies, the social networking giant is expected to release a so-called “white paper” outlining how the currency works and the company’s plans for it.

 

Dan:

Dan reminds us all of the inimitable Erich Anton Paul von Däniken and his ancient astronauts hypotheses!

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

Amdahl’s Law and GPUs, Asian Student Cluster Competition

Results of the Asian Student Cluster Competition

In this episode, Dan has just come back from China and reviews the results of the Asian Student Cluster Competition and HPC workshop.
For the first time, a non-mainland-Chinese team wins the top spot. Taiwan takes the gold in part by their stellar performance in HPCG benchmark where they achieved 2 TFlops, some 25% better than the 2nd best team. The system was a 5-node cluster with Infiniband FDR interconnect. Other interesting info is shared on various codes and configurations.

GPUs and Amdahl’s Law

Dan also mentions that reports from some of the TOP500 sites suggest that GPUs are doing 93-97% of the computation. This sounds very impressive but Shahin points out that since GPUs have hundreds of cores, they should be doing much better, that 93-97% is in fact not as good as it should be at that scale of system and problem size. He is still waiting for some actual utilization data on GPUs too.

Catch of the Week

Henry:

Henry points out many security cameras, offered by several brands but are all manufactured by the same vendor back in China, have big time vulnerabilities so he’s staying away from all of them until further notice. Shahin wonders why they are called “security” cameras!

P2P Weakness Exposes Millions of IoT Devices

A peer-to-peer (P2P) communications technology built into millions of security cameras and other consumer electronics includes several critical security flaws that expose the devices to eavesdropping, credential theft and remote compromise, new research has found.

Shahin:

Shahin talks about Jaguar-Land Rover planning to offer a cryptocurrency wallet to reward drivers that participate in providing traffic and other types of data. He likes their catch phrase: zero emission, zero accident, zero congestion.

Drivers will be able to earn cryptocurrency and make payments on the move using innovative connected car services being tested by Jaguar Land Rover.

 

Dan:

Dan laments the confiscation of his external camera battery at the airport in China because the spec label was a little worn off and the authorities could not read it to ascertain its safety despite his willingness to get a note from the airline, etc.  Nice expensive battery, but at a medium-sized paperback book, maybe following the rules strictly is not a bad idea.

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

Weather Forecasting Goes Crowdsourcing, Q means Quantum

In this episode of Radio Free HPC, Dan, Henry, and Shahin start with a spirited discussion of IBM’s recent announcement regarding their crowd sourced weather prediction application. Henry was dubious as to whether Big Blue could get access to the data they need in order to truly put out a valuable product. Dan had questions about the value of the crowd sourced data and how it could be scrubbed in order to be useful. Shahin was pretty favorable towards IBM’s plans and believes that they will solve the problems that Henry and Dan raised.

IBM came up again in the show as the boys kick around IBM’s quantum computing commercial system. Shahin brought out the point that for a market that has few applications and success stories, it attracted nearly every big vendor in the business.

Catch of the Week:

Henry told the guys about a new security flaw as pointed out by Krebs, this one concerning an exploit of credit cards.

Shahin talked about the newly proposed Deep500 benchmark, designed to compare deep learning and inference performance.

Dan discussed a recent interview with a VC who believed that by 2035, more than 40% of jobs world wide would be taken over by AI. This prompted a discussion of how technology has impacted employment and the economy in the past and how the accelerating pace of economic displacement in the era of AI is much quicker than in any other time.

We end the episode by denouncing attorneys.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

RFHPC180: How NVIDIA’s new EULA Bans Consumer GPUs in the Datacenter

In this podcast, the Radio Free HPC team looks at NVIDIA’s new EULA GeForce Software user license, which prohibits the use of consumer GPUs in the datacenter. We are not looking to beat up on anyone, but our focus is on what limitations this might mean for the industry:

No Datacenter Deployment. The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted.

When we purchase hardware, aren’t we free to use it any way we please? And why do BitCoin miners get a pass during a worldwide GPU shortage?

Plus, what do we mean by the word “datacenter?” anyway? Shahin predicts the imminent proliferation of 5G networking capabilities will move computing closer to the edge, thereby changing what we mean by the term “datacenter.”

After that, we do our Catch of the Week:

  • Shahin likes the news about the Hasselblad 400 Megapixel camera with sensor-shift technology.
  • Rich points us to a D-Wave Seminar from SC17 that does a great job of explaining quantum computing and what type of applications can be adapted to take advantage of it.
  • Dan is not swayed by the news that the city of Barcelona is switching from Windows to Linux. He reminds us that Munich tried the same thing years ago and ended up switching back.

Download the MP3 * Subscribe on iTunes * RSS Feed