HPC Market Eyes $44B in 5 Years

HPC Market Eyes $44B

New report from Hyperion Research has the HPC+AI market growing to $44B, with a B, in 5 years. The industry is hitting on all cylinders, benefiting from

  • The ExaScale race,
  • AI coming to the enterprise only to find that it needs, or really is, HPC, depending on your point of view, and
  • it’s usual, sometimes slow but always steady, growth

The big news continues to be AI fundamentally bringing HPC closer to the mainstream of enterprise computing whether it is on-prem, in a co-location facility, or in a public cloud.

All of this is starting big changes in the industry. We see this in mergers and acquisitions (basically new companies), new technologies, new architectures, and new business models. An example of the latter is the loosening of chip licensing, with open source models starting to get attention. Unlike open source software, however, silicon needs a fab, and the necessary electronic design automation software applications don’t have equivalent open source alternatives.

Catch of the Week

Henry:

Following a supply chain security breach, Henry predicts that standards bodies like NIST and ISO will become even more active in this area with guidelines for hardware, software, and processes.

Shahin:

Shahin talks about Apple’s design chief, Jony Ive, leaving the company and shares some jokes on social media that fall flat for Dan and Henry, who probably claim it has nothing to do with them being such PC aficionados.

Jony Ive, Designer Who Made Apple Look Like Apple, Is Leaving to Start a Firm

Jony Ive, Apple’s chief design officer and one of the most influential executives in the history of the Silicon Valley giant, is leaving the company. Mr. Ive will depart this year to start his own design company, Apple said on Thursday. Through his new firm, LoveFrom, Mr. Ive will continue to work on a wide range of Apple products, the company said.

Dan:

Dan concludes the show without a “catch” this week!

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

Why did HPE buy Cray?

Why did HPE buy Cray?

The RFHPC team tackles the HPE-Cray acquisition as it reviews the companies’ recent moves and strengths and market conditions in the context of:

  • the 5-tier data center application architecture: Embedded, Mobile, Desktop, On-premises, Off-premises
  • the emergence of AI as a must-do enterprise app, and
  • increasing commonality between supercomputers and enterprise servers.

Catch of the Week

Henry:

Another week another breach!

Massive Quest Diagnostics data breach impacts 12 million patients

A massive data breach has struck Quest Diagnostics and the information of up to 11.9 million patients has potentially been compromised. On Monday, the US clinical laboratory said that American Medical Collection Agency (AMCA), a billing collections provider that works with Quest, informed the company that an unauthorized user had managed to obtain access to AMCA systems.

 

Dan:

Dan points out that the new Apple Mac Pro can be configured to cost tens of thousands of dollars. Given that he and Dan are PC people, the nuances of the Apple value are obviously lost of them, goes the counter argument.

Apple’s top spec Mac Pro will likely cost at least $35,000

That’s before you count the GPUs or a Pro Display XDR screen.
Apple announced today that its new Mac Pro starts at an already pricey $6,000, but the company neglected to mention how much the top-of-the-line model will cost. So we shopped around for equivalent parts to the top-end spec that Apple’s promising. As it turns out: $33,720.88 is likely the bare minimum — and that’s before factoring in the four GPUs, which could easily jack that price up to around $45,000.

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

 

TOP500 Jun2019, Facebook Coin

The new TOP500 list of most powerful supercomputers is out and we do our usual quick analysis. Not much changed in the TOP10 but a lot is changing further down the list. Here is a quick take:

  • There are 65 new entries in 2019.
  • US science is receiving support via DOE sites and academic sites like TACC.
  • 26 countries are represented. China continues to widen its lead, now with 219 entries, followed by the US with 116, Japan with 29, France with 19, the UK with 18, Germany with 14, Ireland and the Netherlands with 13 each, and Singapore with 10.
  • Vendors substantially reflect the country standings. Lenovo has 175 entries, Inspur 71, and Sugon 63, all in China. Cray with 42 and HPE with 40 (which will combine when their deal closes), followed by Dell at 17 and IBM at 16.  Bull has 21 entries.
  • There are a lot of “accidental supercomputers” on the list. These are systems that probably are not be doing much science or AI work but they could, and the vendors counted them and it seems to be within the rules to list them. It’s controversial but not a new practice.
  • There are several systems listed as “Internet” companies. Hard to tell what that means but it points to the existence of very large clusters in the cloud for whatever purpose. Last year, there was one system listed as Amazon EC2, which remains on the list. This time, there is also one at Facebook. Usually the big social/cloud players don’t care to participate, though they obviously could summon the resources to run the benchmarks.
  • Just over half of systems use Ethernet as a fabric. A quarter us InfiniBand, nearly 50 use Intel’s OmniPath, and the rest, 55, use custom interconnects like the ones Cray provides. The team talks about Cray+HPE entering the interconnect business for real and if so, they will be formidable.
  • The majority of entries, 367, do not have any accelerators. 125 use Nvidia GPUs.
  • The overwhelming majority of the systems, 478 of them, are based on Intel CPUs. 13 are IBM, and there is 1 system based on Arm provided by Cavium, now part of Marvell.
  • So the when it comes to chips, it’s an Intel game with a respectable showing by Nvidia when GPUs are used. Alternatives are bound to appear as the tens and tens of AI chips in the works become available and Arm, AMD, and IBM build on. The recently announced system at Oakridge will be all AMD, and that will point to an alternative as well.
  • Notably, Intel is listed as the vendor for 2 entries and Nvidia is listed for 4. While Intel has stayed largely away from looking like a system vendor, Nvidia is going for it with its usual alacrity. That, and the pending acquisition of Mellanox by Nvidia should serve as a warning to all system vendors who might feel stuck between treating Nvidia as an important supplier and an up and coming competitor.

CryptoSuper500

Shahin mentions the 2nd edition of the CryptoSuper500 list (really 50 for now), a list developed by his colleague Dr. Stephen Perrenod, which was launched last November, and is being released at the same time as the TOP500. The TOP500 has spawned variations that look at different workloads and attributes, for example, the Green500Graph500, and IO500 lists. CryptoSuper500 was inspired by those lists. The material for the inaugural edition of the CryptoSuper500 list here.

Cryptocurrency mining operations are often pooled and are very much supercomputing class, typically using accelerator technologies such as custom ASICs, FPGAs, or GPUs. Bitcoin is the most notable of such currencies. Scroll down for the top-10 list and see the slides for the full list and the methodology.

Catch of the Week

Henry:

Henry talks about check-out lanes at Target all being down for unknown reasons, though he hesitates to call that a cybersecurity breach. It turned out he’s right and the company blamed an “internal technology issue”.

Target down (then back up) as cash registers fail and leave long lines

Target’s payment systems appeared to be missing the mark the day before Father’s Day, as terminals went AWOL for a couple of hours in a number of the company’s US retail outlets. The outage caused long lines but prompted an encouraging show of sympathy for Target employees from people on Twitter. And there were some jokes too, of course.

Shahin:

Facebook is expected to release a new cryptocurrency that is already impacting the crypto market.

Here’s what we know so far about the secretive Facebook coin

Facebook is likely to release information about its secretive cryptocurrency project, codenamed Libra, as soon as June 18, TechCrunch reports.

As is traditional with new cryptocurrencies, the social networking giant is expected to release a so-called “white paper” outlining how the currency works and the company’s plans for it.

 

Dan:

Dan reminds us all of the inimitable Erich Anton Paul von Däniken and his ancient astronauts hypotheses!

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

Forty+ different AI chips

What are we going to do with 40+ different AI chips?

This week, the team looks at AI chips again, this time motivated by an article in EE Times about once such chip, Graphcore, and touts it as “the most complex processor” ever at some 20 billion transistors. The VC-backed company out of Bristol, UK is also valued on paper at $1.7b, gaining it the coveted “unicorn” status, apparently the “only western semi-conductor unicorn”.

This being one of 40+ such AI chips (and that may be conservative), the odds of success are tough and the task formidable. But even if only 2 or 3 of such chips become successful, that’s already a significant disruption to the market.

The Graphcore chip is 16nm, 1.6GHz, and comes in a PCIe card at 300W. You can stack 8 of these in a 4U chassis, so 2.4 kW just for those.

After a mini-rant about respected publications succumbing to clickbaits, the team talks about how cooling will be an issue and calls again for more clarity in performance metrics since the chip is rated at 125 TFlops but we don’t know at what precision. Shahin reminds the team of his suggestion to clarify things by including precision in the metric, like DFlops for double precision, and then S for single, H for half, and Q for quarter precision.

Henry talks about how hard it is to build and test complex software like this despite Shahin’s view that the modern software stack is too high so the chip need only be concerned with a couple of layers, codes are new and open to getting recompiled, it’s increasingly open source, cloud providers and large customers have the wherewithal to do the job, and traditional HPC customers have the willingness to do the work if performance enhancements are there.

No “Catch of the Week” this time since Henry had a hard stop. We’re used to it!

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

Enterprises go HPC, Chips go Open Source, China goes for the top spot

We continue to want to make these introductions pretty brief here but not this time, apparently! Here’s this week’s synopsis.

Nvidia GTC 2019 announcements

We discussed the recent GTC conference. Dan has been attending since well before it became the big and important conference that it is today. We get a quick update on what was covered: the long keynote, automotive and robotics, the Mellanox acquisition, how a growing fraction of enterprise applications will be AI.

In agreement with the message from GTC, Shahin re-iterates his long-held belief that the future of enterprise applications will be HPC and once again asserts that AI as we know it today is a subset of HPC. Not everyone agrees. Henry brings up varying precisions in AI and a discussion ensues about what is HPC. There seems to be agreement that regardless of what label you put on it, it is the same (HPC) industry and community that is driving this new trend. And that led to a discussion of selling into the enterprise and the need for new models and vocabulary and such.

Speaking of varying precision, there is also Nvidia’s new automatic mixed precision capability for Tensorflow and there is a bit of discussion on that.

China plans multibillion dollar investment in supercomputing

On the heels of the Aurora announcement, there was news in the South China Morning Post that the top spot in supercomputing is something the country is investing in. No surprise, but interesting to see, and consistent with the general view that supercomputing drives competitive strength.

Catch of the Week

Henry:

Facebook Stored Hundreds of Millions of User Passwords in Plain Text for Years

Hundreds of millions of Facebook users had their account passwords stored in plain text and searchable by thousands of Facebook employees — in some cases going back to 2012, KrebsOnSecurity has learned. Facebook says an ongoing investigation has so far found no indication that employees have abused access to this data.

Shahin:

MIPS R6 Architecture Now Available for Open Use

MIPS 32-bit and 64-bit architecture – the most recent version, release 6 – will become available Thursday (March 28) for anyone to download at MIPS Open web page. Under the MIPS Open program, participants have full access to the MIPS R6 architecture free of charge – with no licensing or royalty fees.

Dan:

Vengeful sacked IT bod destroyed ex-employer’s AWS cloud accounts. Now he’ll spent rest of 2019 in the clink

An irate sacked techie who rampaged through his former employer’s AWS accounts with a purloined login, nuking 23 servers and triggering a wave of redundancies, has been jailed.

 

Dead LAN’s hand: IT staff ‘locked out’ of data center’s core switch after the only bloke who could log into it dies

‘We can replace it but we have no idea what the config is on the device’

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

Multicore Scaling Slow Down, and Fooling AI

The team has an animated discussion about multicore scaling, how easy it seems to be to mislead AI systems, and some good sized catches of the week. A common thread is “data” as is often the case these days.

Dan makes a couple of important announcements.

First is the idea that is brewing about revamping the podcast.

Second, to the dismay of the vast number of his supporters, is his decision to not run for the highest office in 2020!

We continue with making these introductions pretty brief here. This time, we include not only the links but also the first paragraph of the linked page as a block quote so you have a bit more information about what is discussed.

Specialized Chips Won’t Save Us From Impending ‘Accelerator Wall’

As CPU performance improvements have slowed down, we’ve seen the semiconductor industry move towards accelerator cards to provide dramatically better results. Nvidia has been a major beneficiary of this shift, but it’s part of the same trend driving research into neural network accelerators, FPGAs, and products like Google’s TPU. These accelerators have delivered tremendous performance boosts in recent years, raising hopes that they present a path forward, even as Moore’s law scaling runs out. A new paper suggests this may be less true than many would like.

 

Nice ‘AI solution’ you’ve bought yourself there. Not deploying it direct to users, right? Here’s why maybe you shouldn’t

Top tip: Ask your vendor what it plans to do about adversarial examples.

RSA It’s trivial to trick neural networks into making completely incorrect decisions, just by feeding them dodgy input data, and there are no foolproof ways to avoid this, a Googler warned today.

 

Catch of the Week

MyEquifax.com Bypasses Credit Freeze PIN

Most people who have frozen their credit files with Equifax have been issued a numeric Personal Identification Number (PIN) which is supposed to be required before a freeze can be lifted or thawed. Unfortunately, if you don’t already have an account at the credit bureau’s new myEquifax portal, it may be simple for identity thieves to lift an existing credit freeze at Equifax and bypass the PIN armed with little more than your, name, Social Security number and birthday.

 

Announcing the Open Sourcing of Windows Calculator

Today, we’re excited to announce that we are open sourcing Windows Calculator on GitHub under the MIT License. This includes the source code, build system, unit tests, and product roadmap. Our goal is to build an even better user experience in partnership with the community. We are encouraging your fresh perspectives and increased participation to help define the future of Calculator.

 

Huawei Sues The US, Prodding It to Prove Suspicions

THE WORLD’S LARGEST telecommunications-equipment company, China’s Huawei, is suing the US government. But the suit isn’t just about US law. It’s part of Huawei’s larger campaign to defend its role as a global provider of telecom gear amid fears that its technology is or could be used by the Chinese government for spying. In essence, Huawei is challenging the US government to prove its suspicions.

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

AI: Realness and Bias

Starting with this episode, we’ll get a bit more efficient in describing the episodes. Please let us know if you prefer the long format. If you just subscribe on iTunes and never see these words, well, that tells us something too!

In this episode, the team discusses AI, bias in AI, and just how real actual AI out there is. Ethics in AI, policy, legal framework are all big threads here. The trigger is the rather funny article Artificial Intelligence, You Know it isn’t real, yeah?

Catch of the Week

Shahin applauds NIST’s new Risk Management Framework, and especially the inclusion of supply chain security, something he and Henry keep bringing up.

Henry discusses sensationalism in technical coverage by the example of an article that says blockchains can be hacked but lacks enough depth and thus fails to impress. As expected, Shahin comes to the defense of the technology, explaining that it depends on the consensus algorithm and participation, etc. not just blockchain per se. Discussion ensues about all manner of blockchains and the spectrum that is forming there with permissioned and permissionless chains.

Dan: In a switch from uplifting news to scary ones, Dan shares the news that Kalashnikov rolls out a weaponized suicide drone.

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

Nvidia, Mellanox: Married!

Big news in the industry today was Nvidia buying Mellanox for $6.9B. This called fo an emergency session of our crack panel.

While it will be several months before the full impact of this merger is felt, the RFHPC team believes this will change both the HPC and the Datacenter markets. It also signals Nvidia’s journey towards becoming more of a systems company and gives them a better shot at the enterprise AI market.

This is also good news for all the alternatives in the market, Shahin and Henry believe. There are a large number of AI chips in the works around the globe, and a growing number of interconnect options on the market. They will now have a chance to present themselves as a more neutral option.

Since the combined company will now represent a bigger portion of the total bill, it has a strengthened hand in the face of growing competition, while, on the other hand, becoming a more visible part of the total system cost, inviting new competition.

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

What’s an AI Supercomputer? What’s up with software SMP?

We start our discussion by contemplating the fact that Shahin doesn’t have a middle name (he says he never needed one) and touching on why Henry has picked up the nick name ‘Gator’ Newman.

What’s an AI supercomputer?

Our first topic is whether a supercomputer can or cannot be a “AI Supercomputer.” This is based on France (along with HPE) unveiling a new AI system which will double the capacity of French supercomputing. So what are the differences between a traditional super and a AI super. According to Dan, it mostly comes down to how many GPUs the system is configured with, while Shahin and Henry think it has something to do with the datasets. Send us a note or a tweet if you have an opinion on this.

Software SMP hits 10k

The guys also discuss ScaleMP and how their announcement of record results, with close to 10,000 customers as of the close of 2018. This led to talk about SMP vs. MPP from a performance standpoint. Henry asserted that a clustered approach will always be superior to a big SMP approach, all things being equal. Dan doesn’t agree and Shahin confesses his love of ‘fat node’ clustering. Dan agrees with Shahin, but wonders why no one is doing it.

We also note that Mellanox got a nice design win with the Finns, as they’ll be installing 200 Gb/s HDR InfiniBand interconnect in a new Finnish supercomputer to be deployed in 2019 and 2020. The interconnect will be used in a Dragonfly topology.

Catch of the Week

  1. Shahin’s catch of the week is a mathematical puzzle titled “The most unexpected answer to a counting puzzle.” Here’s a link to the video.
  2. Dan likes a good comeback story and in light of that, his catch of the week is AMD nabbing a design win at Nikhef.
  3. Henry HAS NO CATCH OF THE WEEK. This makes him the “RF-HPC Villain of the Week” 🙂

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

RFHPC213: Running Down the TOP500 at SC18

In this podcast, the Radio Free HPC team looks back on the highlights of SC18 and the newest TOP500 list of the world’s fastest supercomputers.

Buddy Bland shows off Summit, the world’s fastest supercomputer at ORNL.

The latest TOP500 list of the world’s fastest supercomputers is out, a remarkable ranking that shows five Department of Energy supercomputers in the top 10, with the first two captured by Summit at Oak Ridge and Sierra at Livermore. With the number one and number two systems on the planet, the “Rebel Alliance” vendors of IBM, Mellanox, and NVIDIA stand far and tall above the others.

Summit widened its lead as the number one system, improving its High Performance Linpack (HPL) performance from 122.3 to 143.5 petaflops since its debut on the previous list in June 2018. Sierra also added to its HPL result from six months ago, going from 71.6 to 94.6 petaflops, enough to bump it from the number three position to number two. Both are IBM-built supercomputers, powered by Power9 CPUs and NVIDIA V100 GPUs.

Sierra’s ascendance pushed China’s Sunway TaihuLight supercomputer, installed at the National Supercomputing Center in Wuxi, into third place. Prior to last June, it had held the top position on the TOP500 list for two years with its HPL performance of 93.0 petaflops. TaihuLight was developed by China’s National Research Center of Parallel Computer Engineering & Technology (NRCPC).

In this video from ISC 2018, Yan Fisher from Red Hat and Buddy Bland from ORNL discuss Summit, the world’s fastest supercomputer. Red Hat teamed with IBM, Mellanox, and NVIDIA to provide users with a new level of performance for HPC and AI workloads.

Tianhe-2A (Milky Way-2A), deployed at the National Supercomputer Center in Guangzho, China, is now in the number four position with a Linpack score of 61.4 petaflops. It was upgraded earlier this year by China’s National University of Defense Technology (NUDT), replacing the older Intel Xeon Phi accelerators with the proprietary Matrix-2000 chips.

Top-500, Green-500, IO-500, HPCG, and now CryptoSuper-500 all point to growing versatility of supercomputers,” said Shahin Khan from OrionX. “It’s time to more explicitly recognize that. Counting systems which are capable of doing Linpack but In fact are doing something else continues to be an issue. We need additional info about systems so we can tally them correctly and make this less of a game.”

At number five is Piz Daint, a Cray XC50 system installed at the Swiss National Supercomputing Centre (CSCS) in Lugano, Switzerland. At 21.2 petaflops, it maintains its standing as the most powerful system in Europe. It is powered by a combinations of Intel Xeon processors and NVIDIA Tesla P100 GPUs

Trinity, a Cray XC40 system operated by Los Alamos National Laboratory and Sandia National Laboratories improved its performance to 20.2 petaflops, enough to move it up one position to the number six spot. It uses Intel Xeon Phi processors, the only top ten system to do so.

The AI Bridging Cloud Infrastructure (ABCI) installed in Japan at the National Institute of Advanced Industrial Science and Technology (AIST) is listed at number seven with a Linpack mark of 19.9 petaflops. The Fujitsu-built system is powered by Intel Xeon Gold processors, along with NVIDIA Tesla V100 GPUs.

Germany provided a new top ten entry with SuperMUC-NG, a Lenovo-built supercomputer installed at the Leibniz Supercomputing Centre (Leibniz-Rechenzentrum) in Garching, near Munich. With more than 311,040 Intel Xeon cores and an HPL performance of 19.5 petaflops, it captured the number eight position.

Titan, a Cray XK7 installed at the DOE’s Oak Ridge National Laboratory, and previously the most powerful supercomputer in the US, is now the number nine system. It achieved 17.6 petaflops using NVIDIA K20x GPU accelerators.

Sequoia, an IBM BlueGene/Q supercomputer installed at DOE’s Lawrence Livermore National Laboratory, is the 10th-ranked TOP500 system. It was first delivered in 2011, achieving 17.2 petaflops on HPL.

See our complete coverage of SC18

Sign up for our insideHPC Newsletter

Download the MP3 * Subscribe on iTunes * RSS Feed