On the eve of Computex, Taiwan’s big showpiece event where PC makers roll out the latest and best implementations of Intel CPUs, mobile rival ARM is announcing its own big news with the unveiling of a new generation of ARM CPUs and GPUs. Official today, the ARM Cortex-A75 is the new flagship-tier mobile processor design, with a claimed 22 percent improvement in performance over the incumbent A73. It’s joined by the new Cortex-A55, which has the highest power efficiency of any mid-range CPU ARM’s ever designed, and the Mali-G72 graphics processor, which also comes with a 25 percent improvement in efficiency relative to its predecessor G71.
The efficiency improvements are evolutionary and predictable, but the revolutionary aspects of this new lineup relate to artificial intelligence: this is the first set of processing components designed specifically to tackle the challenges of onboard AI and machine learning. Plus, last year’s updates to improve performance in the power-hugry tasks of augmented and virtual reality are being extended and elaborated.
Before we dive into the detail of this year’s changes, it’s worth recapping what ARM does and why it’s important. This English company, now owned by Japan’s SoftBank, is responsible for designing the processor architecture of practically every mobile device — you’ll have heard of Qualcomm’s Snapdragon, Samsung’s Exynos, and Apple’s A-series of mobile chips, all of which are built using ARM’s instruction sets and based on ARM’s design blueprints. When we talk about the oncoming wave of mobile AI, mobile VR, and smartphones that can perform machine-learning tasks without sending them off to processor farms up in the cloud, developing the capabilities for those tasks starts with ARM.
The new Cortex-A75 and A55 are the first Dynamiq CPUs from ARM. Dynamiq is the branding chosen to describe a much more flexible set of design options for silicon vendors like Qualcomm. Where previously ARM allowed for designs that paired a cluster of so-called big CPUs (from its A7x class) and a matched number of little CPUs (from the A5x series), the new design makes it possible to spec a single, mixed-up cluster composed of both big and little CPUs, to a maximum of eight. Thus, chip makers can now have, for example, seven little A55 cores and just one big A75 one: for a favorable mix of long battery life, cost efficiency, and a high ceiling of single-threaded performance when it’s called for.
"50x improvement in AI performance over the next three to five years"
ARM marketing chief John Ronco says he anticipates a "50x improvement in AI performance over the next three to five years thanks to better architecture, micro-architecture, and software optimizations." ARM’s Dynamiq changes include a redesigned memory subsystem and tweaks to how CPU caches work — which has led to a doubling of memory streaming performance on the A55 relative to the A53 preceding it. Given that the A53 has shipped on 1.7 billion devices over the past three years, it’s truly the A55 that will make the biggest difference in achieving Ronco’s ambitious forecast. In most applications, the new mid-range core will be 10 to 30 percent better than previously, offering up to 15 percent better power efficiency and 18 percent better single-thread performance. But it’s the fact that the new chip designs will be 10 times more configurable, with up to 3,000 different configurations, that will allow chipmakers far greater flexibility to make the most of them by tailoring them to specific tasks.
The Cortex-A75 makes double-figure performance improvements across the board
Interestingly, ARM won’t just be powering machine learning with its new chips, it’ll benefit from ML too. The new designs benefit from an improved branch predictor that uses neural network algorithms to improve data prefetching and overall performance.
The Cortex-A75 makes double-figure performance improvements across the board, with ARM claiming it’s on average 22 percent better than the A73, with 16 percent higher memory throughput, and a 34 percent improvement in its Geekbench score. Single-threaded performance, according to ARM’s Ronco, is up by 20 percent, purely by improving the instructions-per-clock efficiency. The A75 chip is roughly 2.5x the size of the A55, and its intended uses are for infrastructure, automotive, and rich mobile applications. Yes, that means VR, AR, and high-fidelity games, the latter of which ARM’s research has shown have been rapidly increasing in popularity.
A major architectural change with the A75 is the opening of a larger power envelope for chips using this core, scaling up to 2W of power consumption, and thus offering up to 30 percent of extra performance on larger-screen devices. This is entirely targeted at the upcoming Windows on ARM reboot, expected later this year. It’s worth noting that in ARM’s world a "large" screen basically amounts to a laptop — and the company set up a dedicated Large Screen Compute division a year and a half ago to more aggressively target the clamshell devices that Intel has been dominant in.
As to the new Mali GPU, it has 32 shader cores, 25 percent higher energy efficiency, and a 20 percent better performance density (aka performance per mm² of space). The Mali-G72 is at the heart of ARM’s push toward improving machine learning efficiency, and ARM claims it’s showing itself to be 17 percent better than the G71 in ML benchmarks. The design optimizations from the company are tailored to accelerate inference engines rather than training engines — that’s to say ARM chips will be best at using accumulated ML capabilities rather than developing them, which makes perfect sense for mobile applications. Training AI will be a task better left to Nvidia and AMD graphics cards or Google’s custom TensorFlow TPUs.
The Cortex-A75 and A55 designs were released to ARM’s partners at the end of 2016, so by this point, they’ve all had a few months to decide what to do with them. ARM says a "realistic time window" for new mobile devices powered by its latest designs would be the first quarter of 2018 — though the company is also conscious of a new phenomenon it describes as "China speed," where Chinese phone vendors will put its designs into products almost immediately. The Huawei Mate 9, for example, was released just eight months after ARM distributed the Mali-G71 to partners. This faster Chinese cadence could lead to some A75- and A55-based designs this year, but then bulk of them are likely to arrive with the usual smartphone refresh cycle early next year.