This post was originally published by Kyle Wiggers at Venture Beat
Arm today announced a suite of technologies intended to make it easier for autonomous car developers to bring their designs to market. According to the company, integrating three new processors onto a system-on-chip — the Arm Cortex-A78AE processor, Mali-G78AE graphics processor, and Mali-C71AE image signal processor — provides the power-efficient and safety-enabled processing required to achieve the potential of autonomous decision-making.
While fully autonomous vehicles or driverless cars might be years away from commercial deployment, automation features built into advanced driver assistance systems (ADAS) could help reduce the number of accidents by up to 40%. That’s critical, given that 94% of road traffic accidents occur due to human error, according to the U.S. National Highway Traffic Safety Administration, and it’s perhaps why the global ADAS market is projected to grow from $27 billion in 2020 to $83 billion by 2030. (Arm estimates automation in automotive and industrial sectors will be an $8 billion silicon opportunity in 2030.)
Arm says the Cortex-A78AE, Mali-G78AE, and Mali-C71AE — specialized versions of the existing Cortex-A78, Mali-G78, and Mali-C71 — are engineered to work in combination with supporting software and tools to handle autonomous vehicle workloads. On the software front, Arm offers Arm Fast Models, which can be used to build functionally accurate virtual platforms that enable software development and validation ahead of hardware availability. There’s also Arm Development Studio, which includes the Arm Compiler for Safety qualified by TÜV SÜD, one of the nationally recognized German testing laboratories providing vehicular inspection and product certification services.
The Cortex-A78AE is the successor to the Cortex-A76AE (which was announced a little less than two years ago), and Arm says the microarchitecture has been revamped on a number of fronts. It features additional fetch bandwidth, improved branch detection, and a memory subsystem with 50% higher bandwidth than the previous generation. But the Cortex-A78AE’s standout feature is perhaps the macro-operation cache, a structure designed to hold decoded instructions that decouples the fetch engines and execution to support dynamic code sequence optimizations.
Arm says these innovations together drive an over 30% performance improvement on the Spec2006 synthetic benchmark suite across both integer and floating-point routines. Moreover, they contribute to the Cortex-A78AE’s power efficiency. The Cortex-A78AE achieves targeted performance at 60% lower power on a 7-nanometer implementation and a 25% performance boost at the same power envelope.
Arm is touting the Cortex-A78AE’s security and privacy features as major platform advances. Pointer Authentication (PAC) ostensibly shores up vulnerabilities in Return-Oriented-Programming — statistically, the most common form of software exploit — by providing a cryptographic check of stack addresses before they’re put on the program counter. Temporal diversity guards against common cause failures while line lockout support avoids hitting bad locations in the cache structures. And a hybrid-mode allows shared DSU-AE logic to continue operating in a “lock mode” while the processors remain independent, permitting individual processors to be taken offline for testing while the cluster itself remains available for compute.
The Cortex-A78AE can be scaled in processor clusters up to a maximum of four cores and in a variety of cache sizes across L1, L2, and L3. Multiple clusters can be grouped together to offer a many-core implementation (including a Cortext-A78AE and Cortex-A65AE), optionally with accelerators over the chip’s Accelerator Coherence Port.
Complementing the Cortex-A78AE is the new Mali-G78AE, a graphics component Arm says addresses the need for heterogeneous compute in autonomous systems. The Mali-G78AE GPU offers a new approach for resource allocation with a feature called flexible partitioning, which enables graphics resources to be dedicated to different workloads while remaining separate from each other. Basically, the Mali-G78AE can be split to look like multiple GPUs within a system, with up to four dedicated partitions for workload separation that can be individually powered up, powered down, and reset with separate memory interfaces for transactions.
The Mali-G78AE scales from one shader core — the fundamental building block of Mali GPUs — to 24 shader cores. With the new architecture, this means scaling from one slice with one shader core up to eight slices, each with three shader cores. Slices come with independent memory interfaces, job control, and L2 cache to ensure separation for safety and security, and the slices can be grouped together in up to four partitions configurable in software. (The Mali-G78AE can be assembled as one large partition with eight slices and 24 shader cores or four smaller partitions sized according to workload needs.)
The Mali-G78AE also includes dedicated hardware virtualization, meaning that the GPU as whole (i.e. each individual partition) can be virtualized between multiple virtual machines. Beyond this, it comes with safety features, including lock-step, built-in self-testing, interface parity, isolation checks, and read-only memory protection.
The last of the three chips unveiled today — the Mali-C71AE — leverages hardware safety mechanisms and diagnostic software to prevent and detect faults and ensure “every-pixel reliability.” In fact, Arm says the Mali-C71AE is the first product in the Mali camera series of ISPs with built-in features for functional safety applications.
The Mali-C71AE supports up to four real-time camera inputs or 16 camera streams from memory. Camera inputs can be processed in a range of ways, including in as-received order, in a programmed order, or in various other software-defined patterns. Advanced spatial noise reduction, per-exposure noise profiling, and chromatic aberration correction deliver optimized data for computer vision applications and real-time safety features for ADAS and human-machine interface applications, enabling system-level functional safety compliance with over 400 dedicated fault-detection circuits and built-in self-test. Moreover, with its 24-bit processing of ultra-wide dynamic range, the Mali-C71AE offers independent dynamic range management, region-of-interest crops, and planar histograms for further analysis.
Arm says all of the new hardware is available to partners as of today.
This post was originally published by Kyle Wiggers at Venture Beat