In a recent report in Real World Technologies, chip guru David Kanter dissects the new 64-bit ARM design (ARMv8) and what it might mean to the IT landscape. His take on the architecture is almost uniformly positive, noting that not only did the designers manage to develop an elegant instruction set that was backwardly compatible with the existing ISA, but they also took the extra step to jettison a few of the poorly designed features of the 32-bit architecture.
Announced in October 2011, 64-bit ARM is the biggest makeover the processor architecture has received in its 26-year history. The first implementation in 1985, ARM1, was a 32-bit chip developed for Acorn Computers (ARM = Acorn RISC Machine). Although the architecture never caught on in the PC biz, it’s simple, low-power RISC design made it a natural for embedded/mobile SoC applications and microcontrollers.
While the server and personal computer world moved on to 64 bits, ARM was safely ensconced in the embedded/mobile space where 32 bits of addressing (basically 4 GB) was plenty. But now that devices like tablets and other mobile gadgets are pushing up against this limit, a larger address reach will soon become necessary. Also, the expanded address reach will allow ARM chips for the first time to enter the server market and compete against the x86, the processor architecture that has dominated the datacenter for decades.
In a sense, ARM is trying to duplicate the success of the x86 when it made its own jump from 32 to 64 bits in 2000. In that case, the 64-bit Intel Xeons and AMD Opertons ended up displacing a lot of their high-end RISC-based competition — especially SPARC and Power. If 64-bit ARM ends up cutting into the x86 share of the server market, it would be fitting revenge for the RISC faithful.
As mentioned, before the most critical enabling feature for 64-bit ARM is extending the address space. Although 64 bits could reach 16 exabytes, there’s little application demand to access data at that scale. For the time being, only 48 bits will be used to form an address, which gives software a 256 GB address reach. Presumably, additional address bits can be tacked on in the future as applications scale up.
With the ARMv8 design, integer and floating point structures are also being enhanced, with all general purpose registers being extended to 64 bits. The floating point design has been tweaked to support IEEE754-2008, including additional instructions to make the architecture compliant with the standard.
For vector operations, the changes are more extensive. In the 32-bit spec, the SIMD design (known as NEON) already contained 32 64-bit registers, which could be aliased to 16 128-bit pseudo-registers. For the 64-bit design, that’s been extended to 32 128-bit registers, with the lower half being used if only 64 bit values are needed. Not only does that double the capacity of the vector unit, it makes for a somewhat cleaner arrangement. The SIMD design also adds full IEEE support and double precision floating point operations.
Curiously missing form ARMv8 is multi-threading support, a feature common to all other major server CPUs — x86, SPARC, Power, and even HPC processors like the Blue Gene/Q ASIC (PowerPC A2). Kanter speculates that the ARM designers decided to forego multi-threading for now since it is notoriously difficult to validate, and the new design already encapsulated a lot of changes. Although the jury is still out on the aggregate benefit of this feature, for certain classes of software, the lack of multi-threading support could turn out to be a decided disadvantage.
Overall though, Kanter likes what ARM developers have come up with, which he says is “clearly a sound design that was well though out and should enable reasonable implementations.” As he notes though, there are currently no chip implementations around to judge the the architecture’s performance in the field.
But within a couple of years, we should see multiple 64-bit ARM SoCs at various segments of the market — everything from high performance computers to workstations. Applied Micro already has an FPGA implementation of ARMv8, which the company unveiled in October 2011 and subsequently demonstrated running on an Apache web server. Samsung, Qualcomm, Calxeda, Microsoft, Marvell and NVIDIA have either stated plans to implement a chip or have already bought licenses. At this point, NVIDIA is the only one that has specifically talked about a 64-bit ARM implementation (Project Denver) aimed at HPC, but Calxeda also has high performance computing on its radar.
Samsung is a particularly interesting entrant to the market. The Korean firm is mostly in the consumer electronics business and its involvement in the server space is currently confined to supplying DRAM and flash components. But Samsung would make a formidable competitor against Intel in the server chip arena if the company funneled its resources there. While Intel has more than twice Samsung’s revenue today, the latter company is growing at a much faster rate.
That led industry analyst firm IC Insights to project that Samsung would eclipse Intel as the world’s largest supplier of semiconductor parts by 2014. Coincidentally, that’s that same year the company plans to roll out its first 64-bit ARM server chips. As Kanter concluded: “Certainly, the next few years should be very interesting.”