首页 > 代码库 > AARCH64 ARMV8 NEON的变动

AARCH64 ARMV8 NEON的变动



  • Access to a larger general-purpose register file with 31 unbanked registers (0-30), with each register extended to 64 bits.

    31个通用寄存器,外加一个r31作为zero register。

  • Floating point and Advanced SIMD processing share a register file, in a similar manner to AArch32, but extended to thirty-two 128-bit registers. Smaller registers are no longer packed into larger registers, but are mapped one-to-one to the low-order bits of the 128-bit register

    32个NEON的v寄存器,全长都是128bits,从以前的16个翻了一番。因此,之前的4×32=2×64=128的组合不适用了。它们都是单独存在的。例如S0 S1的S1就不再是D0的一半了。

  • Unaligned addresses are permitted for most loads and stores, including paired register accesses, floating point and SIMD registers, with the exception of exclusive and ordered accesses

    引入对成对的寄存器的非对齐访问

  • There are no multiple register LDM, STM, PUSH and POP instructions, but load-store of a non-contiguous pair of registers is available.

  • The A64 instruction set does not include the concept of predicated or conditional execution. Benchmarking shows that modern branch predictors work well enough that predicated execution of instructions does not offer sufficient benefit to justify its significant use of opcode space, and its implementation cost in advanced implementations.

    由于分支预测器已经足够好,不会再有分支预测或者条件执行指令了。很奇怪,条件分支跳转指令不是依然有吗?这里不太理解。

  • The first eight registers, r0-r7, are used to pass argument values into a subroutine and to return result values from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls)

    通用寄存器传参由以前的4个增加到7个。

  • The first eight registers, v0-v7, are used to pass argument values into a subroutine and to return result values from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls).

    向量寄存器传参也有7个了。
    Registers v8-v15 must be preserved by a callee across subroutine calls; the remaining registers (v0-v7, v16-v31) do not need to be preserved (or should be preserved by the caller). Additionally, only the bottom 64-bits of each value stored in v8-v15 need to be preserved; it is the responsibility of the caller to preserve larger values.

    v8-v15在子函数调用时必须要保留,但是只保留低64bits。

  • Floating point support is similar to AArch32 VFP but with some extensions.


offical标准文档

【1】:Procedure Call Standard for the ARM 64-bit Architecture http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf

如果你需要port汇编程序,可以参考一下这个

【2】:http://www.slideshare.net/linaroorg/lce13-gwggfxonarmv8



AARCH64 ARMV8 NEON的变动