目录
1 ARM发展
2 ARM版本
3ARM系列说明
3.1ARM7系列
3.2ARM9系列
3.3ARM11系列
3.4Cortex-R系列
3.5Cortex-M系列
3.6Cortex-A系列
4ARM 内核时间表
5ARM第三方设计公司
ARM是Advanced RISC Machine的缩写,即进阶精简指令集机器。arm更早称为Acorn RISC Machine,是一个32位精简指令集(RISC)处理器架构。也有基于ARM设计的派生产品,主要产品包括Marvell的XScale架构和和德州仪器的OMAP系列。ARM家族中32位嵌入式处理器占比达75%,由于ARM的低功耗特性,被广泛反应于移动通信领域、便携式设备等领域。
1983年Acorn电脑公司(Acorn Computers Ltd)开始开发一颗主要用于路由器的Conexant ARM处理器,由Roger Wilson和Steve Furber带领团队,着手开发一种新架构,类似进阶的MOS Technology 6502处理器。Acorn有一大堆建构在6502架构上的电脑。该团队在1985年时开发出ARM1 Sample版,并于次年量产了ARM2,ARM2具有32位的数据总线、26位的寻址空间,并提供64 Mbyte的寻址范围与16个32-bit的暂存器。
在1980年代晚期,苹果电脑开始与Acorn合作开发新版的ARM核心。1990年将设计团队另组成一间名为安谋国际科技(Advanced RISC Machines Ltd.)的新公司,。1991年首版ARM6出样,然后苹果电脑使用ARM6架构的ARM 610来当作他们Apple Newton PDA的基础。在1994年,Acorn使用ARM 610做为他们Risc PC电脑内的CPU。
ARM是一家微处理器行业的知名企业,该企业设计了大量高性能、廉价、耗能低的RISC (精简指令集)处理器,它只设计芯片而不生产。ARM的经营模式在于出售其知识产权核(IP core),将技术授权给世界上许多著名的半导体、软件和OEM厂商,并提供技术服务。
ARM的版本分为两类,一个是内核版本,一个处理器版本。内核版本也就是ARM架构,如ARMv1、ARMv2、ARMv3、ARMv4、ARMv5、ARMv6、ARMv7、ARMv8等。处理器版本也就是ARM处理器,如ARM1、ARM9、ARM11、ARM Cortex-A(A7、A9、A15),ARM Cortex-M(M1、M3、M4)、ARM Cortex-R,这个也是我们通常意义上所指的ARM版本。
ARM版本信息简化表如下表所示。
内核(架构)版本 |
处理器版本 |
ARMv1 |
ARM1 |
ARMv2 |
ARM2、ARM3 |
ARMv3 |
ARM6、ARM7 |
ARMv4 |
StrongARM、ARM7TDMI、ARM9TDMI |
ARMv5 |
ARM7EJ、ARM9E、ARM10E、XScale |
ARMv6 |
ARM11、ARM Cortex-M |
ARMv7 |
ARM Cortex-A、ARM Cortex-M、ARM Cortex-R |
ARMv8 |
ARM Cortex-A30、ARM Cortex-A50、ARM Cortex-A70 |
ARM版本信息详细表如下表所示。(参考https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures)
ARM family | ARM architecture | ARM core | Feature | Cache (I / D), MMU | Typical MIPS @ MHz | Reference |
---|---|---|---|---|---|---|
ARM1 | ARMv1 | ARM1 | First implementation | None | ||
ARM2 | ARMv2 | ARM2 | ARMv2 added the MUL (multiply) instruction | None | 4 MIPS @ 8 MHz 0.33 DMIPS/MHz |
|
ARMv2a | ARM250 | Integrated MEMC (MMU), graphics and I/O processor. ARMv2a added the SWP and SWPB (swap) instructions | None, MEMC1a | 7 MIPS @ 12 MHz | ||
ARM3 | ARMv2a | ARM3 | First integrated memory cache | 4 KB unified | 12 MIPS @ 25 MHz 0.50 DMIPS/MHz |
|
ARM6 | ARMv3 | ARM60 | ARMv3 first to support 32-bit memory address space (previously 26-bit). ARMv3M first added long multiply instructions (32x32=64). |
None | 10 MIPS @ 12 MHz | |
ARM600 | As ARM60, cache and coprocessor bus (for FPA10 floating-point unit) | 4 KB unified | 28 MIPS @ 33 MHz | |||
ARM610 | As ARM60, cache, no coprocessor bus | 4 KB unified | 17 MIPS @ 20 MHz 0.65 DMIPS/MHz |
[4] | ||
ARM7 | ARMv3 | ARM700 | 8 KB unified | 40 MHz | ||
ARM710 | As ARM700, no coprocessor bus | 8 KB unified | 40 MHz | [5] | ||
ARM710a | As ARM710 | 8 KB unified | 40 MHz 0.68 DMIPS/MHz |
|||
ARM7T | ARMv4T | ARM7TDMI(-S) | 3-stage pipeline, Thumb, ARMv4 first to drop legacy ARM 26-bit addressing | None | 15 MIPS @ 16.8 MHz 63 DMIPS @ 70 MHz |
|
ARM710T | As ARM7TDMI, cache | 8 KB unified, MMU | 36 MIPS @ 40 MHz | |||
ARM720T | As ARM7TDMI, cache | 8 KB unified, MMU with FCSE (Fast Context Switch Extension) | 60 MIPS @ 59.8 MHz | |||
ARM740T | As ARM7TDMI, cache | MPU | ||||
ARM7EJ | ARMv5TEJ | ARM7EJ-S | 5-stage pipeline, Thumb, Jazelle DBX, enhanced DSP instructions | None | ||
ARM8 | ARMv4 | ARM810 | 5-stage pipeline, static branch prediction, double-bandwidth memory | 8 KB unified, MMU | 84 MIPS @ 72 MHz 1.16 DMIPS/MHz |
[6][7] |
ARM9T | ARMv4T | ARM9TDMI | 5-stage pipeline, Thumb | None | ||
ARM920T | As ARM9TDMI, cache | 16 KB / 16 KB, MMU with FCSE (Fast Context Switch Extension) | 200 MIPS @ 180 MHz | [8] | ||
ARM922T | As ARM9TDMI, caches | 8 KB / 8 KB, MMU | ||||
ARM940T | As ARM9TDMI, caches | 4 KB / 4 KB, MPU | ||||
ARM9E | ARMv5TE | ARM946E-S | Thumb, enhanced DSP instructions, caches | Variable, tightly coupled memories, MPU | ||
ARM966E-S | Thumb, enhanced DSP instructions | No cache, TCMs | ||||
ARM968E-S | As ARM966E-S | No cache, TCMs | ||||
ARMv5TEJ | ARM926EJ-S | Thumb, Jazelle DBX, enhanced DSP instructions | Variable, TCMs, MMU | 220 MIPS @ 200 MHz | ||
ARMv5TE | ARM996HS | Clockless processor, as ARM966E-S | No caches, TCMs, MPU | |||
ARM10E | ARMv5TE | ARM1020E | 6-stage pipeline, Thumb, enhanced DSP instructions, (VFP) | 32 KB / 32 KB, MMU | ||
ARM1022E | As ARM1020E | 16 KB / 16 KB, MMU | ||||
ARMv5TEJ | ARM1026EJ-S | Thumb, Jazelle DBX, enhanced DSP instructions, (VFP) | Variable, MMU or MPU | |||
ARM11 | ARMv6 | ARM1136J(F)-S | 8-stage pipeline, SIMD, Thumb, Jazelle DBX, (VFP), enhanced DSP instructions, unaligned memory access | Variable, MMU | 740 @ 532–665 MHz (i.MX31 SoC), 400–528 MHz | [9] |
ARMv6T2 | ARM1156T2(F)-S | 9-stage pipeline, SIMD, Thumb-2, (VFP), enhanced DSP instructions | Variable, MPU | [10] | ||
ARMv6Z | ARM1176JZ(F)-S | As ARM1136EJ(F)-S | Variable, MMU + TrustZone | 965 DMIPS @ 772 MHz, up to 2,600 DMIPS with four processors | [11] | |
ARMv6K | ARM11MPCore | As ARM1136EJ(F)-S, 1–4 core SMP | Variable, MMU | |||
SecurCore | ARMv6-M | SC000 | 0.9 DMIPS/MHz | |||
ARMv4T | SC100 | |||||
ARMv7-M | SC300 | 1.25 DMIPS/MHz | ||||
Cortex-M | ARMv6-M | Cortex-M0[12] | Microcontroller profile, most Thumb + some Thumb-2,[13] hardware multiply instruction (optional small), optional system timer, optional bit-banding memory | Optional cache, no TCM, no MPU | 0.84 DMIPS/MHz | |
Cortex-M0+[14] | Microcontroller profile, most Thumb + some Thumb-2,[13] hardware multiply instruction (optional small), optional system timer, optional bit-banding memory | Optional cache, no TCM, optional MPU with 8 regions | 0.93 DMIPS/MHz | |||
Cortex-M1[15] | Microcontroller profile, most Thumb + some Thumb-2,[13] hardware multiply instruction (optional small), OS option adds SVC / banked stack pointer, optional system timer, no bit-banding memory | Optional cache, 0–1024 KB I-TCM, 0–1024 KB D-TCM, no MPU | 136 DMIPS @ 170 MHz,[16] (0.8 DMIPS/MHz FPGA-dependent)[17] | |||
ARMv7-M | Cortex-M3[18] | Microcontroller profile, Thumb / Thumb-2, hardware multiply and divide instructions, optional bit-banding memory | Optional cache, no TCM, optional MPU with 8 regions | 1.25 DMIPS/MHz | ||
ARMv7E-M | Cortex-M4[19] | Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv4-SP single-precision FPU, hardware multiply and divide instructions, optional bit-banding memory | Optional cache, no TCM, optional MPU with 8 regions | 1.25 DMIPS/MHz (1.27 w/FPU) | ||
Cortex-M7[20] | Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv5 single and double precision FPU, hardware multiply and divide instructions | 0−64 KB I-cache, 0−64 KB D-cache, 0–16 MB I-TCM, 0–16 MB D-TCM (all these w/optional ECC), optional MPU with 8 or 16 regions | 2.14 DMIPS/MHz | |||
ARMv8-M | Cortex-M23[21] | Microcontroller profile, Thumb-1 (most), Thumb-2 (some), Divide, TrustZone | Optional cache, no TCM, optional MPU with 16 regions | 0.99 DMIPS/MHz | ||
Cortex-M33[22] | Microcontroller profile, Thumb-1, Thumb-2, Saturated, DSP, Divide, FPU (SP), TrustZone, Co-processor | Optional cache, no TCM, optional MPU with 16 regions | 1.50 DMIPS/MHz | |||
Cortex-M35P[23] | Microcontroller profile, Thumb-1, Thumb-2, Saturated, DSP, Divide, FPU (SP), TrustZone, Co-processor | Built-in cache (with option 2–16 KB), I-cache, no TCM, optional MPU with 16 regions | 1.50 DMIPS/MHz | |||
Cortex-R | ARMv7-R | Cortex-R4[24] | Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lockstep with fault logic | 0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt. MPU with 8/12 regions | 1.67 DMIPS/MHz[25] | |
Cortex-R5[26] | Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lock-step with fault logic / optional as 2 independent cores, low-latency peripheral port (LLPP), accelerator coherency port (ACP)[27] | 0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt. MPU with 12/16 regions | 1.67 DMIPS/MHz[25] | |||
Cortex-R7[28] | Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 11-stage pipeline dual-core running lock-step with fault logic / out-of-order execution / dynamic register renaming / optional as 2 independent cores, low-latency peripheral port (LLPP), ACP[27] | 0–64 KB / 0–64 KB, ? of 0–128 KB TCM, opt. MPU with 16 regions | 2.50 DMIPS/MHz[25] | |||
Cortex-R8[29] | TBD | TBD | 2.50 DMIPS/MHz[25] | |||
ARMv8-R | Cortex-R52[30] | TBD | TBD | 2.16 DMIPS/MHz[31] | ||
Cortex-A (32-bit) |
ARMv7-A | Cortex-A5[32] | Application profile, ARM / Thumb / Thumb-2 / DSP / SIMD / Optional VFPv4-D16 FPU / Optional NEON / Jazelle RCT and DBX, 1–4 cores / optional MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP) | 4−64 KB / 4−64 KB L1, MMU + TrustZone | 1.57 DMIPS/MHz per core | |
Cortex-A7[33] | Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / Jazelle RCT and DBX / Hardware virtualization, in-order execution, superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), architecture and feature set are identical to A15, 8–10 stage pipeline, low-power design[34] | 8−64 KB / 8−64 KB L1, 0–1 MB L2, MMU + TrustZone | 1.9 DMIPS/MHz per core | |||
Cortex-A8[35] | Application profile, ARM / Thumb / Thumb-2 / VFPv3 FPU / NEON / Jazelle RCT and DAC, 13-stage superscalar pipeline | 16–32 KB / 16–32 KB L1, 0–1 MB L2 opt. ECC, MMU + TrustZone | Up to 2000 (2.0 DMIPS/MHz in speed from 600 MHz to greater than 1 GHz) | |||
Cortex-A9[36] | Application profile, ARM / Thumb / Thumb-2 / DSP / Optional VFPv3 FPU / Optional NEON / Jazelle RCT and DBX, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP) | 16–64 KB / 16–64 KB L1, 0–8 MB L2 opt. parity, MMU + TrustZone | 2.5 DMIPS/MHz per core, 10,000 DMIPS @ 2 GHz on Performance Optimized TSMC 40G (dual-core) | |||
Cortex-A12[37] | Application profile, ARM / Thumb-2 / DSP / VFPv4 FPU / NEON / Hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP) | 32−64 KB | 3.0 DMIPS/MHz per core | |||
Cortex-A15[38] | Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP, 15-24 stage pipeline[34] | 32 KB w/parity / 32 KB w/ECC L1, 0–4 MB L2, L2 has ECC, MMU + TrustZone | At least 3.5 DMIPS/MHz per core (up to 4.01 DMIPS/MHz depending on implementation)[39] | |||
Cortex-A17[40] | Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP | 32 KB L1, 256 KB–8 MB L2 w/optional ECC | 2.8 DMIPS/MHz | |||
ARMv8-A | Cortex-A32[41] | Application profile, AArch32, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, dual issue, in-order pipeline | 8–64 KB w/optional parity / 8−64 KB w/optional ECC L1 per core, 128 KB–1 MB L2 w/optional ECC shared | |||
Cortex-A (64-bit) |
ARMv8-A | ARM Cortex-A34[42] | Application profile, AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline | 8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–1 MB L2 shared, 40-bit physical addresses | ||
Cortex-A35[43] | Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline | 8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–1 MB L2 shared, 40-bit physical addresses | 1.78 DMIPS/MHz | |||
Cortex-A53[44] | Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline | 8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–2 MB L2 shared, 40-bit physical addresses | 2.3 DMIPS/MHz | |||
Cortex-A57[45] | Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width decode superscalar, deeply out-of-order pipeline | 48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB–2 MB L2 shared w/ECC; 44-bit physical addresses | 4.1–4.5 DMIPS/MHz[46][47] | |||
Cortex-A72[48] | Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width superscalar, deeply out-of-order pipeline | 48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB–2 MB L2 shared w/ECC; 44-bit physical addresses | 4.7 DMIPS/MHz | |||
Cortex-A73[49] | Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width superscalar, deeply out-of-order pipeline | 64 KB / 32−64 KB L1 per core, 256 KB–8 MB L2 shared w/ optional ECC, 44-bit physical addresses | 4.8 DMIPS/MHz[50] | |||
ARMv8.2-A | Cortex-A55[51] | Application profile, AArch32 and AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline[52] | 16−64 KB / 16−64 KB L1, 256 KB L2 per core, 4 MB L3 shared | |||
Arm Cortex-A65AE[53] | Application profile, AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-wide decode superscalar, 3-width issue, out-of-order pipeline, SMT | 64 / 64 KB L1, 256 KB L2 per core, 4 MB L3 shared | ||||
Cortex-A75[54] | Application profile, AArch32 and AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width decode superscalar, deeply out-of-order pipeline[55] | 64 / 64 KB L1, 512 KB L2 per core, 4 MB L3 shared | ||||
Cortex-A76[56] | Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 8-way issue, 13 stage pipeline, deeply out-of-order pipeline[57] | 64 / 64 KB L1, 256−512 KB L2 per core, 512 KB−4 MB L3 shared | ||||
Cortex-A77[58] | Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 6-width instruction fetch, 12-way issue, 13 stage pipeline, deeply out-of-order pipeline[57] | 1.5K L0 MOPs cache, 64 / 64 KB L1, 256−512 KB L2 per core, 512 KB−4 MB L3 shared | ||||
Neoverse | Neoverse N1[59] | Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 8-way dispatch/issue, 13 stage pipeline, deeply out-of-order pipeline[57] | 64 / 64 KB L1, 512−1024 KB L2 per core, 2−128 MB L3 shared, 128 MB system level cache | |||
Neoverse E1 | Application profile, AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-wide decode superscalar, 3-width issue, 10 stage pipeline, out-of-order pipeline, SMT | 32−64 KB / 32−64 KB L1, 256 KB L2 per core, 4 MB L3 shared | ||||
ARM family | ARM architecture | ARM core | Feature | Cache (I / D), MMU | Typical MIPS @ MHz | Reference |
该系列主要针对某些简单的32位设备,作为目前较旧的一个系列,ARM7处理器已经不建议继续在新品中使用。主要包括ARM7TDMI-S(ARMv4T架构)和ARM7EJ-S(ARMv5TEJ架构)。
主要针对嵌入式实时应用,主要包括ARM926EJ-S、ARM946E-S和 ARM968E-S。
主要应用在高可靠性和实时嵌入式应用领域,主要包括ARM11MPCore、ARM1176、ARM1156、ARM1136。
Cortex-R,代表实时的意义(Real-Time),目标是实时任务处理,主要应用领域包括汽车、相机、工业、医学等。
该系列处理器主要包括Cortex-R4、Cortex-R5、Cortex-R7、Cortex-R8、Cortex-R52、Cortex-A17。
Cortex-M,代表微处理器的意义(Microcontrollers),目标是最节能的嵌入式设备,主要应用领域包括汽车、能源网、医学、嵌入式、智能卡、智能设备。传感器融合、穿戴设备等。
该系列处理器主要包括Cortex-M0、Cortex-M0+、Cortex-M3、Cortex-M4、Cortex-M7、Cortex-M23、Cortex-M33、Cortex-M35P。
Cortex-A,代表的是先进意义(Advanced),目标是以最佳功耗实现最高性能,主要应用领域包括汽车、工业、医学、调制解调器、存储等。Cortex-A也是目前应用最广的处理器版本。
该系列处理器主要包括Cortex-A5、Cortex-A7、Cortex-A8、Cortex-A9、Cortex-A15、Cortex-A17、Cortex-A32、Cortex-A35、Cortex-A53、Cortex-A57、Cortex-A72、Cortex-A73。Cortex-A8只支持单核。其中,Cortex-A5、Cortex-A7、Cortex-A8、Cortex-A9、Cortex-A15、Cortex-A17基于ARMv7-A架构;Cortex-A32、Cortex-A35、Cortex-A53、Cortex-A57、Cortex-A72、Cortex-A73基于ARMv8-A架构,除了Cortex-A32为32位结构,其它支持64位结构。
Cortex-A处理器从高到低可排序为:Cortex-A73、Cortex-A72、Cortex-A57、Cortex-A53、Cortex-A35、Cortex-A32、Cortex-A17、Cortex-A15、Cortex-A7、Cortex-A9、Cortex-A8、Cortex-A5。
Company | Core | Rele-ased | Revision | Decode | Pipeline depth |
Out-of-order execution |
Branch prediction |
big.LITTLE role | Exec. ports |
Fab (in nm) |
Simult. MT | L0 cache | L1 cache Instr + Data (in KiB) |
L2 cache | L3 cache | Core configu- rations |
DMIPS/ MHz |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Company | Core | Rele-ased | Revision | Decode | Pipeline depth |
Out-of-order execution |
Branch prediction |
big.LITTLE role | Exec. ports |
Fab (in nm) |
Simult. MT | L0 cache | L1 cache Instr + Data (in KiB) |
L2 cache | L3 cache | Core configu- rations |
DMIPS/ MHz |
ARM Holdings | Cortex-A32 (32-bit) | 2017 | ARMv8.0-A (only 32-bit) |
2-wide | 8 | No | LITTLE | ? | 28 | No | No | 8–64 + 8–64 | 0–1 MiB | No | 1-4+ | ||
Cortex-A34 (64-bit) | 2019 | ARMv8.0-A (only 64-bit) |
2-wide | 8 | No | LITTLE | ? | No | No | 8–64 + 8–64 | 0–1 MiB | No | 1-4+ | ||||
Cortex-A35 | 2017 | ARMv8.0-A | 2-wide | 8 | No | Yes | LITTLE | ? | 28 / 16 / 14 / 10 |
No | No | 8–64 + 8–64 | 0 / 128 KiB–1 MiB | No | 1–4+ | 1.78 | |
Cortex-A53 | 2014 | ARMv8.0-A | 2-wide | 8 | No | Conditional+ Indirect branch prediction |
big/LITTLE | 2 | 28 / 20 / 16 / 14 / 10 |
No | No | 8–64 + 8–64 | 128 KiB–2 MiB | No | 1–4+ | 2.24 | |
Cortex-A55 | 2017 | ARMv8.2-A | 2-wide | 8 | No | big/LITTLE | 2 | 28 / 20 / 16 / 14 / 10 |
No | No | 16–64 + 16–64 | 0–256 KiB/core | 0–4 MiB | 1–8+ | 2.65[8] | ||
Cortex-A57 | 2013 | ARMv8.0-A | 3-wide | 15 | Yes 3-wide dispatch |
Two-level | big | 8 | 28 / 20 / 16[10] / 14 |
No | No | 48 + 32 | 0.5–2 MiB | No | 1–4+ | 4.6 | |
Cortex-A65AE | 2019 | ARMv8.2-A | ? | ? | Yes | Two-level | ? | 2 | ? | SMT2 | No | 16-64 + 16-64 | 64-256 KiB | 0-4 MB | 1–8 | ? | |
Cortex-A72 | 2015 | ARMv8.0-A | 3-wide | 15 | Yes 5-wide dispatch |
Two-level | big | 8 | 28 / 16 | No | No | 48 + 32 | 0.5–4 MiB | No | 1–4+ | 4.72 | |
Cortex-A73 | 2016 | ARMv8.0-A | 2-wide | 11–12 | Yes 4-wide dispatch |
Two-level | big | 7 | 28 / 16 / 10 | No | No | 64 + 32/64 | 1–8 MiB | No | 1–4+ | ~6.35 | |
Cortex-A75 | 2017 | ARMv8.2-A | 3-wide | 11–13 | Yes 6-wide dispatch |
Two-level | big | 8? | 28 / 16 / 10 | No | No | 64 + 64 | 256–512 KiB/core | 0–4 MiB | 1–8+ | ? | |
Cortex-A76 | 2018 | ARMv8.2-A | 4-wide | 11–13 | Yes 8-wide dispatch |
Two-level | big | 8 | 10 / 7 | No | No | 64 + 64 | 256–512 KiB/core | 1–4 MiB | 1–4 | ? | |
Cortex-A77 | 2019 | ARMv8.2-A | 4-wide | 11–13 | Yes 10-wide dispatch |
Two-level | big | 12 | 7 | No | 1.5K entries | 64 + 64 | 256–512 KiB/core | 1–4 MiB | 1-4 | ? | |
Apple Inc. | Cyclone | 2013 | ARMv8.0-A | 6-wide | 16 | Yes | Yes | No | 9 | 28 | No | No | 64 + 64 | 1 MiB | 4 MiB | 2 | ? |
Typhoon | 2014 | ARMv8.0‑A | 6-wide | 16 | Yes | Yes | No | 9 | 20 | No | No | 64 + 64 | 1 MiB | 4 MiB | 2, 3 (A8X) | ? | |
Twister | 2015 | ARMv8.0‑A | 6-wide | 16[20] | Yes | Yes | No | 9 | 16 / 14 | No | No | 64 + 64 | 3 MiB | 4 MiB No (A9X) |
2 | ? | |
Hurricane | 2016 | ARMv8.1‑A | 6-wide | 16 | Yes | Yes | "big" (In A10/A10X paired with "LITTLE" Zephyr cores) |
9 | 16 (A10) 10 (A10X) |
No | No | 64 + 64 | 3 MiB(A10) 8 MiB (A10X) |
4 MiB(A10) No (A10X) |
2x Hurricane + 2x Zephyr (A10) 3x Hurricane + 3x Zephyr (A10X) |
? | |
Zephyr | 2016 | ARMv8.1‑A | 3-wide | 12 | Yes | Yes | LITTLE | 5 | 16 (A10) 10 (A10X) |
No | No | 32 + 32 | 1 MiB | 4 MiB[22] (A10) No (A10X) |
2x Hurricane + 2x Zephyr (A10) 3x Hurricane + 3x Zephyr (A10X) |
? | |
Monsoon | 2017 | ARMv8.2‑A | 7-wide | 16 | Yes | Yes | "big" (In Apple A11 paired with "LITTLE" Mistral cores) |
13 | 10 | No | No | 64 + 64 | 8 MiB | No | 2x Monsoon + 4× Mistral | ? | |
Mistral | 2017 | ARMv8.2‑A | 3-wide | 12 | Yes | Yes | LITTLE | 5 | 10 | No | No | 32 + 32 | 1 MiB | No | 2x Monsoon + 4× Mistral | ? | |
Vortex | 2018 | ARMv8.3‑A | 7-wide | 16 | Yes | Yes | "big" (In Apple A12/Apple A12X/Apple A12Z paired with "LITTLE" Tempest cores) |
13 | 7 | No | No | 128 + 128 | 8 MiB | No | 2x Vortex + 4x Tempest (A12) 4x Vortex + 4x Tempest (A12X/A12Z) |
? | |
Tempest | 2018 | ARMv8.3‑A | 3-wide | 12 | Yes | Yes | LITTLE | 5 | 7 | No | No | 32 + 32 | 2 MiB | No | 2x Vortex + 4x Tempest (A12) 4x Vortex + 4x Tempest (A12X/A12Z) |
? | |
Lightning | 2019 | ARMv8.4‑A | 7-wide | 16 | Yes | Yes | "big" (In Apple A13 paired with "LITTLE" Thunder cores) |
13 | 7 | No | No | 128 + 128 | 8 MiB | No | 2x Lightning + 4x Thunder | ? | |
Thunder | 2019 | ARMv8.4‑A | 3-wide | 12 | Yes | Yes | LITTLE | 5 | 7 | No | No | 32 + 48 | 4 MiB | No | 2x Lightning + 4x Thunder | ? | |
Nvidia | Denver | 2014 | ARMv8‑A | 2-wide hardware decoder, up to 7-wide variable- length VLIW micro-ops |
13 | Not if the hardware decoder is in use. Can be provided by dynamic software translation into VLIW. |
Direct+ Indirect branch prediction |
No | 7 | 28 | No | No | 128 + 64 | 2 MiB | No | 2 | ? |
Denver 2 | 2016 | ARMv8‑A | ? | 13 | Not if the hardware decoder is in use. Can be provided by dynamic software translation into VLIW. |
Direct+ Indirect branch prediction |
"Super" Nvidia's own implementation | ? | 16 | No | No | 128 + 64 | 2 MiB | No | 2 | ? | |
Carmel | 2018 | ARMv8.2‑A | ? | Direct+ Indirect branch prediction |
? | 12 | No | No | 128 + 64 | 2 MiB | (4 MiB @ 8 cores) | 2 (+ 8) | ? | ||||
Cavium | ThunderX | ARMv8-A | 2-wide | ? | No | Two-level | ? | 28 | No | No | 78 + 32 | 16 MiB | No | 8–16, 24–48 | ? | ||
ThunderX2 (ex. Broadcom Vulcan) |
May 2018 | ARMv8.1-A |
4-wide "4 μops" |
? | Yes | Multi-level | ? | ? | 16 | SMT4 | No | 32 + 32 (data 8-way) |
256KB per core |
1MB per core |
16-32 | ? | |
Applied Micro |
Helix | ? | ? | ? | ? | ? | ? | ? | ? | 40 / 28 | No | No | 32 + 32 (per core; write-through w/parity) |
256 KiB shared per core pair (with ECC) |
1 MiB/core | 2, 4, 8 | ? |
X-Gene | ? | 4-wide | 15 | Yes | ? | ? | ? | 40 | No | No | 8 MiB | 8 | 4.2 | ||||
X-Gene 2 | ? | 4-wide | 15 | Yes | ? | ? | ? | 28 | No | No | 8 MiB | 8 | 4.2 | ||||
X-Gene 3 | ? | ? | ? | ? | ? | ? | ? | 16 | No | No | ? | ? | 32 MiB | 32 | ? | ||
Qualcomm | Kryo | 2016 | ARMv8-A | ? | ? | Yes | Two-level? | "big" or "LITTLE" Qualcomm's own similar implementation |
? | 14 | No | No | 32+24 | 0.5–1 MiB | 2, 4 | 6.3 | |
Kryo 2XX | 2017 | ARMv8-A | 2-wide | 11–12 | Yes 7-wide dispatch |
Two-level | big | 7 | 14 / 11 / 10 [51] | No | No | 64 + 32/64? | 512 KiB/Gold Core | No | 4 | ? | |
2-wide | 8 | No | Conditional+ Indirect branch prediction |
? | 2 | No | No | 8–64? + 8–64? | 256 KiB/Silver Core | 4 | ? | ||||||
Kryo 3XX | 2018 | ARMv8.2-A | 3-wide | 11–13 | Yes 8-wide dispatch |
Two-level | big | 8 | 10[51] | No | No | 64+64[51] | 256 KiB/Gold Core | 2 MiB | 4 | ? | |
2-wide | 8 | No | Conditional+ Indirect branch prediction |
? | 28 | No | No | 16–64? + 16–64? | 128 KiB/Silver | 4 | ? | ||||||
Kryo 4XX | 2019 | ARMv8.2-A | 4-wide | 11–13 | Yes 8-wide dispatch |
Yes | big | 8 | 11 / 8 / 7 | No | No | 64 + 64 | 512 KiB/Gold Prime 256 KiB/Gold |
2 MiB | 1+3 | ? | |
2-wide | 8 | No | Conditional+ Indirect branch prediction |
? | 2 | No | No | 16–64? + 16–64? | 128 KiB/Silver | 4 | ? | ||||||
Falkor | 11-8-2017 | "ARMv8.1-A features"; AArch64 only (not 32-bit) | 4-wide | 10–15 | Yes 8-wide dispatch |
Yes | ? | 8 | 10 | No | 24 KiB | 88[53] + 32 | 500KiB | 1.25MiB | 40-48 | ? | |
Samsung | M1/M2 | 2015 | ARMv8-A | 4-wide | 13 | Yes 9-wide dispatch |
Two-level | big | 8 | 14 / 10 | No | No | 64 + 32 | 2 MiB[59] | no | 4 | ? |
M3 | 2018 | ARMv8.2-A | 6-wide | 15 | Yes 12-wide dispatch |
Two-level | big | 12 | 10 | No | No | 64 + 64 | 512 KiB per core | 4096KB | 4 | ? | |
M4 | 2019 | ARMv8.2-A | 6-wide | 15 | Yes 12-wide dispatch |
Two-level | big | 12 | 8 / 7 | No | No | 64 + 64 | 512 KiB per core | 4096KB | 2 | ? | |
Fujitsu | A64fx | 2019 | ARMv8.2-A | 4/2-wide | 7+ | Yes 5-way? |
Yes | n/a | 8+ | 7 | No | No | 64 + 64 | 8MiB per 12+1 cores | No | 48+4 | 1.9GHz+; 15GF/W+. |
HiSilicon | TaiShan V110 | 2019 | ARMv8.2-A | 4-wide | ? | Yes | Yes | n/a | 8 | 7 | No | No | 64 + 64 | 512 KiB per core | 1 MiB per core | ? | ? |
目前国产的CPU以及华为的手机麒麟手机芯片和海思芯片等都是基于ARM V8架构的,也是cortex-A系列。可以说在移动便携式领域设备,ARM几乎全部覆盖。
Year | Classic cores | Cortex cores | Neoverse cores | |||||||
---|---|---|---|---|---|---|---|---|---|---|
ARM7 | ARM8 | ARM9 | ARM10 | ARM11 | Microcontroller | Real-time | Application (32-bit) |
Application (64-bit) |
Application (64-bit) |
|
1993 | ARM700 | |||||||||
1994 | ARM710 ARM7DI ARM7TDMI |
|||||||||
1995 | ARM710a | |||||||||
1996 | ARM810 | |||||||||
1997 | ARM710T ARM720T ARM740T |
|||||||||
1998 | ARM9TDMI ARM940T |
|||||||||
1999 | ARM9E-S ARM966E-S |
|||||||||
2000 | ARM920T ARM922T ARM946E-S |
ARM1020T | ||||||||
2001 | ARM7TDMI-S ARM7EJ-S |
ARM9EJ-S ARM926EJ-S |
ARM1020E ARM1022E |
|||||||
2002 | ARM1026EJ-S | ARM1136J(F)-S | ||||||||
2003 | ARM968E-S | ARM1156T2(F)-S ARM1176JZ(F)-S |
||||||||
2004 | Cortex-M3 | |||||||||
2005 | ARM11MPCore | Cortex-A8 | ||||||||
2006 | ARM996HS | |||||||||
2007 | Cortex-M1 | Cortex-A9 | ||||||||
2008 | ||||||||||
2009 | Cortex-M0 | Cortex-A5 | ||||||||
2010 | Cortex-M4(F) | Cortex-A15 | ||||||||
2011 | Cortex-R4 Cortex-R5 Cortex-R7 |
Cortex-A7 | ||||||||
2012 | Cortex-M0+ | Cortex-A53 Cortex-A57 |
||||||||
2013 | Cortex-A12 | |||||||||
2014 | Cortex-M7(F) | Cortex-A17 | ||||||||
2015 | Cortex-A35 Cortex-A72 |
|||||||||
2016 | Cortex-M23 Cortex-M33(F) |
Cortex-R8 Cortex-R52 |
Cortex-A32 | Cortex-A73 | ||||||
2017 | Cortex-A55 Cortex-A75 |
|||||||||
2018 | Cortex-M35P(F) | Cortex-A65AE Cortex-A76 Cortex-A76AE |
||||||||
2019 | Cortex-A77 | Neoverse E1 Neoverse N1 |
Core Family | Instruction set | Microarchitecture | Feature | Cache (I / D), MMU | Typical MIPS @ MHz |
---|---|---|---|---|---|
StrongARM (Digital) |
ARMv4 | SA-110 | 5-stage pipeline | 16 KB / 16 KB, MMU | 100–233 MHz 1.0 DMIPS/MHz |
SA-1100 | derivative of the SA-110 | 16 KB / 8 KB, MMU | |||
Faraday[60] (Faraday Technology) |
ARMv4 | FA510 | 6-stage pipeline | Up to 32 KB / 32 KB cache, MPU | 1.26 DMIPS/MHz 100–200 MHz |
FA526 | Up to 32 KB / 32 KB cache, MMU | 1.26 MIPS/MHz 166–300 MHz |
|||
FA626 | 8-stage pipeline | 32 KB / 32 KB cache, MMU | 1.35 DMIPS/MHz 500 MHz |
||
ARMv5TE | FA606TE | 5-stage pipeline | No cache, no MMU | 1.22 DMIPS/MHz 200 MHz |
|
FA626TE | 8-stage pipeline | 32 KB / 32 KB cache, MMU | 1.43 MIPS/MHz 800 MHz |
||
FMP626TE | 8-stage pipeline, SMP | 1.43 MIPS/MHz 500 MHz |
|||
FA726TE | 13 stage pipeline, dual issue | 2.4 DMIPS/MHz 1000 MHz |
|||
XScale (Intel / Marvell) |
ARMv5TE | XScale | 7-stage pipeline, Thumb, enhanced DSP instructions | 32 KB / 32 KB, MMU | 133–400 MHz |
Bulverde | Wireless MMX, wireless SpeedStep added | 32 KB / 32 KB, MMU | 312–624 MHz | ||
Monahans[61] | Wireless MMX2 added | 32 KB / 32 KB L1, optional L2 cache up to 512 KB, MMU | Up to 1.25 GHz | ||
Sheeva (Marvell) |
ARMv5 | Feroceon | 5–8 stage pipeline, single-issue | 16 KB / 16 KB, MMU | 600–2000 MHz |
Jolteon | 5–8 stage pipeline, dual-issue | 32 KB / 32 KB, MMU | |||
PJ1 (Mohawk) | 5–8 stage pipeline, single-issue, Wireless MMX2 | 32 KB / 32 KB, MMU | 1.46 DMIPS/MHz 1.06 GHz |
||
ARMv6 / ARMv7-A | PJ4 | 6–9 stage pipeline, dual-issue, Wireless MMX2, SMP | 32 KB / 32 KB, MMU | 2.41 DMIPS/MHz 1.6 GHz |
|
Snapdragon (Qualcomm) |
ARMv7-A | Scorpion[62] | 1 or 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv3 FPU / NEON (128-bit wide) | 256 KB L2 per core | 2.1 DMIPS/MHz per core |
Krait[62] | 1, 2, or 4 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON (128-bit wide) | 4 KB / 4 KB L0, 16 KB / 16 KB L1, 512 KB L2 per core | 3.3 DMIPS/MHz per core | ||
ARMv8-A | Kryo[63] | 4 cores. | ? | Up to 2.2 GHz (6.3 DMIPS/MHz) |
|
Ax (Apple) |
ARMv7-A | Swift[64] | 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON | L1: 32 KB / 32 KB, L2: 1 MB | 3.5 DMIPS/MHz per core |
ARMv8-A | Cyclone[65] | 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64 | L1: 64 KB / 64 KB, L2: 1 MB, L3: 4 MB | 1.3 or 1.4 GHz | |
ARMv8-A | Typhoon[65][66] | 2 or 3 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64 | L1: 64 KB / 64 KB, L2: 1 MB or 2 MB, L3: 4 MB | 1.4 or 1.5 GHz | |
ARMv8-A | Twister[67] | 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64 | L1: 64 KB / 64 KB, L2: 2 MB, L3: 4 MB or 0 MB | 1.85 or 2.26 GHz | |
ARMv8.1-A | Hurricane[68] | 2 or 3 cores. AArch64, 6-decode, 6-issue, 9-wide, superscalar, out-of-order | L1: 64 KB / 64 KB, L2: 3 MB or 8 MB, L3: 4 MB or 0 MB | 2.34 or 2.38 GHz | |
ARMv8.2-A | Monsoon[69] | 2 cores. AArch64, 7-decode, ?-issue, 11-wide, superscalar, out-of-order | L1I: 128 KB, L1D: 64 KB, L2: 8 MB, L3: 4 MB | 2.39 GHz | |
ARMv8.3-A | Vortex[70] | 2 or 4 cores. AArch64, 7-decode, ?-issue, 11-wide, superscalar, out-of-order | L1: 128 KB / 128 KB, L2: 8 MB, L3: 8 MB | 2.5 GHz | |
ARMv8.4-A | Lightning[71] | 2 cores. AArch64, 7-decode, ?-issue, 11-wide, superscalar, out-of-order | L1: 128 KB / 128 KB, L2: 8 MB, L3: 16 MB | 2.66 GHz | |
X-Gene (Applied Micro) |
ARMv8-A | X-Gene | 64-bit, quad issue, SMP, 64 cores[72] | Cache, MMU, virtualization | 3 GHz (4.2 DMIPS/MHz per core) |
Denver (Nvidia) |
ARMv8-A | Denver[73][74] | 2 cores. AArch64, 7-wide superscalar, in-order, dynamic code optimization, 128 MB optimization cache, Denver1: 28nm, Denver2:16nm |
128 KB I-cache / 64 KB D-cache | Up to 2.5 GHz |
Carmel (Nvidia) |
ARMv8(t.b.d.) | Carmel[75][76] | 2 cores. AArch64, 10-wide superscalar, in-order, dynamic code optimization, ? MB optimization cache, functional safety, dual execution, parity & ECC |
? KB I-cache / ? KB D-cache | Up to ? GHz |
ThunderX (Cavium) |
ARMv8-A | ThunderX | 64-bit, with two models with 8–16 or 24–48 cores (×2 w/two chips) | ? | Up to 2.2 GHz |
K12 (AMD) |
ARMv8-A | K12[77] | ? | ? | ? |
Exynos (Samsung) |
ARMv8-A | M1/M2 ("Mongoose")[78] | 4 cores. AArch64, 4-wide, quad-issue, superscalar, out-of-order | 64 KB I-cache / 32 KB D-cache, L2: 16-way shared 2 MB | 5.1 DMIPS/MHz (2.6 GHz) |
ARMv8-A | M3 ("Meerkat")[79] | 4 cores, AArch64, 6-decode, 6-issue, 6-wide. superscalar, out-of-order | 64 KB I-cache / 32 KB D-cache, L2: 8-way private 512 KB, L3: 16-way shared 4 MB | ? | |
ARMv8.2-A | M4 ("Cheetah") | 2 cores, AArch64, 6-decode, 6-issue, 6-wide. superscalar, out-of-order | 64 KB I-cache / 32 KB D-cache, L2: 8-way private 512 KB, L3: 16-way shared 4 MB | ? |
本文参考:https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures