01-18-2018, 02:07 PM
So, I want to know what your thoughts are on architectures. Not just limited to programming.
Reasons why I like ARM:
1. Fixed length instruction set
In ARM, every instruction is 32 bits long, this both speeds up instruction decoding but also allows you (the programmer) greater control over your code. You can easily modify your own code at runtime without having to worry about instruction fetches and decoding to determine length
2. Streamlined pipeline
Again a feature restricted to the RISC world is a multistage offset pipeline. This means that an ARM core can control completely what gets executed in what timeline. The way an x86 pipeline works is by using simply edge triggers or delays and locks to determine when to execute things. ARM does it different, the clock rate is fixed and thus offsets can be made so that multiple clock signals can be derrived from a single, ensuring that all instructions get executed in a single cycle (rather than as many cycles as 28 with x86)
3. conditional execution
Here is a pretty simple C program I want to show you:
Ok, now let me translate that to the absolute most basic x86 program I can create. It's not going to have frame pointers, all the data will be held in registers (the fastest way), and no stack operations.
I've gone to the trouble of looking up the instruction cycle times ( ). I used everything as if it were PROC NEAR (which it likely wouldn't be for all code paths) just to keep the numbers from being massive (a CALL is 2 cycles if near, but 26 if far).
The total cycle count for this program is 7,012 to run to completion, which on a 250mhz core would take 0.028ms.
Now, here's the same program for ARM:
the total cycle count is 3,007, and it will take 0.012ms to complete this program on a 250mhz ARM core. This is 2.3x faster for the exact same program, the reason being, in ARM all instructions take up exactly 1 cycle and every instruction is conditional. I actually used the same instructions as I did in the x86 one, you'll see I added the "S", "Z", and "NE" postfixes onto some instructions. These aren't actually different instructions, they just set bits in the condition field (that all instructions have) that tell it how to operate, with the exception of the S fix, which sets a bit elsewhere in the instruction and is only present on data processing instructions.
The reason why I chose 250mhz? well, that's 1/3 the speed of the original iPhone, so it made a good comparison, and 1/3 value means the numbers aren't impossibly tiny. On the original iphone, these programs would have executed in 0.0093ms and 0.004ms respectively, which isn't even noticeable to the user, but this is a particularly small and useless program (its end result is identical every time unless the status register is being interfered with or has failed). Imagine though that there was a hardware flaw that allowed certain CPUs to speculatively predict and execute code based on a branch prediction result. Assume now that both processors were equally vulnerable to such a bug. You could use this program to determine if the kernel had been compromised, on an intel machine, it would preemptively jump out of the loop since I don't have any sort of frame around it. On an ARM core, however the program would execute normally. This is because of the frequency that the CPSR bits are set. 30% of the instructions executed are supposed to set the status register, meaning that they can't be executed prior to their branches coming up as they would interfere with the logic of the program, whereas on the intel program, only 13% of instructions set those bits.
Let me know what you think, and vote in the poll
Reasons why I like ARM:
1. Fixed length instruction set
In ARM, every instruction is 32 bits long, this both speeds up instruction decoding but also allows you (the programmer) greater control over your code. You can easily modify your own code at runtime without having to worry about instruction fetches and decoding to determine length
2. Streamlined pipeline
Again a feature restricted to the RISC world is a multistage offset pipeline. This means that an ARM core can control completely what gets executed in what timeline. The way an x86 pipeline works is by using simply edge triggers or delays and locks to determine when to execute things. ARM does it different, the clock rate is fixed and thus offsets can be made so that multiple clock signals can be derrived from a single, ensuring that all instructions get executed in a single cycle (rather than as many cycles as 28 with x86)
3. conditional execution
Here is a pretty simple C program I want to show you:
Hidden Content
Ok, now let me translate that to the absolute most basic x86 program I can create. It's not going to have frame pointers, all the data will be held in registers (the fastest way), and no stack operations.
Hidden Content
I've gone to the trouble of looking up the instruction cycle times ( ). I used everything as if it were PROC NEAR (which it likely wouldn't be for all code paths) just to keep the numbers from being massive (a CALL is 2 cycles if near, but 26 if far).
The total cycle count for this program is 7,012 to run to completion, which on a 250mhz core would take 0.028ms.
Now, here's the same program for ARM:
Hidden Content
the total cycle count is 3,007, and it will take 0.012ms to complete this program on a 250mhz ARM core. This is 2.3x faster for the exact same program, the reason being, in ARM all instructions take up exactly 1 cycle and every instruction is conditional. I actually used the same instructions as I did in the x86 one, you'll see I added the "S", "Z", and "NE" postfixes onto some instructions. These aren't actually different instructions, they just set bits in the condition field (that all instructions have) that tell it how to operate, with the exception of the S fix, which sets a bit elsewhere in the instruction and is only present on data processing instructions.
The reason why I chose 250mhz? well, that's 1/3 the speed of the original iPhone, so it made a good comparison, and 1/3 value means the numbers aren't impossibly tiny. On the original iphone, these programs would have executed in 0.0093ms and 0.004ms respectively, which isn't even noticeable to the user, but this is a particularly small and useless program (its end result is identical every time unless the status register is being interfered with or has failed). Imagine though that there was a hardware flaw that allowed certain CPUs to speculatively predict and execute code based on a branch prediction result. Assume now that both processors were equally vulnerable to such a bug. You could use this program to determine if the kernel had been compromised, on an intel machine, it would preemptively jump out of the loop since I don't have any sort of frame around it. On an ARM core, however the program would execute normally. This is because of the frequency that the CPSR bits are set. 30% of the instructions executed are supposed to set the status register, meaning that they can't be executed prior to their branches coming up as they would interfere with the logic of the program, whereas on the intel program, only 13% of instructions set those bits.
Let me know what you think, and vote in the poll