On Sat, Apr 29, 2017 at 2:57 PM, Hendrik Boom hendrik@topoi.pooq.com wrote:
Let me remind you of a real-world situation. The hardware designers were woring on the second version of their successful CPU. They attached some counters to masure hos many times the varioun instructons were being executed. THey discovered that the most common instructons were certain test and brnch instructions. So they worked hard on making sure the next model had the most efficient implementation of those test and branch instructions they could achieve.
But when they finally put the new machine together and tried it out, they foud no improvement at all.
Investigating, they discovered they had optimized the wait loop.
that is ffrickin funny. but also relevant, as i am aware of for example the ICT's efforts to add x86-accelerating instructions to the Loongson 2G architecture. although a MIPS64 they added hardware-emulation of the "top" 200 x86 instructions to achieve a qemu emulation that was 70% of actual x86 clock-rates.
which got me thinking: how the heck would you guage which actual instructions were "top"? would it be better instead to measure _power_ consumption per instruction, aiming for better performance/watt?
l.