One more idea that just popped up (I promise I won't spam the list too much with this kind of junk).
How much of a performance penalty/loss would be incurred by "pulling a Transmeta" -- by which I mean emulating x86 hardware with, in this case, ARM...? Say, convincing an A20 to emulate something like a P3 CPU?
The idea comes from the Transmeta Crusoe CPU -- which is NOT x86 at all! It's something I'm really unfamiliar with called VLIW, and it has what amounts to a software emulation layer between it and the rest of the system, so that it can run x86 code. Of course that incurs an INSANE amount of overhead, because the CPU is basically doing its job twice -- translating the instructions and then executing them. It's a remarkable kludge if you ask me, but it works.