What are some
hidden-features
[1] of x86
Assembly Language? What Tips and Tricks do you have for working with x86
Assembly language?
Assembly rocked the world of its era because it freed programmers from manually writing machine code. We got lots of complex instruction sets to help programmers do multiple things in one instruction. There isn't anything hidden or powerful that cannot be accomplished with a compiler. We are at billions of cycles per second, so a instruction that does something in 1 cycle instead of 2 or 3 is not very exciting anymore.
Almost each processor has undocumented instructions and or registers. But they are often undocumented for a reason so its often not wise to use them.
One of the interesting things about assembly language is that the smallest and/or fastest instructions are not necessarily intuitive. For example, to set the EAX register to zero, instead of mov eax,0
, you use xor eax,eax
which is fewer bytes but does the same thing at the same speed.
See Any reason to do a “xor eax, eax”? [1] for more details on this one.
[1] http://stackoverflow.com/questions/1396527/any-reason-to-do-a-xor-eax-eaxXOR %EAX, %EAX
instruction was the fastest way to set a register to zero in the early generations of the x86, but most code is generated by compilers and compilers rarely generated XOR instruction. So the IA designers, decided to move the frequently occurring compiler generated instructions up to the front of the combinational decode logic making the literal MOVL $0, %EAX
instruction execute faster than the XOR instruction. - Nick Dandoulakis
Formerly secret hidden-feature revealed...
Now that computers are so fast, they are hard to actually stop. A single halt instruction is unreliable, and so just calling halt()
in a high level language isn't necessarily going to work if it's an old library routine.
Therefore, the following only-in-assembler design pattern is suggested:
_halt::
halt
halt
halt
halt
jmp _halt
halt ; fill branch delay slot
- Set mm7 to 0x FF00FF00FF00FF00:
pcmpeqd mm7, mm7 // 0xFF FF FF FF FF FF FF FF
psllq mm7, 8 // 0xFF FF FF FF FF FF FF 00
pshufw mm7, mm7, 0x0 // 0xFF 00 FF 00 FF 00 FF 00
Each instruction takes two clock cycles to complete. The whole operation will finish in six clock cycles.
Faster:
pxor mm7, mm7 // 0x 0
pcmpeqd mm0, mm0 // 0x FFFFFFFFFFFFFFFF
punpcklbw mm7, mm0 // 0x FF00FF00FF00FF00