instruction-analysis API(s)

Tue Mar 10 19:57:11 UTC 2009

Hi Jim,

Jim Keniston wrote:
> Quoting Masami Hiramatsu <mhiramat at redhat.com>:
> 
>> Hi Jim and Sriker,
>>
>> Here, I almost rewrote my patch.
>>
>> Changelog:
>> - rewrite decoding logic based on Intel' manual.
>> - supoort insn_get_sib(),insn_get_displacement()
>>   and insn_get_immediate() too.
>> - support 3 bytes opcode and 64bit immediate.
>> - introduce some bitmaps.
>>
>> Thank you,
> 
> Well, I didn't do much of a code review -- it looks like you addressed
> all my concerns -- but as I mentioned on IRC, I hacked together a test
> rig whereby you can disassemble a designated elf file (e.g., vmlinux,
> libc, libm) and then compare insn_get_length()'s results with objdump's
> results.  The comment in distill.awk shows how to use objdump, awk, and
> test_get_len together.

Thank you for review and test!

> I also hacked up insn_x86.h and insn_x86.c to work in user space.  Most
> of that is accomplished via insn_x86_user.h, but it certainly isn't
> necessary to do it that way.  In particular, __u8, __s8, __u16, etc. are
> versions of u8, s8, u16, etc. that can be used in both kernel and user
> code, so maybe we should switch to those.
> 
> I tested with vmlinux, libc, and libm on both an i686 system and an
> x86_64 system.  I found and fixed a few bugs.  Here are the ones that
> come to mind (all fixed):
> - shrd/shld, which we discussed
> - missing support for weird nops with modrm bytes (0f 1f ...).
> - neglected to include the REX prefix in prefixes.nbytes
> - missing static decl in an inline function in insn_x86.h

Thank you for fixing  it.
BTW, it might have to support vm86 mode(especially, for user code).

> There are some other cases where insn_get_length() doesn't match up with
> the disassembly, but I don't consider them bugs:
> - 0x9b is an instruction (fwait), but the disassembler treats it as a
> prefix.  For example 9b df ... can be disassembled as
>     fstsw ...    // wait, then store status word
> or
>     fwait        // wait
>     fnstsw ...    // store status word without waiting
> Perhaps it's relevant to investigate whether a single-step of 9b df ...
> would execute just the fwait or the whole fstsw.  Anyway, this explains
> the "failures" of finit and fstsw that I mentioned to you.  I also saw
> this with fstcw and fclex.

FYI, there is a single wait/fwait instruction described at Intel software
developers manual vol.2B p.399.

> - Illegal instruction sequences, such as an x86_64 instruction that
> starts with 0x40, or a misplaced 0x65 prefix.  Typically, we see these
> when disassembling data.  I just filtered out (via egrep) instructions
> whose disassembly starts with "rex" or includes "(bad)".

Sure, I think insn_* should return -EINVAL or set insn.invalid = 1
if we found those invalid ops. E.g. kernel use BUG() macro, it adds
some raw numbers after ud2, in that case, those raw numbers might
be decoded as an illegal instruction.

> We could address the above by filtering them out in distill.awk or
> test_get_len.c.  I think we're clean otherwise.
> 
> There's a little more housecleaning to do -- e.g., adding Hitachi (?)
> copyright to IBM copyright, discarding insn_field_exists() and
> insn_extract_reg(), putting this all in git somewhere.  But not tonight.
> 
> Pull all the attached files into a directory and have a go -- e.g.,
> $ make
> $ objdump -d vmlinux | awk -f distill.awk | ./test_get_len [x86_64]
> 
> Jim
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat at redhat.com