2- "skip syscall" management.

Renzo Davoli renzo at cs.unibo.it
Tue Jun 3 14:56:11 UTC 2008


arch/x86/kernel/entry_32.S provides two ways to skip the call:
> syscall_trace_entry:
>   movl $-ENOSYS,PT_EAX(%esp)
>   movl %esp, %eax
>   xorl %edx,%edx
>   call do_syscall_trace
>   cmpl $0, %eax
*** this:
>   jne resume_userspace    # ret != 0 -> running under PTRACE_SYSEMU,
>    # so must skip actual syscall
>   movl PT_ORIG_EAX(%esp), %eax
>   cmpl $(nr_syscalls), %eax
*** or this:
>   jnae syscall_call
>   jmp syscall_exit

Old ptrace used a non-zero return value by do_syscall_trace to skip the
call (skipping also the second do_syscall_trace on exit). If  orig_eax
(syscall no) is -1 the jnae fails as it is seen as the largest unsigned number.

Now PTRACE_SYSEMU is implemented using this latter method in kernel/ptrace.c.

IMHO the former is better.

In all architectures the code uses the following layers:
1-assembly code layer (entry_*.S for x86)
2-arch/*/kernel/ptrace.c
3-kernel/utrace.c
4-utrace module
or
4-kernel/ptrace.c when backward ptrace compatibility is required

Syscall skipping is a useful feature that many utrace modules may require.
Thus my proposal is to use a return value through all the interfaces
to skip the call.
More precisely:
- interface 1-2, is already in place for x86_32. when do_syscall_trace
returns nonzero the syscall get skipped. A similar management should be
coded for the other architectures. I have already written the
fix for ppc, ppc64 and (untested) x86_64 (I needed this for my
PTRACE_SYSVM patch).
- interface 2-3, the tracehook_report_syscall_entry should return an integer,
the call get skipped when non-zero.
- interface 3-4, i propose to add an action flag to skip the call.
report_syscall_entry can have one extra ACTION_FLAG say:
#define UTRACE_SYSCALL_SKIP 0x0100
It is possible to ask the lower level to abort the syscall, the
arch-dependent part of the kernel decides how to implement it
#define UTRACE_SYSCALL_ENOSYS 0x0200

My proposal has some pros:
- SYSEMU management becomes architecture-independent
Statements like these can be eliminated.
    unsigned long *scno = &regs->orig_ax; /* XXX */
    unsigned long *retval = &regs->ax;    /* XXX */
- The boundary between arch-independent and arch-dependent sections of the
kernel is more consistent.
- It can be ported to different architrectures. kernel/ptrace.c is
independent from strange syscall and return value encodigs.

(BTW: I continue to say that my PTRACE_SYSVM is more flexible than PTRACE_SYSEMU
and at least as performant.
In with PTRACE_SYSEMU the next System Call is always virtualized (skipped),
with PTRACE_SYSVM it is possible to process the system call parameters
and decide on the fly if the call has to be virtualized or not.
PTRACE_SYSEMU supports only global virtualization (like User-Mode Linux),
while PTRACE_SYSVM supports *also* partial virtualization (like my
umview/kmview).)

renzo




More information about the utrace-devel mailing list