security - Disable Linux vsyscall vdso vvar -
i implementing linux security sandbox custom bytecode interpreter through seccomp mode. minimize as possible attack surface, want run in clean virtual address space. need code , data segments plus stack available, not need vsyscall, vdso nor vvar.
is there way disable allocation of pages given process?
basically, no, have disable vsyscall/vdso globally if want mapping unavailable. if want program unable call vsyscall/vdso syscalls, seccomp able it. caveats though:
see https://www.kernel.org/doc/documentation/prctl/seccomp_filter.txt
on x86-64, vsyscall emulation enabled default. (vsyscalls legacy variants on vdso calls.) currently, emulated vsyscalls honor seccomp, few oddities:
a return value of seccomp_ret_trap set si_call_addr pointing vsyscall entry given call , not address after 'syscall' instruction. code wants restart call should aware (a) ret instruction has been emulated , (b) trying resume syscall again trigger standard vsyscall emulation security checks, making resuming syscall pointless.
a return value of seccomp_ret_trace signal tracer usual, syscall may not changed system call using orig_rax register. may changed -1 order skip emulated call. other change may terminate process. rip value seen tracer syscall entry address; different normal behavior. tracer must not modify rip or rsp. (do not rely on other changes terminating process. might work. example, on kernels, choosing syscall exists in future kernels correctly emulated (by returning -enosys).
to detect quirky behavior, check addr & ~0x0c00 == 0xffffffffff600000. (for seccomp_ret_trace, use rip. seccomp_ret_trap, use siginfo->si_call_addr.) not check other condition: future kernels may improve vsyscall emulation , current kernels in vsyscall=native mode behave differently, instructions @ 0xf...f600{0,4,8,c}00 not system calls in these cases.
note modern systems unlikely use vsyscalls @ -- legacy feature , considerably slower standard syscalls. new code use vdso, , vdso-issued system calls indistinguishable normal system calls.
so emulated vsyscalls can confined seccomp, , vdsos likewise confined seccomp. if disable gettimeofday()
, confined program not able call syscall through emulated vsyscall, vdso, or regular syscall. if confine them way seccomp, shouldn't have worry attack surface create.
if worried attacker exploiting vdso mapping (which doesn't require calling syscall), don't believe there's way disable on per-process basis reliably. can prevent being linked in, hard prevent compromised bytecode interpreter allocating memory , putting back. can boot vdso=0
kernel parameter disable globally, though, linking in nothing.
Comments
Post a Comment