[Libguestfs] Issue with downloading files whose path contains multi-byte utf-8 characters

Yonatan Shtarkman shtarkman at google.com
Sun Feb 12 13:31:08 UTC 2023


Hey,
When downloading a file whose path contains multi-byte utf-8, libguestfs
sometimes crashes.
This reproduces when using python, and not when using guestfish.

Code to reproduce:
for i in range(2000):
    g.download ('/xxxó', '/tmp/1')

#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7fac140 in <signal handler called> () at
/lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007ffff6f77701 in _Py_INCREF (op=<optimized out>) at
/usr/include/python3.9/object.h:408
#3  guestfs_int_py_event_callback_wrapper
    (g=<optimized out>, flags=<optimized out>, array_len=0, array=0x0,
buf_len=47, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E
\303by", event_handle=0, event=16, callback=0x7ffff2516600) at handle.c:137
#4  guestfs_int_py_event_callback_wrapper
    (g=<optimized out>, callback=0x7ffff2516600, event=16, event_handle=0,
flags=<optimized out>, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug
settle -E \303by", buf_len=47, array=0x0, array_len=0) at handle.c:104
#5  0x00007ffff6e0076a in guestfs_int_call_callbacks_message (g=0xf31290,
event=16, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E
\303by", buf_len=47)
    at events.c:117
#6  0x00007ffff6e1702e in guestfs_int_log_message_callback
    (g=g at entry=0xf31290, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm
--debug settle -E \303by", len=len at entry=47) at proto.c:145
#7  0x00007ffff6dfb759 in handle_log_message (g=g at entry=0xf31290,
conn=conn at entry=0x110e280) at conn-socket.c:395
#8  0x00007ffff6dfbd63 in read_data (len=4, bufv=<optimized out>,
connv=<optimized out>, g=<optimized out>) at conn-socket.c:179
#9  read_data (g=0xf31290, connv=0x110e280, bufv=<optimized out>, len=4) at
conn-socket.c:142
#10 0x00007ffff6e1764a in recv_from_daemon (buf_rtn=0x7fffffffd858,
size_rtn=0x7fffffffd854, g=0xf31290) at proto.c:545
#11 guestfs_int_recv_from_daemon (g=g at entry=0xf31290,
size_rtn=size_rtn at entry=0x7fffffffd854, buf_rtn=buf_rtn at entry=0x7fffffffd858)
at proto.c:623
#12 0x00007ffff6e17a5a in guestfs_int_recv
    (g=g at entry=0xf31290, fn=fn at entry=0x7ffff6e3b3e8 "download",
hdr=hdr at entry=0x7fffffffd920, err=err at entry=0x7fffffffd8f0,
xdrp=xdrp at entry=0x0,
ret=ret at entry=0x0)
    at proto.c:668

I debugged this issue and noticed that the appliance logs
from commandrvf are truncated, leading to parse failure (missing utf-8
additional bytes):
https://github.com/libguestfs/libguestfs/blob/master/python/handle.c#L134
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 0:
invalid start byte
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20230212/973ad8ab/attachment.htm>


More information about the Libguestfs mailing list