[libvirt] Found mem leak in libvirtd, need help to debug

Tue Feb 9 15:59:01 UTC 2016

On 09.02.2016 16:34, Piotr Rybicki wrote:
> 
> 
> W dniu 2016-02-09 o 16:12, Michal Privoznik pisze:
>> On 09.02.2016 13:36, Piotr Rybicki wrote:
>>> Hi guys.
>>>
>>> W dniu 2015-11-20 o 11:29, Piotr Rybicki pisze:
>>>>
>>>>> I've seen some of theese already. The bug is actually not in
>>>>> libvirt but
>>>>> in gluster's libgfapi library, so any change in libvirt won't help.
>>>>>
>>>>> This was tracked in gluster as:
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1093594
>>>>>
>>>>> I suggest you update the gluster library to resolve this issue.
>>>>>
>>>
>>> I've tested further this issue.
>>>
>>> I have to report, that mem leak still exists in latest versions
>>> gluster: 3.7.6
>>> libvirt 1.3.1
>>>
>>> mem leak exists even when starting domain (virsh start DOMAIN) which
>>> acesses drivie via libgfapi (although leak is much smaller than with
>>> gluster 3.5.X).
>>>
>>> when using drive via file (gluster fuse mount), there is no mem leak
>>> when starting domain.
>>>
>>> my drive definition (libgfapi):
>>>
>>>      <disk type='network' device='disk'>
>>>        <driver name='qemu' type='raw' cache='writethrough'
>>> iothread='1'/>
>>>        <source protocol='gluster' name='pool/disk-sys.img'>
>>>          <host name='X.X.X.X' transport='rdma'/>
>>>        </source>
>>>        <blockio logical_block_size='512' physical_block_size='32768'/>
>>>        <target dev='vda' bus='virtio'/>
>>>        <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
>>> function='0x0'/>
>>>      </disk>
>>>
>>>
>>> valgrind details (libgfapi):
>>>
>>> # valgrind --leak-check=full --show-reachable=yes
>>> --child-silent-after-fork=yes libvirtd --listen 2> libvirt-gfapi.log
>>>
>>> ==6532== Memcheck, a memory error detector
>>> ==6532== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
>>> ==6532== Using Valgrind-3.11.0 and LibVEX; rerun with -h for
>>> copyright info
>>> ==6532== Command: libvirtd --listen
>>> ==6532==
>>> ==6532== Warning: noted but unhandled ioctl 0x89a2 with no
>>> size/direction hints.
>>> ==6532==    This could cause spurious value errors to appear.
>>> ==6532==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing
>>> a proper wrapper.
>>> 2016-02-09 12:20:26.732+0000: 6535: info : libvirt version: 1.3.1
>>> 2016-02-09 12:20:26.732+0000: 6535: info : hostname: adm-office
>>> 2016-02-09 12:20:26.732+0000: 6535: warning : qemuDomainObjTaint:2223 :
>>> Domain id=1 name='gentoo-intel'
>>> uuid=f9fd934b-cbda-af4e-cc98-0dd2c8dd6c2c is tainted: host-cpu
>>> 2016-02-09 12:21:29.924+0000: 6532: error : qemuMonitorIO:689 : internal
>>> error: End of file from monitor
>>> ==6532==
>>> ==6532== HEAP SUMMARY:
>>> ==6532==     in use at exit: 3,726,573 bytes in 15,324 blocks
>>> ==6532==   total heap usage: 238,573 allocs, 223,249 frees,
>>> 1,020,776,752 bytes allocated
>>>
>>> (...)
>>>
>>> ==6532== LEAK SUMMARY:
>>> ==6532==    definitely lost: 19,760 bytes in 97 blocks
>>> ==6532==    indirectly lost: 21,098 bytes in 122 blocks
>>> ==6532==      possibly lost: 2,698,764 bytes in 67 blocks
>>> ==6532==    still reachable: 986,951 bytes in 15,038 blocks
>>> ==6532==         suppressed: 0 bytes in 0 blocks
>>> ==6532==
>>> ==6532== For counts of detected and suppressed errors, rerun with: -v
>>> ==6532== ERROR SUMMARY: 96 errors from 96 contexts (suppressed: 0
>>> from 0)
>>>
>>> full log:
>>> http://195.191.233.1/libvirt-gfapi.log
>>> http://195.191.233.1/libvirt-gfapi.log.bz2
>>>
>>
>> I still think these are libgfapi leaks; All the definitely lost bytes
>> come from the library.
>>
>> ==6532== 3,064 (96 direct, 2,968 indirect) bytes in 1 blocks are
>> definitely lost in loss record 1,106 of 1,142
>> ==6532==    at 0x4C2C0D0: calloc (vg_replace_malloc.c:711)
>> ==6532==    by 0x10701279: __gf_calloc (mem-pool.c:117)
>> ==6532==    by 0x106CC541: xlator_dynload (xlator.c:259)
>> ==6532==    by 0xFC4E947: create_master (glfs.c:202)
>> ==6532==    by 0xFC4E947: glfs_init_common (glfs.c:863)
>> ==6532==    by 0xFC4EB50: glfs_init@@GFAPI_3.4.0 (glfs.c:916)
>> ==6532==    by 0xF7E4A33: virStorageFileBackendGlusterInit
>> (storage_backend_gluster.c:625)
>> ==6532==    by 0xF7D56DE: virStorageFileInitAs (storage_driver.c:2788)
>> ==6532==    by 0xF7D5E39: virStorageFileGetMetadataRecurse
>> (storage_driver.c:3048)
>> ==6532==    by 0xF7D6295: virStorageFileGetMetadata
>> (storage_driver.c:3171)
>> ==6532==    by 0x1126A2B0: qemuDomainDetermineDiskChain
>> (qemu_domain.c:3179)
>> ==6532==    by 0x11269AE6: qemuDomainCheckDiskPresence
>> (qemu_domain.c:2998)
>> ==6532==    by 0x11292055: qemuProcessLaunch (qemu_process.c:4708)
>>
>> Care to reporting it to them?
> 
> Of course - i will.
> 
> But, are You sure there is no need to call glfs_fini() after qemu
> process is launched? Are all of those resources still needed in libvirt?
> 
> I understand, that libvirt needs to check presence / other-things of
> storage, but after qemu is launched?

We call glfs_fini(). And that's the problem. It does not free everything
that glfs_init() allocated. Hence the leaks. Actually every time we call
glfs_init() we print a debug message from
virStorageFileBackendGlusterInit() which wraps it. And then another
debug message from virStorageFileBackendGlusterDeinit() when we call
glfs_fini(). So if you set up debug logs, you can check whether our init
and finish calls match.

Michal