[dm-devel] Deadlock when swapping a table with a dm-era target

Nikos Tsironis ntsironis at arrikto.com
Fri Dec 3 14:42:36 UTC 2021


On 12/2/21 5:41 PM, Zdenek Kabelac wrote:
> Dne 01. 12. 21 v 18:07 Nikos Tsironis napsal(a):
>> Hello,
>>
>> Under certain conditions, swapping a table, that includes a dm-era
>> target, with a new table, causes a deadlock.
>>
>> This happens when a status (STATUSTYPE_INFO) or message IOCTL is blocked
>> in the suspended dm-era target.
>>
>> dm-era executes all metadata operations in a worker thread, which stops
>> processing requests when the target is suspended, and resumes again when
>> the target is resumed.
>>
>> So, running 'dmsetup status' or 'dmsetup message' for a suspended dm-era
>> device blocks, until the device is resumed.
>>
>> This seems to be a problem on its own.
>>
>> If we then load a new table to the device, while the aforementioned
>> dmsetup command is blocked in dm-era, and resume the device, we
>> deadlock.
>>
>> The problem is that the 'dmsetup status' and 'dmsetup message' commands
>> hold a reference to the live table, i.e., they hold an SRCU read lock on
>> md->io_barrier, while they are blocked.
>>
>> When the device is resumed, the old table is replaced with the new one
>> by dm_swap_table(), which ends up calling synchronize_srcu() on
>> md->io_barrier.
>>
>> Since the blocked dmsetup command is holding the SRCU read lock, and the
>> old table is never resumed, 'dmsetup resume' blocks too, and we have a
>> deadlock.
>>
>> Steps to reproduce:
>>
>> 1. Create device with dm-era target
>>
>>    # dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta /dev/datavg/eradata 8192"
>>
>> 2. Suspend the device
>>
>>    # dmsetup suspend eradev
>>
>> 3. Load new table to device, e.g., to resize the device
>>
>>    # dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta /dev/datavg/eradata 8192"
>>
> 
> Your sequence is faulty - you must always preload  new table before suspend.
> 
> Suspend&Resume should be absolutely minimal in its timing.
> 
> Also nothing should be allocating memory in suspend so that's why suspend has to be used after table line is fully loaded.
> 

Hi Zdenek,

Thanks for the feedback. There doesn't seem to be any documentation
mentioning that loading the new table should happen before suspend, so
thanks a lot for explaining it.

Unfortunately, this isn't what causes the deadlock. The following
sequence, which loads the table before suspend, also results in a
deadlock:

1. Create device with dm-era target

    # dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta /dev/datavg/eradata 8192"

2. Load new table to device, e.g., to resize the device

    # dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta /dev/datavg/eradata 8192"

3. Suspend the device

    # dmsetup suspend eradev

4. Retrieve the status of the device. This blocks for the reasons I
    explained in my previous email.

    # dmsetup status eradev

5. Resume the device. This deadlocks for the reasons I explained in my
    previous email.

    # dmsetup resume eradev

6. The dmesg logs are the same as the ones I included in my previous
    email.

I have explained the reasons for the deadlock in my previous email, but
I would be more than happy to discuss them more.

I would also like your feedback on the solutions I proposed there, so I
can work on a fix.

Thanks,
Nikos.




More information about the dm-devel mailing list