[dm-devel] Deadlock when swapping a table with a dm-era target
Nikos Tsironis
ntsironis at arrikto.com
Fri Dec 3 14:42:36 UTC 2021
On 12/2/21 5:41 PM, Zdenek Kabelac wrote:
> Dne 01. 12. 21 v 18:07 Nikos Tsironis napsal(a):
>> Hello,
>>
>> Under certain conditions, swapping a table, that includes a dm-era
>> target, with a new table, causes a deadlock.
>>
>> This happens when a status (STATUSTYPE_INFO) or message IOCTL is blocked
>> in the suspended dm-era target.
>>
>> dm-era executes all metadata operations in a worker thread, which stops
>> processing requests when the target is suspended, and resumes again when
>> the target is resumed.
>>
>> So, running 'dmsetup status' or 'dmsetup message' for a suspended dm-era
>> device blocks, until the device is resumed.
>>
>> This seems to be a problem on its own.
>>
>> If we then load a new table to the device, while the aforementioned
>> dmsetup command is blocked in dm-era, and resume the device, we
>> deadlock.
>>
>> The problem is that the 'dmsetup status' and 'dmsetup message' commands
>> hold a reference to the live table, i.e., they hold an SRCU read lock on
>> md->io_barrier, while they are blocked.
>>
>> When the device is resumed, the old table is replaced with the new one
>> by dm_swap_table(), which ends up calling synchronize_srcu() on
>> md->io_barrier.
>>
>> Since the blocked dmsetup command is holding the SRCU read lock, and the
>> old table is never resumed, 'dmsetup resume' blocks too, and we have a
>> deadlock.
>>
>> Steps to reproduce:
>>
>> 1. Create device with dm-era target
>>
>> # dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta /dev/datavg/eradata 8192"
>>
>> 2. Suspend the device
>>
>> # dmsetup suspend eradev
>>
>> 3. Load new table to device, e.g., to resize the device
>>
>> # dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta /dev/datavg/eradata 8192"
>>
>
> Your sequence is faulty - you must always preload new table before suspend.
>
> Suspend&Resume should be absolutely minimal in its timing.
>
> Also nothing should be allocating memory in suspend so that's why suspend has to be used after table line is fully loaded.
>
Hi Zdenek,
Thanks for the feedback. There doesn't seem to be any documentation
mentioning that loading the new table should happen before suspend, so
thanks a lot for explaining it.
Unfortunately, this isn't what causes the deadlock. The following
sequence, which loads the table before suspend, also results in a
deadlock:
1. Create device with dm-era target
# dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta /dev/datavg/eradata 8192"
2. Load new table to device, e.g., to resize the device
# dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta /dev/datavg/eradata 8192"
3. Suspend the device
# dmsetup suspend eradev
4. Retrieve the status of the device. This blocks for the reasons I
explained in my previous email.
# dmsetup status eradev
5. Resume the device. This deadlocks for the reasons I explained in my
previous email.
# dmsetup resume eradev
6. The dmesg logs are the same as the ones I included in my previous
email.
I have explained the reasons for the deadlock in my previous email, but
I would be more than happy to discuss them more.
I would also like your feedback on the solutions I proposed there, so I
can work on a fix.
Thanks,
Nikos.
More information about the dm-devel
mailing list