[dm-devel] A hang bug of dm on s390x
Ming Lei
ming.lei at redhat.com
Thu Feb 16 00:08:35 UTC 2023
On Wed, Feb 15, 2023 at 07:23:40PM +0800, Pingfan Liu wrote:
> Hi guys,
>
> I encountered a hang issue on a s390x system. The tested kernel is
> not preemptible and booting with "nr_cpus=1"
>
> The test steps:
> umount /home
> lvremove /dev/rhel_s390x-kvm-011/home
> ## uncomment "snapshot_autoextend_threshold = 70" and
> "snapshot_autoextend_percent = 20" in /etc/lvm/lvm.conf
>
> systemctl enable lvm2-monitor.service
> systemctl start lvm2-monitor.service
>
> lvremove -y rhel_s390x-kvm-011/thinp
> lvcreate -L 10M -T rhel_s390x-kvm-011/thinp
> lvcreate -V 400M -T rhel_s390x-kvm-011/thinp -n src
> mkfs.ext4 /dev/rhel_s390x-kvm-011/src
> mount /dev/rhel_s390x-kvm-011/src /mnt
> for((i=0;i<4;i++)); do dd if=/dev/zero of=/mnt/test$i.img
> bs=100M count=1; done
>
> And the system hangs with the console log [1]
>
> The related kernel config
>
> CONFIG_PREEMPT_NONE_BUILD=y
> CONFIG_PREEMPT_NONE=y
> CONFIG_PREEMPT_COUNT=y
> CONFIG_SCHED_CORE=y
>
> It turns out that when hanging, the kernel is stuck in the dead-loop
> in the function dm_wq_work()
> while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
> spin_lock_irq(&md->deferred_lock);
> bio = bio_list_pop(&md->deferred);
> spin_unlock_irq(&md->deferred_lock);
>
> if (!bio)
> break;
> thread_cpu = smp_processor_id();
> submit_bio_noacct(bio);
> }
> where dm_wq_work()->__submit_bio_noacct()->...->dm_handle_requeue()
> keeps generating new bio, and the condition "if (!bio)" can not be
> meet.
>
>
> After applying the following patch, the issue is gone.
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index e1ea3a7bd9d9..95c9cb07a42f 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -2567,6 +2567,7 @@ static void dm_wq_work(struct work_struct *work)
> break;
>
> submit_bio_noacct(bio);
> + cond_resched();
> }
> }
>
> But I think it is not a proper solution. And without this patch, if
> removing nr_cpus=1 (the system has two cpus), the issue can not be
> triggered. That says when more than one cpu, the above loop can exit
> by the condition "if (!bio)"
>
> Any ideas?
I think the patch is correct.
For kernel built without CONFIG_PREEMPT, in case of single cpu core,
if the dm target(such as dm-thin) needs another wq or kthread for
handling IO, then dm target side is blocked because dm_wq_work()
holds the single cpu, sooner or later, dm target may have not
resource to handle new io from dm core and returns REQUEUE.
Then dm_wq_work becomes one dead loop.
Thanks,
Ming
More information about the dm-devel
mailing list