[vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-lts kernel

Dan Ziemba zman0900 at gmail.com
Fri Oct 23 17:54:18 UTC 2015


Well, old systemd and dbus didn't help. System was locked up again this
morning.  Left the screen on tailing dmesg, but there was nothing
interesting output.  I've got a PKGBUILD for 4.1.11 coming later today, so
maybe that will help.

Dan
On Oct 22, 2015 10:53 PM, "Dan Ziemba" <zman0900 at gmail.com> wrote:

> Hey,
>
> I maintain that PKGBUILD.  I think I've been having the same problem,
> but it seems to also happen if I reinstall the older linux-vfio 4.1.6.
> Here's the latest stack trace I was able to capture: https://i.imgur.co
> m/FZkj4ib.jpg  I had to disable the screen timeout so it would stay on
> all night with dmesg tailing and I found it like this in the morning.
>  Mouse and caps lock still worked, but I couldn't actually do anything
> and the clock was frozen.
>
> I was also noticing that booting my system was unreliable.  If I would
> reboot several times in a row, once every two to three time, it would
> hang while starting various services and then never start gdm.
>
> Today I tried downgrading systemd and dbus to just before the change
> that switched to user buses (See here: https://www.archlinux.org/news/d
> -bus-now-launches-user-buses/ ;) I reboot a whole bunch of times using
> 4.1.10 linux-vfio-lts and it seems reliable.  I have been using the
> computer pretty much all day for work and it hasn't had any of the soft
> lockup yet, but it may be too soon to tell.  Most of the time in the
> past the lockup would happen while idle.
>
> These are the downgrades I made, everything else is up to date as of
> this morning.
>
> [2015-10-22 12:22] [ALPM] transaction started
> [2015-10-22 12:22] [ALPM] downgraded libsystemd (227-1 -> 225-1)
> [2015-10-22 12:22] [ALPM] downgraded libdbus (1.10.0-4 -> 1.10.0-2)
> [2015-10-22 12:22] [ALPM] downgraded dbus (1.10.0-4 -> 1.10.0-2)
> [2015-10-22 12:22] [ALPM] downgraded systemd (227-1 -> 225-1)
> [2015-10-22 12:22] [ALPM] downgraded lib32-systemd (227-1 -> 225-1)
> [2015-10-22 12:22] [ALPM] downgraded systemd-sysvcompat (227-1 -> 225-1)
> [2015-10-22 12:22] [ALPM] transaction completed
>
> I will follow up tomorrow with whether or not it locks up tonight.  If
> we can isolate the problem to systemd or dbus, maybe that's at least
> good enough for a bug report.
>
> Dan
>
> -----Original Message-----
> From: Lucas Kückelhaus <lucas at kuckelhaus.com>
> To: vfio-users at redhat.com
> Subject: [vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-lts kernel
> Date: Thu, 22 Oct 2015 23:00:37 -0200
> Mailer: Roundcube Webmail/1.0.2
>
> Hi,
>
> I'm trying to run an Archlinux host on kernel 4.1.10-1-vfio-lts (Mark
> Weiman's custom repo) because I'm unable to boot a GPU-assigned VM on
> 4.2.3-1-vfio.
>
> The VM boots fine and works for a while, but the computer sporadically
> crashes with the following:
>
>
> Oct 22 21:43:37 kvmhost kernel: NMI watchdog: BUG: soft lockup - CPU#4
> stuck for 22s! [swapper/4:0]
> Oct 22 21:43:39 kvmhost kernel: Modules linked in: veth vhost_net vhost
> macvtap macvlan tun bridge stp llc nls_iso8859_1 nls_cp437 vfat fat
> iTCO_wdt iTCO_vendor_support nouveau snd_hda_codec_hdmi intel_rapl
> iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp mxm_wmi snd_hda_
> Oct 22 21:43:39 kvmhost kernel:  sch_fq_codel fuse nfsd nfs auth_rpcgss
> oid_registry nfs_acl lockd grace sunrpc fscache ip_tables x_tables ext4
> crc16 mbcache jbd2 dm_mod hid_logitech_hidpp hid_logitech_dj hid_generic
> usbhid hid sd_mod uas usb_storage atkbd libps2 crc32c_intel ah
> Oct 22 21:43:39 kvmhost kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G
>              L  4.1.10-1-vfio-lts #1
> Oct 22 21:43:39 kvmhost kernel: Hardware name: To Be Filled By O.E.M. To
> Be Filled By O.E.M./Z77 Extreme4, BIOS P2.30 09/21/2012
> Oct 22 21:43:39 kvmhost kernel: task: ffff88080b119460 ti:
> ffff88080b124000 task.ti: ffff88080b124000
> Oct 22 21:43:39 kvmhost kernel: RIP: 0010:[<ffffffff810f6770>]
> [<ffffffff810f6770>] try_to_del_timer_sync+0x0/0xa0
> Oct 22 21:43:39 kvmhost kernel: RSP: 0018:ffff88082f303db0  EFLAGS:
> 00000286
> Oct 22 21:43:39 kvmhost kernel: RAX: 00000000ffffffff RBX:
> 0000000000000286 RCX: 0000000000000000
> Oct 22 21:43:39 kvmhost kernel: RDX: 00000000000000bf RSI:
> 0000000000000286 RDI: ffff880270fa8428
> Oct 22 21:43:39 kvmhost kernel: RBP: ffff88082f303dc8 R08:
> 0000000000002710 R09: ffff88082f30e780
> Oct 22 21:43:39 kvmhost kernel: R10: 0000000000000000 R11:
> 0000000000000004 R12: ffff88082f303d28
> Oct 22 21:43:39 kvmhost kernel: R13: ffffffff815f13de R14:
> ffff88082f303dc8 R15: ffff880270fa8428
> Oct 22 21:43:39 kvmhost kernel: FS:  0000000000000000(0000)
> GS:ffff88082f300000(0000) knlGS:0000000000000000
> Oct 22 21:43:39 kvmhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Oct 22 21:43:39 kvmhost kernel: CR2: 00007fc2d6f6da28 CR3:
> 000000029c65c000 CR4: 00000000001426e0
> Oct 22 21:43:39 kvmhost kernel: Stack:
> Oct 22 21:43:39 kvmhost kernel:  ffffffff810f6872 ffff88082f303e38
> ffff880270fa8390 ffff88082f303df8
> Oct 22 21:43:39 kvmhost kernel:  ffffffff8152a16f ffff880270fa8390
> ffff8805b3bab800 ffff880270d20000
> Oct 22 21:43:39 kvmhost kernel:  0000000000000001 ffff88082f303e38
> ffffffff8152a3e7 ffff88082f3107e0
> Oct 22 21:43:39 kvmhost kernel: Call Trace:
> Oct 22 21:43:39 kvmhost kernel:  <IRQ>
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff810f6872>] ?
> del_timer_sync+0x62/0x70
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff8152a16f>]
> inet_csk_reqsk_queue_drop+0xbf/0x240
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff8152a3e7>]
> reqsk_timer_handler+0xf7/0x2e0
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff8152a2f0>] ?
> inet_csk_reqsk_queue_drop+0x240/0x240
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff810f64c8>]
> call_timer_fn+0x48/0x160
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff8152a2f0>] ?
> inet_csk_reqsk_queue_drop+0x240/0x240
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff810f6bd4>]
> run_timer_softirq+0x284/0x330
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81086711>]
> __do_softirq+0xf1/0x2e0
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81086acd>] irq_exit+0xbd/0xc0
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff815f31d5>]
> smp_apic_timer_interrupt+0x55/0x70
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff815f13de>]
> apic_timer_interrupt+0x6e/0x80
> Oct 22 21:43:39 kvmhost kernel:  <EOI>
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81021c1d>] ?
> native_sched_clock+0x2d/0xa0
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81490c81>] ?
> cpuidle_enter_state+0xa1/0x250
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81490c53>] ?
> cpuidle_enter_state+0x73/0x250
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81490e8a>]
> cpuidle_enter+0x2a/0x30
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff810cb36c>]
> cpu_startup_entry+0x32c/0x460
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81055f7e>]
> start_secondary+0x19e/0x1e0
> Oct 22 21:43:39 kvmhost kernel: Code: 4d d8 65 48 33 0c 25 28 00 00 00
> 44 89 e0 75 0b 48 83 c4 18 5b 41 5c 41 5d 5d c3 e8 1b b8 f8 ff 90 66 2e
> 0f 1f 84 00 00 00 00 00 <0f> 1f 44 00 00 55 48 89 e5 41 54 53 48 81 ec
> 30 10 00 00 48 83
>
>
>
> This happens for all cores and it locks up the entire system. I don't
> know what to do. On 4.2.3-1-vfio I have no hangups and all my non-vfio
> VMs work perfectly fine.
>
> Thank you,
> Lucas Kückelhaus
>
> _______________________________________________
> vfio-users mailing list
> vfio-users at redhat.com
> https://www.redhat.com/mailman/listinfo/vfio-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20151023/8f06d50c/attachment.htm>


More information about the vfio-users mailing list