[vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-lts kernel

Jon Panozzo jonp at lime-technology.com
Fri Oct 23 17:57:33 UTC 2015


Gents,

I think I may have been able to recreate this issue on one of my systems as well that is also running the 4.1.10 kernel (not arch though).  4.1.1 was released today which apparently has a patch (or maybe a few?) for some CPU crashes.  I’m going to try that on my system soon to see if that fixes it.  I don’t think this is a system or dbus issue.

- Jon

> On Oct 23, 2015, at 12:54 PM, Dan Ziemba <zman0900 at gmail.com> wrote:
> 
> Well, old systemd and dbus didn't help. System was locked up again this morning.  Left the screen on tailing dmesg, but there was nothing interesting output.  I've got a PKGBUILD for 4.1.11 coming later today, so maybe that will help.
> 
> Dan
> 
> On Oct 22, 2015 10:53 PM, "Dan Ziemba" <zman0900 at gmail.com <mailto:zman0900 at gmail.com>> wrote:
> Hey,
> 
> I maintain that PKGBUILD.  I think I've been having the same problem,
> but it seems to also happen if I reinstall the older linux-vfio 4.1.6.
> Here's the latest stack trace I was able to capture: https://i.imgur.co <https://i.imgur.co/>
> m/FZkj4ib.jpg  I had to disable the screen timeout so it would stay on
> all night with dmesg tailing and I found it like this in the morning.
>  Mouse and caps lock still worked, but I couldn't actually do anything
> and the clock was frozen.
> 
> I was also noticing that booting my system was unreliable.  If I would
> reboot several times in a row, once every two to three time, it would
> hang while starting various services and then never start gdm.
> 
> Today I tried downgrading systemd and dbus to just before the change
> that switched to user buses (See here: https://www.archlinux.org/news/d
> -bus-now-launches-user-buses/ <https://www.archlinux.org/news/d-bus-now-launches-user-buses/> ;) I reboot a whole bunch of times using
> 4.1.10 linux-vfio-lts and it seems reliable.  I have been using the
> computer pretty much all day for work and it hasn't had any of the soft
> lockup yet, but it may be too soon to tell.  Most of the time in the
> past the lockup would happen while idle.
> 
> These are the downgrades I made, everything else is up to date as of
> this morning.
> 
> [2015-10-22 12:22] [ALPM] transaction started
> [2015-10-22 12:22] [ALPM] downgraded libsystemd (227-1 -> 225-1)
> [2015-10-22 12:22] [ALPM] downgraded libdbus (1.10.0-4 -> 1.10.0-2)
> [2015-10-22 12:22] [ALPM] downgraded dbus (1.10.0-4 -> 1.10.0-2)
> [2015-10-22 12:22] [ALPM] downgraded systemd (227-1 -> 225-1)
> [2015-10-22 12:22] [ALPM] downgraded lib32-systemd (227-1 -> 225-1)
> [2015-10-22 12:22] [ALPM] downgraded systemd-sysvcompat (227-1 -> 225-1)
> [2015-10-22 12:22] [ALPM] transaction completed
> 
> I will follow up tomorrow with whether or not it locks up tonight.  If
> we can isolate the problem to systemd or dbus, maybe that's at least
> good enough for a bug report.
> 
> Dan
> 
> -----Original Message-----
> From: Lucas Kückelhaus <lucas at kuckelhaus.com <mailto:lucas at kuckelhaus.com>>
> To: vfio-users at redhat.com <mailto:vfio-users at redhat.com>
> Subject: [vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-lts kernel
> Date: Thu, 22 Oct 2015 23:00:37 -0200
> Mailer: Roundcube Webmail/1.0.2
> 
> Hi,
> 
> I'm trying to run an Archlinux host on kernel 4.1.10-1-vfio-lts (Mark
> Weiman's custom repo) because I'm unable to boot a GPU-assigned VM on
> 4.2.3-1-vfio.
> 
> The VM boots fine and works for a while, but the computer sporadically
> crashes with the following:
> 
> 
> Oct 22 21:43:37 kvmhost kernel: NMI watchdog: BUG: soft lockup - CPU#4
> stuck for 22s! [swapper/4:0]
> Oct 22 21:43:39 kvmhost kernel: Modules linked in: veth vhost_net vhost
> macvtap macvlan tun bridge stp llc nls_iso8859_1 nls_cp437 vfat fat
> iTCO_wdt iTCO_vendor_support nouveau snd_hda_codec_hdmi intel_rapl
> iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp mxm_wmi snd_hda_
> Oct 22 21:43:39 kvmhost kernel:  sch_fq_codel fuse nfsd nfs auth_rpcgss
> oid_registry nfs_acl lockd grace sunrpc fscache ip_tables x_tables ext4
> crc16 mbcache jbd2 dm_mod hid_logitech_hidpp hid_logitech_dj hid_generic
> usbhid hid sd_mod uas usb_storage atkbd libps2 crc32c_intel ah
> Oct 22 21:43:39 kvmhost kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G
>              L  4.1.10-1-vfio-lts #1
> Oct 22 21:43:39 kvmhost kernel: Hardware name: To Be Filled By O.E.M. To
> Be Filled By O.E.M./Z77 Extreme4, BIOS P2.30 09/21/2012
> Oct 22 21:43:39 kvmhost kernel: task: ffff88080b119460 ti:
> ffff88080b124000 task.ti: ffff88080b124000
> Oct 22 21:43:39 kvmhost kernel: RIP: 0010:[<ffffffff810f6770>]  
> [<ffffffff810f6770>] try_to_del_timer_sync+0x0/0xa0
> Oct 22 21:43:39 kvmhost kernel: RSP: 0018:ffff88082f303db0  EFLAGS:
> 00000286
> Oct 22 21:43:39 kvmhost kernel: RAX: 00000000ffffffff RBX:
> 0000000000000286 RCX: 0000000000000000
> Oct 22 21:43:39 kvmhost kernel: RDX: 00000000000000bf RSI:
> 0000000000000286 RDI: ffff880270fa8428
> Oct 22 21:43:39 kvmhost kernel: RBP: ffff88082f303dc8 R08:
> 0000000000002710 R09: ffff88082f30e780
> Oct 22 21:43:39 kvmhost kernel: R10: 0000000000000000 R11:
> 0000000000000004 R12: ffff88082f303d28
> Oct 22 21:43:39 kvmhost kernel: R13: ffffffff815f13de R14:
> ffff88082f303dc8 R15: ffff880270fa8428
> Oct 22 21:43:39 kvmhost kernel: FS:  0000000000000000(0000)
> GS:ffff88082f300000(0000) knlGS:0000000000000000
> Oct 22 21:43:39 kvmhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Oct 22 21:43:39 kvmhost kernel: CR2: 00007fc2d6f6da28 CR3:
> 000000029c65c000 CR4: 00000000001426e0
> Oct 22 21:43:39 kvmhost kernel: Stack:
> Oct 22 21:43:39 kvmhost kernel:  ffffffff810f6872 ffff88082f303e38
> ffff880270fa8390 ffff88082f303df8
> Oct 22 21:43:39 kvmhost kernel:  ffffffff8152a16f ffff880270fa8390
> ffff8805b3bab800 ffff880270d20000
> Oct 22 21:43:39 kvmhost kernel:  0000000000000001 ffff88082f303e38
> ffffffff8152a3e7 ffff88082f3107e0
> Oct 22 21:43:39 kvmhost kernel: Call Trace:
> Oct 22 21:43:39 kvmhost kernel:  <IRQ>
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff810f6872>] ?
> del_timer_sync+0x62/0x70
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff8152a16f>]
> inet_csk_reqsk_queue_drop+0xbf/0x240
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff8152a3e7>]
> reqsk_timer_handler+0xf7/0x2e0
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff8152a2f0>] ?
> inet_csk_reqsk_queue_drop+0x240/0x240
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff810f64c8>]
> call_timer_fn+0x48/0x160
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff8152a2f0>] ?
> inet_csk_reqsk_queue_drop+0x240/0x240
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff810f6bd4>]
> run_timer_softirq+0x284/0x330
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81086711>]
> __do_softirq+0xf1/0x2e0
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81086acd>] irq_exit+0xbd/0xc0
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff815f31d5>]
> smp_apic_timer_interrupt+0x55/0x70
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff815f13de>]
> apic_timer_interrupt+0x6e/0x80
> Oct 22 21:43:39 kvmhost kernel:  <EOI>
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81021c1d>] ?
> native_sched_clock+0x2d/0xa0
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81490c81>] ?
> cpuidle_enter_state+0xa1/0x250
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81490c53>] ?
> cpuidle_enter_state+0x73/0x250
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81490e8a>]
> cpuidle_enter+0x2a/0x30
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff810cb36c>]
> cpu_startup_entry+0x32c/0x460
> Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81055f7e>]
> start_secondary+0x19e/0x1e0
> Oct 22 21:43:39 kvmhost kernel: Code: 4d d8 65 48 33 0c 25 28 00 00 00
> 44 89 e0 75 0b 48 83 c4 18 5b 41 5c 41 5d 5d c3 e8 1b b8 f8 ff 90 66 2e
> 0f 1f 84 00 00 00 00 00 <0f> 1f 44 00 00 55 48 89 e5 41 54 53 48 81 ec
> 30 10 00 00 48 83
> 
> 
> 
> This happens for all cores and it locks up the entire system. I don't
> know what to do. On 4.2.3-1-vfio I have no hangups and all my non-vfio
> VMs work perfectly fine.
> 
> Thank you,
> Lucas Kückelhaus
> 
> _______________________________________________
> vfio-users mailing list
> vfio-users at redhat.com <mailto:vfio-users at redhat.com>
> https://www.redhat.com/mailman/listinfo/vfio-users <https://www.redhat.com/mailman/listinfo/vfio-users>
> _______________________________________________
> vfio-users mailing list
> vfio-users at redhat.com
> https://www.redhat.com/mailman/listinfo/vfio-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20151023/faa7f02d/attachment.htm>


More information about the vfio-users mailing list