[vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-lts kernel

Dan Ziemba zman0900 at gmail.com
Sat Oct 24 00:50:03 UTC 2015


I just released the 4.1.11 PKGBUILD.  So far so good for me, but it's
only been running for a few hours - not really long enough to tell.  

I do have ASRock too, but it is on nearly the latest uefi firmware.
 There is one newer version, but it says the only change is the servers
used for online update.

I never got around to setting up the intel microcode updates, so that
should probably be my next step.

Dan

-----Original Message-----
From: Mark Weiman <mark.weiman at markzz.com>
To: vfio-users at redhat.com
Subject: Re: [vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-lts
kernel
Date: Fri, 23 Oct 2015 18:56:39 -0400

To be honest, ASRock BIOS upgrades are fairly painless because they can
be done outside of the operating system, so no need to get an image of
FreeDOS ready.  If you do not want to get that though, I do still
recommend the intel-ucode package if you don't already.  As of right
now, I have no issues running my repository's 4.1.11-1 package.

Mark Weiman

On Fri, 2015-10-23 at 16:51 -0200, Lucas Kückelhaus wrote:
> One thing I noticed is that we all do seem to have ASROCK
> motherboards 
> as Mark mentioned. I am hesitant to perform a bios upgrade, however. 
> VT-D is finicky enough as is. I can try 4.1.11 later tonight and see
> if 
> it helps.
> 
> Regards,
> Lucas Kückelhaus
> 
> On 2015-10-23 15:54, Dan Ziemba wrote:
> > Well, old systemd and dbus didn't help. System was locked up again
> > this morning.  Left the screen on tailing dmesg, but there was
> > nothing
> > interesting output.  I've got a PKGBUILD for 4.1.11 coming later
> > today, so maybe that will help.
> > 
> > Dan
> > On Oct 22, 2015 10:53 PM, "Dan Ziemba" <zman0900 at gmail.com> wrote:
> > 
> > > Hey,
> > > 
> > > I maintain that PKGBUILD. I think I've been having the same
> > > problem,
> > > but it seems to also happen if I reinstall the older linux-vfio
> > > 4.1.6.
> > > Here's the latest stack trace I was able to capture:
> > > https://i.imgur.co [1]
> > > m/FZkj4ib.jpg I had to disable the screen timeout so it would
> > > stay
> > > on
> > > all night with dmesg tailing and I found it like this in the
> > > morning.
> > > Mouse and caps lock still worked, but I couldn't actually do
> > > anything
> > > and the clock was frozen.
> > > 
> > > I was also noticing that booting my system was unreliable. If I
> > > would
> > > reboot several times in a row, once every two to three time, it
> > > would
> > > hang while starting various services and then never start gdm.
> > > 
> > > Today I tried downgrading systemd and dbus to just before the
> > > change
> > > that switched to user buses (See here:
> > > https://www.archlinux.org/news/d
> > > -bus-now-launches-user-buses/ ;) I reboot a whole bunch of times
> > > using
> > > 4.1.10 linux-vfio-lts and it seems reliable. I have been using
> > > the
> > > computer pretty much all day for work and it hasn't had any of
> > > the
> > > soft
> > > lockup yet, but it may be too soon to tell. Most of the time in
> > > the
> > > past the lockup would happen while idle.
> > > 
> > > These are the downgrades I made, everything else is up to date as
> > > of
> > > this morning.
> > > 
> > > [2015-10-22 12:22] [ALPM] transaction started
> > > [2015-10-22 12:22] [ALPM] downgraded libsystemd (227-1 -> 225-1)
> > > [2015-10-22 12:22] [ALPM] downgraded libdbus (1.10.0-4 -> 1.10.0-
> > > 2)
> > > [2015-10-22 12:22] [ALPM] downgraded dbus (1.10.0-4 -> 1.10.0-2)
> > > [2015-10-22 12:22] [ALPM] downgraded systemd (227-1 -> 225-1)
> > > [2015-10-22 12:22] [ALPM] downgraded lib32-systemd (227-1 -> 225-
> > > 1)
> > > [2015-10-22 12:22] [ALPM] downgraded systemd-sysvcompat (227-1 ->
> > > 225-1)
> > > [2015-10-22 12:22] [ALPM] transaction completed
> > > 
> > > I will follow up tomorrow with whether or not it locks up
> > > tonight.
> > > If
> > > we can isolate the problem to systemd or dbus, maybe that's at
> > > least
> > > good enough for a bug report.
> > > 
> > > Dan
> > > 
> > > -----Original Message-----
> > > From: Lucas Kückelhaus <lucas at kuckelhaus.com>
> > > To: vfio-users at redhat.com
> > > Subject: [vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-lts
> > > kernel
> > > Date: Thu, 22 Oct 2015 23:00:37 -0200
> > > Mailer: Roundcube Webmail/1.0.2
> > > 
> > > Hi,
> > > 
> > > I'm trying to run an Archlinux host on kernel 4.1.10-1-vfio-lts
> > > (Mark
> > > Weiman's custom repo) because I'm unable to boot a GPU-assigned
> > > VM
> > > on
> > > 4.2.3-1-vfio.
> > > 
> > > The VM boots fine and works for a while, but the computer
> > > sporadically
> > > crashes with the following:
> > > 
> > > Oct 22 21:43:37 kvmhost kernel: NMI watchdog: BUG: soft lockup -
> > > CPU#4
> > > stuck for 22s! [swapper/4:0]
> > > Oct 22 21:43:39 kvmhost kernel: Modules linked in: veth vhost_net
> > > vhost
> > > macvtap macvlan tun bridge stp llc nls_iso8859_1 nls_cp437 vfat
> > > fat
> > > iTCO_wdt iTCO_vendor_support nouveau snd_hda_codec_hdmi
> > > intel_rapl
> > > iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp mxm_wmi
> > > snd_hda_
> > > Oct 22 21:43:39 kvmhost kernel: sch_fq_codel fuse nfsd nfs
> > > auth_rpcgss
> > > oid_registry nfs_acl lockd grace sunrpc fscache ip_tables
> > > x_tables
> > > ext4
> > > crc16 mbcache jbd2 dm_mod hid_logitech_hidpp hid_logitech_dj
> > > hid_generic
> > > usbhid hid sd_mod uas usb_storage atkbd libps2 crc32c_intel ah
> > > Oct 22 21:43:39 kvmhost kernel: CPU: 4 PID: 0 Comm: swapper/4
> > > Tainted: G
> > > L 4.1.10-1-vfio-lts #1
> > > Oct 22 21:43:39 kvmhost kernel: Hardware name: To Be Filled By
> > > O.E.M. To
> > > Be Filled By O.E.M./Z77 Extreme4, BIOS P2.30 09/21/2012
> > > Oct 22 21:43:39 kvmhost kernel: task: ffff88080b119460 ti:
> > > ffff88080b124000 task.ti: ffff88080b124000
> > > Oct 22 21:43:39 kvmhost kernel: RIP: 0010:[<ffffffff810f6770>]
> > > [<ffffffff810f6770>] try_to_del_timer_sync+0x0/0xa0
> > > Oct 22 21:43:39 kvmhost kernel: RSP: 0018:ffff88082f303db0
> > > EFLAGS:
> > > 00000286
> > > Oct 22 21:43:39 kvmhost kernel: RAX: 00000000ffffffff RBX:
> > > 0000000000000286 RCX: 0000000000000000
> > > Oct 22 21:43:39 kvmhost kernel: RDX: 00000000000000bf RSI:
> > > 0000000000000286 RDI: ffff880270fa8428
> > > Oct 22 21:43:39 kvmhost kernel: RBP: ffff88082f303dc8 R08:
> > > 0000000000002710 R09: ffff88082f30e780
> > > Oct 22 21:43:39 kvmhost kernel: R10: 0000000000000000 R11:
> > > 0000000000000004 R12: ffff88082f303d28
> > > Oct 22 21:43:39 kvmhost kernel: R13: ffffffff815f13de R14:
> > > ffff88082f303dc8 R15: ffff880270fa8428
> > > Oct 22 21:43:39 kvmhost kernel: FS: 0000000000000000(0000)
> > > GS:ffff88082f300000(0000) knlGS:0000000000000000
> > > Oct 22 21:43:39 kvmhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
> > > 0000000080050033
> > > Oct 22 21:43:39 kvmhost kernel: CR2: 00007fc2d6f6da28 CR3:
> > > 000000029c65c000 CR4: 00000000001426e0
> > > Oct 22 21:43:39 kvmhost kernel: Stack:
> > > Oct 22 21:43:39 kvmhost kernel: ffffffff810f6872 ffff88082f303e38
> > > ffff880270fa8390 ffff88082f303df8
> > > Oct 22 21:43:39 kvmhost kernel: ffffffff8152a16f ffff880270fa8390
> > > ffff8805b3bab800 ffff880270d20000
> > > Oct 22 21:43:39 kvmhost kernel: 0000000000000001 ffff88082f303e38
> > > ffffffff8152a3e7 ffff88082f3107e0
> > > Oct 22 21:43:39 kvmhost kernel: Call Trace:
> > > Oct 22 21:43:39 kvmhost kernel: <IRQ>
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff810f6872>] ?
> > > del_timer_sync+0x62/0x70
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff8152a16f>]
> > > inet_csk_reqsk_queue_drop+0xbf/0x240
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff8152a3e7>]
> > > reqsk_timer_handler+0xf7/0x2e0
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff8152a2f0>] ?
> > > inet_csk_reqsk_queue_drop+0x240/0x240
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff810f64c8>]
> > > call_timer_fn+0x48/0x160
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff8152a2f0>] ?
> > > inet_csk_reqsk_queue_drop+0x240/0x240
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff810f6bd4>]
> > > run_timer_softirq+0x284/0x330
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81086711>]
> > > __do_softirq+0xf1/0x2e0
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81086acd>]
> > > irq_exit+0xbd/0xc0
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff815f31d5>]
> > > smp_apic_timer_interrupt+0x55/0x70
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff815f13de>]
> > > apic_timer_interrupt+0x6e/0x80
> > > Oct 22 21:43:39 kvmhost kernel: <EOI>
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81021c1d>] ?
> > > native_sched_clock+0x2d/0xa0
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81490c81>] ?
> > > cpuidle_enter_state+0xa1/0x250
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81490c53>] ?
> > > cpuidle_enter_state+0x73/0x250
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81490e8a>]
> > > cpuidle_enter+0x2a/0x30
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff810cb36c>]
> > > cpu_startup_entry+0x32c/0x460
> > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81055f7e>]
> > > start_secondary+0x19e/0x1e0
> > > Oct 22 21:43:39 kvmhost kernel: Code: 4d d8 65 48 33 0c 25 28 00
> > > 00
> > > 00
> > > 44 89 e0 75 0b 48 83 c4 18 5b 41 5c 41 5d 5d c3 e8 1b b8 f8 ff 90
> > > 66 2e
> > > 0f 1f 84 00 00 00 00 00 <0f> 1f 44 00 00 55 48 89 e5 41 54 53 48
> > > 81
> > > ec
> > > 30 10 00 00 48 83
> > > 
> > > This happens for all cores and it locks up the entire system. I
> > > don't
> > > know what to do. On 4.2.3-1-vfio I have no hangups and all my
> > > non-vfio
> > > VMs work perfectly fine.
> > > 
> > > Thank you,
> > > Lucas Kückelhaus
> > > 
> > > _______________________________________________
> > > vfio-users mailing list
> > > vfio-users at redhat.com
> > > https://www.redhat.com/mailman/listinfo/vfio-users [2]
> > 
> > 
> > Links:
> > ------
> > [1] https://i.imgur.co
> > [2] https://www.redhat.com/mailman/listinfo/vfio-users
> 
> _______________________________________________
> vfio-users mailing list
> vfio-users at redhat.com
_______________________________________________
vfio-users mailing list
vfio-users at redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20151023/285fb223/attachment.sig>


More information about the vfio-users mailing list