[dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0

Mike Snitzer snitzer at redhat.com
Tue Jul 28 21:24:41 UTC 2015


On Tue, Jul 28 2015 at  4:08pm -0400,
Andreas Hartmann <andihartmann at freenet.de> wrote:

> On 07/28/2015 at 21:31 PM, Mike Snitzer wrote:
> > On Tue, Jul 28 2015 at  3:23pm -0400,
> >Andreas Hartmann <andihartmann at freenet.de> wrote:
> >
> >>On 07/28/2015 at 08:58 PM, Mike Snitzer wrote:
> >>>On Tue, Jul 28 2015 at  2:20pm -0400,
> >>>Andreas Hartmann <andihartmann at freenet.de> wrote:
> >>>
> >>>>On 07/28/2015 at 07:50 PM, Mike Snitzer wrote:
> >>>>[..]
> >>>>>Are your SATA devcies using NCQ?
> >>>>
> >>>>Yes. It's enabled:
> >>>>
> >>>>dmesg| grep -i ncq
> >>>>ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part
> >>>>ata2.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
> >>>>ata3.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
> >>>>ata1.00: 468862128 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
> >>>>
> >>>>As the errors already come up on boot (during mount of partitions or
> >>>>even before the password for the disk has been provided): How can I
> >>>>disable NCQ during boot of the kernel? Is there a kernel option?
> >>>
> >>>See:
> >>>https://ata.wiki.kernel.org/index.php/Libata_FAQ#Enabling.2C_disabling_and_checking_NCQ
> >>>
> >>>alternatively, and likely easier, set this on the kernel commandline:
> >>>  libata.force=noncq
> >>
> >>ata2.00: FORCE: horkage modified (noncq)
> >>ata2.00: 5860533168 sectors, multi 0: LBA48 NCQ (not used)
> >>ata3.00: FORCE: horkage modified (noncq)
> >>ata3.00: 5860533168 sectors, multi 0: LBA48 NCQ (not used)
> >>ata5.00: FORCE: horkage modified (noncq)
> >>ata1.00: FORCE: horkage modified (noncq)
> >>ata1.00: 468862128 sectors, multi 16: LBA48 NCQ (not used)
> >>
> >>
> >>Perfectly. Seems to work w/ 3.19.8 and your mentioned patches. But now,
> >>I'm getting another error, which I didn't see before w/ 3.x-kernels:
> >>
> >>[drm:btc_dpm_set_power_state [radeon]] *ERROR*
> >>rv770_restrict_performance_levels_before_switch failed
> >>
> >>It seams that your patches do have some unwanted side effects :-).
> >
> >That is a completely different issue.  drm and radeon is a graphics
> >issue.
> 
> Nothing changed on radeon code. I just applied your patches. Nothing
> more. Why should radeon been suddenly broken if I apply your patches
> to a stable 3.19.8 code?
> 
> These patches trigger tons of AMD-Vi IO_PAGE_FAULTs w/ ncq enabled
> and the IOMMU developers say, that it is not a problem of the iommu
> code.
> 
> >>Could you please reexamine your patch "dm crypt: don't allocate
> >>pages for a partial request" - after applying this patch all the
> >>problems are coming up here.
> >
> >More likely than not your hardware isn't very good.
> 
> Maybe - maybe not. The only thing I know for sure, is: with these
> patches applied, the machine doesn't work reliably any more. W/ ncq
> disabled, the AMD-Vi IO_PAGE_FAULTs are gone, but a radeon error
> never seen before came instead. Most probably chance. Most probably,
> it could have been risen any other error, too.
> 
> I am willing to do tests if you have any idea to be tested - I can
> reproduce it quite easily.

You can try disabling dm-crypt's parallelization by specifying these 2
features: same_cpu_crypt submit_from_crypt_cpus

It is my understanding that these can be set using the cryptsetup tool.
Milan can you clarify how these features can be set from a high-level
(on an existing crypt device)?




More information about the dm-devel mailing list